Qualifications
Required Skills & Experience:Demonstrated experience in Python for data engineering tasks (including PySpark, Pandas, etc.),Hands-on expertise with Databricks and the Spark ecosystem,Strong understanding of ETL/ELT concepts, data modeling, and pipeline orchestration,Experience with Microsoft SQL Server, including direct database connections,Practical knowledge in ingesting Parquet data and managing large historical datasets,Familiarity with Delta Lake and structured streaming in Databricks is a plus,Understanding of secure data transfer protocols between on-premises and cloud platforms,Excellent problem-solving abilities and capability to work independently.Preferred Qualifications:Experience with AI/ML data preparation workflows,Knowledge of data governance and compliance requirements related to customer and contract data.
About the job
Join our innovative team at Inetum Polska as a Data Engineer, where you will utilize your data engineering expertise in a fast-paced environment. Your role will be pivotal in ensuring smooth data migration and optimization for cutting-edge AI and ML projects. Don't miss out on the opportunity to contribute to our groundbreaking initiatives!
Key Responsibilities
Data Pipeline Development:
- Craft, develop, and implement Python-based ETL/ELT pipelines to facilitate data migration from on-premises MS SQL Server to our Databricks instance,
- Ensure effective ingestion of historical parquet datasets into Databricks.
Data Quality & Validation:
- Establish validation, reconciliation, and quality assurance protocols to guarantee the accuracy and completeness of migrated data,
- Manage schema mapping, field transformations, and metadata enrichment to standardize datasets,
- Integrate data governance, quality assurance, and compliance into all migration processes.
Performance Optimization:
- Optimize pipelines for enhanced speed and efficiency, leveraging Databricks capabilities, including Delta Lake when applicable,
- Oversee resource utilization and scheduling for large dataset transfers.
Collaboration:
- Coordinate closely with AI engineers, data scientists, and business stakeholders to outline data access patterns needed for upcoming AI POCs,
- Work alongside infrastructure teams to ensure secure connections between legacy systems and Databricks.
Documentation & Governance:
- Maintain comprehensive technical documentation for all data pipelines,
- Adhere to best practices for data governance, compliance, and security throughout the migration process.
About Inetum
Inetum Polska is a prominent member of the global Inetum Group, dedicated to driving digital transformation across businesses and public institutions. With operations in cities such as Warsaw, Poznan, Katowice, Lublin, Rzeszow, and Lodz, we offer a diverse range of IT services. Our commitment to employee development is evident through our support for training, certifications, and participation in technology conferences. We also engage in local social initiatives, promoting charitable projects and an active lifestyle. At Inetum, we celebrate diversity and inclusivity, ensuring equal opportunities for all.Globally, Inetum operates in 19 countries and employs over 28,000 professionals, focusing on four key areas:Consulting (Inetum Consulting): Strategic advisory services that empower organizations to define and implement innovative solutions.Infrastructure and Application Services: Tailored solutions that enhance operational efficiency and foster growth.