About the job
Join our team as a Senior Data Engineer, where you will play a pivotal role in building and managing scalable data ingestion and Change Data Capture (CDC) capabilities on our Azure-based Lakehouse platform. Your expertise will drive our engineering maturity as we deliver ingestion and CDC preparation through Python projects and reusable frameworks. We are seeking a professional who applies best software engineering practices, including clean architecture, rigorous testing, code reviews, effective packaging, CI/CD, and operational excellence.
Our platform emphasizes batch-first processing, allowing for the landing of streaming sources in their raw form while processing them in batch. We are selective in our evolution towards streaming as necessary.
As part of the Common Data Intelligence Hub, you will collaborate closely with data architects, analytics engineers, and solution designers to create robust data products and ensure governed data flows across the enterprise.
- Your team is responsible for end-to-end ingestion and CDC engineering, including design, build, operation, observability, reliability, and reusable components.
- You will contribute to the development of platform standards, including contracts, layer semantics, and readiness criteria.
- While you will not primarily manage cloud infrastructure provisioning, you will work with the platform team to define requirements, review changes, and maintain deployable code for pipelines and jobs.
Platform Data Engineering & Delivery
- Design and develop ingestion pipelines utilizing Azure and Databricks services, including Azure Data Factory pipelines and Databricks notebooks/jobs/workflows.
- Implement and manage CDC patterns for inserts, updates, and deletes, accommodating late arriving data and reprocessing strategies.
- Structure and maintain bronze and silver Delta Lake datasets, focusing on schema enforcement, de-duplication, and performance tuning.
- Create “transformation-ready” datasets and interfaces with stable schemas, contracts, and metadata expectations for analytics engineers and downstream modeling.
- Adopt a batch-first approach for data ingestion, ensuring raw landing, replayability, and idempotent batch processing while progressing towards true streaming as required.
Software Engineering for Data Frameworks
- Develop and maintain Python-based ingestion and CDC components as production-grade software, focusing on modules, packaging, versioning, and releases.
- Implement engineering best practices such as code reviews, unit/integration tests, static analysis, formatting/linting, type hints, and comprehensive documentation.
- Establish and enhance CI/CD pipelines for data engineering code and pipeline assets, covering build, testing, security checks, deployment, and rollback patterns.

