About the job
As part of the DatahubV2 development, we are seeking a skilled Data Engineer who is well-versed in Java, Spark, and modern data architectures.
In this role, you will be a key member of the Data Foundation team, responsible for the construction, optimization, and reliability of data pipelines while ensuring adherence to best development practices and performance standards.
Your main responsibilities:
- Design, develop, and optimize efficient data pipelines (batch processing, distributed computing).
- Implement Spark 3 processes running on Kubernetes (Spark as a Service).
- Manipulate DatahubV2 data using Starburst (Trino) with SQL.
- Build and orchestrate data workflows on Astronomer / Apache Airflow using Python.
- Conduct application integrations in Java (and potentially Scala depending on applications).
- Ensure the quality, performance, and reliability of data jobs.
- Contribute to the setup and maintenance of DevOps pipelines (Gitlab, Jenkins, ArgoCD…).
- Monitor and analyze application logs via ELK (Log as a Service).
- Collaborate with the Data Foundation team, leveraging and evolving internal Python libraries.
- Participate in the documentation and continuous improvement of DatahubV2 practices.

