Qualifications
ResponsibilitiesMulti-Modal Ingestion Pipeline: Design ETL/ELT pipelines to extract, decode, and store raw EO and IR video data in optimized formats such as WebDataset, TFRecords, or Parquet. Sensor Synchronization & Alignment: Create algorithms to synchronize EO and IR frames temporally and spatially for model training. High-Throughput Data Loading: Construct storage-to-GPU pipelines to ensure high GPU utilization across multi-node training clusters without I/O bottlenecks. Distributed Processing: Develop and optimize distributed data processing tasks using tools like Apache Spark, Ray, or Apache Beam to handle extensive tactical video logs. Data Quality & Versioning: Set up automated quality checks to filter out corrupted or blank frames and ensure reproducible training runs through effective versioning and lineage tracking. Infrastructure Evaluation: Evaluate and implement advanced storage solutions (e.g., MinIO, S3 tiering) to efficiently manage expanding datasets while optimizing for cost and performance.
About the job
About Us
At Harmattan AI, we are at the forefront of developing autonomous and scalable defense systems. Recently, we successfully closed a $200M Series B funding round, elevating our valuation to $1.4 billion. As we expand our teams and capabilities, we are committed to delivering mission-critical systems to allied forces.
Our mission is driven by core values: creating impactful technologies, striving for excellence, setting ambitious goals, and tackling the most challenging technical problems. In our rigorous work environment, we expect ownership, precision, and effective execution.
About the Role
As a Data Engineer within our Foundational team, you will be essential in building the robust data infrastructure that supports our advanced deep learning initiatives. Based in Paris, your role will involve managing extensive volumes of raw, unstructured video data (Electro-Optical and Infrared). Your objective will be to enhance the efficiency of our ML engineers by streamlining data access and ensuring reliable data pipelines.
About Harmattan AI
Harmattan AI is revolutionizing defense technology with a focus on autonomous and scalable systems. With a recent funding boost of $200 million, we are enhancing our capabilities to support allied forces with cutting-edge solutions.