About the job
ABOUT MITHRL
We envision a world where innovative drugs and therapies reach patients in months rather than years, expediting breakthroughs that save lives.
Mithrl is at the forefront of creating the world's first commercially available AI Co-Scientist—an advanced discovery engine that enables life science teams to transform chaotic biological data into insightful discoveries in mere minutes. Scientists can pose questions in natural language, and Mithrl responds with genuine analysis, innovative targets, and patent-ready reports.
Our success is evident:
12X year-over-year revenue growth
Trusted by leading biotech firms and major pharmaceutical companies across three continents
Driving significant breakthroughs from target discovery to patient outcomes.
WHAT YOU WILL DO
Take the lead in creating and managing an AI-driven data ingestion and normalization pipeline to assimilate data from diverse sources—ranging from raw Excel/CSV uploads to lab and instrument exports, as well as processed outputs from internal systems.
Develop comprehensive schema mapping, coercion, and conversion logic, including units normalization, metadata standardization, variable-name harmonization, addressing vendor-instrument peculiarities, plate-reader formats, reference-genome or annotation updates, and batch-effect corrections.
Utilize LLM-driven and classical data-engineering tools to structure semi-structured or messy tabular data, focusing on metadata extraction, inferring column roles/types, cleaning free-text headers, resolving inconsistencies, and preparing final clean datasets.
Ensure that all transformations that must occur only once—such as normalization, coercion, and batch-correction—are executed during ingestion, ensuring that downstream analytics and the AI Co-Scientist operate with clean, canonical data.
Establish validation, verification, and quality control measures to detect ambiguous, inconsistent, or corrupted data before it enters the platform.
Collaborate with product teams, data science/bioinformatics colleagues, and infrastructure engineers to define and uphold data standards, ensuring that pipeline outputs integrate smoothly into downstream analysis and storage systems.
WHAT YOU BRING
Must-have:
5+ years of experience in data engineering or data wrangling with real-world tabular or semi-structured data.
Strong proficiency in Python,

