About the job
ABOUT MITHRL
At Mithrl, we envision a future where groundbreaking medicines are delivered to patients within months rather than years, and scientific discoveries unfold at an extraordinary pace.
We are pioneering the first commercially available AI Co-Scientist, a revolutionary discovery engine that converts chaotic biological data into actionable insights in mere minutes. Scientists can pose questions in natural language, and Mithrl provides comprehensive analyses, identifies novel targets, formulates hypotheses, and generates patent-ready reports.
Our remarkable growth:
Achieved 12X revenue growth year-over-year
Trusted by leading biotech firms and major pharmaceutical companies across three continents
Facilitating significant breakthroughs from target discovery to improved patient outcomes.
ROLE OVERVIEW
We are seeking a Data Engineer specializing in Knowledge Graphs to develop the foundational infrastructure for Mithrl’s biological knowledge layer. You will collaborate closely with the Data Scientist focused on Knowledge Graphs to convert curated knowledge sources into scalable, dependable, and production-ready systems that support our entire platform.
Your responsibilities will include constructing ETL pipelines for extensive biological datasets, designing schemas and storage models for graph-structured data, and developing APIs that enable ML engineers and application teams to efficiently query and utilize the knowledge graph. Additionally, you will be responsible for ensuring the reliability, performance, and versioning of the knowledge graph infrastructure across releases.
This position serves as a vital link between biological knowledge ingestion and the high-performance engineering systems that apply it. If you are passionate about data modeling, schema design, graph storage, ETL processes, and scalable infrastructures, this role offers a unique opportunity to significantly influence the intelligence layer at Mithrl.
KEY RESPONSIBILITIES
Develop and maintain ETL pipelines for large-scale public biological datasets and curated knowledge sources
Design, implement, and refine schemas and storage models for graph-structured biological data
Create effective APIs and query interfaces that enable internal teams and AI systems to access nodes, relationships, pathways, annotations, and perform graph analytics
Collaborate with Data Scientists to operationalize curated relationships, harmonized variable IDs, metadata standards, and ontologies.

