About the job
Our Mission
Aleph Alpha stands at the forefront of foundation model pre-training in Europe. Our clients across finance, manufacturing, and public administration require models that not only understand the German language but also comply with European regulations and operate effectively in high-stakes environments. Our innovative journey is based in Heidelberg.
As we expand our pre-training team, we seek a dedicated professional to take charge of model evaluation, shaping the metrics that define success, developing the systems for measurement, and providing our training team the insights necessary for confident iterations.
The Role
As a Senior AI Engineer focused on Pre-training Evaluation, you will engage in comprehensive evaluation processes, from designing methodologies to implementation and analysis. Your work will involve tasks such as benchmark curation, evaluating the effectiveness of various metrics and their predictive capabilities for downstream performance, and optimizing evaluation pipelines and dashboards.
We are searching for a candidate who blends extensive research experience with exceptional engineering skills. Your evaluations will significantly influence our training direction, data priorities, and resource allocation, allowing you to impact the models we deliver directly.
This role is part of Aleph Alpha Research.
Your Responsibilities
- Own benchmarks end-to-end: Select, implement, and maintain the evaluation suite used during pre-training, encompassing dataset curation, scoring infrastructure, and result analysis.
- Build evaluation infrastructure: Develop and optimize pipelines for evaluations against training checkpoints, ensuring speed, reliability, and reproducibility.
- Design aggregation and reporting: Define how benchmark results inform training decisions, building tools for result interpretation.
- Close capability gaps: Collaborate with product and post-training teams to pinpoint areas for improvement and create benchmarks to measure progress.
- Own German evaluation: Guarantee thorough assessments of German language capabilities, integral to our value proposition.
- Correlate signals: Identify which pre-training metrics predict downstream and system-level performance.
