About the job
About Basis
Basis is a pioneering nonprofit organization dedicated to applied AI research, driven by two key objectives.
The first objective is to comprehend and develop intelligence. This encompasses establishing the mathematical foundations of reasoning, learning, decision-making, understanding, and explanation, while also creating software that embodies these principles.
The second objective is to enhance society’s capacity to tackle complex challenges. This involves broadening the scope, scale, and complexity of problems we can address today, and crucially, accelerating our capacity to solve future problems.
To fulfill these missions, we are constructing an innovative technological infrastructure inspired by human reasoning, along with a collaborative organization that prioritizes human values.
About the Role
As an ML Systems Engineer at Basis, you will ensure that our training and evaluation infrastructure is fast, reliable, and scalable. You will manage the entire stack, from distributed training frameworks to cloud administration, enabling researchers to rapidly iterate on complex models while efficiently managing computational resources.
We are seeking engineers who possess a profound understanding of ML systems paired with operational excellence. The ideal candidate will have experience in distributed training at scale, expertise in debugging numerical instabilities, and the ability to manage cloud infrastructure that seamlessly transitions from experimentation to production. You will be the steward of training stability, an optimizer of computational costs, and a facilitator of reproducible research.
This position encompasses both traditional ML engineering and cloud/DevOps responsibilities. You will oversee GPU clusters, optimize cloud expenditures, ensure security and compliance, and build the infrastructure that allows researchers to focus on algorithms rather than operations.
We are looking for individuals who are committed to developing robust ML infrastructure, maintaining a culture of documentation for issues and solutions, and prioritizing operational excellence as a core value.

