About the job
About Us
Odyssey is at the forefront of artificial intelligence research, specializing in general-purpose world models that are revolutionizing consumer, enterprise, and intelligence applications. Our innovative models, such as the Odyssey-2 Pro, represent the next significant advancement in AI technology.
Position Overview
We are on the lookout for passionate individuals who are dedicated to maximizing performance from complex systems. Our goal is to develop inference infrastructure capable of scaling to hundreds of thousands of users within a year while handling vast and continuously expanding datasets. Your role will be critical in ensuring our models achieve outstanding speed, reliability, and scalability during both training and inference phases, optimizing efficiency to minimize TFLOPS per user and the costs associated with training compute.
Key Responsibilities
- Enhance models for real-time use by a user base in the hundreds of thousands.
- Design and execute distributed training strategies aimed at reducing training time and resource usage across extensive GPU clusters.
- Collaborate with a high-caliber team of ML researchers and engineers to ensure model architectures are performance-driven from the start.
- Create advanced tools to pinpoint performance issues and stability challenges in both training and deployment environments.
- Innovate new approaches, frameworks, and system designs that improve performance metrics throughout our model development and inference infrastructure.
- Enjoy a considerable degree of autonomy in making technical decisions.
- Utilize state-of-the-art GPUs in your work.

