About the job
About the Role
At bjakcareer, we are pioneering an advanced AI system designed to grasp context across interactions, strategize actions, and drive work forward effectively over time.
In this pivotal role, you will transform research insights into operational, production-ready machine learning systems. You will take charge of the execution layer of our AI capabilities, overseeing training pipelines, inference systems, evaluation tools, and deployment processes.
Key Responsibilities
- Develop and manage comprehensive ML pipelines encompassing data preparation, model training, evaluation, inference, and deployment.
- Refine and customize models utilizing cutting-edge techniques such as LoRA, QLoRA, SFT, DPO, and distillation.
- Design and implement scalable inference frameworks, optimizing for latency, cost, and reliability.
- Establish and sustain data systems to ensure high-quality synthetic and real-world training datasets.
- Execute evaluation pipelines focused on performance, robustness, safety, and bias, in collaboration with research leadership.
- Oversee production deployment, including GPU optimization, memory management, latency mitigation, and scaling strategies.
- Collaborate closely with application engineers to seamlessly integrate ML systems into backend, mobile, and desktop applications.
- Make practical trade-offs and deliver enhancements swiftly, learning from real-world application.
- Operate within real production constraints: latency, cost, reliability, and safety.
Qualifications
- Strong expertise in deep learning and transformer-based architectures.
- Hands-on experience in training, fine-tuning, or deploying large-scale ML models in a production environment.
- Proficiency in at least one modern ML framework such as PyTorch or JAX, with a quick adaptability to others.
- Experience with distributed training and inference frameworks like DeepSpeed, FSDP, Megatron, ZeRO, or Ray.
- Robust software engineering skills, capable of writing maintainable, production-level code.
- Familiarity with GPU optimization techniques, including memory efficiency, quantization, and mixed precision.
- Comfortable with owning and developing ambiguous, zero-to-one ML projects from inception to completion.
- A strong inclination towards rapid deployment, iterative learning, and system enhancements.
