Bjak logoBjak logo

Lead Principal Machine Learning Engineer

BjakUnited States
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Mid to Senior

Qualifications

QualificationsStrong expertise in deep learning and transformer-based architectures. Hands-on experience in training, fine-tuning, or deploying large-scale ML models in a production environment. Proficiency in at least one modern ML framework such as PyTorch or JAX, with a quick adaptability to others. Experience with distributed training and inference frameworks like DeepSpeed, FSDP, Megatron, ZeRO, or Ray. Robust software engineering skills, capable of writing maintainable, production-level code. Familiarity with GPU optimization techniques, including memory efficiency, quantization, and mixed precision. Comfortable with owning and developing ambiguous, zero-to-one ML projects from inception to completion. A strong inclination towards rapid deployment, iterative learning, and system enhancements.

About the job

About the Role

At bjakcareer, we are pioneering an advanced AI system designed to grasp context across interactions, strategize actions, and drive work forward effectively over time.

In this pivotal role, you will transform research insights into operational, production-ready machine learning systems. You will take charge of the execution layer of our AI capabilities, overseeing training pipelines, inference systems, evaluation tools, and deployment processes.

Key Responsibilities

  • Develop and manage comprehensive ML pipelines encompassing data preparation, model training, evaluation, inference, and deployment.
  • Refine and customize models utilizing cutting-edge techniques such as LoRA, QLoRA, SFT, DPO, and distillation.
  • Design and implement scalable inference frameworks, optimizing for latency, cost, and reliability.
  • Establish and sustain data systems to ensure high-quality synthetic and real-world training datasets.
  • Execute evaluation pipelines focused on performance, robustness, safety, and bias, in collaboration with research leadership.
  • Oversee production deployment, including GPU optimization, memory management, latency mitigation, and scaling strategies.
  • Collaborate closely with application engineers to seamlessly integrate ML systems into backend, mobile, and desktop applications.
  • Make practical trade-offs and deliver enhancements swiftly, learning from real-world application.
  • Operate within real production constraints: latency, cost, reliability, and safety.

Qualifications

  • Strong expertise in deep learning and transformer-based architectures.
  • Hands-on experience in training, fine-tuning, or deploying large-scale ML models in a production environment.
  • Proficiency in at least one modern ML framework such as PyTorch or JAX, with a quick adaptability to others.
  • Experience with distributed training and inference frameworks like DeepSpeed, FSDP, Megatron, ZeRO, or Ray.
  • Robust software engineering skills, capable of writing maintainable, production-level code.
  • Familiarity with GPU optimization techniques, including memory efficiency, quantization, and mixed precision.
  • Comfortable with owning and developing ambiguous, zero-to-one ML projects from inception to completion.
  • A strong inclination towards rapid deployment, iterative learning, and system enhancements.

About Bjak

bjakcareer is at the forefront of AI innovation, dedicated to creating intelligent systems that enhance decision-making and streamline workflows. Join us and play a significant role in shaping the future of AI technology.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.