Foundation Model DevOps Engineer - AI Research Infrastructure
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Experience
Qualifications
About ifm-us
The Institute of Foundation Models is a pioneering research lab dedicated to the advancement and practical application of foundation models in AI. We focus on nurturing talent and driving innovation that contributes to a robust knowledge economy.
Similar jobs
Browse all companies, explore by city & role, or SEO search pages.
Search for Principal Consultant Devops Ci Cd
84 results
Sonsoft Inc.
Join Sonsoft Inc. as a Principal Consultant specializing in DevOps CI/CD, where you'll lead transformative projects that enhance software delivery and operational efficiency. As a key player in our dynamic team, you will leverage your expertise to design, implement, and optimize CI/CD pipelines, ensuring seamless integration and deployment processes.If you a…
Join our dynamic team at usm2 as a Continuous Integration Engineer. In this crucial role, you will be responsible for designing and implementing robust CI/CD pipelines, ensuring seamless integration and delivery of software products. Collaborate with cross-functional teams to optimize development workflows and enhance software quality.
Sonsoft Inc.
Join Sonsoft Inc. as a Genesys Principal Consultant and be part of a dynamic team dedicated to delivering exceptional customer experience solutions. In this role, you will leverage your expertise in Genesys platforms to design, implement, and support innovative contact center solutions for our clients.We are seeking an individual who is passionate about technology and has a strong background in consulting to help our clients achieve their business goals through effective use of Genesys solutions.
Mindlance
Join our dynamic team at Mindlance as a DevOps Engineer, where you will play a crucial role in optimizing our development and operational processes. Your expertise will be instrumental in deploying, managing, and scaling our applications efficiently.In this role, you will collaborate closely with development teams to automate and streamline our operations and processes. You will also monitor system performance, troubleshoot issues, and ensure the security and scalability of our infrastructure.
Collabera
Join our dynamic team at Collabera as a DevOps Engineer. In this pivotal role, you will be responsible for optimizing and automating our software deployment processes, ensuring seamless integration and delivery of applications. We are looking for an innovative thinker who thrives in a fast-paced environment and is passionate about cloud technologies.
360 IT Professionals
Join our dynamic team at 360 IT Professionals as a DevOps Engineer. In this role, you will collaborate with software developers and IT staff to oversee code releases and manage complex infrastructure systems efficiently. You will play a crucial part in automating processes and ensuring a seamless deployment of applications and services.
Join Sonsoft Inc. as a Genesys Principal Consultant, where you will leverage your expertise in Genesys technology to drive innovative solutions for our clients. You will play a pivotal role in designing, implementing, and optimizing customer experience strategies that enhance client satisfaction and operational efficiency.
360 IT Professionals
Join our team as a DevOps Engineer and play a pivotal role in enhancing our infrastructure and deployment processes. You will work collaboratively with software developers and system operators to build, maintain, and optimize our systems and services. Your expertise will contribute to automating our workflows and ensuring high availability and performance of our applications.
Intuitive Surgical, Inc.
Join our innovative team at Intuitive Surgical, Inc. as a Principal Research Engineer. In this pivotal role, you will lead advanced research initiatives to enhance our cutting-edge surgical technologies. Your expertise will contribute to the development of innovative solutions that improve patient outcomes and streamline surgical procedures.
Intuitive Surgical, Inc.
Join Intuitive Surgical as a Managing Network Principal, where your leadership will guide our network management team towards excellence in operational efficiency and innovation. In this pivotal role, you will oversee the development and implementation of network strategies, ensuring that our systems are robust, secure, and aligned with our organizational goals.
About the RoleCoreWeave operates some of the largest GPU clusters globally. The AI infrastructure behind these clusters plays a crucial role in determining workload placement, resource sharing, and system reliability under continuous pressure.As a Principal Engineer specializing in AI Infrastructure, you will spearhead the design and enhancement of cluster orchestration systems, including Slurm, Kubernetes, SUNK, and the control planes that facilitate AI training, inference, and model onboarding at scale.Your responsibilities will include defining long-term architecture, addressing complex scaling challenges, and establishing technical direction across various teams. Your contributions will significantly impact the speed at which customers can deploy models, the efficiency of GPU utilization, and the overall reliability of the platform at scale.
Cerebras Systems
Cerebras Systems is at the forefront of AI technology, having developed the world's largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip while maintaining the programming simplicity of a single device. This unique approach enables Cerebras to provide unparalleled training and inference speeds, allowing machine learning practitioners to seamlessly run large-scale ML applications without the complexities of managing numerous GPUs or TPUs. Cerebras proudly serves a diverse clientele, including leading model labs, global enterprises, and pioneering AI-native startups. Notably, OpenAI has recently formed a multi-year partnership with Cerebras to harness 750 megawatts of scale, revolutionizing key workloads with ultra high-speed inference. Our groundbreaking wafer-scale architecture ensures that Cerebras Inference stands as the world's fastest solution for Generative AI inference, achieving speeds over ten times faster than GPU-based hyperscale cloud inference services. This remarkable increase in speed is transforming the user experience of AI applications, enabling real-time iterations and enhancing intelligence through additional agentic computation.About The RoleCerebras is expanding its Machine Learning team to spearhead a new initiative that aligns with our existing teams. We are seeking a Principal Investigator to collaborate with our ML leaders in shaping this new effort while building the team and enhancing our capabilities. This new team will work in concert with our current ML divisions: Field ML, which directly engages with customers, Applied ML, which develops new ML capabilities and applications, and Core ML, which adapts ML algorithms to leverage the unique features of Cerebras hardware. The new team may undertake similar or complementary responsibilities.The new team will focus on areas such as:Post-training and reinforcement learning: Enhancing model deployment quality through advanced training, tuning, and reinforcement learning techniques, concentrating on specific downstream tasks;Dataset curation and optimization: Implementing strategies to gather and select high-quality data, facilitating quicker and higher-quality model training and tuning;LLM Pretraining: Engaging in...
About the Institute of Foundation Models At the Institute of Foundation Models, we are on a mission to innovate and enhance the development of foundation models. Our research lab is committed to advancing AI through understanding, utilization, and effective risk management of these models. We aim to empower the next generation of AI developers and contribute significantly to a knowledge-driven economy.Joining our team means you will work at the forefront of foundation model training, collaborating with elite researchers, data scientists, and engineers. You will tackle pivotal challenges in AI development and contribute to the creation of revolutionary AI solutions that could transform various industries. Your strategic and innovative problem-solving skills will play a key role in establishing MBZUAI as a global leader in high-performance computing for deep learning, fostering impactful discoveries that will inspire future AI visionaries.The Role We are in search of a Foundation Model DevOps Engineer who will focus on Operational Stability to support our AI research infrastructure. You will be responsible for creating an efficient environment that facilitates model development. Your role involves building tooling, release pipelines, and storage policies that alleviate burdens on our research team. You will manage the foundational layer, ensuring that researchers have immediate, secure, and reliable access to essential tools, data, and computational resources.Key Responsibilities Model Release Engineering High-Fidelity Release Management: You will uphold the standards of our public presence, ensuring that all releases (weights, code, training logs, data) are reproducible, comprehensively documented, and presented with the professionalism of a leading open-source product.CI/CD for Research: You will design and implement pipelines that automate the testing and packaging of intricate model releases, transitioning us from manual procedures to automated validation.Repo Administration: You will administer the organization’s Git repositories, ensuring optimal performance and accessibility.
Join Apptronik, a leading human-centered robotics company revolutionizing the world with AI-powered robots designed to enhance every aspect of life. Our flagship humanoid robot, Apollo, is engineered for seamless collaboration with people, initially focusing on critical sectors like manufacturing and logistics, and poised for future applications in healthcare, domestic environments, and more.We are at the forefront of embodied AI, utilizing our extensive expertise across the entire robotics stack to address some of society's most pressing challenges. As part of our team, you will play a pivotal role in scaling Apollo for market readiness, navigating complex issues around safety, commercialization, and mass production to make a positive impact on the world.JOB SUMMARYWe are in search of a Principal Engineer to spearhead the development of high-performance embedded AI systems and advanced simulation infrastructure for our humanoid robots. This position emphasizes GPU-centric workload orchestration, graphics-driven simulation performance, and robust on-device AI execution.The ideal candidate will possess extensive experience in graphics, display systems, Linux platforms, and low-level embedded software, enabling them to enhance simulation fidelity and optimize real-time AI workloads across robotic platforms.ESSENTIAL DUTIES AND RESPONSIBILITIESGPU Workload OrchestrationArchitect and implement a pipeline for the effective utilization of GPUs across various concurrent AI workloads.Design and develop schedulers and runtime systems to coordinate perception, planning, and control models on-device.Optimize latency, throughput, and power efficiency for real-time robotic operations.On-Device AI SystemsEnhance the robustness and reliability of deployed AI models in constrained embedded environments.Facilitate efficient execution of multi-model pipelines (vision, tracking, control).Collaborate with ML teams to co-design models and runtime systems effectively.Simulation PerformanceLead initiatives to significantly boost simulation throughput and realism.Optimize rendering, physics integration, and data pipelines through graphics expertise.Align simulation outputs with the requirements of real-world deployments.Embedded Systems & Platform IntegrationOversee low-level system integration across Linux-based platforms.Collaborate across kernel, drivers, HAL, and user-space layers to ensure seamless operation.
We are seeking a skilled Senior DevOps Engineer with expertise in MESOS and Marathon to join our dynamic team in Sunnyvale, CA. In this role, you will be responsible for implementing and optimizing our cloud infrastructure, ensuring high availability and scalability of applications.The ideal candidate will have a proven track record in DevOps practices, strong problem-solving skills, and the ability to work collaboratively in a fast-paced environment. You will be instrumental in driving our DevOps initiatives forward and enhancing our deployment processes.
Cerebras Systems
Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!
At SpaceX, we believe that a future where humanity explores the stars is immensely more thrilling than one where we remain Earth-bound. Our mission is to develop cutting-edge technologies that will make this vision a reality, with the ultimate goal of facilitating human life on Mars.Lead Principal DFT Engineer - Silicon EngineeringWe are looking for an enthusiastic and innovative engineer to join our elite team at SpaceX, where we leverage our expertise in rocket and spacecraft development to enhance Starlink – the most advanced broadband internet system in the world. Starlink is the largest satellite constellation ever built, delivering fast and reliable internet connectivity to millions of users globally. Our team designs, builds, tests, and operates every aspect of this groundbreaking system, from thousands of satellites to user-friendly consumer receivers and the integrated software that powers it all.As a Lead Principal DFT Engineer, you will collaborate with a diverse group of experts across multiple disciplines, including systems, firmware, architecture, design, validation, product engineering, and ASIC implementation. In this pivotal role, you will be responsible for developing next-generation ASICs that will be deployed both in space and on the ground, expanding connectivity to previously unreachable locations. Your contributions will be vital in advancing the performance and functionalities of the Starlink network.
Artech Information Systems LLC
Join our dynamic team as a DevOps Engineer at Artech Information Systems LLC, where you'll be at the forefront of enhancing our software development processes. In this role, you will collaborate with cross-functional teams to streamline operations and implement robust DevOps practices that drive efficiency and innovation.
Sonsoft Inc.
We are seeking a talented and experienced Ab Initio Consultant to join our dynamic team at Sonsoft Inc. In this role, you will leverage your expertise in Ab Initio to help our clients optimize their data processing and integration solutions. As an integral part of our team, you will work closely with various stakeholders to understand their business needs and deliver tailored solutions.
Sonsoft Inc.
Join Sonsoft Inc. as an SDN & Networking Consultant, where you will leverage your expertise in Software Defined Networking to deliver innovative solutions for our clients. In this role, you will collaborate with cross-functional teams to design, implement, and optimize networking solutions that meet the evolving demands of our clients.
Sign in to browse more jobs
Create account — see all 84 results
