About the job
Site Reliability Engineer
Location: San Francisco, CA (5 Days In-Office)
As a Site Reliability Engineer at Latent, you will be the backbone of our infrastructure, ensuring the exceptional stability and performance of our cutting-edge clinical AI platform that serves major health systems. Your role is pivotal in enhancing operational excellence, directly impacting patient access to critical treatments.
What Makes a Great Engineer at Latent
We seek individuals who are not just technically skilled but also passionate about ownership and high standards. You will thrive in our dynamic, in-office culture where teamwork and a winning mentality are key.
Tool Proficiency: You are highly adept with your tools, fluent in command line operations, and skilled in keyboard shortcuts.
Ownership: You take pride in managing complex systems and have a successful history of scaling mission-critical deployments.
Automation Drive: You have a passion for automation, consistently seeking innovative methods to enhance efficiency and establish operational excellence.
Problem Solver: You proactively address challenges, stepping in to resolve issues without waiting for others.
Your Responsibilities
As our SRE, you will take full ownership of the production environment and enhance the developer experience:
Infrastructure Ownership: Design, implement, and maintain a robust production environment, having experience with over 500 machine deployments.
Kubernetes Mastery: Utilize your expertise in Kubernetes and Helm to manage our containerized infrastructure, ensuring optimal deployment, scalability, and operational health.
CI/CD & Deployment Optimization: Streamline the deployment pipelines for TypeScript and Python/ML, supporting rapid feature releases while upholding top-notch reliability.
DevX Support: Enhance developer workflows by supporting Developer Experience (DevX) initiatives to improve tool proficiency and CI/CD systems.
Infrastructure as Code (IaC): Manage infrastructure definitions using Terraform.

