companyBraintrust logo

Cloud Infrastructure Engineer at braintrust | San Francisco

BraintrustSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

5+ years of experience in DevOps, SRE, or Infrastructure Engineering roles. In-depth expertise with Terraform and familiarity with major cloud providers, especially AWS. Strong skills in Kubernetes for deploying and managing workloads. Proficiency in at least one programming language (Python, Typescript, or Go). Experience in supporting production systems and incident management. Excellent communication skills, particularly in a customer support context.

About the job

About Us

Braintrust is at the forefront of AI observability, seamlessly integrating evaluations and observability into a single workflow. Our platform empowers innovators by providing them with the critical insights needed to understand AI performance in production environments and the tools required to enhance it.

Recognized by leading companies such as Notion, Stripe, Zapier, Vercel, and Ramp, Braintrust enables teams to compare AI models, test prompts, and detect regressions, transforming production data into superior AI with each iteration.

Role Overview

We are seeking a talented Cloud Infrastructure Engineer to join our team and contribute to the development of a robust and scalable infrastructure. You will provide developers with a premium platform to deploy code efficiently and confidently. Your role will involve leading initiatives across Terraform, Kubernetes, CI/CD, observability, and support, significantly impacting Braintrust's internal operations and the self-hosted experiences of our customers.

This position is pivotal as you will manage our AWS environment while assisting customers in deploying our infrastructure on AWS, Azure, and GCP.

Your Responsibilities

  • Develop and maintain Terraform modules for both internal infrastructure and customer deployments.

  • Engage directly with customers via Slack to assist with self-hosting and troubleshoot infrastructure challenges, creating tools to simplify their support process.

  • Take ownership of our CI/CD pipeline, aiming to reduce build times, enhance failure visibility, and facilitate safer, quicker releases.

  • Centralize and scale observability through logs, metrics, dashboards, and alerts.

  • Collaborate with engineering teams to create and enhance a secure, developer-friendly infrastructure platform.

  • Support multi-cloud deployment strategies, primarily in AWS, while also extending support for Azure and GCP for our enterprise clientele.

  • Implement tools and automation to bolster deployment, rollback, and infrastructure reliability.

Ideal Candidate Profile

  • A minimum of 5 years of experience in DevOps, SRE, or Infrastructure Engineering roles.

  • In-depth knowledge of Terraform and experience with at least one major cloud provider, preferably AWS.

  • Proficient in Kubernetes, with capabilities in deploying, debugging, and scaling real workloads.

  • Strong programming skills in scripting languages like Python, Typescript, or Go.

  • Experience in supporting production systems and managing incidents effectively.

  • Comfortable working closely with customers in a support or deployment capacity.

  • Bonus: Familiarity with monitoring and logging tools, as well as knowledge of security best practices.

About Braintrust

Braintrust is an innovative AI observability platform that connects evaluations and observability into a streamlined workflow, enhancing AI performance and enabling teams to transform production data into improved AI solutions.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.