companyxAI logo

Site Reliability Engineer (SRE)

xAILondon, UK
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Expertise in Kubernetes and continuous deployment systems (Buildkite, ArgoCD), proficiency in monitoring tools (Prometheus, Grafana, PagerDuty), strong knowledge of infrastructure as code (Pulumi, Terraform), familiarity with systems programming languages (Rust, C++, Go), and experience with traffic management (nginx, envoy).

About the job

About xAI

At xAI, our mission is to develop advanced AI systems that can comprehend the universe and assist humanity in its quest for knowledge. Our dedicated team is small, highly motivated, and committed to engineering excellence, making it an ideal environment for individuals who thrive on challenges and curiosity. We foster a flat organizational structure where every employee plays a crucial role in driving our mission forward. We value initiative and excellence, rewarding those who consistently demonstrate strong work ethic and prioritization skills. Effective communication is essential, and all team members are expected to share their insights clearly and concisely.

About the Team

You will join a team responsible for the backend services that power our innovative products, including grok.com and our API. Our focus is on developing and maintaining highly scalable and reliable services capable of efficiently processing tens of thousands of queries per second, hosted across multiple Kubernetes clusters in both on-premises and cloud environments.

About the Role

We are looking for a candidate who meets the following criteria:

  • In-depth expertise in Kubernetes.
  • Proficiency with continuous deployment systems, including Buildkite and ArgoCD.
  • Extensive experience with monitoring tools such as Prometheus, Grafana, and PagerDuty.
  • Strong knowledge of infrastructure as code practices utilizing tools like Pulumi or Terraform.
  • Familiarity with systems programming languages such as Rust, C++, or Go.
  • Experience in traffic management and HTTP proxies, such as nginx and envoy.

Location

This position requires in-person attendance in London, UK. While we typically work from the office five days a week, we do provide flexibility for remote work when necessary. Candidates should be prepared to attend late meetings at least once a week to coordinate with our global teams.

About xAI

xAI is a pioneering company dedicated to creating AI systems that enhance our understanding of the universe and support humanity's intellectual pursuits. Our team embodies motivation and engineering excellence, encouraging curiosity and problem-solving in a collaborative environment.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.