Site Reliability Engineer At Hebbia New York City Or San Francisco jobs in San Francisco – Browse 11,459 openings on RoboApply Jobs

Site Reliability Engineer at Hebbia | New York City or San Francisco

HebbiaNew York City; San Francisco, CA

On-site Full-time $160K/yr - $300K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Experience Level

Entry Level

About the job

About Hebbia

Hebbia is an innovative AI platform designed for investors and bankers, focused on generating alpha and driving exceptional financial outcomes.

Founded in 2020 by George Sivulka and supported by notable investors such as Peter Thiel and Andreessen Horowitz, Hebbia empowers investment decisions for prominent firms including BlackRock, KKR, Carlyle, and Centerview, among 40% of the world’s largest asset managers. Our flagship product, Matrix, is known for its unmatched accuracy, speed, and transparency in AI-driven analysis, and is trusted to manage over $30 trillion in assets worldwide.

We provide the intelligence that gives finance professionals a decisive edge. Our AI uncovers insights that are beyond human perception, reveals hidden opportunities, and accelerates decision-making with extraordinary speed and confidence. We don’t just streamline workflows; we revolutionize capital deployment, risk management, and value creation across markets.

Hebbia is not merely a tool; it is the competitive advantage that enhances performance, alpha, and market leadership.

The Role

We are seeking a Site Reliability Engineer who prioritizes software engineering principles. You will take ownership of critical production systems from design to deployment, focusing on building and optimizing rather than just operating. Your responsibilities will include writing production-quality code to ensure platform reliability at scale, collaborating with product engineering teams to influence architecture from the outset, and developing essential internal tools that support every engineer at Hebbia. This role is not about managing tickets; it is primarily about coding, instrumenting services, resolving performance bottlenecks, creating deployment platforms, and translating incident analyses into meaningful architectural enhancements.

Responsibilities

Take full responsibility for critical production services, overseeing design, code review, deployment, operations, and incident response.
Profile, benchmark, and refactor critical paths to eliminate bottlenecks as Hebbia scales.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

1 - 20 of 11,459 Jobs

Select all on this page (20)

Apply

Site Reliability Engineer at Hebbia | New York City or San Francisco

Hebbia

Full-time|$160K/yr - $300K/yr|On-site|New York City; San Francisco, CA

About Hebbia Hebbia is an innovative AI platform designed for investors and bankers, focused on generating alpha and driving exceptional financial outcomes. Founded in 2020 by George Sivulka and supported by notable investors such as Peter Thiel and Andreessen Horowitz, Hebbia empowers investment decisions for prominent firms including BlackRock, KKR, Carlyl…

Feb 27, 2026

Apply

Integrations Engineer at Hebbia | New York City, San Francisco

Hebbia

Full-time|$160K/yr - $265K/yr|On-site|New York City; San Francisco, CA

About HebbiaHebbia is an innovative AI platform designed for investors and bankers, delivering exceptional insights that generate alpha and maximize potential returns.Founded in 2020 by George Sivulka and supported by notable investors such as Peter Thiel and Andreessen Horowitz, Hebbia empowers investment decisions for major firms including BlackRock, KKR, Carlyle, and Centerview, along with 40% of the world's largest asset managers. Our flagship product, Matrix, offers unparalleled accuracy, speed, and transparency in AI-driven analysis, entrusted to manage over $30 trillion in global assets.We empower finance professionals with intelligence that provides a definitive competitive edge. Our AI uncovers unseen signals, reveals hidden opportunities, and accelerates decision-making with unmatched speed and confidence. We don't just enhance workflows; we revolutionize how capital is allocated, risk is managed, and value is created in the markets.Hebbia is not merely a tool; it is the competitive advantage that drives performance, alpha, and market leadership.The TeamThe Integrations team serves as the crucial link between Hebbia’s AI platform and the diverse data sources our customers depend on. We design and maintain the pipelines that integrate content from enterprise systems—such as Snowflake, S3, SharePoint, Dropbox, and more—enabling Hebbia to effectively analyze it.This role involves critical infrastructure work that directly impacts our customers. When integrations are swift, reliable, and seamless, our clients trust Hebbia with their most vital workflows. Conversely, any disruption is immediately felt. Our team is responsible for the entire process: creating new connectors, reinforcing existing ones, and ensuring a continuous data flow.The RoleThis is not a standard backend engineering position with just a few integrations. You will primarily focus on developing, troubleshooting, and managing data integrations across numerous third-party platforms. This hands-on role directly influences customer experiences and requires a passion for delivering exceptional solutions.

Mar 20, 2026

Apply

Engineering Manager at Hebbia | San Francisco, CA

Hebbia

Full-time|$200K/yr - $300K/yr|On-site|New York City; San Francisco, CA

About HebbiaHebbia is an innovative AI platform designed specifically for investors and bankers, enabling them to generate alpha and maximize returns. Founded in 2020 by George Sivulka and supported by notable investors like Peter Thiel and Andreessen Horowitz, Hebbia is transforming investment strategies for leading firms including BlackRock, KKR, Carlyle, and Centerview. Our flagship product, Matrix, is renowned for its unmatched accuracy, speed, and transparency in AI-driven analysis, managing assets exceeding $30 trillion globally.At Hebbia, we equip finance professionals with unparalleled intelligence that not only uncovers hidden signals but also accelerates decision-making processes. We revolutionize capital deployment, risk management, and value creation across markets, ensuring our clients remain leaders in their fields.The RoleAs an Engineering Manager at Hebbia, you will play a pivotal role as a technical leader, actively involved in coding, architecture decisions, and guiding your team through complex challenges. This position is not solely about management; it demands strong technical skills and a hands-on approach. You will be responsible for setting the technical direction, resolving blockers, elevating quality standards, and delivering products that our clients depend on for informed investment decisions.Collaboration is key in this role, as you will partner with your product counterpart to drive initiatives and, in their absence, take the lead on product direction. A deep understanding of product intuition and business acumen is essential, alongside your technical expertise. You will also be tasked with hiring, mentoring, and developing your team members to foster their growth. Expect our interview process to reflect our work culture, including discussions on systems design, coding, architecture, and leadership.

Mar 30, 2026

Apply

Data Engineer at Hebbia | New York City, San Francisco

Hebbia

Full-time|$190K/yr - $250K/yr|Remote|New York City; San Francisco, CA

Join the Hebbia Team Hebbia is pioneering an AI platform that empowers investors and bankers, enabling them to generate alpha and unlock potential financial upside. Founded in 2020 by George Sivulka and proudly backed by Peter Thiel and Andreessen Horowitz, Hebbia is at the forefront of transforming investment strategies for major players like BlackRock, KKR, Carlyle, and Centerview, serving 40% of the world’s largest asset managers. Our flagship product, Matrix, is renowned for its unparalleled accuracy, speed, and transparency in AI-driven analysis, currently managing over $30 trillion in global assets. We don’t just optimize workflows; we redefine capital deployment, risk management, and value creation across diverse markets. Hebbia is not merely a tool; it embodies the competitive edge that propels performance and leadership in the finance sector.

Feb 11, 2026

Apply

Senior Site Reliability Engineer at Drata | San Francisco

Drata

Full-time|$166.9K/yr - $225.9K/yr|Hybrid|Hybrid - San Francisco

Drata helps organizations demonstrate their commitment to security and integrity. The platform supports companies as they build and maintain trust with users, customers, partners, and prospects. Values Built on Trust: Consistency shapes decisions and actions. Integrity: Choosing to do what is right, every time. Customer-Obsessed: Prioritizing customer needs above all else. Competitive Fire: Striving for higher standards and greater achievements. Diversity: Welcoming different perspectives to encourage creative solutions. Automation First: Pursuing efficiency by saving time and resources wherever possible. How the Team Works Drata blends high standards with a supportive environment focused on growth. Team members are encouraged to own their work, improve continuously, and deliver meaningful results. The company values quick, informed decisions that drive immediate impact, while always keeping the mission and customer needs at the center. The San Francisco-based team uses a hybrid work model. Colleagues collaborate in the office Tuesday through Thursday, focusing on alignment and innovation. Mondays and Fridays offer flexibility for deep work or personal needs. Growth and Culture Drata has expanded to over 600 professionals worldwide, recognized for a culture that values trust, speed, and continuous learning. The environment supports both personal and professional development. See the Speed: CEO Adam Markowitz discusses Drata’s rapid journey to $100M ARR in four years. Hear the Voice of the Team: Employee stories highlight collaboration and growth at Drata.

Apr 27, 2026

Apply

Site Reliability Engineer at Mercor | San Francisco

Mercor

Full-time|On-site|San Francisco

Join the Mercor TeamAt Mercor, we stand at the dynamic intersection of labor markets and AI research. Collaborating with premier AI labs and enterprises, we empower the human intelligence that is crucial for AI's evolution.Our expansive talent network plays a vital role in training cutting-edge AI models, akin to the way educators impart knowledge to their students—by sharing insights, experiences, and contextual understanding that code alone cannot convey. Currently, our network of over 30,000 experts generates more than $2 million daily.We are pioneering a novel category of work where expertise fuels AI progress. Achieving this vision necessitates an ambitious, fast-paced, and deeply dedicated team. You will collaborate with researchers, operators, and AI firms that are at the forefront of transforming societal structures.Mercor is a thriving Series C company with a valuation of $10 billion. We operate five days a week in-person at our new headquarters in San Francisco.About the RoleAs a Site Reliability Engineer (SRE) at Mercor, you will take ownership of production reliability for our critical systems, working closely with our infrastructure leadership. You will play a pivotal role in establishing our SRE function and defining how Mercor manages large-scale, high-availability systems.Your ResponsibilitiesEnsure the reliability and safety of production for key shared services and customer-facing systems.Collaborate directly with infrastructure leadership to outline SRE priorities, reliability benchmarks, and the production safety roadmap.Enhance the structure of our production systems to ensure stability, resource efficiency, isolation, and observability.Advocate for and implement modern SRE methodologies (e.g., incident management, postmortems, SLIs/SLOs) across engineering teams.Work alongside engineering and applied AI teams to facilitate sustainable growth.Promote SRE best practices internally, supporting teams in a safe, scalable, and consistent production onboarding process.Who We SeekThe ideal candidate will have:Extensive experience in genuine SRE roles (not merely operations) across various positions or organizations.A deep understanding of SRE methodologies popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems).5+ years of SRE experience; ideally, 15+ years in total experience for this inaugural SRE position.A proven track record of managing systems at scale, with a strong grasp of the complexities involved.

Dec 27, 2025

Apply

Site Reliability Engineer at Superhuman | San Francisco

Superhuman, Inc.

Full-time|$214K/yr - $260K/yr|Hybrid|Hub - San Francisco

At Superhuman, we embrace a vibrant hybrid work model that offers our team members the ideal blend of focused individual work and collaborative in-person interactions, fostering trust, innovation, and a robust team culture.About SuperhumanSuperhuman, the AI productivity platform, is on a transformative mission to unlock the superhuman potential within everyone. With the integration of Grammarly's writing assistance and innovative tools like Coda’s collaborative workspaces and Go, our proactive AI assistant, we empower over 40 million individuals and 50,000 organizations globally. Founded in 2009, we strive to eliminate busywork and enhance productivity. Discover more at superhuman.com and explore our values here.The OpportunityTo meet our ambitious goals, we are seeking a Site Reliability Engineer (SRE) to join our infrastructure team. This pivotal role focuses on developing software solutions to maintain the reliability of our back-end systems while collaborating with engineering teams to strategize our future growth. You will also engage with our production engineering teams in Europe as we transition from a “you build it, you own it” approach.At Superhuman, our engineers and researchers enjoy the autonomy to innovate and drive breakthroughs, directly impacting our product roadmap. As we rapidly scale our interfaces, algorithms, and infrastructure, the complexity of our technical challenges is growing. Learn more about our technical endeavors on our technical blog.As an SRE, your responsibilities will include:Scaling our Kubernetes-based control plane that processes billions of events each day.Enhancing our automation mechanisms to efficiently respond to workload demands.Deploying machine learning systems across various departments.

Jun 18, 2025

Apply

Senior Site Reliability Engineer at Hyperbolic | San Francisco

Hyperbolic Labs

Full-time|On-site|San Francisco, CA

Who We AreAt Hyperbolic Labs, we are committed to democratizing AI by removing barriers to computing power with our Open-Access AI Cloud. By aggregating global computing resources, we provide an innovative GPU marketplace and AI inference service that ensures both affordability and accessibility. As trailblazers at the convergence of AI and open-source technology, we envision a future where AI innovation is only limited by creativity, not by resource availability. We invite forward-thinking individuals who share our dedication to making AI universally accessible, secure, and affordable. Join us in crafting a platform that empowers innovators worldwide to realize their visionary AI projects.In anticipation of our growth following our Series A funding, our team — guided by co-founders with advanced degrees in AI, Mathematics, and Computer Science — is set to transform the computing landscape.About the RoleWe are in search of a skilled Site Reliability Engineer to guarantee that Hyperbolic's GPU marketplace and AI infrastructure function with outstanding reliability, performance, and security. As an aggregator of computational resources from numerous global providers, our service level objectives (SLOs), trust, and economic efficiency are critical to our product. Your key responsibilities will include defining and maintaining service level objectives, developing resilient incident response protocols, managing capacity across our extensive GPU network, and implementing secure rollout and rollback mechanisms to ensure uninterrupted platform operation around the clock.In this influential role, you'll set the reliability benchmarks that foster customer trust in our platform, design comprehensive monitoring and alerting systems for enhanced infrastructure visibility, automate capacity management and resource allocation processes, lead incident response and post-mortem evaluations, and collaborate closely with engineering teams to bolster system resilience. Security and infrastructure hardening will be paramount, necessitating strong isolation protocols between tenants and suppliers, the implementation of effective key management systems, and the establishment of compliance frameworks. This high-impact position will directly affect our ability to deliver on our commitment to providing affordable, accessible AI compute at scale.

Mar 26, 2026

Apply

Forward Deployed Engineer at Hebbia | San Francisco, CA

Hebbia, Inc.

Full-time|$180K/yr - $300K/yr|On-site|New York City; San Francisco, CA

About HebbiaHebbia is an innovative AI platform tailored for investors and bankers, designed to generate alpha and unlock potential in financial markets. Founded in 2020 by George Sivulka and supported by notable investors including Peter Thiel and Andreessen Horowitz, we empower investment decisions for leading firms such as BlackRock, KKR, Carlyle, and Centerview. Our flagship product, Matrix, showcases unparalleled accuracy and speed in AI-driven analysis, trusted to manage over $30 trillion in assets worldwide.Our technology uncovers insights that are otherwise invisible, revealing hidden opportunities and expediting decision-making with unprecedented speed. We don't simply enhance workflows; we revolutionize capital deployment, risk management, and value creation across various markets.Hebbia transcends traditional tools; it serves as a competitive advantage that propels performance, alpha generation, and market leadership.The RoleAs a Forward Deployed Engineer, you will immerse yourself with Hebbia's most strategic clients, tailoring the final components of our platform to align with their unique workflows and data needs. This is a hands-on engineering position where you will write, deploy, and take ownership of production code.You will act as a critical link between Hebbia’s platform and the complexities of our customers’ environments. Collaborating closely with customer teams, you will identify their challenges and develop indispensable solutions. Insights gained will be relayed back to our engineering and product teams to enhance the overall platform.This opportunity is ideal for engineers eager to blend technical expertise with direct customer impact, witnessing the value of their code within days rather than months. The Forward Deployed Engineer team operates at the intersection of engineering and market engagement, collaborating closely with both our core engineering team and account representatives to direct deployment efforts effectively.

Mar 30, 2026

Apply

Site Reliability Engineer at Superhuman | San Francisco

Superhuman

Full-time|$214K/yr - $260K/yr|Hybrid|San Francisco, CA

At Superhuman, we embrace a flexible hybrid working model that combines focused work time with in-person collaboration, fostering trust, innovation, and a vibrant team culture.About SuperhumanSuperhuman, now part of Grammarly, is an AI productivity platform dedicated to unlocking the superhuman potential in everyone. Our suite of applications integrates AI with over 1 million tools and websites, offering innovative solutions such as Grammarly's writing assistance, Coda's collaborative workspaces, Mail's inbox management, and Go, our proactive AI assistant. Since our inception in 2009, we have empowered over 40 million individuals and 50,000 organizations worldwide, enabling them to eliminate busywork and focus on what truly matters. Discover more at superhuman.com and explore our values here.The OpportunityIn pursuit of our ambitious goals, we are seeking a Site Reliability Engineer to enhance our infrastructure team. This pivotal role involves building software that ensures the reliability of our back-end systems while collaborating closely with our engineering teams. You will also help plan for our future growth as we shift from a “you build it, you own it” model.Our engineers and researchers enjoy the freedom to innovate and influence our product roadmap, tackling increasingly complex technical challenges as we scale our systems. Learn more about our technical endeavors on our technical blog.As a Site Reliability Engineer, your responsibilities will include:Scaling our Kubernetes-based control plane, processing billions of events daily.Enhancing our automation mechanisms in response to workload demands.Deploying machine learning systems across the organization.

Mar 18, 2026

Apply

Platform Engineer - Document Intelligence at Hebbia

Hebbia

Full-time|On-site|New York City; San Francisco, CA

About HebbiaHebbia is an innovative AI platform designed specifically for investors and bankers, aimed at generating alpha and maximizing returns. Founded in 2020 by George Sivulka, and backed by prominent investors including Peter Thiel and Andreessen Horowitz, we empower investment decisions for industry leaders like BlackRock, KKR, Carlyle, and Centerview, managing over $30 trillion in global assets.Our flagship product, Matrix, is renowned for providing exceptional accuracy, speed, and transparency in AI-driven analysis, helping finance professionals gain a competitive edge. Our technology uncovers unseen signals, reveals hidden opportunities, and facilitates rapid decision-making, thus transforming capital deployment, risk management, and value creation across markets.Hebbia isn't just a tool; it's the competitive advantage that enhances performance and market leadership.The TeamThe Document Intelligence team at Hebbia is dedicated to developing state-of-the-art AI solutions that revolutionize how users interact with vast collections of private and public documents. Through our innovative Browse application, we enable intelligent document exploration, advanced search functionality, and profound insights extraction. Our commitment to continuous improvement means we work closely with our customers to address real-world challenges and foster impactful, data-driven decisions.The RoleAs a Platform Engineer at Hebbia, you will be at the forefront of building scalable systems that support billions of tokens across significant assets under management. Your role will involve deploying optimized systems and ensuring high-performance capabilities in our infrastructure.

Feb 11, 2026

Apply

Senior Site Reliability Engineer at prosper | San Francisco

prosper

Full-time|On-site|San Francisco, CA

Role overview The Senior Site Reliability Engineer at prosper plays a key role in maintaining and improving the reliability and performance of the company’s core systems. Collaboration with teams across the organization is essential to ensure services remain stable and efficient. What you will do Design and set up monitoring tools to track the health and performance of systems Automate routine operational tasks to minimize manual intervention and boost efficiency Diagnose and resolve complex technical problems that impact infrastructure or services Support projects aimed at strengthening infrastructure stability and preparing for future growth Location This role is located in San Francisco, CA.

Apr 27, 2026

Apply

Site Reliability Engineer at Blaxel | San Francisco

Blaxel

Full-time|On-site|San Francisco

Join Our Team as a Site Reliability EngineerBlaxel is seeking a highly skilled Site Reliability Engineer to enhance the reliability, performance, and scalability of our cutting-edge AI infrastructure platform.In this role, you will develop and manage the essential systems that support scalable agentic AI. Your primary goal: maintain our ultra-low-latency, stateful, serverless compute engine, ensuring it remains robust as we handle billions of agent requests from the world's most advanced AI teams.This position is deeply technical and execution-oriented. You will take charge of our reliability framework, encompassing observability, performance optimization, incident management, infrastructure health, and the automation processes that ensure seamless operations. We are looking for innovators who can design new reliability systems, advance automation capabilities, and continuously adapt the platform to accommodate next-generation AI workloads. If you are a builder who excels in managing critical infrastructure at scale, we want to hear from you.Your ResponsibilitiesWorking closely with our founders, infrastructure team, and development team—leveraging AI for maximum efficiency—you will architect and manage the systems that keep Blaxel fast, resilient, and secure.Design, operate, and iteratively enhance the core infrastructure that drives our 25ms cold-start compute engine.Develop and refine our observability stack (metrics, traces, logs), ensuring proactive issue detection.Establish, monitor, and drive SLOs/SLIs across vital system components to ensure world-class reliability.Lead incident response with precision: conduct root cause analyses, post-mortems, and implement systemic solutions.Design and deploy self-healing, automated operational systems to minimize manual work and scale operations.Collaborate across compute, networking, storage, and sandboxed execution layers to optimize performance under intense workloads.Create automation tools—often utilizing AI agents—to enhance operations, debugging, capacity planning, and failure predictions.Test and stress our systems to their limits: engage in load testing, chaos engineering, and performance benchmarking.Champion security best practices at the infrastructure level, from sandboxed compute to network isolation.Collaborate with platform engineers to ensure reliability is an integral part of new features from inception.Who You AreExtensive technical expertise in site reliability engineering, with a passion for building scalable systems.

Mar 3, 2026

Apply

Senior Site Reliability Engineer at Carta | San Francisco, CA

Carta

Full-time|On-site|San Francisco, California; Santa Clara, California; Seattle, WA

Join Carta as a Senior Site Reliability Engineer, where you will play a pivotal role in enhancing our infrastructure and ensuring the reliability of our platforms. You will work collaboratively with cross-functional teams to implement innovative solutions that drive operational excellence and scalability.

Apr 3, 2026

Apply

Site Reliability Engineer at EngFlow | San Francisco

EngFlow

Full-time|On-site|San Francisco

Join Our Team at EngFlowEngFlow is revolutionizing the software development process by enabling developers to save valuable time in their build and test cycles. Our innovative cloud-based distributed service optimizes workflows through advanced remote execution and caching, significantly enhancing efficiency, productivity, and product quality.Supported by esteemed investors, EngFlow is at the forefront of transforming how organizations develop software and deliver thoroughly tested products. Our solutions can accelerate builds by tenfold or more, and our observability platform provides crucial insights for ongoing optimization. Founded by leading contributors to Bazel, we create tools that empower engineering teams, from startups to Fortune 500 companies, to boost developer velocity and build performance.Discover more about our mission, culture, and team: EngFlow | Watch Our VideoWe are seeking a talented and experienced Site Reliability Engineer to join our dynamic engineering team. In this pivotal role, you will bridge the gap between software engineering and systems operations, ensuring our distributed infrastructure is highly available, performant, and scalable, thereby allowing our engineers to work swiftly and with confidence.

Jan 27, 2026

Apply

Platform Engineer at AirOps | San Francisco, New York City

AirOps

Full-time|On-site|New York City or San Francisco

Join Our Team at AirOpsAt AirOps, we are pioneering the first comprehensive content engineering platform tailored for the AI-driven landscape. As traditional search methods transition towards AI-enhanced platforms, we empower brands to enhance their visibility and maintain a strong presence. Our recent growth surge, achieving a fivefold increase in revenue over the past year, is a testament to our success in enabling marketing teams at industry leaders like Ramp, Chime, Carta, and Rippling to transform content quality into a sustainable competitive edge.Our innovative platform enables marketers to adeptly navigate the ever-evolving discovery landscape, prioritize impactful opportunities, and generate accurate, brand-consistent content that garners recognition from AI systems and builds trust with audiences. Supported by prominent investors such as Greylock, Unusual Ventures, Wing VC, and Founder Collective, we are developing intelligent systems designed to empower the next wave of marketing leaders. Our headquarters are located in San Francisco, New York, and Montevideo.As we establish our platform engineering function from the ground up, you will be among the initial US-based hires, responsible for the infrastructure that every engineer in our organization relies on. Currently, our deployment processes involve manual steps that have led to production outages, and our development environments do not align with production. Your contributions will be pivotal in creating a robust foundation for productivity. This is not a role focused on maintaining established systems; you will have the opportunity to design and implement the deployment pipeline, health check protocols, alert systems, and developer tools that the entire engineering team will depend upon. With real challenges to tackle and a substantial backlog of tasks, your work will have a direct impact on our operational speed and efficiency.

May 1, 2026

Apply

Site Reliability Engineer at Latent | San Francisco

Latent

Full-time|On-site|San Francisco

Site Reliability EngineerLocation: San Francisco, CA (5 Days In-Office)As a Site Reliability Engineer at Latent, you will be the backbone of our infrastructure, ensuring the exceptional stability and performance of our cutting-edge clinical AI platform that serves major health systems. Your role is pivotal in enhancing operational excellence, directly impacting patient access to critical treatments.What Makes a Great Engineer at LatentWe seek individuals who are not just technically skilled but also passionate about ownership and high standards. You will thrive in our dynamic, in-office culture where teamwork and a winning mentality are key.Tool Proficiency: You are highly adept with your tools, fluent in command line operations, and skilled in keyboard shortcuts.Ownership: You take pride in managing complex systems and have a successful history of scaling mission-critical deployments.Automation Drive: You have a passion for automation, consistently seeking innovative methods to enhance efficiency and establish operational excellence.Problem Solver: You proactively address challenges, stepping in to resolve issues without waiting for others.Your ResponsibilitiesAs our SRE, you will take full ownership of the production environment and enhance the developer experience:Infrastructure Ownership: Design, implement, and maintain a robust production environment, having experience with over 500 machine deployments.Kubernetes Mastery: Utilize your expertise in Kubernetes and Helm to manage our containerized infrastructure, ensuring optimal deployment, scalability, and operational health.CI/CD & Deployment Optimization: Streamline the deployment pipelines for TypeScript and Python/ML, supporting rapid feature releases while upholding top-notch reliability.DevX Support: Enhance developer workflows by supporting Developer Experience (DevX) initiatives to improve tool proficiency and CI/CD systems.Infrastructure as Code (IaC): Manage infrastructure definitions using Terraform.

Dec 5, 2025

Apply

Site Reliability Engineer - Platform at CodeRabbit | San Francisco

CodeRabbit

Full-time|On-site|San Francisco

About CodeRabbitCodeRabbit is a pioneering research and development firm dedicated to creating highly efficient human-machine collaboration systems. Our mission is to develop the next generation of AI-driven code review tools, fostering a harmonious partnership between human creativity and advanced algorithms that far exceed the capabilities of individual engineers. By merging language models with human innovation, we aim to elevate the standards of efficiency and quality in software development.The RoleWe are in search of a talented Site Reliability Engineer (SRE) to become a vital part of our Platform Engineering team located in the Bay Area. In this role, you will play a crucial part in maintaining the high availability, performance, and scalability of CodeRabbit's AI-enhanced code review platform. This position lies at the nexus of software engineering and systems operations, where you will construct the foundational platforms and automation that empower our engineering teams to deploy, monitor, and scale our services with reliability.As a Site Reliability Engineer at CodeRabbit, your responsibilities will include improving the reliability of our essential services that handle millions of code reviews, developing sophisticated automation platforms, and managing the infrastructure that drives our AI analysis engine. You will engage with cutting-edge technologies such as large language models, real-time processing systems, and distributed architectures that function at scale.Key ResponsibilitiesInfrastructure & Platform OwnershipDesign, implement, and maintain scalable infrastructure on Google Cloud Platform to accommodate CodeRabbit's expanding user base and processing needs.Take ownership of and operate essential platform services.Develop and manage Infrastructure as Code using Terraform to guarantee consistent, reproducible, and version-controlled infrastructure deployments.Reliability & Performance EngineeringEstablish and uphold SLI/SLO frameworks for all critical services, ensuring we fulfill our reliability commitments to users.Implement comprehensive monitoring, alerting, and observability solutions utilizing Datadog and custom instrumentation.Conduct in-depth incident response, root cause analysis, and post-mortem processes to continually enhance system reliability.Optimize application and infrastructure performance to manage millions of pull request analyses with minimal latency.

Jan 9, 2026

Apply

Senior Site Reliability Engineer at Unify | San Francisco

Unify

Full-time|On-site|San Francisco Office

About UnifyAt Unify, we're pioneering the first AI-driven system of action for revenue teams. Our innovative approach empowers companies to transform their outbound strategies into a leading growth engine, ensuring that go-to-market execution is observable, repeatable, and scalable. Established in 2023 by visionaries from Ramp and Scale AI, our diverse team boasts experience from industry giants such as Airbnb, Meta, Waymo, and Perplexity.Having achieved an impressive 8x revenue growth in 2024, we proudly serve esteemed clients including Perplexity, Cursor, SoFi, and Justworks. With a dynamic team that has successfully raised $58M from prominent investors like Thrive, Emergence, and OpenAI, we are at the forefront of revolutionizing the future of GTM. Come and be a part of this exciting journey!About the RoleAs a Senior Site Reliability Engineer (SRE) at Unify, you will play a pivotal role in addressing the challenges of scaling and maintaining reliability as we handle immense data volumes and support enterprise clients with stringent uptime standards. Your expertise will span the entire tech stack—optimizing databases, fortifying services, and crafting automation and observability tools to ensure Unify remains fast and dependable at scale.

Jan 5, 2026

Apply

Site Reliability Engineer at Air Apps | San Francisco

Air Apps

Full-time|On-site|San Francisco

Join Our Team at Air AppsAt Air Apps, we are driven by innovation and speed. Founded by a family in 2018 in Lisbon, Portugal, we are on a quest to revolutionize how individuals and entrepreneurs manage their resources through the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP). With over 100 million downloads globally, our self-funded journey now spans across offices in Lisbon and San Francisco.We constantly challenge conventional norms, leveraging AI to develop solutions that genuinely impact lives. As part of our team, you will be a critical player in shaping impactful products that empower users around the world.Join us as we redefine resource management and make a difference in people’s lives.Your Role as a Site Reliability Engineer (SRE)As a Site Reliability Engineer at Air Apps, you will play a pivotal role in maintaining the reliability, availability, and scalability of our systems. Your work will bridge software development and operations by implementing automation, monitoring solutions, and performance optimization strategies to minimize downtime and enhance system resilience.

Mar 27, 2025

1 2 3.100

Create account — see all 11,459 results

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, or location & role pages.