Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
Qualifications We NeedStrong foundation in Computer Science principles. Over 5 years of experience in the industry, focusing on building and maintaining high-quality software used by other engineers. A product-focused mindset when it comes to infrastructure systems, with a passion for empowering others. A desire to be an excellent team player while enjoying the work environment. A strong sense of craftsmanship and a healthy academic curiosity. Qualifications We Want (also, skills you’ll learn!)Experience in building systems for data analytics. Skills in monitoring and profiling distributed systems. Understanding of cloud application security models. Experience in administering cloud service infrastructure like GCP, AWS, or Azure. Previous experience in a startup environment.
About the job
About the Role
Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration.
What You Will Do
Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring.
Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges.
Take part in on-call rotations to help maintain strong uptime for Sigma’s services.
Create tools and processes to improve cloud incident triage and reduce downtime.
Define and promote practices that make systems and services measurable and observable.
Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.
About Sigma Computing
Sigma Computing is at the forefront of engineering innovation, committed to making data accessible and actionable for businesses. Our talented team is dedicated to delivering high-quality software solutions in a collaborative and fun work environment.
Become part of the innovative engineering teams at OpenAI, where we create and deliver groundbreaking AI technologies responsibly and safely to the world!Our Applied Engineering team collaborates across research, engineering, product, and design disciplines to deploy OpenAI's cutting-edge technology for both consumers and businesses. We are committed to learning from our deployments and ensuring that AI is utilized ethically while maximizing its benefits. To us, safety takes precedence over unchecked growth.About the RoleWe are in the process of developing OpenAI's observability product, which encompasses everything from scalable infrastructure to an intuitive, AI-enhanced user interface. Our systems process petabytes of logs and billions of time series metrics throughout our infrastructure. We are now integrating intelligence to create features like agents that summarize service events, auto-generate dashboards, and assist engineers in debugging through user-friendly notebook-like interfaces.We are looking to hire software engineers at all levels of our stack—be it infrastructure, backend, or product. You will be part of a dynamic, resourceful team that develops both foundational infrastructure and innovative internal tools, ensuring the reliability, performance, and observability of OpenAI's production systems.What You’ll DoLead the development of core observability infrastructure, focusing on distributed logging, time series, and trace storage.Create AI-integrated tools that empower engineers to autonomously identify, comprehend, and resolve issues.Enhance user interface experiences including dashboards, notebooking, and interactive debugging.Work collaboratively with engineers, researchers, user operations, and various teams to craft the next generation of the observability product.You Might Be a Fit If You:Have experience operating large-scale distributed systems in production, particularly logging systems or time series databases.Excel in ambiguous environments and tackle unscoped challenges head-on.Possess full-stack development skills or a strong product sensibility; you are eager to build practical tools that users will engage with.Demonstrate robust knowledge of systems, networking, and cloud infrastructure (Kubernetes, AWS, etc.).Bonus: Have built or contributed to observability systems (e.g., Prometheus, OpenTelemetry, etc.).Why This Team?We combine infrastructure and product development to create real AI applications for in-house use.Your contributions will directly enhance the reliability of GPT-based products at OpenAI.
Join Gusto as a Staff Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our software's performance and reliability. Utilize your expertise to develop and implement monitoring solutions that provide insights into application behavior, ensuring a seamless experience for our users.Your contributions will directly impact our engineering processes and product quality. Collaborate with cross-functional teams to identify and resolve issues proactively, while also driving initiatives to improve system observability.
Full-time|On-site|San Francisco, CA • New York, NY • United States
Join Figma as a Software Engineering Manager specializing in Observability. In this pivotal role, you will lead a dynamic team of engineers in developing cutting-edge solutions that enhance visibility and performance across our platform. Your expertise will drive the design and implementation of observability tools that empower our engineering teams to optimize their workflows, ensuring the robustness and reliability of our applications.
Full-time|On-site|San Francisco, CA | New York City, NY | Seattle, WA
Join Anthropic as a Staff+ Software Engineer specializing in Observability, where you will play a crucial role in enhancing our systems to ensure high-performance and reliability. Collaborate with cross-functional teams to develop innovative solutions, implement observability metrics, and drive improvements that enable better decision-making and user experiences.
Join our dynamic team at Cloudflare as a Software Engineer focused on Workers Observability. In this pivotal role, you'll be instrumental in enhancing the observability features of our Workers platform, ensuring optimal performance and reliability for our users. You will collaborate with cross-functional teams, tackle complex technical challenges, and contribute to the advancement of our innovative cloud solutions.
Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA
About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.
Role overview Adyen seeks a Senior Software Engineer in San Francisco to focus on Customer Developer Observability. This position aims to enhance the tools and systems that let clients monitor and analyze their performance across the Adyen platform. What you will do Collaborate with cross-functional teams to design and build observability solutions. Create and implement features that provide customers with deeper insights into their systems and data. Help improve the customer experience by making monitoring and analysis more effective and accessible.
Full-time|$166K/yr - $201K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to accelerate the availability of energy and intelligence. We are building the foundational technology that empowers individuals to innovate boldly with AI while maintaining speed, scale, and sustainability.Join us in the AI revolution with sustainable technology at Crusoe, where you will lead significant innovations, make a real impact, and collaborate with a team that is pioneering responsible and transformative cloud infrastructure.About the Role:We are seeking a highly proficient engineer with extensive experience in designing and managing observability platforms at scale. You will be responsible for architecting, developing, and operating Crusoe’s next-generation observability stack, which will allow engineers to gain insights into the internal state of distributed systems through metrics, logs, and traces. Your contributions will guarantee reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform.Key Responsibilities:Design and manage scalable observability systems (metrics, logging, tracing) in multi-datacenter Kubernetes environments.Architect comprehensive telemetry pipelines, covering ingestion, storage, querying, and visualization.Enhance monitoring and alerting mechanisms with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry.Develop scalable log collection and processing pipelines utilizing Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.Implement distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrate with service meshes, load balancers, and APIs.Establish and promote the adoption of SLOs, SLIs, and error budgets across various services and teams.Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tools (Go, Python).Ensure the reliability and cost-effectiveness of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure).Integrate security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.Collaborate with engineering teams to embed observability into applications, services, and infrastructure.Mentor engineers and influence Crusoe’s observability strategy and technical roadmap.
Full-time|Remote|Remote with offices in San Francisco, CA / New York, NY / Minneapolis, MN
Join Dagster Labs as a Software Engineer specializing in our Observability Product. In this fully remote role, you will play a crucial part in enhancing the visibility and performance of our software solutions. Collaborate with cross-functional teams to develop and implement innovative observability features that empower our users to monitor and optimize their applications effectively.
Join Crusoe as a Senior Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our systems and ensuring robust performance across our platforms. You will collaborate with cross-functional teams to develop innovative solutions that improve the visibility and reliability of our software applications.
Join Adyen as an Engineering Manager for our Developer Observability team! In this pivotal role, you will lead a dynamic group of engineers dedicated to enhancing the observability of our developer platforms. You will be responsible for driving technical innovation, mentoring your team, and collaborating closely with cross-functional partners to deliver exceptional developer experiences.As a leader, you will empower your team to excel in building tools and solutions that provide insights into system performance, ensuring our developers have everything they need to thrive. If you are passionate about technology, leadership, and fostering a culture of excellence, we want to hear from you!
Full-time|$200K/yr - $250K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a standard part of everyday life. Our goal is to provide the essential framework for agent engineering, empowering developers to transition their ideas from prototypes to production-ready AI agents that teams can trust. Initially launched as a widely embraced open-source initiative, our evolution has led us to offer a robust platform tailored for building, evaluating, deploying, and managing agents at scale.Our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are now instrumental for teams delivering innovative AI solutions across diverse sectors, from startups to major corporations. Industry leaders such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, and Vanta, along with 35% of the Fortune 500, rely on LangChain for their AI initiatives.Having successfully secured $125M in Series B funding from prominent investors like IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are poised for continued growth and innovation. At LangChain, every team member plays a vital role in shaping our projects and collaborative work environment, making it a place where your input can significantly influence the future of technology.About The Role:We are seeking a dynamic Engineering Manager to spearhead the development of LangSmith, our observability and evaluation platform designed for LLM applications. In this role, you will set the technical vision, cultivate and mentor a high-performing engineering team, and collaborate closely with product and design teams to deliver features that enable developers to construct and deploy reliable AI systems with assurance.You will: Build, mentor, and expand a talented team of engineers, fostering a culture of collaboration, ownership, and accountability.Enhance LangChain’s engineering culture through mentorship, commitment to high-quality code, and technical excellence.Define long-term technical strategy and guarantee the scalability and reliability of the LangSmith AI Observability Platform.Work alongside product and design teams to outline project scope, sequence, and success metrics for key initiatives.Uphold a high standard of technical excellence while ensuring the team remains focused and operates with urgency.Lead by example in producing clean, maintainable, and thoroughly tested code using Go/Python and TypeScript.Engage directly with customers to grasp their needs and translate those insights into actionable product enhancements.
Join DigitalOcean as a Senior Observability Engineer, where you will play a critical role in enhancing our monitoring and observability platforms. Your expertise will help us ensure that our systems are performant, reliable, and scalable, providing a seamless experience for our customers.
Full-time|$125K/yr - $145K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a fundamental part of everyday technology. Our mission is to provide the essential tools for agent engineering in practical applications, enabling developers to transition seamlessly from initial prototypes to production-ready AI agents that organizations can depend on. Starting as a suite of widely adopted open-source tools, we have expanded to offer a comprehensive platform for building, evaluating, deploying, and managing AI agents at scale.Currently, our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are trusted by teams developing real AI solutions in both startups and established enterprises. Our technology powers AI initiatives for renowned companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are at an exciting juncture where we continue to innovate, grow rapidly, and every team member can make a significant impact on our products and collaboration. Join us at LangChain, where your contributions can reshape the technology landscape.About the Role:In-person, 5 days a week in San FranciscoWe are seeking a Fullstack Engineer to join our LangSmith product team, focusing on our commercial AI observability and evaluation platform. In this position, you will have the opportunity to develop new features and capabilities for our platform while collaborating closely with enterprise clients, developer end-users, and internal stakeholders.Your Responsibilities:Design and implement critical product features utilizing our Go, Python, and TypeScript stackWork in close partnership with product and design teams to refine features and enhance the product roadmapDrive project timelines effectively while maintaining high engineering standards through clean, maintainable, and well-tested codeTo Succeed in This Role:2+ years of experience in software engineering, particularly with complex platform productsFullstack engineering experience with Go or Python on the backend and React + TypeScript on the frontendStrong understanding of database systems, especially Postgres and RedisExperience in designing and scaling APIs, ideally in high-performance environments
Full-time|$320K/yr - $405K/yr|On-site|San Francisco, CA
About AnthropicAt Anthropic, we are dedicated to developing AI systems that are reliable, interpretable, and controllable. Our mission is to ensure that artificial intelligence remains safe and beneficial for individuals and society at large. Our rapidly expanding team comprises passionate researchers, engineers, policy experts, and business leaders collaborating to create positive AI solutions.About the TeamAs the scale of AI training and deployment increases, so does the volume of data that requires monitoring and comprehension. Our team utilizes Claude to interpret this data effectively. We manage an integrated suite of tools that empowers Anthropic to pose open-ended inquiries, identify unexpected patterns, and maintain significant human oversight over extensive datasets.Our tools are widely utilized internally, driving ongoing enforcement, threat intelligence investigations, model audits, and much more. We are seeking skilled engineers and researchers to enhance existing applications and innovate new ones from the ground up.About the RoleAs a Research Engineer on our team, you will design and develop systems that enable AI to analyze vast, unstructured datasets—think tens or hundreds of thousands of conversations or documents—and generate structured, reliable insights. You will engage with the entire technology stack, from foundational analysis frameworks to user-facing applications and interfaces.This is a high-impact position. The tools you create will be utilized by numerous researchers and investigators, directly influencing our capacity to assess and counteract both misuse and misalignment.
Full-time|$194K/yr - $267K/yr|On-site|San Francisco, California
Discover OktaOkta is recognized as The World’s Identity Company, empowering individuals to securely leverage any technology across various devices and applications. Our versatile Okta Platform and Auth0 Platform provide reliable access, authentication, and automation, placing identity at the forefront of business security and expansion.At Okta, we value diverse perspectives and experiences. We seek continuous learners and individuals who can enhance our team with their distinct backgrounds.Join us as we create a world where identity is truly yours.We are in search of a highly skilled Observability Site Reliability Engineer specializing in Google Cloud, to take charge of and elevate our Observability ecosystem within GCP. In this position, you will progress beyond basic monitoring to develop a world-class, comprehensive, and scalable Observability Platform that supports our SRE teams and business collaborators. You will implement infrastructure as code by employing Terraform and demonstrating strong coding skills in Go, Python, or Ruby to automate the deployment of agents and collectors across intricate distributed systems.Key ResponsibilitiesAutomated Infrastructure: Design, build, and maintain scalable observability infrastructure utilizing tools such as Terraform.GCP Observability Engineering: Enhance the collection, processing, and storage of Observability data to guarantee high reliability and low latency for our Splunk and Grafana services.Incident Response: Engage in on-call rotations and conduct post-incident reviews to foster systemic improvements and promote 'observability-driven development.'Automation: Minimize 'toil' by automating the deployment and scaling of observability agents and collectors.
About UsAt Braintrust, we are pioneering the AI observability landscape. Our platform seamlessly integrates evaluations and observability into a unified workflow, providing developers with crucial insights into AI behavior in production and powerful tools for enhancement.Our clients, including renowned teams at Notion, Stripe, Zapier, Vercel, and Ramp, leverage Braintrust to benchmark models, optimize prompts, and identify regressions, transforming production data into superior AI functionality with each iteration.The OpportunityWe are in search of a motivated Product Engineer who is enthusiastic about crafting tools that users adore and rely on daily. In this role, you will engage closely with our users—developers, product managers, and designers in the AI domain—and significantly influence our product roadmap.Our platform is built on a high-performance, local-first architecture with a visualization-heavy UI, utilizing modern Typescript and React. Our clientele, comprised of some of the leading technology firms, demands a product that is exceptionally fast, reliable, and user-friendly.Your ResponsibilitiesKey responsibilities include:Guaranteeing a seamless experience for customers enabling AI observability in their systems.Contributing to the foundational UI architecture with a focus on performance and efficient data loading strategies.You will also develop user-facing components for Braintrust, such as:An exceptional prompt playground that accommodates multiple models and thousands of user inputs.A robust system for managing prompts, configurations, and version comparisons.A scalable multiplayer human review system.LLM output analysis and comparison across extensive datasets.Ideal Candidate ProfileProficient in Typescript, React, HTML, CSS, SQL, and NextJS.Experience in founding or working with startups is advantageous.Familiarity with writing prompts and experimenting with GPT models and applications.What We OfferComprehensive medical, dental, and vision coverage.Daily lunch, snacks, and beverages provided.Flexible time-off policy.Competitive salary and equity opportunities.AI stipend for continued learning and tools.Commitment to DiversityBraintrust is a staunch advocate for equal opportunity in the workplace. We believe in fostering a diverse and inclusive environment.
Full-time|$175K/yr - $225K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a common part of everyday technology. Our goal is to provide a robust foundation for agent engineering that empowers developers to transition from prototypes to production-ready AI agents that teams can depend on. Initially starting as a widely embraced open-source toolset, we have expanded our offerings to include a comprehensive platform for the building, evaluating, deploying, and managing of agents at scale.Currently, our tools—LangChain, LangGraph, LangSmith, and Agent Builder—are utilized by teams developing real AI products in both startups and large enterprises. Millions of developers rely on LangChain to power AI initiatives at notable companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.Having secured $125M in Series B funding from leading investors like IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are in an exciting phase of product development and rapid growth, where every team member has a substantial impact on our projects and collaborative efforts. At LangChain, your contributions will play a crucial role in shaping how this technology manifests in the real world.About the Role:This position requires in-person attendance 5 days a week in San Francisco, CA, as well as options in New York and Boston.We are seeking a seasoned frontend engineer to innovate and improve features on LangSmith, our enterprise platform designed for LLM application observability, testing, and debugging.What You Will Do:Create new user-facing features utilizing React and TypeScript.Develop reusable components and front-end libraries for future projects.Convert designs and wireframes into high-quality, maintainable code.Optimize components for peak performance across diverse web-capable devices and browsers.Collaborate with fullstack and backend developers as well as UX/UI designers to enhance usability and experience.You’re a Good Fit If You Have:Extensive frontend engineering experience, with strong command of React, JavaScript, and TypeScript.Practical experience with frontend development tools such as Babel, Vite, Webpack, NPM, and Yarn.Familiarity with REST APIs and experience collaborating closely with fullstack and backend developers.
Full-time|$175K/yr - $225K/yr|On-site|San Francisco, CA
About Us:LangChain is dedicated to making intelligent agents commonplace. We are pioneering the foundations of agent engineering in the real world, empowering developers to transition from prototypes to production-ready AI agents that teams can depend on. Initially known for our widely embraced open-source tools, we have expanded to provide a comprehensive platform for constructing, assessing, deploying, and managing agents at scale.Our products, including LangChain, LangGraph, LangSmith, and Agent Builder, are utilized by teams delivering genuine AI solutions in both startup environments and large corporations. Millions of developers trust our technology to elevate AI initiatives at organizations such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in our Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are poised for continued product development and accelerating growth, where each team member plays a significant role in shaping our technology and collaborative culture.About the Role:On-site 5 days a week in San FranciscoWe are seeking a Senior Fullstack Engineer for our commercial product, LangSmith, which serves as an observability and evaluation platform. In this role, you will have the chance to influence the technical direction of our platform while engaging with enterprise clients, developer end-users, and internal stakeholders.Lead the technical architecture and implementation of essential product features for LangSmith, utilizing our entire stack of Go, Python, and TypeScript.Work closely with product and design teams to iterate and refine new features.Mentor and support junior team members, driving ambitious project timelines while upholding high engineering standards.Set an example by producing clean, maintainable, and thoroughly tested code.
About braintrustBraintrust is at the forefront of AI observability. By merging evaluation and observability into a singular workflow, we empower developers with the insights needed to comprehend AI behavior in production environments, along with the tools to enhance it.Leading teams at Notion, Stripe, Zapier, Vercel, and Ramp utilize Braintrust to compare models, test prompts, and monitor regressions — transforming production data into superior AI with each new release.About the roleWe are in search of a passionate software engineer dedicated to crafting high-performance data processing systems. Our clientele consists of large enterprises handling complex, semi-structured data, which they require for real-time processing and analysis. Our distinct architecture enables these organizations to keep data on-premises while creating intricate visualizations that load without delay. Explore our Brainstore blog post.If you have experience with database systems, compilers, networks, or storage systems and aspire to pivot your expertise into the AI sector, this role could be your ideal fit. You will significantly influence foundational system architecture, technology selection, and implementation. Our founding team possesses extensive knowledge in database and ML systems, and you will have the autonomy to collaborate closely with them while exploring your innovative ideas.Your ResponsibilitiesAs a systems engineer at Braintrust, you’ll contribute to the core systems that empower Braintrust’s capability to process and query vast amounts of unstructured data at an enterprise scale. Key areas of responsibility include:Enhancing the storage, indexing, and query execution performance of Brainstore.Developing Braintrust's btql query language.Optimizing query patterns to boost performance across our platform.QualificationsDeep understanding of systems programming (C++ or Rust, concurrency, databases, operating systems).Experience in founding or working at startups is advantageous.Familiarity with writing prompts or experimenting with GPT models and applications.BenefitsComprehensive medical, dental, and vision insurance.Daily lunch, snacks, and beverages provided.Flexible time off policy.Competitive salary with equity options.
Mar 29, 2024
Sign in to browse more jobs
Create account — see all 5,496 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.