AnthropicSan Francisco, CA | New York City, NY | Seattle, WA
On-site Full-time
Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Proven experience in software engineering with a focus on observability and monitoring systems. Strong programming skills in languages such as Python, Go, or Java. Experience with cloud platforms and container orchestration technologies. Ability to analyze and troubleshoot complex systems issues. Excellent communication and teamwork skills.
About the job
Join Anthropic as a Staff+ Software Engineer specializing in Observability, where you will play a crucial role in enhancing our systems to ensure high-performance and reliability. Collaborate with cross-functional teams to develop innovative solutions, implement observability metrics, and drive improvements that enable better decision-making and user experiences.
About Anthropic
Anthropic is a forward-thinking technology company committed to building safe and beneficial artificial intelligence. We foster a collaborative environment that encourages innovation and values diverse perspectives, making it a great place for driven individuals to thrive.
Similar jobs
1 - 20 of 7,230 Jobs
Search for Senior Software Engineer Cloud Availability Platform Engineering Observability
Full-time|$166K/yr - $201K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to accelerate the availability of energy and intelligence. We are building the foundational technology that empowers individuals to innovate boldly with AI while maintaining speed, scale, and sustainability.Join us in the AI revolution with sustainable technology at Crusoe, where you will lead significant innovations, make a real impact, and collaborate with a team that is pioneering responsible and transformative cloud infrastructure.About the Role:We are seeking a highly proficient engineer with extensive experience in designing and managing observability platforms at scale. You will be responsible for architecting, developing, and operating Crusoe’s next-generation observability stack, which will allow engineers to gain insights into the internal state of distributed systems through metrics, logs, and traces. Your contributions will guarantee reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform.Key Responsibilities:Design and manage scalable observability systems (metrics, logging, tracing) in multi-datacenter Kubernetes environments.Architect comprehensive telemetry pipelines, covering ingestion, storage, querying, and visualization.Enhance monitoring and alerting mechanisms with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry.Develop scalable log collection and processing pipelines utilizing Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.Implement distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrate with service meshes, load balancers, and APIs.Establish and promote the adoption of SLOs, SLIs, and error budgets across various services and teams.Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tools (Go, Python).Ensure the reliability and cost-effectiveness of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure).Integrate security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.Collaborate with engineering teams to embed observability into applications, services, and infrastructure.Mentor engineers and influence Crusoe’s observability strategy and technical roadmap.
Join DigitalOcean as a Senior Observability Engineer, where you will play a critical role in enhancing our monitoring and observability platforms. Your expertise will help us ensure that our systems are performant, reliable, and scalable, providing a seamless experience for our customers.
Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA
About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.
Full-time|$180K/yr - $180K/yr|On-site|South San Francisco, California, USA
Senior Software Engineer – Cloud Communications Platform Location: South San Francisco, California, USA About Zipline Are you passionate about making a difference in the world? At Zipline, we are dedicated to revolutionizing the movement of goods globally. Our mission is to tackle the world’s most pressing access challenges by developing the first instant delivery and logistics system that serves all individuals, irrespective of their location. From facilitating Rwanda’s national blood delivery network and distributing COVID-19 vaccines in Ghana to offering on-demand home delivery for major retailers and enabling healthcare providers to deliver care directly to homes in the U.S., we are reshaping logistics for businesses, governments, and consumers alike. While our technology is sophisticated, the concept is straightforward: a teleportation service that delivers what you need, when you need it. By utilizing robotics and autonomy, we are committed to decarbonizing delivery, alleviating road congestion, minimizing fossil fuel usage, and enhancing the resilience of the global supply chain. Join Zipline and contribute to creating an equitable and resilient logistics system that impacts billions of lives. About You and The Role Zipline operates a large-scale autonomous system that relies on dependable, low-latency communication between vehicles, ground infrastructure, and cloud services. Our Cloud Communications team is responsible for the platform that transfers critical data from embedded systems to the cloud, ensuring data reliability, scalability, and observability. We seek a Senior Software Engineer to enhance and fortify this platform. This role centers on connecting hardware assets to the cloud, hosting and orchestrating new data use cases, and constructing distributed observability across embedded software, cellular networks, and cloud microservices. In this position, you will collaborate closely with the Embedded and Autonomy teams that develop software to extract data from devices. Your primary responsibility will be to guarantee that data is securely ingested into the cloud, deduplicated, stored, processed, monitored, and accessible for both real-time and offline workflows. This is a high-ownership role directly influencing flight reliability, operational visibility, and the scalability of our global network. What You’ll Do Lead the evolution of services connecting vehicles, charging and loading stations, fulfillment hardware, and other field-deployed infrastructure to the cloud. Design and maintain asset-to-cloud APIs, message schemas, and communication clients in collaboration with embedded teams. Develop and manage ingestion pipelines for new data use cases.
Role overview Adyen seeks a Senior Software Engineer in San Francisco to focus on Customer Developer Observability. This position aims to enhance the tools and systems that let clients monitor and analyze their performance across the Adyen platform. What you will do Collaborate with cross-functional teams to design and build observability solutions. Create and implement features that provide customers with deeper insights into their systems and data. Help improve the customer experience by making monitoring and analysis more effective and accessible.
Full-time|$125K/yr - $145K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a fundamental part of everyday technology. Our mission is to provide the essential tools for agent engineering in practical applications, enabling developers to transition seamlessly from initial prototypes to production-ready AI agents that organizations can depend on. Starting as a suite of widely adopted open-source tools, we have expanded to offer a comprehensive platform for building, evaluating, deploying, and managing AI agents at scale.Currently, our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are trusted by teams developing real AI solutions in both startups and established enterprises. Our technology powers AI initiatives for renowned companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are at an exciting juncture where we continue to innovate, grow rapidly, and every team member can make a significant impact on our products and collaboration. Join us at LangChain, where your contributions can reshape the technology landscape.About the Role:In-person, 5 days a week in San FranciscoWe are seeking a Fullstack Engineer to join our LangSmith product team, focusing on our commercial AI observability and evaluation platform. In this position, you will have the opportunity to develop new features and capabilities for our platform while collaborating closely with enterprise clients, developer end-users, and internal stakeholders.Your Responsibilities:Design and implement critical product features utilizing our Go, Python, and TypeScript stackWork in close partnership with product and design teams to refine features and enhance the product roadmapDrive project timelines effectively while maintaining high engineering standards through clean, maintainable, and well-tested codeTo Succeed in This Role:2+ years of experience in software engineering, particularly with complex platform productsFullstack engineering experience with Go or Python on the backend and React + TypeScript on the frontendStrong understanding of database systems, especially Postgres and RedisExperience in designing and scaling APIs, ideally in high-performance environments
Full-time|$287.6K/yr - $345.1K/yr|On-site|Denver, CO; New York City, NY; San Francisco, CA
At Fastly, we empower individuals to maintain meaningful connections with the things they cherish. Our cutting-edge edge cloud platform allows customers to deliver exceptional digital experiences with speed, security, and reliability by processing and securing applications as close to end-users as possible — right at the edge of the Internet. Our platform is engineered to harness the potential of the modern internet, enabling programmability and supporting agile software development. We proudly serve some of the world's most influential companies, including GitHub, Yelp, Paramount, and JetBlue.Join us in building a more trustworthy Internet.Posting Open Date: Reposted March 30, 2026Anticipated Posting Close Date*: April 20, 2026*This job posting may close earlier due to high applicant volume. Senior Principal Engineer, Platform EngineeringAs a member of Fastly’s Platform Engineering team, you will be instrumental in establishing the essential frameworks that enable engineers to deliver quickly, safely, and at scale. In the role of Senior Principal Engineer, you will guide the technical direction and spearhead cross-organizational initiatives aimed at enhancing our platform, developer experience, and operational excellence. Collaborating closely with fellow engineers, you will work to eliminate friction, standardize best practices, and create paved roads to expedite product delivery. This is a highly collaborative and hands-on position, representing the Platform Engineering organization and reporting directly to the Senior Director of Engineering.
Become part of the innovative engineering teams at OpenAI, where we create and deliver groundbreaking AI technologies responsibly and safely to the world!Our Applied Engineering team collaborates across research, engineering, product, and design disciplines to deploy OpenAI's cutting-edge technology for both consumers and businesses. We are committed to learning from our deployments and ensuring that AI is utilized ethically while maximizing its benefits. To us, safety takes precedence over unchecked growth.About the RoleWe are in the process of developing OpenAI's observability product, which encompasses everything from scalable infrastructure to an intuitive, AI-enhanced user interface. Our systems process petabytes of logs and billions of time series metrics throughout our infrastructure. We are now integrating intelligence to create features like agents that summarize service events, auto-generate dashboards, and assist engineers in debugging through user-friendly notebook-like interfaces.We are looking to hire software engineers at all levels of our stack—be it infrastructure, backend, or product. You will be part of a dynamic, resourceful team that develops both foundational infrastructure and innovative internal tools, ensuring the reliability, performance, and observability of OpenAI's production systems.What You’ll DoLead the development of core observability infrastructure, focusing on distributed logging, time series, and trace storage.Create AI-integrated tools that empower engineers to autonomously identify, comprehend, and resolve issues.Enhance user interface experiences including dashboards, notebooking, and interactive debugging.Work collaboratively with engineers, researchers, user operations, and various teams to craft the next generation of the observability product.You Might Be a Fit If You:Have experience operating large-scale distributed systems in production, particularly logging systems or time series databases.Excel in ambiguous environments and tackle unscoped challenges head-on.Possess full-stack development skills or a strong product sensibility; you are eager to build practical tools that users will engage with.Demonstrate robust knowledge of systems, networking, and cloud infrastructure (Kubernetes, AWS, etc.).Bonus: Have built or contributed to observability systems (e.g., Prometheus, OpenTelemetry, etc.).Why This Team?We combine infrastructure and product development to create real AI applications for in-house use.Your contributions will directly enhance the reliability of GPT-based products at OpenAI.
Join Crusoe as a Senior Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our systems and ensuring robust performance across our platforms. You will collaborate with cross-functional teams to develop innovative solutions that improve the visibility and reliability of our software applications.
Full-time|On-site|San Francisco, CA • New York, NY • United States
Join Figma as a Software Engineering Manager specializing in Observability. In this pivotal role, you will lead a dynamic team of engineers in developing cutting-edge solutions that enhance visibility and performance across our platform. Your expertise will drive the design and implementation of observability tools that empower our engineering teams to optimize their workflows, ensuring the robustness and reliability of our applications.
Join Gusto as a Staff Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our software's performance and reliability. Utilize your expertise to develop and implement monitoring solutions that provide insights into application behavior, ensuring a seamless experience for our users.Your contributions will directly impact our engineering processes and product quality. Collaborate with cross-functional teams to identify and resolve issues proactively, while also driving initiatives to improve system observability.
MidiHealth is seeking a Senior Software Engineer to join the Platform Engineering team. This hybrid role is based in the SF Bay Area and centers on building and enhancing the software that drives MidiHealth’s healthcare technology platform. The work contributes directly to improving patient outcomes through technology. Key responsibilities Design and develop software solutions for the core platform Collaborate with engineering, product, and cross-functional teams to deliver integrated features Support the reliability and scalability of the platform Location This position requires regular on-site work in the SF Bay Area as part of a hybrid schedule.
Full-time|On-site|San Francisco, CA | New York City, NY | Seattle, WA
Join Anthropic as a Staff+ Software Engineer specializing in Observability, where you will play a crucial role in enhancing our systems to ensure high-performance and reliability. Collaborate with cross-functional teams to develop innovative solutions, implement observability metrics, and drive improvements that enable better decision-making and user experiences.
Full-time|$179.4K/yr - $224.3K/yr|On-site|San Francisco, CA; New York, NY
In a world where software is rapidly evolving, artificial intelligence (AI) is at the forefront, transforming how we interact with technology. At Scale AI, we recognize the immense potential of AI to enhance human capabilities, offering personalized support across various aspects of life—from coaching and tutoring to shopping and travel guidance. As enterprises, startups, and governments rush to integrate large language models (LLMs) into their operations, it is crucial to ensure these systems are safe, aligned, and effective. This involves rigorous human evaluation and reinforcement learning through human feedback (RLHF) during all stages of model development.Our innovative products, including the Generative AI Data Engine, SGP, and Donovan, are designed to empower the most advanced LLMs and generative models globally. By leveraging world-class RLHF, human data generation, model evaluation, safety, and alignment, we are shaping the future of human-AI interaction.As a member of our Platform Engineering team, you will play a pivotal role in designing and developing the foundational platforms that support Scale's operations. Your responsibilities will include architecting our core cloud infrastructure, enhancing our data lifecycle, and transforming the software development process for engineers at Scale. You will gain invaluable insights into the AI landscape as it develops within diverse sectors.
Join our innovative team at Unify as a Senior Software Engineer, Platform, where you will play a crucial role in enhancing our platform capabilities. You will collaborate with cross-functional teams to design, develop, and implement high-quality software solutions that meet our clients' needs.
Aura is seeking a talented and experienced Senior Software Engineer, Platform to join our innovative team. In this role, you will be responsible for designing and implementing scalable software solutions that enhance our platform capabilities. You will work closely with cross-functional teams to ensure the delivery of high-quality software that meets the needs of our users.
Full-time|$175K/yr - $225K/yr|On-site|San Francisco, CA
About Us:LangChain is dedicated to making intelligent agents commonplace. We are pioneering the foundations of agent engineering in the real world, empowering developers to transition from prototypes to production-ready AI agents that teams can depend on. Initially known for our widely embraced open-source tools, we have expanded to provide a comprehensive platform for constructing, assessing, deploying, and managing agents at scale.Our products, including LangChain, LangGraph, LangSmith, and Agent Builder, are utilized by teams delivering genuine AI solutions in both startup environments and large corporations. Millions of developers trust our technology to elevate AI initiatives at organizations such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in our Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are poised for continued product development and accelerating growth, where each team member plays a significant role in shaping our technology and collaborative culture.About the Role:On-site 5 days a week in San FranciscoWe are seeking a Senior Fullstack Engineer for our commercial product, LangSmith, which serves as an observability and evaluation platform. In this role, you will have the chance to influence the technical direction of our platform while engaging with enterprise clients, developer end-users, and internal stakeholders.Lead the technical architecture and implementation of essential product features for LangSmith, utilizing our entire stack of Go, Python, and TypeScript.Work closely with product and design teams to iterate and refine new features.Mentor and support junior team members, driving ambitious project timelines while upholding high engineering standards.Set an example by producing clean, maintainable, and thoroughly tested code.
Full-time|$190K/yr - $230K/yr|Remote|Remote with offices in San Francisco, CA / New York, NY / Minneapolis, MN
Dagster Labs develops tools that enable organizations to build scalable and efficient data platforms. The company’s core offerings include Dagster, an open-source project popular among developers, and Dagster+, a managed cloud solution. These products support thousands of teams, ranging from early-stage startups to established enterprises, in their analytics, machine learning, and AI initiatives. With the rapid growth of AI, the need for reliable, high-quality data has never been greater. Dagster Labs is dedicated to simplifying the testing, comprehension, and usability of data platforms. Many top AI companies have adopted Dagster as a foundational part of their technology stack. Team culture The team operates with strong funding and a collaborative spirit. High standards, open communication, and a focus on trust and curiosity shape the work environment. The company values a workplace free from egos and unnecessary drama. Locations This is a remote-first company with offices in San Francisco, New York, and Minneapolis.
Full-time|$209K/yr - $253K/yr|On-site|San Francisco, CA - US
At Crusoe, our mission is to catalyze the proliferation of energy and intelligence. We are engineering the driving force behind a future where individuals can ambitiously create with AI without compromising on scale, speed, or sustainability.Join us at Crusoe as we lead the charge in the AI revolution through sustainable technology. You will play a pivotal role in fostering meaningful innovation, making a significant impact, and collaborating with a team that is pioneering the development of responsible and transformative cloud infrastructure.Position Overview:We are in search of experienced Staff/Senior Staff Software Engineers who will be tasked with the architecture, design, and development of advanced Cloud Infrastructure management systems and platforms. You will be vital in delivering end-to-end use cases and workflows for our integrated AI-First Crusoe Cloud. Your contributions will be essential in constructing systems and platforms that effectively plan, monitor, deploy, and operate Crusoe Cloud, achieving key business revenue metrics.Your expertise will be crucial in evaluating, implementing, and building platforms, tools, and frameworks that prioritize reliability, scalability, operational efficiency, and user-friendliness. You will enhance our infrastructure planning and management workflows, driving efficiency and improving the overall performance and reliability of our cloud platform as we ambitiously scale our Crusoe Cloud products and services by more than 10X.In this role, you will also develop and refine technical designs and architectures, mentor fellow engineers, and actively contribute to the growth of the team in partnership with engineering managers.Your Key Responsibilities:Engage collaboratively across teams to design, architect, and implement physical infrastructure management software systems and availability platforms that meet end-to-end customer use cases, ensuring an exceptional customer experience.Champion the reliability, scalability, and security of our systems and platforms, acting as the guardian of our infrastructure!Create workflows designed to enhance efficiency and achieve key business objectives and metrics.Design and implement high-performance, highly available cloud architectures, optimizing for both performance and cost-effectiveness.Enhance cloud deployment, configuration management, and operations by developing and maintaining effective platforms, interfaces, and automation tools.Actively participate in the evolution of our platform, working closely with cross-functional teams.
Full-time|$175K/yr - $225K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a common part of everyday technology. Our goal is to provide a robust foundation for agent engineering that empowers developers to transition from prototypes to production-ready AI agents that teams can depend on. Initially starting as a widely embraced open-source toolset, we have expanded our offerings to include a comprehensive platform for the building, evaluating, deploying, and managing of agents at scale.Currently, our tools—LangChain, LangGraph, LangSmith, and Agent Builder—are utilized by teams developing real AI products in both startups and large enterprises. Millions of developers rely on LangChain to power AI initiatives at notable companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.Having secured $125M in Series B funding from leading investors like IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are in an exciting phase of product development and rapid growth, where every team member has a substantial impact on our projects and collaborative efforts. At LangChain, your contributions will play a crucial role in shaping how this technology manifests in the real world.About the Role:This position requires in-person attendance 5 days a week in San Francisco, CA, as well as options in New York and Boston.We are seeking a seasoned frontend engineer to innovate and improve features on LangSmith, our enterprise platform designed for LLM application observability, testing, and debugging.What You Will Do:Create new user-facing features utilizing React and TypeScript.Develop reusable components and front-end libraries for future projects.Convert designs and wireframes into high-quality, maintainable code.Optimize components for peak performance across diverse web-capable devices and browsers.Collaborate with fullstack and backend developers as well as UX/UI designers to enhance usability and experience.You’re a Good Fit If You Have:Extensive frontend engineering experience, with strong command of React, JavaScript, and TypeScript.Practical experience with frontend development tools such as Babel, Vite, Webpack, NPM, and Yarn.Familiarity with REST APIs and experience collaborating closely with fullstack and backend developers.
Jun 9, 2025
Sign in to browse more jobs
Create account — see all 7,230 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.