Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Senior
Qualifications
The ideal candidate will possess a strong background in software engineering, with expertise in observability tools and techniques. A Bachelor's Degree in a relevant field is required, along with experience in distributed systems and cloud technologies. Proficiency in programming languages such as Java, Python, or Go is essential. You should also demonstrate excellent problem-solving skills and the ability to work effectively in a team environment.
About the job
Adyen seeks a Senior Software Engineer in San Francisco to focus on Customer Developer Observability. This position aims to enhance the tools and systems that let clients monitor and analyze their performance across the Adyen platform.
What you will do
Collaborate with cross-functional teams to design and build observability solutions.
Create and implement features that provide customers with deeper insights into their systems and data.
Help improve the customer experience by making monitoring and analysis more effective and accessible.
About Adyen
Adyen is a leading global payment company that provides businesses with a seamless payment experience across multiple channels. Our mission is to empower businesses to accept payments anywhere in the world, and we pride ourselves on our innovative technology and commitment to customer satisfaction.
Join Adyen as an Engineering Manager focused on Developer Experience and Observability, where you'll lead a talented team dedicated to enhancing the development processes and monitoring capabilities of our platform. You'll play a critical role in shaping the future of our engineering practices, ensuring that our developers have the tools they need to succeed…
Join Adyen as an Engineering Manager for our Developer Observability team! In this pivotal role, you will lead a dynamic group of engineers dedicated to enhancing the observability of our developer platforms. You will be responsible for driving technical innovation, mentoring your team, and collaborating closely with cross-functional partners to deliver exceptional developer experiences.As a leader, you will empower your team to excel in building tools and solutions that provide insights into system performance, ensuring our developers have everything they need to thrive. If you are passionate about technology, leadership, and fostering a culture of excellence, we want to hear from you!
Full-time|$200K/yr - $250K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a standard part of everyday life. Our goal is to provide the essential framework for agent engineering, empowering developers to transition their ideas from prototypes to production-ready AI agents that teams can trust. Initially launched as a widely embraced open-source initiative, our evolution has led us to offer a robust platform tailored for building, evaluating, deploying, and managing agents at scale.Our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are now instrumental for teams delivering innovative AI solutions across diverse sectors, from startups to major corporations. Industry leaders such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, and Vanta, along with 35% of the Fortune 500, rely on LangChain for their AI initiatives.Having successfully secured $125M in Series B funding from prominent investors like IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are poised for continued growth and innovation. At LangChain, every team member plays a vital role in shaping our projects and collaborative work environment, making it a place where your input can significantly influence the future of technology.About The Role:We are seeking a dynamic Engineering Manager to spearhead the development of LangSmith, our observability and evaluation platform designed for LLM applications. In this role, you will set the technical vision, cultivate and mentor a high-performing engineering team, and collaborate closely with product and design teams to deliver features that enable developers to construct and deploy reliable AI systems with assurance.You will: Build, mentor, and expand a talented team of engineers, fostering a culture of collaboration, ownership, and accountability.Enhance LangChain’s engineering culture through mentorship, commitment to high-quality code, and technical excellence.Define long-term technical strategy and guarantee the scalability and reliability of the LangSmith AI Observability Platform.Work alongside product and design teams to outline project scope, sequence, and success metrics for key initiatives.Uphold a high standard of technical excellence while ensuring the team remains focused and operates with urgency.Lead by example in producing clean, maintainable, and thoroughly tested code using Go/Python and TypeScript.Engage directly with customers to grasp their needs and translate those insights into actionable product enhancements.
Full-time|On-site|San Francisco, CA • New York, NY • United States
Join Figma as a Software Engineering Manager specializing in Observability. In this pivotal role, you will lead a dynamic team of engineers in developing cutting-edge solutions that enhance visibility and performance across our platform. Your expertise will drive the design and implementation of observability tools that empower our engineering teams to optimize their workflows, ensuring the robustness and reliability of our applications.
Role overview Adyen seeks a Senior Software Engineer in San Francisco to focus on Customer Developer Observability. This position aims to enhance the tools and systems that let clients monitor and analyze their performance across the Adyen platform. What you will do Collaborate with cross-functional teams to design and build observability solutions. Create and implement features that provide customers with deeper insights into their systems and data. Help improve the customer experience by making monitoring and analysis more effective and accessible.
Join DigitalOcean as a Senior Observability Engineer, where you will play a critical role in enhancing our monitoring and observability platforms. Your expertise will help us ensure that our systems are performant, reliable, and scalable, providing a seamless experience for our customers.
About Our TeamAt OpenAI, the Developer Experience team is dedicated to empowering developers around the globe. Our mission is to provide exceptional and seamless experiences for developers and startups, enabling them to integrate AI technologies into their applications and products effortlessly. We ensure that our developers have access to the tools, resources, and support necessary to fully harness the power of AI.We create engaging demos, robust developer tools, sample projects, and informative content that illustrate how to build outstanding applications using our advanced models, including OpenAI o3 and GPT-4.1, along with multimodal capabilities and tools like Codex.Our team collaborates closely with product, engineering, research, and go-to-market teams to ensure that the developer journey—from the first API call to full production deployment—is seamless, efficient, and enjoyable.About the RoleAs a Developer Experience Engineer, you will be responsible for creating compelling technical content, innovative developer tools, and sample applications that inspire and empower developers to succeed with OpenAI's APIs and products. You will interact with developers and technical founders, showcasing best practices and building cutting-edge applications powered by our leading models and tools.We are seeking individuals who possess a blend of strong technical abilities, creativity, and a passion for engaging with and empowering the developer community.Key Responsibilities:Design and develop demos and sample applications that highlight innovative integrations and best practices utilizing reasoning models, multimodal capabilities, and agent tools.Produce high-quality technical content—including tutorials, blog posts, videos, and code samples—to educate and inspire developers about our models, APIs, and Codex.Engage actively with and cultivate a vibrant local and global developer ecosystem surrounding OpenAI’s platform.Represent OpenAI at developer events and online platforms, serving as a knowledgeable and approachable advocate for developers.Collect and synthesize developer feedback to inform and enhance our product roadmap.Collaborate cross-functionally with product, engineering, and marketing teams to ensure the successful adoption of OpenAI’s developer tools and APIs.Directly contribute to the improvement and refinement of OpenAI’s developer interfaces and experiences.
Become part of the innovative engineering teams at OpenAI, where we create and deliver groundbreaking AI technologies responsibly and safely to the world!Our Applied Engineering team collaborates across research, engineering, product, and design disciplines to deploy OpenAI's cutting-edge technology for both consumers and businesses. We are committed to learning from our deployments and ensuring that AI is utilized ethically while maximizing its benefits. To us, safety takes precedence over unchecked growth.About the RoleWe are in the process of developing OpenAI's observability product, which encompasses everything from scalable infrastructure to an intuitive, AI-enhanced user interface. Our systems process petabytes of logs and billions of time series metrics throughout our infrastructure. We are now integrating intelligence to create features like agents that summarize service events, auto-generate dashboards, and assist engineers in debugging through user-friendly notebook-like interfaces.We are looking to hire software engineers at all levels of our stack—be it infrastructure, backend, or product. You will be part of a dynamic, resourceful team that develops both foundational infrastructure and innovative internal tools, ensuring the reliability, performance, and observability of OpenAI's production systems.What You’ll DoLead the development of core observability infrastructure, focusing on distributed logging, time series, and trace storage.Create AI-integrated tools that empower engineers to autonomously identify, comprehend, and resolve issues.Enhance user interface experiences including dashboards, notebooking, and interactive debugging.Work collaboratively with engineers, researchers, user operations, and various teams to craft the next generation of the observability product.You Might Be a Fit If You:Have experience operating large-scale distributed systems in production, particularly logging systems or time series databases.Excel in ambiguous environments and tackle unscoped challenges head-on.Possess full-stack development skills or a strong product sensibility; you are eager to build practical tools that users will engage with.Demonstrate robust knowledge of systems, networking, and cloud infrastructure (Kubernetes, AWS, etc.).Bonus: Have built or contributed to observability systems (e.g., Prometheus, OpenTelemetry, etc.).Why This Team?We combine infrastructure and product development to create real AI applications for in-house use.Your contributions will directly enhance the reliability of GPT-based products at OpenAI.
Join Gusto as a Staff Software Engineer specializing in Observability, where you will play a pivotal role in enhancing our software's performance and reliability. Utilize your expertise to develop and implement monitoring solutions that provide insights into application behavior, ensuring a seamless experience for our users.Your contributions will directly impact our engineering processes and product quality. Collaborate with cross-functional teams to identify and resolve issues proactively, while also driving initiatives to improve system observability.
Full-time|On-site|San Francisco, CA | New York City, NY | Seattle, WA
Join Anthropic as a Staff+ Software Engineer specializing in Observability, where you will play a crucial role in enhancing our systems to ensure high-performance and reliability. Collaborate with cross-functional teams to develop innovative solutions, implement observability metrics, and drive improvements that enable better decision-making and user experiences.
Full-time|$150K/yr - $190K/yr|Hybrid|San Francisco, California
About SentryAt Sentry, we believe that poor software is a thing of the past. Our mission is to empower developers to create high-quality software more efficiently, allowing us to enjoy the technology we love.With over $217 million in funding and a community of more than 100,000 organizations backing our vision, we are developing advanced performance and error monitoring tools used by industry leaders such as Disney, Microsoft, and Atlassian, enabling them to focus on innovation rather than bug-fixing.We embrace a hybrid work model across our global offices, designating Mondays, Tuesdays, and Thursdays as in-office days to foster meaningful collaboration. If you are passionate about creating tools that enhance the digital experience, join us in building the future of software monitoring.About the RoleOur Developer Experience team is expanding in San Francisco! This role is hybrid and requires you to be based at our headquarters. We are searching for an innovative builder with a strong viewpoint on exceptional developer documentation.If you are an individual who enjoys experimenting with the latest features in your favorite tools, loves exploring new JavaScript frameworks, and is frustrated by inaccurate code snippets or documentation, then this role at Sentry is perfect for you. The Developer Experience team merges the excitement of shipping cutting-edge products with ensuring developers can use them effectively.If you are confident in collaborating with Product and Engineering teams to test and launch new features, and possess the ability to dive in and solve complex problems while maintaining a keen eye for quality documentation, this could be your dream job.As part of Sentry's Developer Experience team, you will be a builder constantly seeking ways to simplify the experience for developers using our tools. You will engage with the technical community to understand their challenges, support developers, gather feedback, and assist our product and engineering teams in releasing new features.In this role, you will be expected to bring solutions to the table, propose product strategies, identify necessary resources and partnerships, and execute your plans effectively.
Full-time|$125K/yr - $145K/yr|On-site|San Francisco, CA
About Us:At LangChain, we are dedicated to making intelligent agents a fundamental part of everyday technology. Our mission is to provide the essential tools for agent engineering in practical applications, enabling developers to transition seamlessly from initial prototypes to production-ready AI agents that organizations can depend on. Starting as a suite of widely adopted open-source tools, we have expanded to offer a comprehensive platform for building, evaluating, deploying, and managing AI agents at scale.Currently, our platforms, including LangChain, LangGraph, LangSmith, and Agent Builder, are trusted by teams developing real AI solutions in both startups and established enterprises. Our technology powers AI initiatives for renowned companies such as Replit, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, and 35% of the Fortune 500.With $125M raised in Series B funding from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we are at an exciting juncture where we continue to innovate, grow rapidly, and every team member can make a significant impact on our products and collaboration. Join us at LangChain, where your contributions can reshape the technology landscape.About the Role:In-person, 5 days a week in San FranciscoWe are seeking a Fullstack Engineer to join our LangSmith product team, focusing on our commercial AI observability and evaluation platform. In this position, you will have the opportunity to develop new features and capabilities for our platform while collaborating closely with enterprise clients, developer end-users, and internal stakeholders.Your Responsibilities:Design and implement critical product features utilizing our Go, Python, and TypeScript stackWork in close partnership with product and design teams to refine features and enhance the product roadmapDrive project timelines effectively while maintaining high engineering standards through clean, maintainable, and well-tested codeTo Succeed in This Role:2+ years of experience in software engineering, particularly with complex platform productsFullstack engineering experience with Go or Python on the backend and React + TypeScript on the frontendStrong understanding of database systems, especially Postgres and RedisExperience in designing and scaling APIs, ideally in high-performance environments
Full-time|$320K/yr - $405K/yr|On-site|San Francisco, CA
About AnthropicAt Anthropic, we are dedicated to developing AI systems that are reliable, interpretable, and controllable. Our mission is to ensure that artificial intelligence remains safe and beneficial for individuals and society at large. Our rapidly expanding team comprises passionate researchers, engineers, policy experts, and business leaders collaborating to create positive AI solutions.About the TeamAs the scale of AI training and deployment increases, so does the volume of data that requires monitoring and comprehension. Our team utilizes Claude to interpret this data effectively. We manage an integrated suite of tools that empowers Anthropic to pose open-ended inquiries, identify unexpected patterns, and maintain significant human oversight over extensive datasets.Our tools are widely utilized internally, driving ongoing enforcement, threat intelligence investigations, model audits, and much more. We are seeking skilled engineers and researchers to enhance existing applications and innovate new ones from the ground up.About the RoleAs a Research Engineer on our team, you will design and develop systems that enable AI to analyze vast, unstructured datasets—think tens or hundreds of thousands of conversations or documents—and generate structured, reliable insights. You will engage with the entire technology stack, from foundational analysis frameworks to user-facing applications and interfaces.This is a high-impact position. The tools you create will be utilized by numerous researchers and investigators, directly influencing our capacity to assess and counteract both misuse and misalignment.
Join our dynamic team at Cloudflare as a Software Engineer focused on Workers Observability. In this pivotal role, you'll be instrumental in enhancing the observability features of our Workers platform, ensuring optimal performance and reliability for our users. You will collaborate with cross-functional teams, tackle complex technical challenges, and contribute to the advancement of our innovative cloud solutions.
Join us at datacurve as we innovate a gamified developer platform that empowers thousands of engineers to create high-fidelity datasets, advancing the frontiers of large language models (LLMs). In this pivotal role, you will oversee the entire technical lifecycle of our data pipelines—from collaborating with partner labs to establish new data formats, to delivering the essential tools, environments, documentation, and quality assurance processes that bring these formats to life at scale.Key ResponsibilitiesLead Projects End-to-End: Take ownership of projects from initial prototyping through to ongoing maintenance and iterative improvements based on user feedback.Oversee Developer Experience Pipelines: Develop and prototype tools for capturing new data formats, transitioning to a production workflow, and refining the developer experience.Champion Developer Experience: Produce clear and concise guidelines and documentation to empower our contributors and ensure the quality of project inputs.Quality Assurance & Governance: Establish and manage quality standards for your projects, which includes training content reviewers to ensure data consistency and accuracy. Implement automated checks, evaluation harnesses, and workflows to meet data quality benchmarks.Continuous Improvement: Monitor systems, troubleshoot issues, and enhance reliability, latency, and contributor success rates.Occasional ResponsibilitiesDefine Innovative Data Formats: Collaborate with frontier lab researchers to create specifications and design schemas, metadata, and versioning for new formats.Develop Tools and Environments: Deliver tools, sandboxes, command-line interfaces (CLIs), and instrumentation to streamline contribution processes.
Full-time|$220K/yr - $275K/yr|On-site|San Francisco, CA
Peregrine Technologies, headquartered in San Francisco, develops an AI platform that transforms fragmented data into actionable intelligence for public safety, government, and enterprise clients. With support from top Silicon Valley investors, Peregrine serves hundreds of organizations across more than 30 states and two countries, impacting over 125 million people. The company is now expanding further into enterprise and international markets. The Developer Experience (DevEx) team plays a central role as Peregrine’s engineering organization rapidly grows from 50 to over 120 members in a year. As the team expands, challenges like slower development cycles, onboarding complexity, and deployment reliability have become more pressing. The DevEx group, a small and influential team of 2 to 3 engineers, focuses on making daily work smoother for internal developers by identifying pain points and delivering tools and processes that improve efficiency across engineering. Role overview The Tech Lead Manager, Developer Experience, leads the team responsible for engineering tooling, developer velocity, observability, and monitoring. This role is hands-on: setting technical direction, mentoring a small group of senior engineers, and collaborating with other teams to address developer productivity challenges. Defining and tracking metrics for engineering effectiveness is a central part of the job, helping guide where to invest resources for the greatest impact. The work of this team directly influences the productivity of an engineering group of more than 100 people. Key responsibilities Own the DevEx roadmap: Define strategy for CI/CD, observability, and developer productivity. Prioritize work based on data and developer feedback. Lead and develop the team: Manage and mentor 2-3 engineers, support their career growth, provide direction, recruit new talent, and raise engineering standards. Establish effectiveness metrics: Instrument workflows and develop metrics such as build times, deployment frequency, developer satisfaction, and onboarding speed. Improve CI/CD and deployment reliability: Maintain efficient, dependable build, test, and deployment pipelines as the company scales. Develop observability and monitoring frameworks: Build systems for real-time insights and operational monitoring.
Join Crusoe as the Senior Director of Engineering, focusing on enhancing Developer Experience. In this pivotal role, you will lead a dynamic engineering team to innovate and improve our developer tools and platforms, ensuring seamless integration and exceptional user experience. You will collaborate closely with cross-functional teams to drive technical excellence and deliver outstanding solutions that empower developers. Your leadership will be integral to shaping our engineering culture and fostering a collaborative environment where creativity and problem-solving thrive.
Full-time|$194K/yr - $267K/yr|On-site|San Francisco, California
Discover OktaOkta is recognized as The World’s Identity Company, empowering individuals to securely leverage any technology across various devices and applications. Our versatile Okta Platform and Auth0 Platform provide reliable access, authentication, and automation, placing identity at the forefront of business security and expansion.At Okta, we value diverse perspectives and experiences. We seek continuous learners and individuals who can enhance our team with their distinct backgrounds.Join us as we create a world where identity is truly yours.We are in search of a highly skilled Observability Site Reliability Engineer specializing in Google Cloud, to take charge of and elevate our Observability ecosystem within GCP. In this position, you will progress beyond basic monitoring to develop a world-class, comprehensive, and scalable Observability Platform that supports our SRE teams and business collaborators. You will implement infrastructure as code by employing Terraform and demonstrating strong coding skills in Go, Python, or Ruby to automate the deployment of agents and collectors across intricate distributed systems.Key ResponsibilitiesAutomated Infrastructure: Design, build, and maintain scalable observability infrastructure utilizing tools such as Terraform.GCP Observability Engineering: Enhance the collection, processing, and storage of Observability data to guarantee high reliability and low latency for our Splunk and Grafana services.Incident Response: Engage in on-call rotations and conduct post-incident reviews to foster systemic improvements and promote 'observability-driven development.'Automation: Minimize 'toil' by automating the deployment and scaling of observability agents and collectors.
Full-time|$170K/yr - $240K/yr|On-site|San Francisco, CA
About the Role Sigma Computing is growing its engineering team in San Francisco, CA. The company builds technology to help users access data with ease. As a Senior Software Engineer focused on Observability and Reliability, you will work alongside engineers who value high standards and collaboration. What You Will Do Design and build observability platforms and tools, including metrics collection, logging, distributed tracing, dashboards, alerting, and application performance monitoring. Work with technologies such as Go, OpenTelemetry, and Kubernetes to solve reliability challenges. Take part in on-call rotations to help maintain strong uptime for Sigma’s services. Create tools and processes to improve cloud incident triage and reduce downtime. Define and promote practices that make systems and services measurable and observable. Join design and code reviews with peers and stakeholders to reinforce quality and effective collaboration.
Full-time|$166K/yr - $201K/yr|On-site|San Francisco, CA - US
At Crusoe, we are on a mission to accelerate the availability of energy and intelligence. We are building the foundational technology that empowers individuals to innovate boldly with AI while maintaining speed, scale, and sustainability.Join us in the AI revolution with sustainable technology at Crusoe, where you will lead significant innovations, make a real impact, and collaborate with a team that is pioneering responsible and transformative cloud infrastructure.About the Role:We are seeking a highly proficient engineer with extensive experience in designing and managing observability platforms at scale. You will be responsible for architecting, developing, and operating Crusoe’s next-generation observability stack, which will allow engineers to gain insights into the internal state of distributed systems through metrics, logs, and traces. Your contributions will guarantee reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform.Key Responsibilities:Design and manage scalable observability systems (metrics, logging, tracing) in multi-datacenter Kubernetes environments.Architect comprehensive telemetry pipelines, covering ingestion, storage, querying, and visualization.Enhance monitoring and alerting mechanisms with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry.Develop scalable log collection and processing pipelines utilizing Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.Implement distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrate with service meshes, load balancers, and APIs.Establish and promote the adoption of SLOs, SLIs, and error budgets across various services and teams.Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tools (Go, Python).Ensure the reliability and cost-effectiveness of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure).Integrate security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.Collaborate with engineering teams to embed observability into applications, services, and infrastructure.Mentor engineers and influence Crusoe’s observability strategy and technical roadmap.
Oct 1, 2025
Sign in to browse more jobs
Create account — see all 9,050 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.