Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Entry Level
Qualifications
Strong knowledge of cloud computing platforms (e.g., AWS, Azure, GCP). Experience with CI/CD pipelines and DevOps practices. Proficiency in programming languages such as Python, Go, or Java. Understanding of containerization technologies (Docker, Kubernetes). Excellent problem-solving and debugging skills. Ability to work in an agile environment. Strong communication skills and a collaborative mindset.
About the job
Join our innovative team at worldlabs as a Platform Engineer specializing in Developer Infrastructure. In this pivotal role, you will enhance and maintain the tools and platforms that empower our developers to build efficient and scalable applications. You will work collaboratively across teams to ensure that our infrastructure meets the needs of our growing organization.
About worldlabs
worldlabs is a forward-thinking tech company based in San Francisco, dedicated to creating innovative solutions that drive efficiency and foster collaboration. Our team is made up of passionate individuals who thrive in a dynamic environment and are committed to pushing the boundaries of technology.
Similar jobs
1 - 20 of 8,308 Jobs
Search for Engineering Manager Platform Infrastructure
Decagon seeks an Engineering Manager to lead its Platform Infrastructure team in San Francisco. This position shapes the technical foundation behind Decagon’s scalable applications, focusing on both performance and reliability. The role involves hands-on leadership and a commitment to building infrastructure that supports the company’s growth. Role overview This manager will oversee a group of engineers dedicated to platform infrastructure. The team’s work underpins the systems that allow Decagon’s products to scale smoothly and operate dependably. What you will do Guide and support engineers working on key infrastructure projects Direct the development and maintenance of systems that power Decagon’s applications Encourage solutions that boost platform performance and reliability
Full-time|$185K/yr - $400K/yr|On-site|San Francisco, California, United States
Join Our Team as an Infrastructure & Platform EngineerWe are seeking a talented Infrastructure & Platform Engineer to join our dynamic team at mlabs in San Francisco. As a rapidly growing technology company, we are at the cutting edge of the crypto derivatives market, an industry that generates tens of billions in annual revenue. Our exchange is one of the fastest-growing platforms for crypto derivatives, and we are committed to enhancing our offerings to meet the evolving needs of our users.Your mission will be to develop the next critical feature: Multi-Asset Margin, which will streamline how users post collateral directly on-chain, thus improving trading efficiency. You will work alongside our Infrastructure & Platform team, focusing on designing and managing our high-performance systems that deliver exceptional speed and reliability.Key Responsibilities:Design and implement robust scripts and services that ensure optimal performance in real-time environments.Manage and deploy computing resources and containers for tailored services and integrations.Automate scaling, load balancing, and congestion control for both compute and database layers.Establish and maintain CI/CD pipelines for streamlined deployments and continuous delivery.Monitor and optimize system performance across multiple metrics to enhance throughput and resilience.Develop and maintain indexing and explorer services for fast, real-time data access.Provision and optimize diverse database systems, including time-series, relational, key-value, and in-memory databases.
Full-time|$160K/yr - $225K/yr|Hybrid|San Francisco, CA (Hybrid)
About Fable SecurityIn today’s digital landscape, AI-driven threats and human errors represent the most significant risks to enterprise security. Cybercriminals exploit human behavior, contributing to 70% of security breaches. At Fable, we empower individuals to transform from potential targets to active defenders with innovative tools.Fable is at the forefront of human risk management, offering a platform that effectively influences employee behavior. Our user-friendly, scalable solution analyzes complex employee data, identifies high-risk behaviors, and delivers timely interventions directly to users in their work environment.Supported by notable investors like Redpoint Ventures and Greylock Partners, and founded by former members of the Abnormal Security team, Fable is tackling one of cybersecurity's greatest challenges in a rapidly expanding market. Our team comprises alumni from esteemed organizations such as Meta, Twitter, and Flexport, as well as top universities including Waterloo, Columbia, and Stanford. This is an exceptional opportunity for you to join us at a time of rapid growth and help shape the future of security.Why Join UsBuild and scale the foundational data infrastructure that drives a groundbreaking product.Collaborate closely with engineering, data science, and product teams to operationalize data at scale.Become part of a small, high-caliber team where your contributions will have a significant impact.As part of an early-stage company, every engineer plays a crucial role in shaping the evolution of our products and the company's approach to data management.Your RoleAs a Platform and Infrastructure Engineer, you will be instrumental in developing and scaling the core systems that underpin Fable’s product and data operations.Your responsibilities will span backend systems including real-time services and data pipelines. You will ensure reliability, scalability, and optimal performance across all layers. This highly collaborative role involves working closely with data and ML teams, contributing to systems that effectively manage data ingestion, processing, and delivery.This role demands cross-functional collaboration with engineering, data, and product teams to create robust, production-grade systems that grow alongside the company.ResponsibilitiesDesign, develop, and maintain scalable backend and infrastructure systems.Collaborate with cross-functional teams to deliver high-quality software solutions.Ensure system reliability, performance, and security through rigorous testing and monitoring.
About the RoleJoin the innovative team at Known as an Infrastructure and Platform Engineer, where you will take the lead in managing and enhancing our core infrastructure and platform systems. Your work will be crucial in powering AI-driven matching, voice, and scheduling functionalities. You will be responsible for everything from cloud infrastructure and data orchestration to performance monitoring and model deployment support, designing and scaling systems that ensure Known operates swiftly, reliably, and securely.In this pivotal role, you will collaborate closely with the founding team, comprising experts in AI/ML, product development, and design, to establish Known’s technical foundation. You will play a key role in shaping our architecture, engineering culture, and best practices right from the start. This position is perfect for a practical builder who thrives in early-stage environments and is passionate about taking projects from concept to production.
Join our innovative team at worldlabs as a Platform Engineer specializing in Developer Infrastructure. In this pivotal role, you will enhance and maintain the tools and platforms that empower our developers to build efficient and scalable applications. You will work collaboratively across teams to ensure that our infrastructure meets the needs of our growing organization.
Full-time|$282K/yr - $363K/yr|On-site|San Francisco, CA
Supported by premier investors from Silicon Valley, Peregrine Technologies empowers public safety organizations, government entities, federal agencies, and private institutions to tackle societal challenges with unmatched speed and precision. Our AI-driven platform transforms isolated and unconnected data into actionable operational intelligence, swiftly surfacing critical information that enables better, faster decision-making, thereby enhancing outcomes at every interaction. Currently, Peregrine serves hundreds of clients across more than 30 states and two countries, impacting over 125 million individuals, and we are poised to extend our influence into enterprise sectors and globally.Our TeamAs a cohesive engineering unit, we firmly believe that empathy enhances our solutions. Observing how users interact with our products is pivotal in guiding us toward the right solutions. Engineers will have the opportunity to collaborate closely with our onsite team to grasp the diverse use cases that Peregrine addresses.We are on the lookout for an Engineering Manager to join our core engineering teams. You will collaborate cross-functionally with design and product management to develop robust, scalable, and user-centered systems. Our teams face a range of challenges, from enabling real-time collaboration on detailed maps to constructing high-scale backend architectures capable of processing billions of data points.We value both ownership and collaboration—you will take full responsibility for significant features while working closely with fellow engineers to drive projects to fruition. We hold that humility and empathy are vital for crafting the right solutions—you will engage directly with our deployment team and users as we iterate to tackle their challenges. Creativity and perseverance are essential in realizing our vision.RoleThis position is central to the strategic execution of Peregrine's platform. You will define how our core systems scale, perform, and evolve as Peregrine continues its rapid growth and strengthens its impact across public safety, government, and enterprise sectors.As a senior platform leader, your role transcends mere system management; you will establish the technical direction, build your team, and create the operational framework that empowers every product team at Peregrine to progress with speed, safety, and assurance. Your contributions will directly influence system reliability and performance.
Full-time|$162K/yr - $216K/yr|Hybrid|San Francisco, California, United States
Who We AreBaton is Ryder’s innovative product development division dedicated to leveraging cutting-edge technologies to transform the transportation and logistics landscape. Managing over $10 billion in freight, our technology has a significant impact across the U.S. economy.We are committed to creating and delivering software that not only meets but exceeds the needs of Ryder and its 50,000+ clients, which includes some of the most recognized brands globally. Our projects range from user-centric applications to the robust data platform that will drive the future of Ryder’s innovations.Baton’s mission: To enable a supply chain that operates on autopilot.Since Ryder’s acquisition of Baton in 2022, we have been operating with the agility of a startup while benefiting from the extensive reach of a Fortune 500 company. If you're passionate about tackling intricate challenges and making a real impact in the backbone of the American economy, you’ll thrive with us.Role: Software Engineer - InfrastructureDepartment: Data PlatformLocation: Hayes Valley, San Francisco, CA
Full-time|$196K/yr - $220.5K/yr|On-site|San Francisco Bay Area
At Discord, we connect over 200 million users monthly for diverse experiences, with gaming being the predominant activity. Our platform supports more than 90% of our users in enjoying games, collectively logging 1.5 billion hours each month across various titles. As we shape the future of gaming, our mission is to enhance interactions before, during, and after gaming sessions.The Platform Infrastructure teams are pivotal in constructing and upholding the essential systems that energize Discord's core functionalities. We manage systems that process hundreds of thousands of requests per second and handle tens of billions of transactions daily, enabling seamless connections for millions of users. By developing foundational platform components, we empower internal developers to deploy new features swiftly and securely, ensuring Discord remains reliable, efficient, and scalable.As a Senior Software Engineer on our team, you will play a crucial role in continuously refining our codebase, processes, and infrastructure, directly impacting user interactions on Discord!
Plasmidsaurus helps scientists worldwide by streamlining sequencing. Researchers from leading institutions and companies rely on this platform daily. With a global network of labs, the company delivers fast, affordable sequencing results, and has recently expanded into RNA-seq to broaden its genomics reach. The team is focused on building a universal sequencing platform designed for efficiency and global scale. Role overview The Lead Engineer for AI Infrastructure in Platform Engineering sets both technical direction and management strategy for the company’s compute, data, AI, and security infrastructure. This position oversees the entire sequencing operation, from laboratory devices to data delivery. What you will do Oversee core services that coordinate laboratory devices, including robots, sequencers, and on-premises Linux servers, as well as the data ingestion pipeline. Develop cloud infrastructure and data pipelines for storing, processing, and delivering terabytes of sequencing data. Design systems to manage millions of bioinformatics tasks, handling queue management, workflow orchestration, and scheduling. Build AI infrastructure and internal tools to support autonomous systems, including: Quality Scientist Agents: Monitor operations, detect anomalies, and escalate quality or reliability concerns. Logistics Agents: Coordinate global transportation of samples to labs and carriers. Bioinformatics Coding Agents: Run adaptive analyses on varied sample types with different data distributions. Culture The team values initiative and a strong sense of ownership. High agency and responsibility shape how work gets done at Plasmidsaurus.
About Our TeamThe Scaling team at OpenAI is dedicated to designing, constructing, and managing essential infrastructure that powers groundbreaking research.Our mission is straightforward: to expedite the advancement of research towards Artificial General Intelligence (AGI). We achieve this by developing foundational systems that researchers depend on, spanning from core infrastructure elements to specialized applications tailored for research. Our systems are designed to scale efficiently with the growing complexity and size of our workloads while ensuring reliability and user-friendliness.About the PositionWe are seeking a Senior Software Engineer to take charge of critical production infrastructure from start to finish.This role primarily focuses on backend and systems engineering, with a strong emphasis on low-level performance, distributed systems, and the hands-on management of vital services at scale. You will be responsible for transforming ambiguous challenges into actionable plans, delivering pragmatic solutions promptly, and refining them based on real-world feedback and iterations.This position goes beyond a standard Python backend role; we are specifically on the lookout for candidates with robust systems experience in Rust or C++, particularly in performance-sensitive infrastructure.This is an in-office role based in San Francisco, CA, following a hybrid model of three days in the office per week. We also provide relocation assistance for new hires.Your ResponsibilitiesManage critical infrastructure throughout its lifecycle, including design, implementation, deployment, operation, and ongoing improvements.Develop and maintain high-performance backend systems in Rust or C++ that facilitate core research operations.Design and optimize distributed data and serving systems, considering partitioning, replication, consistency, retries, backpressure, and failure isolation.Identify and resolve production bottlenecks related to latency, throughput, contention, hot spots, and overload scenarios.Oversee mission-critical services, including on-call duties, incident management, postmortems, observability, deployment safety, and zero-downtime migrations.Enhance the reliability of services running on Kubernetes, focusing on resource tuning and failure management.Collaborate closely with engineers and researchers to deliver fast, dependable, and effective systems.Elevate standards through strong technical judgment, ownership, and commitment to quality.You Will Excel in This Role If You Have:A proven track record of owning and delivering operationally critical systems end to end in ambiguous settings.Experience with systems programming in Rust or C++.Strong analytical skills and a problem-solving mindset.Excellent communication and collaboration skills.
Full-time|$200K/yr - $240K/yr|Hybrid|United States
SentiLink is at the forefront of delivering groundbreaking identity verification and risk management solutions, enabling both institutions and individuals to engage in transactions with utmost assurance. We are revolutionizing the identity verification landscape in the United States by replacing outdated and costly methods with solutions that are ten times faster, more intelligent, and remarkably accurate.Our rapid growth is a testament to our success, with our real-time APIs having verified hundreds of millions of identities, starting with the financial services sector and swiftly branching out into diverse markets. SentiLink is proudly supported by esteemed investors such as Craft Ventures, Andreessen Horowitz, NYCA, and Max Levchin.Our accomplishments have been recognized by notable media outlets including TechCrunch, CNBC, Bloomberg, Forbes, Business Insider, PYMNTS, and American Banker. We are also honored to have been featured on the Forbes Fintech 50 list every year since 2023. In a historical milestone, we became the first company to launch the eCBSV and have provided testimony before the United States House of Representatives regarding the future of identity verification.SentiLink embraces various working styles, offering options that range from fully remote to in-office arrangements. As a digital-first organization, we foster robust collaboration across teams in the U.S. and India, with physical offices located in Austin, San Francisco, New York City, Seattle, Los Angeles, and Chicago in the U.S., as well as Gurugram (Delhi) and Bengaluru in India. If you are near one of our offices, we encourage you to join us in-person regularly. Certain roles are specifically designed to be hybrid or in-office; for instance, our engineering team in India predominantly operates from our Gurugram office.Role:We are seeking a skilled Engineering Manager to spearhead our Data Platform team, which is dedicated to building and scaling the data infrastructure that underpins SentiLink’s products and decision-making systems.This integral team is responsible for the pipelines, storage systems, and data services that empower all of our products, facilitate data analysis, and support business intelligence efforts. These systems are fundamental in detecting fraud, driving machine learning models, and delivering valuable insights to our customers.
The Scaling team at OpenAI builds and maintains the core infrastructure that supports research efforts. This group focuses on enabling rapid progress toward Artificial General Intelligence by providing the systems and tools researchers rely on every day. Their work covers everything from foundational infrastructure to specialized applications, all designed to handle increasing complexity and scale without sacrificing reliability or ease of use. Role overview OpenAI is seeking a Site Reliability Engineer to manage and improve the infrastructure behind its analytics platform. This position centers on supporting production systems that handle data-intensive, low-latency workloads. Key technologies include large-scale ClickHouse clusters, high-throughput Kafka pipelines, and stable integrations with Snowflake. The engineer in this role will turn ambiguous operational challenges into concrete solutions, deliver improvements quickly, and iterate based on real-world feedback. Success in this role means independently setting and raising operational standards, working closely with production systems, and collaborating across teams to ensure reliability at scale. Key responsibilities Manage the full lifecycle of infrastructure: provisioning, upgrades, scaling, and decommissioning using Infrastructure as Code (IaC). Operate and scale ClickHouse clusters, including sharding, replication, capacity planning, tuning, and maintenance. Run Kafka as the primary data ingestion layer, improving throughput, managing lag and backpressure, and ensuring robust failure recovery. Improve latency and reliability for workloads involving heavy data serving and querying. Develop and maintain monitoring and alerting systems, including SLIs/SLOs, dashboards, alert policies, and actionable runbooks. Create and refine incident response protocols, on-call procedures, and postmortem practices. Oversee backup, restore, and disaster recovery strategies, including regular drills. Plan and execute safe rollouts across development, staging, and production environments, using canary deployments and rollback plans. Work daily with software engineers to embed reliability into system design, implementation, and release cycles. Set and promote standards for operational readiness and runbooks, encouraging adoption across teams. Enhance CI/CD pipelines and improve the developer experience for greater speed and safety.
Role overview The Platform Engineer at Coframe will help shape the core infrastructure that powers the company’s engineering efforts. Based in the SF Bay Area, this role involves designing and refining the systems that underpin how teams build, deploy, and manage software. Day-to-day work includes using AI tools to improve productivity and streamline deployment processes. The engineer will also focus on strengthening monitoring, enhancing security, and managing costs across the platform. Impact This position carries significant responsibility. The solutions developed will directly support all teams at Coframe, influencing how software is created and maintained across the company. The work done in this role will help set the direction for future engineering practices.
Full-time|$180K/yr - $210K/yr|On-site|San Francisco, CA
About Sigma Computing Sigma Computing builds AI-powered apps and analytics tools that connect directly to cloud data warehouses. Teams use Sigma to create applications, automate workflows, and analyze live data through a spreadsheet interface, SQL and Python editors, visual builders, and integrated AI features. The platform supports everything from interactive analyses to reports and embedded data experiences. Role Overview: Senior Product Manager - Platform Performance & Infrastructure Sigma is growing to serve larger enterprises with demanding, complex workloads. The Senior Product Manager for Platform Performance & Infrastructure will guide the development of core backend systems that keep Sigma responsive and reliable as usage scales. This role focuses on driving improvements in: Workbook performance Query lifecycle management Compute and caching strategies Metadata services Compiler components New warehouse connectors These systems are essential for Sigma’s ability to deliver consistent, high-quality performance to enterprise customers. What You Will Do Define and prioritize product enhancements for backend platform performance and scalability Work closely with platform engineering and cross-functional teams to address technical challenges Translate performance and scalability needs into clear product requirements and measurable objectives Ensure Sigma’s infrastructure can support enterprise clients with reliability and speed Who We’re Looking For Experienced Senior Product Manager with strong technical background Comfortable working hands-on with backend systems and infrastructure Skilled at collaborating with engineering and cross-functional partners Focused on delivering measurable improvements for customers Location & On-Site Requirement This position is based in San Francisco, CA. It requires working on-site at the Sigma office at least four days per week.
Lambda, recognized as The Superintelligence Cloud, is a pioneering force in AI cloud infrastructure, empowering tens of thousands of customers, from AI researchers to large enterprises and hyperscalers. Our mission is to make computational power as accessible as electricity, providing everyone the capability of superintelligence—one person, one GPU.Join us in our quest to build the world’s leading AI cloud platform.Note: This role mandates in-office presence in our San Francisco location four days a week; Lambda’s designated remote work day is Tuesday.As an Engineering Manager at Lambda, you will lead the charge in developing and scaling our cloud offerings, which encompass the Lambda website, cloud APIs, and internal tools for deployment, management, and maintenance.
Full-time|$217K/yr - $312.2K/yr|On-site|San Francisco, California
At Databricks, we are dedicated to empowering data teams to tackle the world's most challenging issues, from realizing the next generation of transportation to expediting medical advancements. Our mission involves constructing and managing the premier data and AI infrastructure platform, enabling our clients to leverage profound data insights to enhance their operations. The Workspace Platform team is embarking on an ambitious journey to scale our customer base by 100x and support the evolution of agentic AI workloads. Our objective is to create a unified, consistent, and foundational Shared Platform that enhances the overall Databricks Workspace experience. As a Senior Engineering Manager within the Workspace Platform team, you will spearhead the development of a cohesive infrastructure that underpins vital customer-facing functionalities across various Databricks products. These include Content Discovery (similar to Google Search), Content Organization (akin to Google Drive), collaborative code editing, and repository management (comparable to GitHub). This is a high-impact opportunity to lead a team of approximately 20 software engineers in developing platform features, intuitive workspace experiences, and essential partner integrations that are pivotal to Databricks' growth and user adoption.
Slash Financial develops business banking infrastructure tailored for real operational demands. Since its founding in 2021, the company has processed over $10 billion in annual business transactions across several industries. With $100M in Series C funding from investors including Ribbit Capital, Khosla Ventures, Goodwater Capital, NEA, and Y Combinator, Slash continues to grow its product offerings and market presence. The San Francisco headquarters fosters a collaborative, in-person work environment. Role overview The Senior Infrastructure/Platform Engineer will take a hands-on role in scaling and strengthening the core platform behind Slash’s banking products. This position involves designing, building, and maintaining infrastructure to support rapid company growth and high transaction volumes. The work spans AWS cloud operations, Terraform, Kubernetes, and related systems. This engineer will help shape infrastructure strategy and make key decisions on performance, observability, security, and deployment. What you will do Lead the development of next-generation database, real-time workflow, and container orchestration infrastructure. Scale and enhance the Kubernetes (EKS) platform, CockroachDB clusters, Kafka (MSK), Temporal workflows, and ElastiCache Redis. Collaborate with engineering teams to establish and scale best practices using AWS ALB, WAF, Route 53, OpenSearch, S3, and Vercel. Create and maintain abstractions in Terraform and Pulumi to streamline architecture and assist product teams. Improve the speed and reliability of CI/CD pipelines. Tackle complex scaling, performance, and low-latency challenges within a monolithic architecture. Location This role is based in the San Francisco office and supports an in-person work culture.
About AnythingAnything is a pioneering AI product engineering company, empowering the next generation of entrepreneurs. Our innovative AI agent transforms English into fully functional applications, encapsulating everything needed to monetize online ventures, including mobile solutions, web interfaces, design, AI capabilities, backend services, infrastructure, and payment systems. Since our launch on August 7th, we have achieved $5 million in revenue and are rapidly expanding. Discover more at anything.com.Role OverviewWhat You Will DoWe are looking for individuals eager to accelerate their growth and make a significant impact. In this role, you will develop systems that support millions of applications and billions of users, addressing the challenges that arise in a high-demand environment. You will design and maintain the runtime, control plane, and isolation boundaries essential for safely executing user-generated applications at scale.Your innovative solutions will utilize platform telemetry, execution data, and feedback loops to enhance code generation and application performance, all powered by our AI-centric platform.You will take ownership of key components of the platform from architecture and implementation to operational production and iteration.Operational ResponsibilitiesDesign and manage multi-tenant cloud infrastructure, focusing on isolation, deployment, observability, and cost control for customer applications.Ensure top-tier reliability and performance for our platform.Conduct research to inform decisions regarding technology choices and service providers.Collaborate closely with product teams to develop platform features that drive product innovation.Stay informed about the latest advancements in infrastructure research and development.Successful platform management requires composure under pressure. We value self-assurance coupled with curiosity and a commitment to evidence-based decision-making.Key Performance MetricsYour effectiveness will be evaluated based on:1. Runtime InfrastructureDevelop and oversee scalable, low-latency infrastructure for user applications.2. Platform ReliabilityYou will ensure the platform's uptime and reliability, preventing failures from affecting multiple customers. Our users expect high availability and rapid issue resolution.3. Platform Support for Product FeaturesYou will create the platform features essential to support our product roadmap, ensuring seamless integration and performance.
Full-time|$225K/yr - $275K/yr|On-site|South San Francisco, California, USA
Senior Manager of Data Platforms & Autonomy Infrastructure San Francisco Bay Area — In Person About Zipline Zipline is pioneering an innovative instant logistics system, utilizing autonomous aircraft to deliver essential and everyday items to individuals precisely when and where they need them. Currently, Zipline boasts the world's largest autonomous delivery network, providing support to healthcare systems, governmental agencies, and commercial partners across various continents. As Zipline expands its operations from tens of thousands to millions of flights daily, the significance of data as core infrastructure cannot be overstated. The mechanisms that determine the data we gather, how we process it, and how teams utilize it directly influence safety, autonomy performance, system uptime, and operational costs. About the Role We are seeking a Senior Manager of Data Platforms & Autonomy Infrastructure to spearhead teams and systems that transform real-world flight data into actionable insights and learning experiences. This pivotal role is responsible for the comprehensive data platform for autonomy and operations, overseeing everything from onboard data logging and ingestion to post-processing, sampling, and the creation of curated datasets utilized by autonomy, hardware, operations, and business teams. The ideal candidate will establish technical direction, build and guide the organization, and ensure the reliability of these systems to accommodate 1 million flights daily with exceptional uptime. This position is particularly suited for leaders who have experience in developing large-scale robotics or autonomy data systems in production environments. This is an in-person position based in the San Francisco Bay Area. Key Responsibilities Set Technical Direction Develop a long-term strategy and roadmap for Zipline’s data, autonomy, and machine learning infrastructure. Establish architectural standards across logging, data ingestion, processing, storage, access/visualization, and machine learning training and evaluation. Balance reliability, performance, cost, and developer productivity across the platform. Support a diverse range of internal stakeholders, including hardware teams, autonomy/software teams, and analytics/business teams. Facilitate Debugging, Learning, and Scalability Enable swift root-cause analysis across autonomy, hardware, and operations.
Full-time|$179.4K/yr - $224.3K/yr|On-site|San Francisco, CA; New York, NY
In a world where software is rapidly evolving, artificial intelligence (AI) is at the forefront, transforming how we interact with technology. At Scale AI, we recognize the immense potential of AI to enhance human capabilities, offering personalized support across various aspects of life—from coaching and tutoring to shopping and travel guidance. As enterprises, startups, and governments rush to integrate large language models (LLMs) into their operations, it is crucial to ensure these systems are safe, aligned, and effective. This involves rigorous human evaluation and reinforcement learning through human feedback (RLHF) during all stages of model development.Our innovative products, including the Generative AI Data Engine, SGP, and Donovan, are designed to empower the most advanced LLMs and generative models globally. By leveraging world-class RLHF, human data generation, model evaluation, safety, and alignment, we are shaping the future of human-AI interaction.As a member of our Platform Engineering team, you will play a pivotal role in designing and developing the foundational platforms that support Scale's operations. Your responsibilities will include architecting our core cloud infrastructure, enhancing our data lifecycle, and transforming the software development process for engineers at Scale. You will gain invaluable insights into the AI landscape as it develops within diverse sectors.
Mar 26, 2026
Sign in to browse more jobs
Create account — see all 8,308 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.