Senior Site Reliability Engineer jobs in Toronto – Browse 1,270 openings on RoboApply Jobs

Senior Site Reliability Engineer jobs in Toronto

Open roles matching “Senior Site Reliability Engineer” with location signals for Toronto. 1,270 active listings on RoboApply Jobs.

1,270 jobs found

1 - 20 of 1,270 Jobs
Apply
companyRelayfi logo
Full-time|CA$243K/yr - CA$297K/yr|On-site|Toronto, ON

At Relay, we empower self-made business owners with a digital banking platform that transforms financial management into a source of clarity, confidence, and control. Our mission is to replace financial uncertainty with genuine visibility, enabling entrepreneurs to convert their hard work into enduring success. By alleviating the stress of cash flow management, we provide the tools necessary for owners to operate robust and resilient businesses.As Relay continues its growth trajectory, the reliability, performance, and resilience of our platform have become integral to both our customer experience and overall business success.This senior leadership position is crucial in steering a team of Site Reliability Engineers while shaping how reliability strategies influence engineering and product decisions throughout the organization. You will determine the future direction of the SRE function, promote operational excellence, and assist the company in anticipating and managing scale challenges before they pose risks.If you thrive on tackling complex systems, leading organizations, and building resilient platforms that customers depend on daily, we are eager to connect with you!Key ResponsibilitiesLead and enhance Relay’s Site Reliability Engineering function, establishing strategic direction as the company scales.Define and implement a long-term reliability roadmap, making informed trade-offs under real business and capacity constraints.Act as the senior reliability voice in discussions involving engineering and product leadership.Influence the integration of reliability considerations into product planning, architectural decisions, and delivery processes.Serve as a senior escalation point during critical production incidents, ensuring effective communication and thorough follow-up actions.Enhance Relay’s observability, performance, and operational maturity practices across teams.Establish and uphold standards concerning SLOs, operational readiness, incident management, and continuous improvement.Collaborate with stakeholders in Engineering, Product, Data, and Finance to balance velocity, risk, performance, and cost.Build and nurture a high-performing SRE organization capable of supporting future growth.

Feb 26, 2026
Apply
companyMongoDB, Inc. logo
Full-time|CA$144K/yr - CA$200K/yr|Hybrid|Montreal; Toronto

The Storage Layer Services (SLS) team at MongoDB is embarking on an innovative journey to re-architect our cloud storage layer, forming the core of our next-generation cloud storage architecture. This newly established team is dedicated to creating high-performance, multi-tenant distributed storage services that not only enhance our current Atlas storage stack but also enable more efficient customer workloads. As a Senior Site Reliability Engineer, you will collaborate closely with teams responsible for these storage services to establish Service Level Objectives (SLOs), develop capacity plans, and guarantee the reliability, durability, and operational safety of the foundational storage layer supporting Atlas. By joining our small team of seasoned SREs, you will play an integral role in executing a multi-year roadmap for MongoDB’s cloud storage architecture. This position is open to candidates based in our Toronto or Montreal offices or those working remotely from anywhere in Canada, provided they are located in the Eastern or Central time zones.

Apr 8, 2026
Apply
companyMongoDB, Inc. logo
Full-time|CA$144K/yr - CA$200K/yr|Hybrid|Toronto; Vancouver

The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.

Apr 8, 2026
Apply
companyPinterest, Inc. logo
Full-time|On-site|Toronto, ON, Canada

Pinterest is hiring a Senior Site Reliability Engineer in Toronto, ON, Canada. The focus of this role is to ensure that Pinterest’s services remain reliable, scalable, and perform well as the platform grows. Working closely with software engineers, this position involves designing and implementing solutions that strengthen system reliability and efficiency. Key responsibilities Partner with engineering teams to maintain and enhance the reliability of Pinterest’s services Design and implement improvements to support scalability and performance Troubleshoot and resolve service issues to reduce downtime Requirements Extensive experience in site reliability engineering or a closely related field Strong technical background with proven problem-solving abilities Comfort working alongside software engineers to improve systems This position is located in Toronto, ON, Canada.

Apr 24, 2026
Apply
companyOkta, Inc. logo
Full-time|$136K/yr - $187K/yr|On-site|Toronto, Ontario, Canada

Empower Every Identity, from AI to HumanIdentity is the cornerstone of unlocking AI's potential. At Okta, we secure AI by creating a trustworthy, neutral infrastructure that allows organizations to confidently navigate this transformative era. This mission demands an unwavering commitment to addressing intricate challenges with significant real-world implications. We seek innovative builders who act with speed and urgency and execute with exceptional proficiency.This is your chance to engage in work that can define your career. We are fully dedicated to this mission. If you share this passion, we want to hear from you.Join Us in Securing Every Identity, from AI to HumanOkta is at the forefront of providing a superior authentication experience for hundreds of millions globally. Our focus on reliability forms the bedrock of our product, with a strong commitment to surpassing customer expectations for availability being a fundamental engineering priority. As a Senior Site Reliability Engineer, you will be part of our SRE team, ensuring our production systems are not only fully operational but also resilient, scalable, and poised for remarkable growth. This role goes beyond mere maintenance; it is about playing a significant role in enhancing the core robustness and resilience of our platform. You will be a proactive builder, developing solutions that inherently boost our system's reliability.Your Responsibilities:Craft and develop custom software in Go to bolster the platform’s reliability and resilience.Collaborate with engineering teams to integrate reliability principles, enhancing the availability, performance, and observability of our services.Utilize your profound understanding of infrastructure and observability to pinpoint improvement opportunities within the product and implement effective solutions.Participate in our on-call rotation, providing swift, effective responses to critical incidents and utilizing your expertise to troubleshoot, mitigate, or accurately escalate production issues.Enhance our SRE tooling and processes, focusing on automation and operational efficiency.Establish, document, and promote reliability best practices throughout the organization.

Apr 8, 2026
Apply
companyVeeva Systems Inc. logo
Full-time|Hybrid|Canada - Toronto

Veeva Systems is a mission-driven leader in industry cloud technology, dedicated to accelerating the delivery of therapies to patients in the life sciences sector. As one of the fastest-growing SaaS companies ever, we surpassed $2 billion in revenue last fiscal year with significant growth prospects ahead.Central to Veeva's mission are our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. Notably, we made history in 2021 by becoming a public benefit corporation (PBC), which legally commits us to balance the interests of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your work environment, whether it's from home or in our office, enabling you to excel in your preferred setting.Be part of our journey in transforming the life sciences industry and making a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be instrumental in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge of Java and modern open-source technologies to create a meaningful impact on our production systems.The ideal candidate will possess substantial experience with Java applications and cutting-edge open-source technologies, particularly within the context of enterprise software development or a high-growth tech environment. As a Senior SRE, you should have a natural curiosity and a strong aptitude for problem-solving. Your unique engineering perspective will be critical as you understand how systems integrate in production to function efficiently on a global scale, supporting hundreds of customers across North America, Europe, and Asia.

Oct 7, 2025
Apply
companyVeeva Systems Inc. logo
Full-time|Hybrid|Canada - Toronto

At Veeva Systems, we are driven by a mission to revolutionize the life sciences industry, empowering companies to bring therapies to patients at an accelerated pace. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue last fiscal year and possess immense growth potential.Our core values - Do the Right Thing, Customer Success, Employee Success, and Speed - define who we are. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we offer the flexibility for you to work remotely or from our office, allowing you to thrive in your preferred environment.Join us in transforming the life sciences sector and making a positive impact on our customers, employees, and communities.

Oct 7, 2025
Apply
companyRootly logo
Full-time|Hybrid|Toronto, Ontario, Canada (Hybrid)

About Rootly At Rootly, we are dedicated to revolutionizing how organizations manage incidents. Our mission is to provide a reliable incident management platform that empowers companies to respond swiftly and effectively when challenges arise. Our innovative approach has established us as leaders in a new multi-billion dollar segment, and we are seeking exceptional talent to help us achieve our ambitious goals. Our customers, including industry giants like NVIDIA, Figma, Canva, and Tripadvisor, trust Rootly for their critical incident management needs. They appreciate our user-friendly platform and unique partnership approach, which has garnered us a stellar 5-star rating on G2. Join us in creating a reliable future for organizations worldwide. Backed by prestigious investors from Y Combinator to key operators in tech, we prioritize transparency and team involvement in our financial health. We conduct monthly business reviews and share updates through our weekly changelog. About the Role As a Senior Site Reliability Engineer at Rootly, you will play a crucial role in shaping our technical infrastructure. You will thrive in a dynamic environment where each day presents new challenges and opportunities for growth. This position is perfect for individuals who seek ownership, enjoy tackling complex technical problems, and are driven by a mission to enhance reliability. While the work will be demanding, it promises to be one of the most rewarding experiences in your career. Collaborate with product teams to enhance the observability, reliability, and performance of services. Take ownership of our CI/CD pipelines, observability tools, monitoring systems, and incident response processes. Develop tools and automation to reduce manual toil, enhance engineering velocity, and improve developer experience and system reliability. Engage deeply with engineering teams to gain insights into system performance and identify cross-functional reliability and scaling concerns. Design and scale our infrastructure while ensuring top-notch performance and operational excellence.

Mar 5, 2026
Apply
companyacquird logo
Full-time|CA$130K/yr - CA$180K/yr|Hybrid|Toronto - Hybrid

A Few Important Notes:Join a Profitable B2B SaaS company with teams primarily located in North America.This position is predominantly remote, with a requirement to meet in Toronto once a month.Candidates must possess the legal right to work in Canada; we are unable to provide visa sponsorship.As our platform continues to expand, we are actively seeking a Senior Site Reliability Engineer (SRE) / Cloud Engineer.Experience with Azure is highly prioritized as it is our primary cloud platform.About Our Company:We are recognized as one of the leading retail analytics platforms, empowering marketing teams and brands to decode retail data and execute targeted media campaigns without the need for coding. Our services enhance client understanding of customer behavior and maximize ROI on marketing campaigns, with notable clients including Home Depot.Utilize a modern cloud stack, with a focus on Azure, CI/CD, containerization, and distributed computing technologies.About You:We are in search of a dynamic and skilled Senior SRE/Cloud Engineer who is eager to take on a pivotal role in managing our Cloud Operations, ensuring uptime, reliability, and automation.Key Responsibilities:Collaborate with software engineering teams to design, implement, and maintain CI/CD pipelines for rapid and reliable software releases.Automate and optimize infrastructure provisioning, configuration, and management processes utilizing industry-standard tools and methodologies.Implement and manage containerization and orchestration technologies to enhance scalability and resource efficiency.Own the end-to-end availability and performance of our cloud infrastructure; proactively identify potential issues and implement automation to mitigate recurrence.Participate in an on-call rotation to ensure system stability and responsiveness during off-hours.Lead the development and implementation of service-level objectives crucial for maintaining product reliability.

Oct 23, 2024
Apply
companyTenstorrent logo
Full-time|On-site|Toronto, Ontario, Canada

Join Tenstorrent as a Site Reliability Engineer, where you will play a crucial role in ensuring the reliability and performance of our cutting-edge systems. As a member of our dedicated engineering team, you will work on innovative solutions to enhance our infrastructure and streamline operations. Your expertise will help us deliver exceptional service and uptime to our customers.

Apr 10, 2026
Apply
companyRelay logo
Full-time|$211.5K/yr - $258.5K/yr|On-site|Toronto, ON

At Relay, we are revolutionizing the way self-made business owners manage their finances through our cutting-edge digital banking platform. Our mission is to empower entrepreneurs with the tools and knowledge they need to achieve financial clarity, confidence, and control over their earnings. By transforming cash flow management from a source of stress into a clear, actionable insight, we help our customers build stronger and more resilient businesses.As we continue to grow, the reliability, performance, and resilience of our platform have become critical components of our customer experience and overall business success.We are currently seeking an Engineering Manager to lead our Site Reliability Engineering (SRE) team. In this pivotal role, you will oversee the scalability, reliability, and robustness of Relay's systems. This position transcends infrastructure management and incident response; it is a leadership opportunity that sits at the nexus of technology, team dynamics, and business strategy. You will mentor and manage a talented SRE team, influence how reliability is integrated across the organization, and ensure our systems can safely scale in response to increasing customer demands and complexity.If you thrive in technically demanding environments and are passionate about fostering strong teams, a healthy workplace culture, and effective cross-functional collaboration, this position is designed for you.

Feb 2, 2026
Apply
companyNewton logo
Full-time|Remote|Toronto, Ontario

Join our innovative team at Newton as a Site Reliability Engineer, where you'll play a crucial role in ensuring the reliability and performance of our systems. In this fully remote position, you will collaborate with engineering and operations teams to develop solutions that enhance system uptime and efficiency.Your expertise will help us transition and maintain our infrastructure, ensuring our services are resilient and scalable. This is an exciting opportunity to contribute to a company that values innovation and teamwork.

Mar 26, 2026
Apply
companyMovable Ink logo
Full-time|On-site|Movable Ink - Toronto

At Movable Ink, we empower marketers with cutting-edge content personalization through data-driven content creation and AI-driven decision-making. Our innovative platform is trusted by top global brands to enhance revenue, streamline workflows, and increase marketing agility. With our headquarters in New York City and a talented team of nearly 600 employees, Movable Ink has a presence across North America, Central America, Europe, Australia, and Japan.As a Lead Site Reliability Engineer, you will leverage your technical expertise and leadership skills to oversee infrastructure and software development initiatives. You will play a pivotal role in designing and evolving key systems within our multi-cloud, multi-region content serving platform, which handles over 25 billion requests daily. By fostering architectural vision, cross-team collaboration, and mentorship, you will spearhead reliability initiatives and define the technical strategies necessary for scaling our platform to accommodate 50 billion requests per day and beyond.

Feb 24, 2026
Apply
companyMomentum Financial Services Group logo
Full-time|Hybrid|Toronto, Canada

Momentum Financial Services Group (MFSG) is the company behind Money Mart, Canada’s largest non-bank branch network. With over four decades of experience, MFSG delivers financial solutions for underserved communities, including short-term loans, money transfers, and prepaid cards. Each year, millions of customers rely on these services for timely financial support. Role Overview: Site Reliability Engineer The Site Reliability Engineer plays a key role in keeping MFSG’s digital banking and financial services platforms available, responsive, and resilient. This position centers on automating operational tasks, setting and maintaining service-level objectives, and engineering systems to withstand and recover from failures. Daily work involves close collaboration with engineering, DevOps, QA, cybersecurity, and compliance teams to ensure platform reliability meets both technical and regulatory requirements. The role also emphasizes proactive monitoring, incident response, and ongoing improvements to the software delivery process to reduce production risk. Why Join Momentum Financial Services Group? Competitive compensation that reflects experience and current market rates Annual bonus based on individual and company achievements Comprehensive benefits including health and dental coverage with premiums fully paid, plus Employee Assistance Program access Retirement planning support to help prepare for the future Hybrid work model offering flexibility between remote work and in-office collaboration at the Toronto headquarters Employee perks such as tuition reimbursement, professional development, Perkopolis discounts, and recognition programs Location Toronto, Canada (hybrid work model)

Apr 14, 2026
Apply
companyOkta, Inc. logo
Full-time|$194K/yr - $266K/yr|On-site|Toronto, Ontario, Canada

Welcome to OktaAt Okta, we are redefining identity management. We empower individuals to securely access any technology, from any device or application, fostering a transformative approach to business security and growth. Our innovative solutions, including the Okta Platform and Auth0 Platform, prioritize identity at the heart of operational success.We value diverse perspectives and experiences, seeking lifelong learners who contribute to our dynamic culture.Join us as we shape a future where identity is truly in your hands.Are you driven to tackle complex data challenges and make a significant impact? Do you want to collaborate with a passionate team of cloud engineers and architects? If yes, we want to hear from you!The Auth0 platform manages over 100 million logins daily for clients worldwide and is rapidly expanding. As part of the Data Platform team, you will be instrumental in developing and managing essential data services that enable scalability, reliability, efficiency, and operational excellence. In your role as Senior Manager, you will collaborate with engineers across departments, guide the platform roadmap, and establish the foundational infrastructure for Auth0's future growth.As a leader, your passion for developing high-performing teams and your ability to coordinate across organizations will make you an ideal fit for this position!Your Responsibilities Include:Leading a diverse, agile software development team focused on delivering value with expertise in distributed systems, cloud infrastructure, and site reliability engineering.Fostering a culture of discovery, learning, and experimentation within a geographically distributed team through continuous coaching and mentoring.Collaborating closely with architects and engineers to design scalable, robust, and extensible services using modern technologies such as Go, Node.js, Kubernetes, Docker, AWS, and Azure.Building and managing data streaming teams utilizing event-driven architecture and Kafka.Partnering with product management and engineering leadership to define a platform roadmap that supports the next generation of identity products, overseeing planning, execution, and delivery of data platform services.Implementing process improvements to drive operational excellence and efficiency during a period of significant growth.

Mar 19, 2026
Apply
companyQuince logo
Full-time|$110K/yr - $130K/yr|On-site|Toronto, Ontario, Canada

ABOUT QUINCEEstablished in 2018, Quince is revolutionizing the retail landscape by demonstrating that high-quality goods can be affordably priced. Our mission is straightforward: to provide premium essentials at accessible costs, produced ethically and sustainably. We believe everyone deserves exceptional craftsmanship and timeless design without the inflated prices typically associated with luxury. Quince operates on a direct-to-consumer (DTC) model that eliminates intermediaries, utilizing just-in-time manufacturing to reduce waste and enhance value.Quince is a tech-driven company that is transforming the retail sector by integrating AI, analytics, and automation into our core operations. Our steadfast dedication to excellence and adherence to our company values shape our decisions and actions:Customer First: We prioritize customer satisfaction in every decision.High Quality: True quality means premium materials and rigorous production standards you can feel good about.Essential Design: We focus on timeless, functional essentials instead of chasing trends.Always a Better Deal: Innovation and transparency ensure value for both customers and partners.Social & Environmental Responsibility: We commit to sustainable materials, ethical production, and fair wages.Quince collaborates with top-tier manufacturers worldwide, serving millions of satisfied customers. Backed by strong investors and a commitment to sustainable growth, we are rapidly expanding while upholding our focus on quality, simplicity, and radical price transparency.JOIN OUR TEAM AND BE PART OF OUR SUCCESS

Feb 12, 2026
Apply
companyKnix logo
Contract|On-site|Toronto, ON

Join Knix, a celebrated brand in intimate apparel and activewear, as we redefine the way people experience intimates in everyday life. Since our inception in 2013, we've rapidly grown to become one of North America's leading intimate apparel brands, recognized globally for our innovative approach to apparel. With a community of over 3 million customers, we offer our products through online platforms, Knix retail locations across North America, and partnerships with wholesale and Amazon.We are committed to revolutionizing the apparel industry with our unparalleled customer experience and innovative product lines such as Knix, Kt by Knix, and Mntd. If you're seeking a meaningful and authentic career, we invite you to join our dynamic team!The Knix eCommerce Team is in search of a Senior Manager, Site Merchandising who will spearhead and enhance the onsite experience, serving as a key driver for revenue growth, conversion rates, and customer acquisition.Reporting directly to the Director of Site Merchandising, this role is both strategic and operational. You will be tasked with defining how customers navigate, evaluate, and purchase products on our site, ensuring swift and quality execution in a fast-paced environment.Your responsibilities will encompass executing the comprehensive onsite merchandising strategy, which includes managing the homepage, navigation, product discovery, launches, and conversion optimization. You will translate business objectives into clear onsite priorities, harness data to identify opportunities, and facilitate cross-departmental collaboration with Growth, Product, Creative, and Analytics teams. Following implementation, you will utilize your insights to influence and refine ongoing and future site merchandising strategies.This position demands a leader capable of taking ownership of the site experience—striking a balance between customer satisfaction, brand storytelling, and commercial performance.

Apr 7, 2026
Apply
companyCohere logo
FullTime|On-site|Toronto

About Us:At Cohere, we are dedicated to scaling intelligence to enhance human experience. We specialize in training and deploying cutting-edge AI models for developers and businesses, empowering them to create extraordinary applications such as content generation, semantic search, retrieval-augmented generation (RAG), and intelligent agents. We believe our innovative work is pivotal in driving the adoption of AI across various sectors.Our team is passionate and meticulous about what we create. Every team member plays a crucial role in enhancing our models' capabilities and the value they deliver to our clients. We prioritize hard work and agility to serve our customers effectively.Cohere comprises a diverse team of researchers, engineers, designers, and industry experts, all committed to excellence in their respective fields. We understand that a variety of perspectives is essential for developing outstanding products.Join us in our mission to shape the future of AI!Why This Position Matters:If you thrive on building high-performance, scalable, and reliable machine learning systems, and you are excited about defining the future of AI platforms that power advanced NLP applications, we want you on our Model Serving team at Cohere. As a Site Reliability Engineer, you will be instrumental in developing, deploying, and managing our AI platform, which delivers Cohere's large language models via user-friendly API endpoints. You will collaborate with multiple teams to deploy optimized NLP models in environments characterized by low latency, high throughput, and high availability. This role also offers the chance to engage with customers and create tailored deployments that address their unique requirements.Your Responsibilities:Design and build self-service systems that streamline the management, deployment, and operation of services.Develop custom Kubernetes operators that facilitate language model deployments.Automate observability and resilience within the environment, empowering developers to troubleshoot and resolve issues efficiently.Ensure adherence to defined Service Level Objectives (SLOs), which includes participating in an on-call rotation.Foster strong relationships with internal developers and help guide the Infrastructure team’s roadmap based on their feedback.Contribute to the development of our team through knowledge sharing and an active review process.

Jan 12, 2026
Apply
companyOntario Transit Group logo
Site Geologist

Ontario Transit Group

Full-time|On-site|Toronto

Join our dynamic team at Ontario Transit Group as a Site Geologist, where you will play a pivotal role in shaping the future of transit infrastructure. In this full-time position, you will be responsible for conducting geological assessments and providing expert recommendations to ensure the safety and sustainability of our projects.

Feb 19, 2026
Apply
companyOntario Transit Group logo
Site Superintendent

Ontario Transit Group

Full-time|On-site|Toronto

Join the Ontario Transit Group as a Site Superintendent, where you will lead and manage construction projects from inception through completion. Your expertise will ensure that projects are completed on time, within budget, and to the highest quality standards. As a key player in our team, you will liaise with various stakeholders, oversee work schedules, and enforce safety regulations on-site.

Apr 1, 2026

Sign in to browse more jobs

Create account — see all 1,270 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.