Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Senior
Qualifications
The ideal candidate shouldPossess 10+ years of experience in software engineering and operations, particularly with a focus on distributed systems and a profound understanding of networking fundamentals, including TCP/IP (notably IPv6), DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles. Exhibit a customer-centric mindset, consistently driving enhancements that prioritize end-user experience. Value operational efficiency and demonstrate a strong inclination toward automation over manual processes, often described as 'allergic to ops work.'Be well-versed in contemporary cloud infrastructures and the foundational network design elements of at least one major provider such as AWS, Azure, or GCP, including VPCs, subnetting, routing, VPNs, peering, private links, and CDNs. Have substantial knowledge of service mesh and load-balancing principles, and show eagerness to implement these in a multi-cloud framework.
About the job
The Team
At MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.
The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.
This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
About MongoDB, Inc.
MongoDB, Inc. is a leading modern data platform that empowers developers and businesses to harness the power of data through innovative cloud services. With a commitment to open-source technology and a focus on scalability and performance, MongoDB enables organizations to build and deploy applications with unparalleled flexibility.
Full-time|CA$144K/yr - CA$200K/yr|Hybrid|Toronto; Vancouver
The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
Full-time|CA$243K/yr - CA$297K/yr|On-site|Toronto, ON
At Relay, we empower self-made business owners with a digital banking platform that transforms financial management into a source of clarity, confidence, and control. Our mission is to replace financial uncertainty with genuine visibility, enabling entrepreneurs to convert their hard work into enduring success. By alleviating the stress of cash flow management, we provide the tools necessary for owners to operate robust and resilient businesses.As Relay continues its growth trajectory, the reliability, performance, and resilience of our platform have become integral to both our customer experience and overall business success.This senior leadership position is crucial in steering a team of Site Reliability Engineers while shaping how reliability strategies influence engineering and product decisions throughout the organization. You will determine the future direction of the SRE function, promote operational excellence, and assist the company in anticipating and managing scale challenges before they pose risks.If you thrive on tackling complex systems, leading organizations, and building resilient platforms that customers depend on daily, we are eager to connect with you!Key ResponsibilitiesLead and enhance Relay’s Site Reliability Engineering function, establishing strategic direction as the company scales.Define and implement a long-term reliability roadmap, making informed trade-offs under real business and capacity constraints.Act as the senior reliability voice in discussions involving engineering and product leadership.Influence the integration of reliability considerations into product planning, architectural decisions, and delivery processes.Serve as a senior escalation point during critical production incidents, ensuring effective communication and thorough follow-up actions.Enhance Relay’s observability, performance, and operational maturity practices across teams.Establish and uphold standards concerning SLOs, operational readiness, incident management, and continuous improvement.Collaborate with stakeholders in Engineering, Product, Data, and Finance to balance velocity, risk, performance, and cost.Build and nurture a high-performing SRE organization capable of supporting future growth.
Full-time|CA$144K/yr - CA$200K/yr|Hybrid|Montreal; Toronto
The Storage Layer Services (SLS) team at MongoDB is embarking on an innovative journey to re-architect our cloud storage layer, forming the core of our next-generation cloud storage architecture. This newly established team is dedicated to creating high-performance, multi-tenant distributed storage services that not only enhance our current Atlas storage stack but also enable more efficient customer workloads. As a Senior Site Reliability Engineer, you will collaborate closely with teams responsible for these storage services to establish Service Level Objectives (SLOs), develop capacity plans, and guarantee the reliability, durability, and operational safety of the foundational storage layer supporting Atlas. By joining our small team of seasoned SREs, you will play an integral role in executing a multi-year roadmap for MongoDB’s cloud storage architecture. This position is open to candidates based in our Toronto or Montreal offices or those working remotely from anywhere in Canada, provided they are located in the Eastern or Central time zones.
Pinterest is hiring a Senior Site Reliability Engineer in Toronto, ON, Canada. The focus of this role is to ensure that Pinterest’s services remain reliable, scalable, and perform well as the platform grows. Working closely with software engineers, this position involves designing and implementing solutions that strengthen system reliability and efficiency. Key responsibilities Partner with engineering teams to maintain and enhance the reliability of Pinterest’s services Design and implement improvements to support scalability and performance Troubleshoot and resolve service issues to reduce downtime Requirements Extensive experience in site reliability engineering or a closely related field Strong technical background with proven problem-solving abilities Comfort working alongside software engineers to improve systems This position is located in Toronto, ON, Canada.
Empower Every Identity, from AI to HumanIdentity is the cornerstone of unlocking AI's potential. At Okta, we secure AI by creating a trustworthy, neutral infrastructure that allows organizations to confidently navigate this transformative era. This mission demands an unwavering commitment to addressing intricate challenges with significant real-world implications. We seek innovative builders who act with speed and urgency and execute with exceptional proficiency.This is your chance to engage in work that can define your career. We are fully dedicated to this mission. If you share this passion, we want to hear from you.Join Us in Securing Every Identity, from AI to HumanOkta is at the forefront of providing a superior authentication experience for hundreds of millions globally. Our focus on reliability forms the bedrock of our product, with a strong commitment to surpassing customer expectations for availability being a fundamental engineering priority. As a Senior Site Reliability Engineer, you will be part of our SRE team, ensuring our production systems are not only fully operational but also resilient, scalable, and poised for remarkable growth. This role goes beyond mere maintenance; it is about playing a significant role in enhancing the core robustness and resilience of our platform. You will be a proactive builder, developing solutions that inherently boost our system's reliability.Your Responsibilities:Craft and develop custom software in Go to bolster the platform’s reliability and resilience.Collaborate with engineering teams to integrate reliability principles, enhancing the availability, performance, and observability of our services.Utilize your profound understanding of infrastructure and observability to pinpoint improvement opportunities within the product and implement effective solutions.Participate in our on-call rotation, providing swift, effective responses to critical incidents and utilizing your expertise to troubleshoot, mitigate, or accurately escalate production issues.Enhance our SRE tooling and processes, focusing on automation and operational efficiency.Establish, document, and promote reliability best practices throughout the organization.
Veeva Systems is a mission-driven leader in industry cloud technology, dedicated to accelerating the delivery of therapies to patients in the life sciences sector. As one of the fastest-growing SaaS companies ever, we surpassed $2 billion in revenue last fiscal year with significant growth prospects ahead.Central to Veeva's mission are our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. Notably, we made history in 2021 by becoming a public benefit corporation (PBC), which legally commits us to balance the interests of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your work environment, whether it's from home or in our office, enabling you to excel in your preferred setting.Be part of our journey in transforming the life sciences industry and making a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be instrumental in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge of Java and modern open-source technologies to create a meaningful impact on our production systems.The ideal candidate will possess substantial experience with Java applications and cutting-edge open-source technologies, particularly within the context of enterprise software development or a high-growth tech environment. As a Senior SRE, you should have a natural curiosity and a strong aptitude for problem-solving. Your unique engineering perspective will be critical as you understand how systems integrate in production to function efficiently on a global scale, supporting hundreds of customers across North America, Europe, and Asia.
At Veeva Systems, we are driven by a mission to revolutionize the life sciences industry, empowering companies to bring therapies to patients at an accelerated pace. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue last fiscal year and possess immense growth potential.Our core values - Do the Right Thing, Customer Success, Employee Success, and Speed - define who we are. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we offer the flexibility for you to work remotely or from our office, allowing you to thrive in your preferred environment.Join us in transforming the life sciences sector and making a positive impact on our customers, employees, and communities.
About Rootly At Rootly, we are dedicated to revolutionizing how organizations manage incidents. Our mission is to provide a reliable incident management platform that empowers companies to respond swiftly and effectively when challenges arise. Our innovative approach has established us as leaders in a new multi-billion dollar segment, and we are seeking exceptional talent to help us achieve our ambitious goals. Our customers, including industry giants like NVIDIA, Figma, Canva, and Tripadvisor, trust Rootly for their critical incident management needs. They appreciate our user-friendly platform and unique partnership approach, which has garnered us a stellar 5-star rating on G2. Join us in creating a reliable future for organizations worldwide. Backed by prestigious investors from Y Combinator to key operators in tech, we prioritize transparency and team involvement in our financial health. We conduct monthly business reviews and share updates through our weekly changelog. About the Role As a Senior Site Reliability Engineer at Rootly, you will play a crucial role in shaping our technical infrastructure. You will thrive in a dynamic environment where each day presents new challenges and opportunities for growth. This position is perfect for individuals who seek ownership, enjoy tackling complex technical problems, and are driven by a mission to enhance reliability. While the work will be demanding, it promises to be one of the most rewarding experiences in your career. Collaborate with product teams to enhance the observability, reliability, and performance of services. Take ownership of our CI/CD pipelines, observability tools, monitoring systems, and incident response processes. Develop tools and automation to reduce manual toil, enhance engineering velocity, and improve developer experience and system reliability. Engage deeply with engineering teams to gain insights into system performance and identify cross-functional reliability and scaling concerns. Design and scale our infrastructure while ensuring top-notch performance and operational excellence.
A Few Important Notes:Join a Profitable B2B SaaS company with teams primarily located in North America.This position is predominantly remote, with a requirement to meet in Toronto once a month.Candidates must possess the legal right to work in Canada; we are unable to provide visa sponsorship.As our platform continues to expand, we are actively seeking a Senior Site Reliability Engineer (SRE) / Cloud Engineer.Experience with Azure is highly prioritized as it is our primary cloud platform.About Our Company:We are recognized as one of the leading retail analytics platforms, empowering marketing teams and brands to decode retail data and execute targeted media campaigns without the need for coding. Our services enhance client understanding of customer behavior and maximize ROI on marketing campaigns, with notable clients including Home Depot.Utilize a modern cloud stack, with a focus on Azure, CI/CD, containerization, and distributed computing technologies.About You:We are in search of a dynamic and skilled Senior SRE/Cloud Engineer who is eager to take on a pivotal role in managing our Cloud Operations, ensuring uptime, reliability, and automation.Key Responsibilities:Collaborate with software engineering teams to design, implement, and maintain CI/CD pipelines for rapid and reliable software releases.Automate and optimize infrastructure provisioning, configuration, and management processes utilizing industry-standard tools and methodologies.Implement and manage containerization and orchestration technologies to enhance scalability and resource efficiency.Own the end-to-end availability and performance of our cloud infrastructure; proactively identify potential issues and implement automation to mitigate recurrence.Participate in an on-call rotation to ensure system stability and responsiveness during off-hours.Lead the development and implementation of service-level objectives crucial for maintaining product reliability.
Join Tenstorrent as a Site Reliability Engineer, where you will play a crucial role in ensuring the reliability and performance of our cutting-edge systems. As a member of our dedicated engineering team, you will work on innovative solutions to enhance our infrastructure and streamline operations. Your expertise will help us deliver exceptional service and uptime to our customers.
Full-time|$211.5K/yr - $258.5K/yr|On-site|Toronto, ON
At Relay, we are revolutionizing the way self-made business owners manage their finances through our cutting-edge digital banking platform. Our mission is to empower entrepreneurs with the tools and knowledge they need to achieve financial clarity, confidence, and control over their earnings. By transforming cash flow management from a source of stress into a clear, actionable insight, we help our customers build stronger and more resilient businesses.As we continue to grow, the reliability, performance, and resilience of our platform have become critical components of our customer experience and overall business success.We are currently seeking an Engineering Manager to lead our Site Reliability Engineering (SRE) team. In this pivotal role, you will oversee the scalability, reliability, and robustness of Relay's systems. This position transcends infrastructure management and incident response; it is a leadership opportunity that sits at the nexus of technology, team dynamics, and business strategy. You will mentor and manage a talented SRE team, influence how reliability is integrated across the organization, and ensure our systems can safely scale in response to increasing customer demands and complexity.If you thrive in technically demanding environments and are passionate about fostering strong teams, a healthy workplace culture, and effective cross-functional collaboration, this position is designed for you.
Join our innovative team at Newton as a Site Reliability Engineer, where you'll play a crucial role in ensuring the reliability and performance of our systems. In this fully remote position, you will collaborate with engineering and operations teams to develop solutions that enhance system uptime and efficiency.Your expertise will help us transition and maintain our infrastructure, ensuring our services are resilient and scalable. This is an exciting opportunity to contribute to a company that values innovation and teamwork.
At Movable Ink, we empower marketers with cutting-edge content personalization through data-driven content creation and AI-driven decision-making. Our innovative platform is trusted by top global brands to enhance revenue, streamline workflows, and increase marketing agility. With our headquarters in New York City and a talented team of nearly 600 employees, Movable Ink has a presence across North America, Central America, Europe, Australia, and Japan.As a Lead Site Reliability Engineer, you will leverage your technical expertise and leadership skills to oversee infrastructure and software development initiatives. You will play a pivotal role in designing and evolving key systems within our multi-cloud, multi-region content serving platform, which handles over 25 billion requests daily. By fostering architectural vision, cross-team collaboration, and mentorship, you will spearhead reliability initiatives and define the technical strategies necessary for scaling our platform to accommodate 50 billion requests per day and beyond.
Momentum Financial Services Group (MFSG) is the company behind Money Mart, Canada’s largest non-bank branch network. With over four decades of experience, MFSG delivers financial solutions for underserved communities, including short-term loans, money transfers, and prepaid cards. Each year, millions of customers rely on these services for timely financial support. Role Overview: Site Reliability Engineer The Site Reliability Engineer plays a key role in keeping MFSG’s digital banking and financial services platforms available, responsive, and resilient. This position centers on automating operational tasks, setting and maintaining service-level objectives, and engineering systems to withstand and recover from failures. Daily work involves close collaboration with engineering, DevOps, QA, cybersecurity, and compliance teams to ensure platform reliability meets both technical and regulatory requirements. The role also emphasizes proactive monitoring, incident response, and ongoing improvements to the software delivery process to reduce production risk. Why Join Momentum Financial Services Group? Competitive compensation that reflects experience and current market rates Annual bonus based on individual and company achievements Comprehensive benefits including health and dental coverage with premiums fully paid, plus Employee Assistance Program access Retirement planning support to help prepare for the future Hybrid work model offering flexibility between remote work and in-office collaboration at the Toronto headquarters Employee perks such as tuition reimbursement, professional development, Perkopolis discounts, and recognition programs Location Toronto, Canada (hybrid work model)
Welcome to OktaAt Okta, we are redefining identity management. We empower individuals to securely access any technology, from any device or application, fostering a transformative approach to business security and growth. Our innovative solutions, including the Okta Platform and Auth0 Platform, prioritize identity at the heart of operational success.We value diverse perspectives and experiences, seeking lifelong learners who contribute to our dynamic culture.Join us as we shape a future where identity is truly in your hands.Are you driven to tackle complex data challenges and make a significant impact? Do you want to collaborate with a passionate team of cloud engineers and architects? If yes, we want to hear from you!The Auth0 platform manages over 100 million logins daily for clients worldwide and is rapidly expanding. As part of the Data Platform team, you will be instrumental in developing and managing essential data services that enable scalability, reliability, efficiency, and operational excellence. In your role as Senior Manager, you will collaborate with engineers across departments, guide the platform roadmap, and establish the foundational infrastructure for Auth0's future growth.As a leader, your passion for developing high-performing teams and your ability to coordinate across organizations will make you an ideal fit for this position!Your Responsibilities Include:Leading a diverse, agile software development team focused on delivering value with expertise in distributed systems, cloud infrastructure, and site reliability engineering.Fostering a culture of discovery, learning, and experimentation within a geographically distributed team through continuous coaching and mentoring.Collaborating closely with architects and engineers to design scalable, robust, and extensible services using modern technologies such as Go, Node.js, Kubernetes, Docker, AWS, and Azure.Building and managing data streaming teams utilizing event-driven architecture and Kafka.Partnering with product management and engineering leadership to define a platform roadmap that supports the next generation of identity products, overseeing planning, execution, and delivery of data platform services.Implementing process improvements to drive operational excellence and efficiency during a period of significant growth.
Join Waabi, a pioneering force in Physical AI, founded by the visionary Raquel Urtasun. We are at the forefront of revolutionizing autonomous transportation, developing cutting-edge technology that powers commercial autonomous trucks and robotaxis. With esteemed partnerships in AI, automotive, logistics, and deep tech, we are shaping the future of transportation.Located in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is rapidly expanding and seeking diverse, innovative, and collaborative individuals eager to make a positive impact on the world. Discover more about our journey at: www.waabi.aiYour Role:- Collaborate within a multidisciplinary team of Engineers and Researchers, utilizing an AI-first approach to ensure safe self-driving technology is deployed at scale.- Develop robust and scalable tools and frameworks that support Autonomous Vehicle (AV) development.- Lead technical discussions and architectural planning in collaboration with both Researchers and Engineers.- Mentor fellow software engineers through code reviews, design discussions, and by sharing best practices in software development.- Assist with task planning and estimation to enhance project efficiency.
Join Our Team! The Loyalty & Virality tribe is integral to HelloFresh's strategy for customer retention and organic growth. We develop innovative products and systems that engage customers, reward loyalty, and encourage them to share their HelloFresh experiences with others. Our team is responsible for loyalty and rewards programs, referral systems, viral growth mechanics, and personalized re-engagement experiences, functioning at a global scale across various brands and markets. We combine profound product insight with data-driven experimentation, leveraging AI to create smarter, more personalized customer experiences. If you're enthusiastic about building systems that transform satisfied customers into loyal advocates, this is the ideal team for you! Explore more about our initiatives by visiting our tech blog. Key Responsibilities: Act as a senior technical authority within the Loyalty & Virality tribe, establishing architectural direction, driving technical excellence, and mentoring fellow engineers. This influential individual contributor role significantly impacts the global scaling of our referral and virality systems. Define and oversee the technical strategy for our referral and virality platform, ensuring that our systems remain flexible, resilient, and scalable across all HelloFresh brands and markets. Take complete ownership of the architecture, design, development, deployment, and operations of the Share squad’s microservices. Promote the integration of AI and machine learning within the squads, identifying opportunities for enhancing personalization, predictive engagement, and intelligent reward optimization. Collaborate actively as a solution-oriented member of an autonomous, cross-functional agile team, working closely with Product Owners, Engineers, Designers, and Business Intelligence teams. Facilitate effective communication across the engineering organization, ensuring technical alignment among diverse stakeholders and colleagues. Perform additional duties as assigned. Qualifications: Extensive experience with microservice architecture and event-driven systems (e.g., Kafka, RabbitMQ), alongside distributed architecture principles. Strong proficiency in programming languages and frameworks pertinent to our tech stack. Demonstrated ability to mentor and lead technical discussions, fostering a culture of excellence and continuous improvement. Excellent communication skills, with the ability to articulate complex technical concepts to a non-technical audience. A passion for leveraging data and technology to enhance customer experiences.
About FaireFaire is a dynamic online wholesale marketplace driven by the belief that the future lies in local commerce. Independent retailers worldwide are achieving greater revenue than giants like Walmart and Amazon combined, yet remain relatively small in stature. At Faire, we harness the power of technology, data, and machine learning to connect this vibrant community of entrepreneurs around the globe. Imagine your favorite local boutique—we empower them to discover the finest products globally to stock their shelves. With the right tools and insights, we aim to level the playing field, enabling small businesses everywhere to compete against large box stores and e-commerce behemoths.By championing the growth of independent enterprises, Faire fosters positive economic impacts within local communities on a global scale. We are on the lookout for intelligent, resourceful, and passionate individuals to join us as we drive the shop-local movement. If you believe in the power of community, we invite you to be a part of ours.Role Description:Our Engineering organization is the backbone of our marketplace, responsible for the software that enables it to function seamlessly. The Product Security team empowers product engineering teams to create and deploy secure software solutions. We prioritize best engineering practices, striving to deliver software that is secure, thoroughly tested, easy to maintain, and capable of scaling to millions of users. We develop scalable, reusable frameworks, consult with product teams, leverage data-driven insights, and continually iterate on our practices.As a Senior Staff Software Engineer in Product Security, you will take on the role of technical lead for the Product Security domain. You will establish the long-term technical vision for integrating security within Faire’s application framework. Collaborating closely with Platform and Product Engineering teams, you will identify and mitigate security vulnerabilities, spearhead significant security initiatives, and mentor engineers across the organization to enhance secure engineering practices.Additionally, you will lead cross-functional programs to embed security deeply within our architecture, pipelines, and developer experience, effectively minimizing risk while maintaining development velocity.In this role, you will:Define the long-term technical strategy for application security at Faire, establishing scalable and developer-friendly frameworks and principles that facilitate secure development across all product areas.
ABOUT QUINCEEstablished in 2018, Quince is revolutionizing the retail landscape by demonstrating that high-quality goods can be affordably priced. Our mission is straightforward: to provide premium essentials at accessible costs, produced ethically and sustainably. We believe everyone deserves exceptional craftsmanship and timeless design without the inflated prices typically associated with luxury. Quince operates on a direct-to-consumer (DTC) model that eliminates intermediaries, utilizing just-in-time manufacturing to reduce waste and enhance value.Quince is a tech-driven company that is transforming the retail sector by integrating AI, analytics, and automation into our core operations. Our steadfast dedication to excellence and adherence to our company values shape our decisions and actions:Customer First: We prioritize customer satisfaction in every decision.High Quality: True quality means premium materials and rigorous production standards you can feel good about.Essential Design: We focus on timeless, functional essentials instead of chasing trends.Always a Better Deal: Innovation and transparency ensure value for both customers and partners.Social & Environmental Responsibility: We commit to sustainable materials, ethical production, and fair wages.Quince collaborates with top-tier manufacturers worldwide, serving millions of satisfied customers. Backed by strong investors and a commitment to sustainable growth, we are rapidly expanding while upholding our focus on quality, simplicity, and radical price transparency.JOIN OUR TEAM AND BE PART OF OUR SUCCESS
Findem is looking for a Senior Staff Full Stack Engineer based in Toronto or working remotely. This role centers on building and improving the company’s platform through hands-on development and close collaboration with colleagues from different disciplines. Role overview This position involves designing, developing, and refining features that support Findem’s products. The Senior Staff Full Stack Engineer will work with cross-functional teams to deliver solutions that drive the platform forward. What you will do Develop and enhance platform features using full stack technologies Collaborate with product, design, and engineering teams to deliver impactful updates Apply deep experience in full stack development to solve technical challenges Requirements Extensive background in full stack engineering Experience working with cross-functional teams Ability to work remotely from Toronto or elsewhere
Full-time|CA$144K/yr - CA$200K/yr|Hybrid|Toronto; Vancouver
The TeamAt MongoDB, our Platform Engineering division within Site Reliability Engineering (SRE) is tasked with managing essential infrastructure and operational functions that empower our engineering teams. This includes our robust, multi-cloud Kubernetes infrastructure, deployment systems, and advanced observability and alerting mechanisms.The Fabric team is at the forefront of enabling secure communication across systems and from the public internet. Our responsibilities involve designing network architecture, implementing service mesh solutions, and optimizing edge load balancing to ensure the safety of customer data in transit. This team is vital in developing and maintaining a dependable and globally connected multi-cloud network that underpins MongoDB products.This position can be based in our Toronto or Vancouver offices, or you can work completely remotely from anywhere in North America. We provide flexible hybrid work arrangements for those in our offices.
Full-time|CA$243K/yr - CA$297K/yr|On-site|Toronto, ON
At Relay, we empower self-made business owners with a digital banking platform that transforms financial management into a source of clarity, confidence, and control. Our mission is to replace financial uncertainty with genuine visibility, enabling entrepreneurs to convert their hard work into enduring success. By alleviating the stress of cash flow management, we provide the tools necessary for owners to operate robust and resilient businesses.As Relay continues its growth trajectory, the reliability, performance, and resilience of our platform have become integral to both our customer experience and overall business success.This senior leadership position is crucial in steering a team of Site Reliability Engineers while shaping how reliability strategies influence engineering and product decisions throughout the organization. You will determine the future direction of the SRE function, promote operational excellence, and assist the company in anticipating and managing scale challenges before they pose risks.If you thrive on tackling complex systems, leading organizations, and building resilient platforms that customers depend on daily, we are eager to connect with you!Key ResponsibilitiesLead and enhance Relay’s Site Reliability Engineering function, establishing strategic direction as the company scales.Define and implement a long-term reliability roadmap, making informed trade-offs under real business and capacity constraints.Act as the senior reliability voice in discussions involving engineering and product leadership.Influence the integration of reliability considerations into product planning, architectural decisions, and delivery processes.Serve as a senior escalation point during critical production incidents, ensuring effective communication and thorough follow-up actions.Enhance Relay’s observability, performance, and operational maturity practices across teams.Establish and uphold standards concerning SLOs, operational readiness, incident management, and continuous improvement.Collaborate with stakeholders in Engineering, Product, Data, and Finance to balance velocity, risk, performance, and cost.Build and nurture a high-performing SRE organization capable of supporting future growth.
Full-time|CA$144K/yr - CA$200K/yr|Hybrid|Montreal; Toronto
The Storage Layer Services (SLS) team at MongoDB is embarking on an innovative journey to re-architect our cloud storage layer, forming the core of our next-generation cloud storage architecture. This newly established team is dedicated to creating high-performance, multi-tenant distributed storage services that not only enhance our current Atlas storage stack but also enable more efficient customer workloads. As a Senior Site Reliability Engineer, you will collaborate closely with teams responsible for these storage services to establish Service Level Objectives (SLOs), develop capacity plans, and guarantee the reliability, durability, and operational safety of the foundational storage layer supporting Atlas. By joining our small team of seasoned SREs, you will play an integral role in executing a multi-year roadmap for MongoDB’s cloud storage architecture. This position is open to candidates based in our Toronto or Montreal offices or those working remotely from anywhere in Canada, provided they are located in the Eastern or Central time zones.
Pinterest is hiring a Senior Site Reliability Engineer in Toronto, ON, Canada. The focus of this role is to ensure that Pinterest’s services remain reliable, scalable, and perform well as the platform grows. Working closely with software engineers, this position involves designing and implementing solutions that strengthen system reliability and efficiency. Key responsibilities Partner with engineering teams to maintain and enhance the reliability of Pinterest’s services Design and implement improvements to support scalability and performance Troubleshoot and resolve service issues to reduce downtime Requirements Extensive experience in site reliability engineering or a closely related field Strong technical background with proven problem-solving abilities Comfort working alongside software engineers to improve systems This position is located in Toronto, ON, Canada.
Empower Every Identity, from AI to HumanIdentity is the cornerstone of unlocking AI's potential. At Okta, we secure AI by creating a trustworthy, neutral infrastructure that allows organizations to confidently navigate this transformative era. This mission demands an unwavering commitment to addressing intricate challenges with significant real-world implications. We seek innovative builders who act with speed and urgency and execute with exceptional proficiency.This is your chance to engage in work that can define your career. We are fully dedicated to this mission. If you share this passion, we want to hear from you.Join Us in Securing Every Identity, from AI to HumanOkta is at the forefront of providing a superior authentication experience for hundreds of millions globally. Our focus on reliability forms the bedrock of our product, with a strong commitment to surpassing customer expectations for availability being a fundamental engineering priority. As a Senior Site Reliability Engineer, you will be part of our SRE team, ensuring our production systems are not only fully operational but also resilient, scalable, and poised for remarkable growth. This role goes beyond mere maintenance; it is about playing a significant role in enhancing the core robustness and resilience of our platform. You will be a proactive builder, developing solutions that inherently boost our system's reliability.Your Responsibilities:Craft and develop custom software in Go to bolster the platform’s reliability and resilience.Collaborate with engineering teams to integrate reliability principles, enhancing the availability, performance, and observability of our services.Utilize your profound understanding of infrastructure and observability to pinpoint improvement opportunities within the product and implement effective solutions.Participate in our on-call rotation, providing swift, effective responses to critical incidents and utilizing your expertise to troubleshoot, mitigate, or accurately escalate production issues.Enhance our SRE tooling and processes, focusing on automation and operational efficiency.Establish, document, and promote reliability best practices throughout the organization.
Veeva Systems is a mission-driven leader in industry cloud technology, dedicated to accelerating the delivery of therapies to patients in the life sciences sector. As one of the fastest-growing SaaS companies ever, we surpassed $2 billion in revenue last fiscal year with significant growth prospects ahead.Central to Veeva's mission are our core values: Do the Right Thing, Customer Success, Employee Success, and Speed. Notably, we made history in 2021 by becoming a public benefit corporation (PBC), which legally commits us to balance the interests of our customers, employees, society, and investors.As a Work Anywhere company, we empower you to choose your work environment, whether it's from home or in our office, enabling you to excel in your preferred setting.Be part of our journey in transforming the life sciences industry and making a positive impact on our customers, employees, and communities.The RoleWe are seeking a talented Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be instrumental in ensuring the scalability and reliability of our enterprise applications. You will face complex challenges on a global scale, leveraging your extensive knowledge of Java and modern open-source technologies to create a meaningful impact on our production systems.The ideal candidate will possess substantial experience with Java applications and cutting-edge open-source technologies, particularly within the context of enterprise software development or a high-growth tech environment. As a Senior SRE, you should have a natural curiosity and a strong aptitude for problem-solving. Your unique engineering perspective will be critical as you understand how systems integrate in production to function efficiently on a global scale, supporting hundreds of customers across North America, Europe, and Asia.
At Veeva Systems, we are driven by a mission to revolutionize the life sciences industry, empowering companies to bring therapies to patients at an accelerated pace. As one of the fastest-growing SaaS companies in history, we achieved over $2 billion in revenue last fiscal year and possess immense growth potential.Our core values - Do the Right Thing, Customer Success, Employee Success, and Speed - define who we are. In 2021, we made history by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.As a Work Anywhere organization, we offer the flexibility for you to work remotely or from our office, allowing you to thrive in your preferred environment.Join us in transforming the life sciences sector and making a positive impact on our customers, employees, and communities.
About Rootly At Rootly, we are dedicated to revolutionizing how organizations manage incidents. Our mission is to provide a reliable incident management platform that empowers companies to respond swiftly and effectively when challenges arise. Our innovative approach has established us as leaders in a new multi-billion dollar segment, and we are seeking exceptional talent to help us achieve our ambitious goals. Our customers, including industry giants like NVIDIA, Figma, Canva, and Tripadvisor, trust Rootly for their critical incident management needs. They appreciate our user-friendly platform and unique partnership approach, which has garnered us a stellar 5-star rating on G2. Join us in creating a reliable future for organizations worldwide. Backed by prestigious investors from Y Combinator to key operators in tech, we prioritize transparency and team involvement in our financial health. We conduct monthly business reviews and share updates through our weekly changelog. About the Role As a Senior Site Reliability Engineer at Rootly, you will play a crucial role in shaping our technical infrastructure. You will thrive in a dynamic environment where each day presents new challenges and opportunities for growth. This position is perfect for individuals who seek ownership, enjoy tackling complex technical problems, and are driven by a mission to enhance reliability. While the work will be demanding, it promises to be one of the most rewarding experiences in your career. Collaborate with product teams to enhance the observability, reliability, and performance of services. Take ownership of our CI/CD pipelines, observability tools, monitoring systems, and incident response processes. Develop tools and automation to reduce manual toil, enhance engineering velocity, and improve developer experience and system reliability. Engage deeply with engineering teams to gain insights into system performance and identify cross-functional reliability and scaling concerns. Design and scale our infrastructure while ensuring top-notch performance and operational excellence.
A Few Important Notes:Join a Profitable B2B SaaS company with teams primarily located in North America.This position is predominantly remote, with a requirement to meet in Toronto once a month.Candidates must possess the legal right to work in Canada; we are unable to provide visa sponsorship.As our platform continues to expand, we are actively seeking a Senior Site Reliability Engineer (SRE) / Cloud Engineer.Experience with Azure is highly prioritized as it is our primary cloud platform.About Our Company:We are recognized as one of the leading retail analytics platforms, empowering marketing teams and brands to decode retail data and execute targeted media campaigns without the need for coding. Our services enhance client understanding of customer behavior and maximize ROI on marketing campaigns, with notable clients including Home Depot.Utilize a modern cloud stack, with a focus on Azure, CI/CD, containerization, and distributed computing technologies.About You:We are in search of a dynamic and skilled Senior SRE/Cloud Engineer who is eager to take on a pivotal role in managing our Cloud Operations, ensuring uptime, reliability, and automation.Key Responsibilities:Collaborate with software engineering teams to design, implement, and maintain CI/CD pipelines for rapid and reliable software releases.Automate and optimize infrastructure provisioning, configuration, and management processes utilizing industry-standard tools and methodologies.Implement and manage containerization and orchestration technologies to enhance scalability and resource efficiency.Own the end-to-end availability and performance of our cloud infrastructure; proactively identify potential issues and implement automation to mitigate recurrence.Participate in an on-call rotation to ensure system stability and responsiveness during off-hours.Lead the development and implementation of service-level objectives crucial for maintaining product reliability.
Join Tenstorrent as a Site Reliability Engineer, where you will play a crucial role in ensuring the reliability and performance of our cutting-edge systems. As a member of our dedicated engineering team, you will work on innovative solutions to enhance our infrastructure and streamline operations. Your expertise will help us deliver exceptional service and uptime to our customers.
Full-time|$211.5K/yr - $258.5K/yr|On-site|Toronto, ON
At Relay, we are revolutionizing the way self-made business owners manage their finances through our cutting-edge digital banking platform. Our mission is to empower entrepreneurs with the tools and knowledge they need to achieve financial clarity, confidence, and control over their earnings. By transforming cash flow management from a source of stress into a clear, actionable insight, we help our customers build stronger and more resilient businesses.As we continue to grow, the reliability, performance, and resilience of our platform have become critical components of our customer experience and overall business success.We are currently seeking an Engineering Manager to lead our Site Reliability Engineering (SRE) team. In this pivotal role, you will oversee the scalability, reliability, and robustness of Relay's systems. This position transcends infrastructure management and incident response; it is a leadership opportunity that sits at the nexus of technology, team dynamics, and business strategy. You will mentor and manage a talented SRE team, influence how reliability is integrated across the organization, and ensure our systems can safely scale in response to increasing customer demands and complexity.If you thrive in technically demanding environments and are passionate about fostering strong teams, a healthy workplace culture, and effective cross-functional collaboration, this position is designed for you.
Join our innovative team at Newton as a Site Reliability Engineer, where you'll play a crucial role in ensuring the reliability and performance of our systems. In this fully remote position, you will collaborate with engineering and operations teams to develop solutions that enhance system uptime and efficiency.Your expertise will help us transition and maintain our infrastructure, ensuring our services are resilient and scalable. This is an exciting opportunity to contribute to a company that values innovation and teamwork.
At Movable Ink, we empower marketers with cutting-edge content personalization through data-driven content creation and AI-driven decision-making. Our innovative platform is trusted by top global brands to enhance revenue, streamline workflows, and increase marketing agility. With our headquarters in New York City and a talented team of nearly 600 employees, Movable Ink has a presence across North America, Central America, Europe, Australia, and Japan.As a Lead Site Reliability Engineer, you will leverage your technical expertise and leadership skills to oversee infrastructure and software development initiatives. You will play a pivotal role in designing and evolving key systems within our multi-cloud, multi-region content serving platform, which handles over 25 billion requests daily. By fostering architectural vision, cross-team collaboration, and mentorship, you will spearhead reliability initiatives and define the technical strategies necessary for scaling our platform to accommodate 50 billion requests per day and beyond.
Momentum Financial Services Group (MFSG) is the company behind Money Mart, Canada’s largest non-bank branch network. With over four decades of experience, MFSG delivers financial solutions for underserved communities, including short-term loans, money transfers, and prepaid cards. Each year, millions of customers rely on these services for timely financial support. Role Overview: Site Reliability Engineer The Site Reliability Engineer plays a key role in keeping MFSG’s digital banking and financial services platforms available, responsive, and resilient. This position centers on automating operational tasks, setting and maintaining service-level objectives, and engineering systems to withstand and recover from failures. Daily work involves close collaboration with engineering, DevOps, QA, cybersecurity, and compliance teams to ensure platform reliability meets both technical and regulatory requirements. The role also emphasizes proactive monitoring, incident response, and ongoing improvements to the software delivery process to reduce production risk. Why Join Momentum Financial Services Group? Competitive compensation that reflects experience and current market rates Annual bonus based on individual and company achievements Comprehensive benefits including health and dental coverage with premiums fully paid, plus Employee Assistance Program access Retirement planning support to help prepare for the future Hybrid work model offering flexibility between remote work and in-office collaboration at the Toronto headquarters Employee perks such as tuition reimbursement, professional development, Perkopolis discounts, and recognition programs Location Toronto, Canada (hybrid work model)
Welcome to OktaAt Okta, we are redefining identity management. We empower individuals to securely access any technology, from any device or application, fostering a transformative approach to business security and growth. Our innovative solutions, including the Okta Platform and Auth0 Platform, prioritize identity at the heart of operational success.We value diverse perspectives and experiences, seeking lifelong learners who contribute to our dynamic culture.Join us as we shape a future where identity is truly in your hands.Are you driven to tackle complex data challenges and make a significant impact? Do you want to collaborate with a passionate team of cloud engineers and architects? If yes, we want to hear from you!The Auth0 platform manages over 100 million logins daily for clients worldwide and is rapidly expanding. As part of the Data Platform team, you will be instrumental in developing and managing essential data services that enable scalability, reliability, efficiency, and operational excellence. In your role as Senior Manager, you will collaborate with engineers across departments, guide the platform roadmap, and establish the foundational infrastructure for Auth0's future growth.As a leader, your passion for developing high-performing teams and your ability to coordinate across organizations will make you an ideal fit for this position!Your Responsibilities Include:Leading a diverse, agile software development team focused on delivering value with expertise in distributed systems, cloud infrastructure, and site reliability engineering.Fostering a culture of discovery, learning, and experimentation within a geographically distributed team through continuous coaching and mentoring.Collaborating closely with architects and engineers to design scalable, robust, and extensible services using modern technologies such as Go, Node.js, Kubernetes, Docker, AWS, and Azure.Building and managing data streaming teams utilizing event-driven architecture and Kafka.Partnering with product management and engineering leadership to define a platform roadmap that supports the next generation of identity products, overseeing planning, execution, and delivery of data platform services.Implementing process improvements to drive operational excellence and efficiency during a period of significant growth.
Join Waabi, a pioneering force in Physical AI, founded by the visionary Raquel Urtasun. We are at the forefront of revolutionizing autonomous transportation, developing cutting-edge technology that powers commercial autonomous trucks and robotaxis. With esteemed partnerships in AI, automotive, logistics, and deep tech, we are shaping the future of transportation.Located in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is rapidly expanding and seeking diverse, innovative, and collaborative individuals eager to make a positive impact on the world. Discover more about our journey at: www.waabi.aiYour Role:- Collaborate within a multidisciplinary team of Engineers and Researchers, utilizing an AI-first approach to ensure safe self-driving technology is deployed at scale.- Develop robust and scalable tools and frameworks that support Autonomous Vehicle (AV) development.- Lead technical discussions and architectural planning in collaboration with both Researchers and Engineers.- Mentor fellow software engineers through code reviews, design discussions, and by sharing best practices in software development.- Assist with task planning and estimation to enhance project efficiency.
Join Our Team! The Loyalty & Virality tribe is integral to HelloFresh's strategy for customer retention and organic growth. We develop innovative products and systems that engage customers, reward loyalty, and encourage them to share their HelloFresh experiences with others. Our team is responsible for loyalty and rewards programs, referral systems, viral growth mechanics, and personalized re-engagement experiences, functioning at a global scale across various brands and markets. We combine profound product insight with data-driven experimentation, leveraging AI to create smarter, more personalized customer experiences. If you're enthusiastic about building systems that transform satisfied customers into loyal advocates, this is the ideal team for you! Explore more about our initiatives by visiting our tech blog. Key Responsibilities: Act as a senior technical authority within the Loyalty & Virality tribe, establishing architectural direction, driving technical excellence, and mentoring fellow engineers. This influential individual contributor role significantly impacts the global scaling of our referral and virality systems. Define and oversee the technical strategy for our referral and virality platform, ensuring that our systems remain flexible, resilient, and scalable across all HelloFresh brands and markets. Take complete ownership of the architecture, design, development, deployment, and operations of the Share squad’s microservices. Promote the integration of AI and machine learning within the squads, identifying opportunities for enhancing personalization, predictive engagement, and intelligent reward optimization. Collaborate actively as a solution-oriented member of an autonomous, cross-functional agile team, working closely with Product Owners, Engineers, Designers, and Business Intelligence teams. Facilitate effective communication across the engineering organization, ensuring technical alignment among diverse stakeholders and colleagues. Perform additional duties as assigned. Qualifications: Extensive experience with microservice architecture and event-driven systems (e.g., Kafka, RabbitMQ), alongside distributed architecture principles. Strong proficiency in programming languages and frameworks pertinent to our tech stack. Demonstrated ability to mentor and lead technical discussions, fostering a culture of excellence and continuous improvement. Excellent communication skills, with the ability to articulate complex technical concepts to a non-technical audience. A passion for leveraging data and technology to enhance customer experiences.
About FaireFaire is a dynamic online wholesale marketplace driven by the belief that the future lies in local commerce. Independent retailers worldwide are achieving greater revenue than giants like Walmart and Amazon combined, yet remain relatively small in stature. At Faire, we harness the power of technology, data, and machine learning to connect this vibrant community of entrepreneurs around the globe. Imagine your favorite local boutique—we empower them to discover the finest products globally to stock their shelves. With the right tools and insights, we aim to level the playing field, enabling small businesses everywhere to compete against large box stores and e-commerce behemoths.By championing the growth of independent enterprises, Faire fosters positive economic impacts within local communities on a global scale. We are on the lookout for intelligent, resourceful, and passionate individuals to join us as we drive the shop-local movement. If you believe in the power of community, we invite you to be a part of ours.Role Description:Our Engineering organization is the backbone of our marketplace, responsible for the software that enables it to function seamlessly. The Product Security team empowers product engineering teams to create and deploy secure software solutions. We prioritize best engineering practices, striving to deliver software that is secure, thoroughly tested, easy to maintain, and capable of scaling to millions of users. We develop scalable, reusable frameworks, consult with product teams, leverage data-driven insights, and continually iterate on our practices.As a Senior Staff Software Engineer in Product Security, you will take on the role of technical lead for the Product Security domain. You will establish the long-term technical vision for integrating security within Faire’s application framework. Collaborating closely with Platform and Product Engineering teams, you will identify and mitigate security vulnerabilities, spearhead significant security initiatives, and mentor engineers across the organization to enhance secure engineering practices.Additionally, you will lead cross-functional programs to embed security deeply within our architecture, pipelines, and developer experience, effectively minimizing risk while maintaining development velocity.In this role, you will:Define the long-term technical strategy for application security at Faire, establishing scalable and developer-friendly frameworks and principles that facilitate secure development across all product areas.
ABOUT QUINCEEstablished in 2018, Quince is revolutionizing the retail landscape by demonstrating that high-quality goods can be affordably priced. Our mission is straightforward: to provide premium essentials at accessible costs, produced ethically and sustainably. We believe everyone deserves exceptional craftsmanship and timeless design without the inflated prices typically associated with luxury. Quince operates on a direct-to-consumer (DTC) model that eliminates intermediaries, utilizing just-in-time manufacturing to reduce waste and enhance value.Quince is a tech-driven company that is transforming the retail sector by integrating AI, analytics, and automation into our core operations. Our steadfast dedication to excellence and adherence to our company values shape our decisions and actions:Customer First: We prioritize customer satisfaction in every decision.High Quality: True quality means premium materials and rigorous production standards you can feel good about.Essential Design: We focus on timeless, functional essentials instead of chasing trends.Always a Better Deal: Innovation and transparency ensure value for both customers and partners.Social & Environmental Responsibility: We commit to sustainable materials, ethical production, and fair wages.Quince collaborates with top-tier manufacturers worldwide, serving millions of satisfied customers. Backed by strong investors and a commitment to sustainable growth, we are rapidly expanding while upholding our focus on quality, simplicity, and radical price transparency.JOIN OUR TEAM AND BE PART OF OUR SUCCESS
Findem is looking for a Senior Staff Full Stack Engineer based in Toronto or working remotely. This role centers on building and improving the company’s platform through hands-on development and close collaboration with colleagues from different disciplines. Role overview This position involves designing, developing, and refining features that support Findem’s products. The Senior Staff Full Stack Engineer will work with cross-functional teams to deliver solutions that drive the platform forward. What you will do Develop and enhance platform features using full stack technologies Collaborate with product, design, and engineering teams to deliver impactful updates Apply deep experience in full stack development to solve technical challenges Requirements Extensive background in full stack engineering Experience working with cross-functional teams Ability to work remotely from Toronto or elsewhere
Apr 28, 2026
Sign in to browse more jobs
Create account — see all 1,292 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.