About the job
At Affirm, we are redefining the credit landscape to foster a more transparent and user-friendly experience, allowing consumers to purchase now and pay later without hidden fees or compounding interest.
The Site Reliability Engineering (SRE) team at Affirm plays a vital role in collaborating with our engineering partners to ensure exceptional operational standards, safeguarding the experience of our customers. Our SRE team achieves this by establishing frameworks and best practices for application operations, developing tools, and offering training and consulting services. Key responsibilities of the SRE team include:
- Providing teams and leadership with data and insights on application performance
- Guiding the establishment of Service Level Objectives (SLOs)
- Managing the Incident Management and Analysis process
- Overseeing Change Management and Deployment practices
- Participating in service and architectural discussions
- Recommending observability and alerting settings
The SRE group is enriched by diverse expertise across various domains, including:
- Infrastructure, platform, and distributed systems
- Capacity management, load testing, and chaos engineering
- Automation, observability, and configuration management
- Development and product experience
We are looking for driven software and systems engineers who can build, iterate, and enhance incident lifecycle, reliability, and resilience practices throughout Affirm's engineering organization and beyond.
