About the role
At Braze, we pride ourselves on our exceptional team—a group that is approachable, kind, and profoundly passionate about our work.
We strive to ignite that passion through high standards, collaborative teamwork, and a commitment to work-life balance as we navigate rapid global growth while promoting equity and opportunity within and outside our organization.
To thrive here, you need to hold yourself and your colleagues to high standards. Everyone has an opportunity to contribute: exercising autonomy, accountability, and openness to new perspectives are crucial to our ongoing success.
Our deep curiosity and willingness to share diverse interests infuse our culture with vibrancy and balance.
If you are eager to tackle stimulating challenges and possess a proactive mindset in the face of change, you'll have the chance to make a significant impact here, supported by a talented and passionate team. If Braze resonates with you, we look forward to meeting you.
WHAT YOU'LL DO
As a Platform Software Engineer (PSWE), you will design and build distributed systems that power Braze's extensive background processing platform. We manage Sidekiq at Braze—handling over a trillion jobs daily across Kubernetes clusters globally. Your work will encompass autoscaling systems, metrics pipelines, reliable job execution, and internal frameworks that enhance the safety of distributed processing for application teams.
Our operations are vast: we cater to 3.3 billion monthly active users, process hundreds of billions of data points monthly, and send billions of messages every day. Our tech stack includes Ruby on Rails, Go, MongoDB, Redis, and Kafka. As a PSWE, you'll collaborate with application teams to advance the Sidekiq platform they rely on, enhancing reliability, performance, and developer experience.
Main responsibilities:
- Develop Braze’s embedded frameworks for large-scale distributed processing.
- Design, build, and operate internal software frameworks that facilitate Braze’s asynchronous and background processing systems on a massive scale.
- Evolve and extend frameworks based on technologies like Sidekiq to reliably execute over a trillion jobs per day across a globally distributed platform.
- Oversee scaling behavior, reliability guarantees, failure modes, and operational safety of these systems.
- Provide opinionated abstractions, tooling, and guardrails that empower application teams to utilize distributed processing safely without managing underlying complexities.
- Enhance observability, debuggability, and maintainability of our systems.

