About the job
Rithum™ is recognized as the leading commerce network globally, streamlining collaboration between brands, suppliers, and retailers to provide exceptional e-commerce experiences. Our unparalleled platform empowers brands and retailers to boost growth, refine channel operations, expand product ranges, and enhance margins.
With over 40,000 businesses leveraging Rithum to elevate their operations across numerous channels, generating more than $50 billion in annual GMV, we equip our clients with commerce, marketing, and delivery solutions that facilitate optimized consumer shopping journeys from start to finish.
Overview
The Database Reliability Engineering (DBRE) team at Rithum is dedicated to ensuring the availability, reliability, and observability of all database systems. Our team emphasizes automation to minimize manual tasks and constantly seeks to enhance our processes. We manage and optimize a large-scale SQL Server environment that spans hundreds of instances across a hybrid infrastructure (on-prem VMware and AWS), in addition to various relational and NoSQL database platforms including MongoDB, DynamoDB, Elasticsearch, MySQL, Postgres, and Redis. These systems are integral to all business operations. Our DBRE team is built on a strong foundation of curiosity, transparency, collaboration, and a commitment to continuous learning.
As a Senior Database Reliability Engineer, you will be expected to embody these values and promote them among your colleagues. You will manage a diverse array of database systems and spearhead your own projects with a highly technical focus.
Responsibilities
- Guarantee the maximum availability and reliability of mission-critical database systems across hybrid infrastructure.
- Design, implement, and maintain SQL Server Always-On Availability Groups, clustering, and replication topologies, while continually enhancing the observability of all database systems.
- Lead significant database upgrade initiatives and modernization efforts, providing support to other engineers and teams in utilizing database systems.
- Consistently improve observability through telemetry, performance analysis, and proactive monitoring.
- Enhance operational processes through automation, utilizing PowerShell, Python, and CI/CD tools.
- Ensure the security and protection of all data.
- Participate in our on-call rotation.
- Diagnose and fine-tune high-load production systems, including complex performance and replication challenges.
- Lead technical responses during high-severity incidents.

