Electrum logoElectrum logo

Intermediate Software Engineer - Reliability/SRE

ElectrumCape Town, Western Cape, South Africa
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Mid to Senior

Qualifications

Qualifications:Proficiency in software engineering principles and practices. Experience with reliability engineering and SRE principles. Strong understanding of observability tools and frameworks. Familiarity with CI/CD processes and deployment automation. Excellent problem-solving skills and an analytical mindset. Ability to work collaboratively in cross-functional teams.

About the job

Electrum is a pioneering payment software technology firm.

Since our inception in 2012, we have consistently provided trusted, enterprise-grade, cloud-native solutions to enhance financial transaction processing. Our extensive expertise has positioned us as a reputable partner in high-volume, low-value payment schemes, enabling our clients to deliver services to millions of South Africans every day.

At Electrum, our mission is driven by impact – we prioritize designing solutions that matter, acting with urgency, and fostering continuous learning as we scale. We stand by the principle of collaboration – working closely with our clients and teams to create meaningful, sustainable solutions. Safety is paramount; we promote open communication, smart risk-taking, and trust, ensuring that creativity and alignment can flourish. We believe in empowering strong teams – we hire exceptional talent, collaborate vigorously, and hold one another to high standards while leading with empathy and kindness.

The Role

As a Core Reliability Engineer, you will be at the forefront, acting as a central software team enabler. Your responsibilities will include defining standards, implementing observability tools, and establishing automation frameworks that empower our product teams to independently manage their service health.

In our unique FinTech environment, reliability transcends mere server uptime; it encompasses the processing of high-volume, impactful financial transactions where even a single dropped message can have significant real-world implications. We seek an innovative systems thinker eager to tackle challenging industry problems, architect solutions for scalability while ensuring reliability, and help us set new benchmarks for reliability in payments.

Your primary objective will be to ensure that building reliable software is straightforward, and to be alerted before our clients notice any failures.

Responsibilities

Enablement & RelOps Culture

  • Implement the Observability Ladder: Guide teams from basic monitoring to advanced metric tracking. Collaborate with product teams to define SLAs, SLIs, and SLOs, while creating dashboards that monitor error budgets effectively.
  • Empower Product Teams: Develop frameworks and deployment tools (e.g., CI/CD, internal tool integrations) that enable teams to make informed, data-driven decisions regarding deployment safety, and automate rollbacks when error budgets are exceeded.
  • Champion Reliability: Foster a blameless post-mortem culture focused on actionable insights, system enhancements, and quantifiable metrics (MTBF, MTTR).

Frameworks & Automation

  • Standardised Alerting & On-Call: Continuously refine our company-wide alerting and on-call frameworks to minimize alert fatigue and ensure clarity when alerts are triggered.

About Electrum

Electrum is a forward-thinking payment software technology company that has established itself as a trusted partner in the financial sector since 2012. We focus on delivering high-quality, cloud-native solutions that optimize transaction processes for millions of users daily.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.