About the job
At BetterUp, we believe in the power of human transformation, and our approach to the employer-employee relationship reflects that belief.
From the moment you engage with us, you will notice a distinct experience. It's not just about filling a position; it's about joining a mission-driven team.
Upon accepting an offer, you gain more than just a paycheck, you will receive a dedicated BetterUp Coach, a personalized development plan, and a supportive manager. You'll also be part of an extraordinary team, each member accompanied by their own BetterUp Coach, working on projects that make a real impact.
This unique environment fosters a focused and fulfilling work experience. While it may not be for everyone, for those who are passionate and driven, this role represents a transformative career opportunity.
Join us for an intense and rewarding journey, where you'll engage in meaningful work within a vibrant and creative culture.
If this resonates with you and the job description aligns with your skills, let’s start a conversation.
As a hybrid company, we emphasize in-person collaboration when necessary. Employees must be available to work from one of our office hubs a minimum of two days per week, totaling eight days per month. Our US hubs include: Austin, TX; Chicago, IL; New York City, NY; San Francisco, CA; and the Washington, DC metro area. For roles based in Europe, our hubs are located in London, UK, and Amsterdam, NL. Please ensure you can commit to this structure before applying.
Key Responsibilities:
Utilize AI-driven tools and automation to enhance monitoring, troubleshooting, and maintenance of production systems.
Develop and manage cloud infrastructure on AWS, employing Terraform for codifying and version-controlling our environments.
Oversee and scale Kubernetes clusters that support BetterUp's platform, ensuring optimal availability and performance.
Create intelligent alerting and observability frameworks.
Collaborate with engineering teams to integrate reliability into the development lifecycle, proactively addressing operational concerns.
Automate incident response processes and establish self-healing infrastructure.
Explore and implement cutting-edge AI tools for log analysis, anomaly detection, and predictive maintenance.
