About the job
At Crusoe, we are on a mission to accelerate the abundance of energy and intelligence, creating the driving force behind a world where individuals can ambitiously innovate with AI without compromising on scale, speed, or sustainability.
Join us in the AI revolution powered by sustainable technology at Crusoe. Here, you will foster meaningful innovation, make a significant impact, and be part of a team that leads the way in responsible and transformative cloud infrastructure.
Role Overview:
As a Senior Cloud Support Engineer, you will be instrumental in the transformation of high-performance computing through the provision of sustainable and cost-effective GPU compute power. Your role will empower our customers to harness this technology for pioneering developments in areas such as AI/ML, physics simulations, and computational biology. Acting as the primary technical support contact, you will ensure that our customers can effortlessly utilize Crusoe Cloud to reach their objectives. This position is vital to Crusoe's mission, facilitating our customers' research and development efforts and contributing to a sustainable future. You will engage in exciting projects, collaborate with a talented team, and tackle complex challenges using cutting-edge technologies. We are seeking a highly motivated and experienced technical professional with a strong commitment to customer success, a comprehensive understanding of cloud technologies, and alignment with Crusoe's core values. This is a full-time position.
Key Responsibilities:
Customer Support: Deliver outstanding technical support to customers via Zendesk, adhering to SLAs and maintaining a high customer satisfaction score (CSAT of 95% or greater).
On-Call Rotation: Participate in a 24/7 on-call rotation to promptly address critical issues.
Troubleshooting: Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools.
Alert Management: Oversee alert triage, prepare for maintenance windows, and conduct node delivery testing.
Collaboration: Collaborate closely with SRE, Networking, and Storage teams from initial triage through root cause analysis (RCA) delivery.
Global Collaboration: Follow established global team collaboration and handoff procedures for ticketing and on-call management.
Knowledge Development: Create onboarding materials, knowledge base documentation, and standard operating procedures (SOPs).

