About the job
Join DigitalOcean and take your career to new heights while collaborating with a vibrant community of exceptional talent dedicated to simplifying the cloud experience. If you possess a growth mindset, are inspired by bold thinking, and thrive in a dynamic environment where innovation disrupts the status quo, you’ll find a welcoming home here. We cherish success as a team and are committed to learning, enjoying our work, and significantly impacting the aspirations of visionaries and creators worldwide.
Scope and Mission:
- Lead the design and operation of the Gradient AI platform, ensuring a seamless and innovative agent development process characterized by top-tier scalability, performance, and reliability.
- Influence architectural vision, uphold technical excellence, and foster innovation across backend systems and user-facing interactions.
Key Responsibilities:
- Architect and Build
- Shape and advance the architecture for our agent development experience, including code integration, evaluations, observability, tools, and cross-agent collaborations.
- Drive projects aimed at creating an architecture that excels in scalability, reliability, and cost efficiency.
- Oversee and enhance our benchmarking system to continuously elevate our platform experience.
- Lead the rollout of new services, taking on a hands-on role as necessary to achieve timely delivery.
- Technical Leadership
- Establish and maintain technical standards, coding practices, tooling, and infrastructure guidelines across AI/ML engineering teams.
- Promote best practices for design, testing, deployment, instrumentation, and performance optimization.
- Mentor senior engineers, cultivating a culture of architectural integrity and operational excellence within the team.
- Cross-functional Collaboration
- Collaborate with product managers, stakeholders, and business leaders to translate strategic goals into scalable technical roadmaps.
- Assist customer-facing teams (e.g., consultants, support, sales engineers) in shaping AI modernization initiatives through agents.
- Reliability, Performance & Scaling
- Lead operations excellence for our agent development platform, establishing scalable mechanisms and processes throughout the engineering organization.

