About the job
About the Team You'll Join
-
The ML Platform Team Leader will oversee the pivotal ML Platform team responsible for the enterprise's AI/ML development and operational environment. This team comprises the 'ML Ops Part', which builds and operates a stable MLOps environment based on Kubernetes, and the 'LLM Part', which leads the development of cutting-edge LLM applications along with the necessary LLM Ops environment.
- Upon joining, you will oversee both Parts, establishing and executing a technical strategy that provides a scalable and efficient platform, allowing data scientists and ML engineers to focus solely on model development and experimentation. You will play a crucial role in presenting the vision for the ML Platform, fostering team growth, and solidifying the foundation of the company's AI technology capabilities.
Your Responsibilities
- Establish and lead the technical vision and long-term roadmap for the ML Platform team (ML Ops Part, LLM Part).
- Mentor team members, conduct code reviews, and manage performance to drive team growth.
- Design and oversee the operation of a scalable and reliable ML Platform (model training, deployment, serving, monitoring) based on Kubernetes.
- Lead the technical development of various LLM applications and establish the LLM Ops environment for fine-tuning, serving, and evaluating large-scale models.
- Collaborate closely with cross-functional teams, including data scientists, ML engineers, and product teams, to resolve bottlenecks in AI/ML development and maximize developer experience (DX).
- Continuously research the latest MLOps and LLM technology trends, design the architecture of the enterprise AI tech stack, and provide technical direction.
- Define and continuously improve metrics for platform stability, cost efficiency, and performance optimization.
Ideal Candidate
- We are looking for someone with proven experience in leading engineering teams (ML, Infrastructure, Platform, etc.), successfully presenting technical visions, establishing roadmaps, and nurturing team members.
- A deep practical experience in building and operating MLOps platforms (e.g., Kubeflow, MLflow, CI/CD, model serving) in a Kubernetes environment is essential.
- A strong understanding and practical experience in developing LLM applications (RAG, fine-tuning, agents, etc.) or LLM Ops (large-scale model serving, Vector DB, evaluation pipelines) is required.
- Experience in designing and optimizing system architecture capable of handling large-scale traffic and data is preferred.
- You should be able to define complex technical problems, communicate clearly with various stakeholders, and solve issues strategically.
- A passion for designing and advancing the platform while considering business impact and developer experience (DX) beyond mere technical adoption is essential.
Resume Recommendations
- Detail your contributions to leading ML platform or system projects (technical leadership, architecture design, team management, strategic planning).
- Clearly outline the flow of addressing technical/organizational challenges for each project, including your proposed solutions (architecture, adopted technologies), collaboration processes with team members/stakeholders, and final results (platform performance improvements, acceleration in development speed, cost reductions).
- Instead of simply listing tasks, include insights and learnings gained as a technical leader, along with the criteria for making technical trade-offs.
- Emphasize quantifiable achievements that you have led.
- If you have experience successfully building/operating large-scale systems through collaboration with multiple engineering teams to resolve complex technical problems, please share that.
Journey to Joining Toss Bank
- Application submission > Job interview > Cultural fit interview > Leadership interview > Reference check > Compensation negotiation > Final acceptance
Please Note
- Any false information found in the resume or submitted documents may lead to disqualification.
