companyToss Securities logo

Machine Learning Engineer - Platform

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

Proficient in programming languages such as Python, Go, Java, or Kotlin. Experience in designing and developing production-level API servers. Familiarity with API Gateways and LLM Routers. Experience in high-traffic processing and incident management. Background in working with Kafka, Elasticsearch, and Kibana. Knowledge of model monitoring and dashboard creation. Hands-on experience with KServe, BentoML, vLLM, and SGLang. Experience in MLOps and troubleshooting.

About the job

About the Team You'll Join

  • The ML Engineer (Platform) at Toss Securities is part of the ML Platform Team within the Product Division.
  • The mission of the ML Platform Team is to create an optimal machine learning platform that enables efficient and stable development and operation of various AI/ML services at Toss Securities.

 

Your Responsibilities Upon Joining

Develop and enhance the Gateway system, the gateway for ML services.

  • Develop and operate a Gateway system based on FastAPI that handles enterprise-level LLM API requests.
  • Design and implement authentication, routing, traffic control, fault isolation (Circuit Breaker, Fallback), large-scale TPS processing, and load balancing strategies from both application and infrastructure perspectives in the FastAPI-based Gateway application.

Manage and serve ML services.

  • Directly operate a machine learning model serving system in a Kubernetes environment.
  • Design and improve the LLM serving architecture to operate stably under large traffic conditions.
  • Monitor latency, error rates, resource usage, and analyze and resolve operational issues for the models in service.
  • Identify root causes of failures and implement structural improvements, including operational policies and architecture.


Develop and manage a common ML platform for the company.

  • Develop and manage a common platform for efficiently operating the training and serving of internal ML/LLM models based on Kubeflow.
  • Continuously monitor and optimize the performance and resources of workloads executed on the platform.

Build infrastructure for LLM-based services.

  • Operate LLM services using various serving frameworks such as vLLM, SGLang, and Triton.
  • Manage the environment to ensure stable operation of training and serving workloads on high-performance GPU clusters like H100/B300.
  • Build and operate a large-scale data training environment for finance domain-specific LLMs.

 

We Are Looking for Candidates Who:

  • Are proficient in one or more programming languages such as Python, Go, Java, or Kotlin, and have experience designing and developing API servers in production environments.
  • Have experience developing or operating API Gateways (Nginx, Kong, etc.) or LLM Routers (LiteLLM, Envoy AI Gateway, etc.), with a background in handling high-volume traffic and incident response.
  • Have experience operating serving logs and event pipelines integrated with Kafka, Elasticsearch, and Kibana.
  • Have experience defining monitoring metrics for model serving and configuring and operating dashboards using Prometheus and Grafana.
  • Have experience operating ML/LLM model serving using KServe, BentoML, vLLM, SGLang, etc.
  • Have experience directly managing MLOps components (Kubeflow, KServe, Airflow, Argo CD, MLflow, etc.) in Kubernetes environments and debugging and resolving issues.
  • Can design and apply long-term improvement plans through root cause analysis beyond short-term responses to issues that arise during service operations.

 

Additional Preferred Experience:

  • Experience in related fields or technologies will be a plus.

About Toss Securities

Toss Securities is a leading financial technology company in South Korea, focusing on innovative solutions in the fintech space. Our teams are dedicated to developing cutting-edge AI and machine learning technologies to enhance financial services for our users.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.