About the job
Contribute to a Safer World.
At TRM Labs, we leverage blockchain analytics and artificial intelligence to empower law enforcement, national security agencies, financial institutions, and cryptocurrency enterprises in the fight against crypto-related fraud and financial crime. Our advanced blockchain intelligence and AI platforms are designed to trace transactions, identify illicit activities, build investigative cases, and establish a comprehensive view of potential threats. Trusted by leading organizations worldwide, TRM is committed to fostering a safer, more secure environment for everyone.
The AI Engineering Team is dedicated to driving the development of next-generation AI applications, specifically focusing on Large Language Models (LLMs) and agentic systems. Our mission is to create resilient pipelines, high-performance infrastructure, and operational tools that facilitate the swift, safe, and scalable deployment of AI systems.
We manage extensive petabyte-scale data pipelines, deliver model outputs with millisecond-level latency, and ensure observability and governance to make AI production-ready. Our team actively evaluates and integrates state-of-the-art tools in the LLM and agent domain, such as open-source stacks, vector databases, evaluation frameworks, and orchestration tools, which enhance TRM's ability to innovate more rapidly than the competition.
In the role of Senior MLOps Engineer specializing in LLMOps, you will play a pivotal role in constructing and scaling the technical infrastructure required for AI and ML systems. Responsibilities include:
Develop reusable CI/CD workflows for model training, evaluation, and deployment, incorporating tools like Langfuse, GitHub Actions, and experiment tracking.
Automate model versioning, approval processes, and compliance checks across various environments.
Construct a modular and scalable AI infrastructure stack, integrating vector databases, feature stores, model registries, and observability tools.
Collaborate with engineering and data science teams to integrate AI models and agents into real-time applications and workflows.
Regularly assess and incorporate cutting-edge AI tools (e.g., LangChain, LlamaIndex, vLLM, MLflow, BentoML, etc.).
Enhance AI reliability and governance, promoting experimentation while ensuring compliance, security, and system uptime.
Optimize AI/ML model performance by ensuring data accuracy, consistency, and reliability to improve training and inference processes.
Deploy infrastructure that supports both offline and online LLM evaluations.

