About the job
At Judgment Labs, we are pioneering the development of Agent Behavior Monitoring (ABM) infrastructure. Unlike conventional observability that primarily focuses on logging exceptions and latency, our ABM technology highlights behavioral anomalies, such as instruction drifts and context retrieval loss, within scaled production environments.
Numerous teams engaged in building autonomous agents depend on Judgment to gain insight into their systems' performance post-deployment. Our proactive approach enables them to aggregate patterns across conversations and workflows, correlate regressions with specific interaction types, and accurately identify reliability breakdowns within their usage context.
Recently, we secured over $30 million in funding across two rounds within five months. Our esteemed investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, and notable figures like Chris Manning, Michael Ovitz, and Kevin Hartz.
Your Role:
As a Forward Deployed AI Engineer at Judgment Labs, you'll seamlessly integrate our ABM infrastructure into customer production systems. This hands-on role involves working directly within customer codebases to incorporate monitoring and evaluation into their real-world agent workflows, diagnosing failures in live environments, and ensuring successful deployments for reliable production use.
This position emphasizes deep technical execution combined with customer ownership. You will collaborate closely with customer teams to analyze agent behavior, convert high-level objectives into actionable ABM deployments, and take full responsibility for outcomes across live production settings. The scope and autonomy in this role offer a unique training ground for those aspiring to lead or found a technical company.
Your Responsibilities:
Deploy and embed Judgment Labs’ ABM platform and AI components into customer codebases and production AI systems.
Integrate monitoring, evaluation, and agent-facing components into real workflows within customer systems.
Assist customers in technical decision-making regarding agent monitoring and evaluation strategies, ensuring smooth integration with existing production systems.
Manage multiple customer engagements from start to finish, ensuring the successful integration and sustained adoption of monitoring and evaluation systems within production agent workflows.

