About the job
Join our innovative team at General Intelligence Company as an Applied AI Engineer, where you'll be at the forefront of enhancing our Cofounder agent technology. You'll take charge of critical backend systems and engage in applied large language model (LLM) projects, focusing on improving agent reliability and autonomy. Your role will encompass building evaluation pipelines and implementing techniques that significantly boost agent performance. This is a hands-on position that demands high ownership from research to production: prototyping, evaluating, and deploying enhancements that have a direct impact on user experience.
Your Responsibilities
Architect and execute comprehensive agent enhancements: including prompting strategies, tool selection, action planning, memory optimization, safety measures, and recovery solutions.
Develop robust evaluation frameworks for the agent: offline evaluations (golden tasks, regression tests, behavior assessments), online metrics (latency, success rates, failure modes, cost efficiency), and experimentation methods (A/B testing, canaries, guardrail thresholds).
Implement applied LLM methodologies: orchestrating function/tool calls, self-reflection, retrieval-augmented generation (RAG), multi-agent transitions, caching and embedding strategies, and reducing hallucinations.
Enhance core backend systems: ensuring reliable job orchestration, implementing retries/backoff strategies, supporting idempotency and auditability; managing scalable memory and context routing; optimizing data pipelines across platforms such as Gmail, Slack, Notion, Linear, Google Workspace, etc.; and improving observability and tracing for agent actions and outcomes.
Collaborate with product and infrastructure teams to define success metrics and deliver rapid, safe iterations.
Produce clean, thoroughly tested code; document design choices and operational guidelines.
Qualifications
A minimum of 4 years of backend engineering experience, ideally with Python (we prioritize impact over tenure).
Hands-on experience with LLMs: including prompt engineering, function calling, retrieval techniques, embeddings, and evaluation frameworks; demonstrated experience in deploying LLM features into production.
Proven track record in developing evaluation harnesses and leveraging them to drive enhancements (regression suites, task success metrics, cost/runtime trade-offs).
Strong understanding of distributed systems principles: concurrency, reliability, performance optimization, data modeling, and lifecycle management.
Practical approach to experimentation: from hypothesis formulation to prototyping, measuring improvements, and rollout.
Exceptional debugging and instrumentation skills; a passion for identifying and resolving edge cases in real-world scenarios.
Preferred Qualifications
Experience in working with other programming languages and technologies.
