About the job
About Ethos
Ethos is dedicated to revolutionizing the life insurance industry, making it faster and easier for families to secure coverage. By integrating industry knowledge with cutting-edge technology and a personal touch, we help individuals find the perfect policy to safeguard their loved ones.
Our innovative use of deep technology and data science streamlines the life insurance process, transforming what was once a lengthy, multi-week endeavor into a seamless digital experience that can be completed in mere minutes! Each month, we issue billions in coverage while dismantling traditional barriers, propelling the industry into a new age. Our comprehensive technology platform serves as the foundation for family financial health.
At Ethos, we strive to make life insurance more accessible, efficient, and beneficial for everyone.
Our esteemed investors include General Catalyst, Sequoia Capital, Accel Partners, Google Ventures, and SoftBank, as well as notable figures such as Jay-Z, Kevin Durant, and Robert Downey Jr. This year, we've been recognized in CB Insights' Global Insurtech 50 list and BuiltIn's Top 100 Midsize Companies in San Francisco. We are rapidly scaling our operations and are eager to welcome passionate individuals to help protect the next million families!
About the Role
We are developing several LLM-powered copilots across essential workflows, including underwriting productivity, agent enablement, customer support, operations/compliance, and fraud detection. We are seeking an AI engineer to take ownership of the LLM + retrieval + context layer, ensuring that these copilots are accurate, auditable, fast, and cost-efficient.
Typical technology stack includes: Python/FastAPI, Postgres + vector (pgvector/Pinecone/Weaviate), OpenSearch, optional graph DB, Kubernetes + GPUs, OTEL/Datadog.
Duties and Responsibilities:
- Production RAG: indexing, retrieval, hybrid search, reranking, query rewriting, grounding, citations
- Context Graph: entity resolution + linking + provenance; graph + vector retrieval; supports multi-hop context
- LLM orchestration: tool/function calling, structured outputs, routing across model tiers, failure modes
- GPU/inference cost optimization: batching, caching/KV reuse, quantization, autoscaling; optimize $/session + latency
- Safety + compliance: PII/PHI handling, redaction, audit logs, deterministic replay, hallucination mitigation
- LLMOps: evaluation harness (golden ...

