About the job
Full-time | Remote
About Fastino:
At Fastino, we are pioneering the future of Large Language Models (LLMs). Our team includes distinguished alumni from renowned institutions such as Google Research, Apple, Stanford, and Cambridge, united in our goal to create specialized and efficient AI solutions.
Our GLiNER family of open-source models has achieved over 5 million downloads and is currently utilized by industry giants like NVIDIA, Meta, and Airbnb.
We have successfully secured $25 million in seed funding, as highlighted by TechCrunch, with support from prominent investors like Microsoft, Khosla Ventures, Insight Partners, and the CEOs of Github and Docker.
Key Responsibilities:
Lead the innovation of high-performance agentic systems by designing and deploying solutions that utilize Fastino’s optimized model architectures to surpass conventional LLM benchmarks.
Collaborate closely with engineering teams to seamlessly transition research breakthroughs into scalable, low-latency applications for enterprise clients.
Engage in rapid prototyping of AI capabilities, continuously refining model accuracy and performance by analyzing real-world telemetry to align with developer standards.
Ensure the stability and efficiency of inference pipelines, proactively addressing scalability challenges to maintain consistent model performance under heavy operational demands.
Develop large-scale data strategies and fine-tuning methodologies to enhance the accuracy and domain-specific application of Fastino models.
Qualifications:
Minimum of 2 years of practical experience in AI/ML engineering roles.
Proven expertise with LLMs and a successful history of applying AI/ML methodologies to resolve complex, unstructured challenges.
Ability to work across the technology stack, from prompt engineering to Kubernetes deployment and API design.
Experience in building microservices that manage high-concurrency workloads is a plus.
Familiarity with GLiNER or similar information extraction architectures is preferred.

