company

Freelance AI Agent Evaluation Engineer

MindriftRemote — Hyderabad, Telangana, India
Remote Part-time $12/hr - $12/hr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

What We Look ForThis opportunity is ideal for seasoned developers, software engineers, and/or test automation specialists seeking part-time, non-permanent projects. The ideal candidate will possess:A degree in Computer Science, Software Engineering, or a related field. Over 5 years of experience in software development, predominantly using Python (FastAPI, pytest, async/await, subprocess, file operations). A background in full-stack development, particularly in creating React-based interfaces (JavaScript/TypeScript) and robust backend systems. Experience in writing various tests (functional, integration—not just executing them). Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis). An understanding of CI/CD processes (GitHub Actions as a user: triggers, labels, interpreting results). Proficiency in English.

About the job

Please submit your CV in English and include your English proficiency level.

Mindrift offers project-based freelance roles for specialists interested in AI system evaluation. This contract position focuses on assessing and improving AI coding agents for technology clients. The role is remote, with a preference for candidates based in Hyderabad, Telangana, India. Please note: this is a freelance contract, not a permanent position.

Role overview

The Freelance AI Agent Evaluation Engineer works on building datasets to measure how well AI coding agents handle realistic developer tasks. The position centers on creating and refining simulated development environments and evaluating model performance in those settings.

What you will do

  • Set up virtual companies using detailed plans, including codebases, infrastructure, and supporting materials (documentation, tickets, conversations) to mirror real-world development environments.
  • Design and adapt tasks as these virtual companies evolve: write prompts, define fair evaluation criteria, and ensure tasks are solvable and judged objectively.
  • Create assignments within isolated environments that simulate a developer’s workstation, including a Linux machine with development tools, MCP servers (repository, task tracker, messenger, documentation), and a real web application codebase.
  • Develop tests that accept all valid solutions and reject incorrect ones, ensuring the tests are neither too strict nor too lenient.
  • Collaborate with an AI agent to check that tests catch real issues, avoid missing faulty solutions, and do not penalize correct ones.
  • Review agent-generated code, analyze agent performance, and design edge cases and adversarial scenarios to further challenge the models.
  • Incorporate feedback from expert QA reviewers to refine and improve your work to meet quality standards.

What this role is not

  • This is not a data labeling position.
  • This is not prompt engineering.
  • You will not write code from scratch; the AI agent produces most of the code. The main focus is on guidance and evaluation.

Much of the work involves collaborating closely with AI systems. Creating tasks that challenge advanced models requires direct interaction with these agents.

About Mindrift

Mindrift specializes in connecting talent with innovative AI projects across the globe, providing opportunities for professionals to work alongside leading tech companies to enhance AI systems.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.