Position has been filled
company

Freelance AI Evaluation Engineer

toloka-aiRemote — Lyon, Auvergne-Rhône-Alpes, France
Remote Contract $50/hr - $50/hr
Position filled

Experience Level

Experience

Qualifications

This role is ideal for seasoned developers, software engineers, or test automation specialists seeking part-time, non-permanent projects. We look for individuals with:A degree in Computer Science, Software Engineering, or a related discipline. Over 5 years of experience in software development, primarily using Python (including FastAPI, pytest, async/await, subprocess, file operations). A strong background in full-stack development, with proficiency in building React-based interfaces (JavaScript/TypeScript) and robust back-end systems. Experience in writing tests (both functional and integration—not merely executing them). Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis). Understanding of CI/CD processes (GitHub Actions as a user: triggers, labels, result interpretation). English proficiency at a conversational level or higher.

About the role

Please submit your CV in English and indicate your English proficiency level.

toloka-ai, working with Mindrift, offers freelance, project-based roles that connect experienced professionals to AI-driven projects for major technology companies. This contract position is not permanent employment.

Role overview

This freelance AI Evaluation Engineer role centers on building datasets to assess AI-powered coding agents in realistic software development scenarios. The work involves designing complex tasks and assessment criteria that reflect actual development workflows, all within simulated environments.

Main responsibilities

  • Create virtual companies using a defined strategy. Develop a codebase, infrastructure, and contextual materials, such as documentation, conversations, and tickets, that mirror real development histories.
  • Design and configure tasks at various stages of the virtual company. Write prompts, set evaluation standards, and ensure tasks are solvable and fairly assessed.
  • Set up tasks in isolated environments that simulate a developer's workstation. These environments include a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation), and a real web application codebase.
  • Develop tests that reliably accept all correct solutions and reject incorrect ones, balancing strictness with flexibility.
  • Collaborate with an AI agent to test and validate assessments, confirming the agent can spot genuine issues and does not overlook valid solutions.
  • Review code generated by AI agents, analyze both successes and failures, and create edge cases or challenging scenarios to further evaluate capabilities.
  • Incorporate feedback from expert QA reviewers to refine tasks and assessments, aligning with quality benchmarks.

What this role is not

  • This is not a data labeling job.
  • This is not a prompt engineering position.
  • This does not require writing code from scratch. The AI agent handles most coding tasks; your focus is on guidance and evaluation.

Much of the work involves direct collaboration with advanced AI systems. Designing meaningful challenges for these models requires hands-on interaction with them.

About toloka-ai

Mindrift connects skilled professionals with project-based opportunities in artificial intelligence for leading tech firms, emphasizing the evaluation and enhancement of AI systems. We offer a dynamic environment for developers to engage with cutting-edge technologies and contribute to the advancement of AI capabilities.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.