About the job
Please submit your CV in English and indicate your level of English proficiency.
At Mindrift, we specialize in connecting talented professionals with innovative, project-based AI opportunities from leading tech companies, focusing on the evaluation, testing, and enhancement of AI systems. This role is project-based and operates under a freelance agreement, which does not establish an employment relationship with Toloka or our clients.
Role Overview
As a Freelance Agent Evaluation Engineer, you will design intricate coding test cases that challenge AI coding systems to their maximum potential:
- Critically assess and enhance realistic coding tasks grounded in actual production codebases, considering realistic parameters and information sources.
- Develop comprehensive functional tests that validate true end-to-end functionality and edge cases, moving beyond basic checks.
- Design “fair but challenging” problems where the AI has all necessary context, yet must diligently piece information from various files and external sources, requiring complex reasoning.
- Evaluate AI failures to discern the model's strengths and weaknesses.
- Iterate on your work based on evaluations from expert QA reviewers who assess your contributions against seven quality criteria.
Qualifications
This role suits experienced developers, software engineers, and test automation specialists who are open to part-time, non-permanent projects. The ideal candidates will possess:
- A degree in Computer Science, Software Engineering, or a related discipline.
- 5+ years of experience in software development, predominantly in Python (including pytest, async/await, subprocess, and file operations).
- A strong background in Full-Stack development, with equal expertise in developing React-based interfaces and robust back-end systems.
- Proficiency in writing tests (functional and integration), not just executing them.
- Experience with Docker containers (for running evaluations locally).
- Understanding of CI/CD processes, particularly with GitHub Actions (triggers, labels, and result analysis).
- English proficiency at B2 level or higher.
Application Process
To apply, submit your application → complete qualification assessments → select a project → manage tasks at your convenience within project deadlines → receive payment for your contributions.
Project Time Expectations
Estimated project tasks will take around 20 hours to complete, depending on complexity. This is an estimate; you determine your schedule. All tasks must be submitted by the deadlines and meet acceptance criteria for approval.
Compensation
- Freelance contributions compensated, with rates potentially reaching up to $80/hour* (project and task-based).
- Compensation may be fixed per project or vary by task.
- Some projects may offer incentive payments.
*Note: Rates may vary based on expertise, skills assessment, location, and experience.

