company

Freelance AI Evaluation Engineer

toloka-aiRemote — Queensland, Australia
Remote Contract A$45/hr - A$45/hr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

Ideal Candidate ProfileThis opportunity is well-suited for seasoned developers, software engineers, and/or test automation specialists seeking part-time, non-permanent projects. Candidates should ideally possess:A degree in Computer Science, Software Engineering, or a related discipline.5+ years of experience in software development, predominantly in Python (FastAPI, pytest, async/await, subprocess, file operations). A strong foundation in full-stack development, with experience in building React-based interfaces (JavaScript/TypeScript) and robust back-end systems. Experience in writing tests (functional, integration — beyond merely executing them). Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis). An understanding of CI/CD processes (GitHub Actions as a user: triggers, labels, interpreting results). Proficiency in English...

About the job

Please submit your CV in English and include your English proficiency level.

This project-based contract with toloka-ai connects engineers to AI evaluation work for leading technology companies. The focus is on testing, assessment, and improvement of AI systems. This is not a permanent position.

Role overview

The Freelance AI Evaluation Engineer builds realistic virtual companies, complete with codebases, infrastructure, and supporting context such as documentation, conversations, and tickets. These environments simulate authentic development settings for AI systems to operate within.

What you will do

  • Design simulated companies based on strategic plans, including the creation of codebases and infrastructure.
  • Develop and refine tasks within these environments, setting clear prompts and evaluation metrics to ensure tasks are solvable and fairly assessed.
  • Set up isolated developer workstations, configuring Linux machines with development tools, repositories, task trackers, messaging platforms, and real web application codebases.
  • Create tests that accurately accept all valid solutions and reject incorrect ones, maintaining a careful balance to avoid blocking correct approaches or allowing flawed ones.
  • Iterate with AI agents during testing, ensuring they identify real issues, avoid missing mistakes, and do not incorrectly flag correct solutions.
  • Review AI-generated code, analyze agent performance, and design edge cases and adversarial scenarios to strengthen evaluation processes.
  • Incorporate feedback from expert QA reviewers, refining deliverables to meet quality standards.

What this role does not include

  • Data labeling
  • Prompt engineering
  • Writing code from scratch (the AI agent handles most code generation; your focus is on guidance and evaluation)

This role centers on collaborating with advanced AI systems. Much of the work involves designing and refining tasks that challenge these models, requiring close interaction with AI agents throughout the process.

About toloka-ai

Mindrift connects specialists to exciting AI projects within leading tech companies. Focused on enhancing AI systems, we provide a platform for professionals to engage in meaningful project-based work.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.