company

Freelance AI Evaluation Engineer

MindriftRemote — Stuttgart, Baden-Württemberg, Germany
Remote Contract $50/hr - $50/hr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

This role is well-suited for experienced developers, software engineers, or test automation specialists seeking part-time, non-permanent project engagements. Ideal candidates will possess:A degree in Computer Science, Software Engineering, or related disciplines. Over 5 years of experience in software development, predominantly in Python (experience with FastAPI, pytest, async/await, subprocess, file operations). A background in full-stack development, including experience in building React-based interfaces (JavaScript/TypeScript) and robust back-end systems. Proficiency in writing tests (functional, integration, and not merely executing them). Experience with Docker containers and familiarity with infrastructure tools (Postgres, Kafka, Redis). An understanding of CI/CD processes (specifically GitHub Actions regarding triggers, labels, and result interpretation). English proficiency at a professional level.

About the job

Please submit your CV in English and indicate your English proficiency level.

Mindrift connects experienced specialists with project-based AI work for technology companies. Assignments focus on testing, evaluating, and improving AI systems. This freelance, project-based position does not offer permanent employment.

Role overview

As a Freelance AI Evaluation Engineer, the primary focus is building a dataset to assess AI coding agents using real-world developer tasks. The work involves designing detailed tasks and evaluation methods in realistic simulated environments.

Main responsibilities

  • Create virtual companies from high-level plans, including codebases, infrastructure, and realistic context such as conversations, documentation, and tickets that reflect authentic development history.
  • Develop and refine tasks for different stages of the virtual company. This includes writing prompts, setting evaluation criteria, and ensuring tasks are solvable and assessments are fair.
  • Design assignments for isolated environments that mimic a developer's workstation, using a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation), and a real web application codebase.
  • Build tests that accept all valid solutions and reject incorrect ones, aiming for balanced strictness.
  • Work with an AI agent to confirm that tests detect real issues, do not overlook errors, and validate correct solutions.
  • Review code generated by agents, analyze why solutions succeed or fail, and invent edge cases and adversarial scenarios.
  • Incorporate feedback from expert QA reviewers to improve your work and meet quality standards.

Scope clarifications

  • This position does not include data labeling.
  • This position does not cover prompt engineering.
  • Writing code from scratch is not required. The AI agent handles most coding; your focus is on guidance and evaluation.

Much of the work involves collaborating directly with AI systems, as designing challenges for advanced models requires hands-on interaction with those models.

About Mindrift

Mindrift specializes in connecting skilled professionals with cutting-edge AI projects, focusing on enhancing and evaluating AI systems for prominent technology firms. We prioritize collaboration and innovation, making us a valuable partner in the evolving landscape of artificial intelligence.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.