Freelance AI Agent Evaluation Engineer

toloka-aiRemote — Pretoria, Gauteng, South Africa

Remote Contract $24/hr - $24/hr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

This opportunity is ideal for experienced developers, software engineers, and test automation specialists seeking part-time, non-permanent projects. Ideal candidates will possess:A degree in Computer Science, Software Engineering, or a related field. A minimum of 5 years of experience in software development, particularly with Python (FastAPI, pytest, async/await, subprocess, file operations). A background in full-stack development, with experience in building React-based interfaces (JavaScript/TypeScript) and robust back-end systems. Experience in writing tests (functional, integration — not merely executing them). Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis). An understanding of CI/CD (GitHub Actions as a user: triggers, labels, result interpretation). Proficiency in English.

About the job

Please submit your CV in English and indicate your language proficiency.

Mindrift connects skilled professionals with project-based AI roles at leading technology companies. This freelance position is remote and based in Pretoria, Gauteng, South Africa. The work is project-based and does not constitute permanent employment.

Role overview

The Freelance AI Agent Evaluation Engineer will help build a dataset to assess AI coding agents. The main focus is evaluating how these agents perform on practical developer tasks. This involves designing complex assignments and creating fair evaluation criteria within simulated environments that reflect real-life development settings.

Main responsibilities

Create virtual companies according to a strategic plan, including setting up codebases, infrastructure, and realistic context such as conversations, documentation, and tickets to simulate a development history.
Develop and refine tasks based on the evolving state of these virtual companies. Draft prompts, define evaluation criteria, and ensure tasks are solvable and fairly assessed.
Design assignments for isolated environments that mimic a developer’s workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation), and a real web application codebase.
Write tests that accept all valid solutions and reject incorrect ones. Find the right balance between strictness and leniency to ensure good approaches are not penalized and weak solutions do not pass.
Work with AI agents on test cases, making sure tests uncover genuine issues, do not miss faulty solutions, and properly validate successful ones.
Review code produced by AI agents, analyze reasons for success or failure, and design edge cases and adversarial scenarios.
Iterate on your work based on feedback from expert QA reviewers who check your output against quality standards.

What this role does not cover

Data labeling
Prompt engineering
Writing code from scratch (the AI agent generates most code; your focus is on guidance and evaluation)

Much of the work involves collaborating with AI systems. Creating tasks that challenge advanced models means working closely with these agents.

About toloka-ai

toloka-ai is a forward-thinking company that connects skilled professionals with innovative AI projects, emphasizing collaboration to enhance AI systems for top tech firms.

Similar jobs

1 - 20 of 1,286,053 Jobs

Select all on this page (20)

Apply

Associate People Solutions

Delivery Hero

Contract|On-site|Maadi

Delivery Hero is hiring an Associate People Solutions in Maadi. This position supports the HR team’s efforts to improve the employee experience and build a positive workplace culture. Role overview The Associate People Solutions role centers on supporting people management initiatives and contributing to HR projects. This position offers hands-on involvement in activities that shape how employees interact with the organization. What you will do Assist with the delivery of HR solutions that enhance the workforce experience Support efforts to maintain a positive and engaging work environment Work closely with a collaborative team focused on people management Who should apply Individuals interested in growing their HR or people management careers Those who enjoy contributing to team-oriented projects Candidates eager to support a supportive and innovative workplace

Apr 30, 2026

Apply

Senior Quality Assurance Engineer - Frontend Development Team

Acumatica

Full-time|On-site|Belgrade

Acumatica is looking for a Senior Quality Assurance Engineer to join the Frontend Development Team in Belgrade. This position focuses on upholding high quality standards for software products through careful testing and collaboration. Role overview The Senior Quality Assurance Engineer works alongside developers and product managers to find and resolve bugs, improve performance, and support a smooth user experience. Attention to detail and a proactive approach are essential in this role. What you will do Test frontend software to identify issues and verify fixes Work with team members to address bugs and suggest improvements Contribute to optimizing application performance Help maintain a high level of customer satisfaction through quality releases Requirements Experience in quality assurance for frontend software Strong collaboration skills with developers and product managers Commitment to delivering reliable and user-friendly solutions

Apr 30, 2026

Apply

Senior Product Manager - Retail Selection

Coupang

Full-time|On-site|Taipei, Taiwan

Coupang delivers a seamless shopping experience, focusing on every detail from app launch to doorstep delivery. The company’s Taiwan offerings include Rocket Delivery, which brings next-day shipping on a wide selection of products at competitive prices. Rocket Oversea extends access to millions of popular items from Korea, the U.S., and beyond, with free international shipping. Growth in Taiwan is a core priority. Coupang is seeking skilled professionals to help expand its local presence and shape the future of retail for customers across the region.

Apr 30, 2026

Apply

Strategic Commercial Planning Manager

NielsenIQ

Full-time|On-site|Cotia

The Strategic Commercial Planning Manager at NielsenIQ plays a key role in shaping commercial strategies for the organization. This position is based in Cotia and centers on designing and executing commercial plans that support business goals. Main responsibilities Work closely with teams across departments to create and carry out commercial strategies. Analyze market trends and customer data to inform planning and decision-making. Support initiatives aimed at strengthening the company's position in the market. What you will bring Strong analytical skills to interpret data and market signals. Experience collaborating with multiple teams to reach shared objectives. Ability to translate insights into actionable commercial plans.

Apr 30, 2026

Apply

Training Associate, Driver Operations

Lalamove

Full-time|On-site|Dhaka

Lalamove connects customers and drivers through a technology platform designed for fast, reliable delivery and moving services. With a presence across Southeast Asia and Latin America, Lalamove supports millions of deliveries each day and has grown to over 1,600 employees globally. The team values serving local communities, supporting small businesses and driver partners, and maintaining a culture of continuous learning. Lalamove’s operations focus on adapting to local business needs while strengthening its international brand. Role overview The Training Associate, Driver Operations will lead training sessions for driver partners in Dhaka. This role aims to improve service quality and support business growth by ensuring drivers understand and meet operational standards. What you will do Organize and deliver training programs for new and existing drivers Promote service quality and operational excellence Support the continuous improvement of training materials and processes Requirements Motivation to support driver partners and improve service quality Strong communication and organizational skills Ability to work in a fast-growing, community-focused company Only shortlisted candidates will be notified.

Apr 30, 2026

Apply

Manager/Senior Manager – Governance

frpadvisory

Full-time|On-site|London

frpadvisory is hiring a Manager or Senior Manager to lead its Governance team in London. This position oversees governance frameworks and ensures the organization meets regulatory requirements. The role also supports strategic initiatives that strengthen integrity and operational effectiveness. Role overview The Manager/Senior Manager will guide the development and maintenance of governance processes. Working closely with teams across the business, this leader will provide advice on best practices and help drive improvements in how governance is managed. Key responsibilities Oversee and refine governance frameworks Monitor compliance with relevant regulations Lead strategic projects to enhance governance standards Collaborate with other departments to support organizational goals Advise on governance best practices and process improvements Requirements Experience in governance or a related field Strong leadership and communication skills Ability to work with cross-functional teams Proactive approach to continuous improvement

Apr 30, 2026

Apply

Sample Job Title for Eurofins in Freiberg

Eurofins Scientific

Full-time|On-site|Freiberg

Eurofins Scientific is hiring a Sample Entry Employee (m/f/d) to join the team in Freiberg. This position centers on handling the intake and processing of different types of samples for bioanalytical testing. Main responsibilities Receive and log incoming samples according to established procedures Process samples to ensure they meet quality standards and protocols Collaborate with laboratory staff to support accurate and timely testing What you will bring Attention to detail and a focus on quality Ability to follow protocols and work as part of a team Reliability in supporting laboratory operations This role supports the efficiency and accuracy of Eurofins' testing services at the Freiberg site.

Apr 30, 2026

Apply

Mitarbeiter (m/w/d) für Probenregistrierung bei Eurofins

Eurofins Scientific

Full-time|On-site|Freiberg

Eurofins Scientific sucht Unterstützung im Bereich Probenregistrierung am Standort Freiberg. In dieser Position steht die sorgfältige Erfassung und Verwaltung von Proben im Mittelpunkt. Die Arbeit trägt dazu bei, dass alle Abläufe im Labor zuverlässig dokumentiert und koordiniert werden. Ihre Aufgaben Präzise Registrierung eingehender Proben Mitarbeit bei der Verwaltung und Pflege von Probendaten Abstimmung mit verschiedenen Abteilungen für einen reibungslosen Ablauf Wen wir suchen Teamfähigkeit und Freude an gemeinsamer Arbeit Genauigkeit und ein gutes Auge für Details Motivation, sich in einem Laborumfeld einzubringen

Apr 30, 2026

Apply

AV Technical Engineer - Remote Opportunity

Kinly

Full-time|€3.2K/mo - €4.2K/mo|Remote|Remote job

Kinly is seeking an AV Technical Engineer to join the team in a remote capacity, supporting clients in the Randstad region. This position centers on delivering reliable audiovisual and IT solutions, with a focus on the full project lifecycle from installation to execution. Role overview The AV Technical Engineer plays a key part in managing and executing AV/IT projects. This includes overseeing installations, configuring systems, and making necessary software changes. The role requires hands-on involvement in setting up, commissioning, operating, assembling, and disassembling AV/IT systems of various complexities. What you will do Oversee installation and delivery of AV/IT projects, including system configuration and software updates Handle the complete execution of AV/IT installations, from initial setup through to final disassembly Configure and modify AV/IT systems as project needs evolve Ensure that all projects are delivered on schedule, within budget, and to the required quality standards Troubleshoot and resolve technical issues as they arise Location This is a remote position supporting clients in the Randstad region.

Apr 30, 2026

Apply

SEND 1:1 Buddy Support - Paxton

Junior Adventures Group

Full-time|On-site|LONDON

This SEND 1:1 Buddy Support role in Paxton centers on working directly with children who have Special Educational Needs and Disabilities (SEND). The position calls for a compassionate approach to help each child feel included and supported throughout their day. What you will do Provide individual assistance to children with SEND during daily activities Encourage positive social interactions among children Support the educational and personal development of each child Role overview Working closely with children, you will help create an engaging environment where every child can participate and grow. Your commitment and understanding will play a key part in their progress and well-being.

Apr 30, 2026

Apply

Level 1 Helpdesk Support Technician

Dijital Team Pty Ltd

Full-time|On-site|Colombo

Dijital Team Pty Ltd is looking for a Level 1 Helpdesk Support Technician to join the support team in Colombo. This entry-level role suits someone early in their technical career who enjoys solving problems and helping users with day-to-day issues. Role overview This position focuses on assisting users, troubleshooting technical problems, and building skills in telephony and cloud communications. The work is hands-on and centers around providing responsive support to clients. About the client The client is a technology company based in Australia, specializing in telephony, unified communications, and contact center solutions. Their platforms integrate with major vendors, including Microsoft Teams, Avaya, and Cisco, to deliver reliable communication services. What you will do Respond to user support requests and resolve technical issues Troubleshoot problems related to telephony and cloud communication systems Work with platforms that connect with Microsoft Teams, Avaya, and Cisco

Apr 30, 2026

Apply

Credit Control Clerk

Syscogb

Full-time|On-site|Ashford

Syscogb is seeking a Credit Control Clerk to help manage credit processes from our Ashford office. This position centers on monitoring customer accounts and supporting collections to keep payments on track. What you will do Track and review customer accounts for outstanding balances Ensure timely payments are received Assist with collection activities as needed What we look for Attention to detail and strong organizational skills Ability to support financial processes and maintain accuracy Commitment to upholding financial integrity This role is key to supporting Syscogb’s financial operations and ongoing success.

Apr 30, 2026

Apply

Junior Customer Support Analyst

NielsenIQ

Full-time|On-site|Pune

NielsenIQ is looking for a Junior Customer Support Analyst in Pune to help deliver responsive and reliable service to clients. This position centers on assisting customers, resolving issues, and maintaining a high standard of support. Role overview This role focuses on handling customer inquiries and troubleshooting problems as they arise. The goal is to ensure each client receives timely and effective assistance. What you will do Respond to customer questions and concerns Troubleshoot and resolve reported issues Work to create a positive experience for every client What we look for Interest in supporting and helping customers Willingness to learn and grow within customer support Strong communication and problem-solving skills

Apr 30, 2026

Apply

Research Associate – Legal

Scalable GmbH

Full-time|On-site|München

Scalable GmbH is hiring a Research Associate for the Legal department in München. This position centers on legal research and analysis to support the team’s work with clients. Role overview This role involves gathering and analyzing legal information to help deliver accurate legal services. The Research Associate will assist in preparing materials, reviewing documents, and ensuring the team has up-to-date legal insights. What you will do Conduct detailed legal research on various topics Analyze findings to support casework and advisory projects Assist the legal team in preparing documents and reports Who this role suits This position fits individuals interested in law and research who want to build their skills and start a legal career in a collaborative setting.

Apr 30, 2026

Apply

Senior Advisory AI Foundry Architect

ServiceNow

Full-time|On-site|Bangalore

ServiceNow is seeking a Senior Advisory AI Foundry Architect in Bangalore. This position focuses on designing and delivering AI-driven solutions that help clients improve their business processes and customer interactions. Role overview This role centers on using advanced AI technologies to create and implement solutions that address real business needs. The Senior Advisory AI Foundry Architect will work with teams and clients to develop strategies that make a measurable impact on operations and service quality. What you will do Architect AI solutions tailored to client requirements Guide the implementation of these solutions to ensure they deliver value Collaborate with stakeholders to identify opportunities for AI-driven transformation Requirements Expertise in AI technologies and solution architecture Experience designing and implementing AI systems for business use Strong problem-solving skills and the ability to work closely with clients

Apr 30, 2026

Apply

Compliance Expert (m/w/d) at scalablegmbh | Berlin

scalablegmbh

Full-time|On-site|Berlin

scalablegmbh is looking for a Compliance Expert (m/w/d) to join the team in Berlin. This position plays a key part in keeping our operations aligned with relevant regulations and industry standards. Role overview This role focuses on monitoring company activities to ensure compliance with legal and regulatory requirements. The Compliance Expert helps maintain client trust by supporting processes that uphold our commitment to quality and integrity. What you will do Oversee adherence to applicable laws and industry guidelines Support efforts to deliver reliable service to clients Contribute to a culture of compliance across the organization Location This position is based in Berlin.

Apr 30, 2026

Apply

Cloud Architect

Avaloq

Full-time|On-site|Bioggio

Avaloq is looking for a Cloud Architect in Bioggio to design and implement cloud solutions tailored to client needs. This position centers on shaping cloud architectures and working closely with colleagues across different teams to deliver reliable and effective services. Role overview This role focuses on creating cloud-based systems and ensuring their smooth integration and deployment. The Cloud Architect will work with various groups within Avaloq to align technical solutions with business goals. Collaboration Expect frequent interaction with cross-functional teams as part of the process to deliver cloud services. The work involves both technical design and hands-on implementation.

Apr 30, 2026

Apply

Service Planner

Faac Technologies

Full-time|On-site|Utrecht, Utrecht, Nederland

Faac Technologies is looking for a Service Planner to coordinate project schedules and manage resource allocation in Utrecht. This position plays a key part in keeping projects on track and supporting smooth service delivery. Role overview The Service Planner works closely with different teams to organize workflows and support project execution. Clear communication and strong organizational skills are essential in this role. What you will do Coordinate project timelines and resource assignments Work with cross-functional teams to streamline processes Support efforts to improve service delivery and meet company objectives Requirements Experience in planning and organization Ability to collaborate with multiple teams Strong attention to detail

Apr 30, 2026

Apply

Compliance Specialist (m/w/d)

scalablegmbh

Full-time|On-site|München

scalablegmbh is seeking a Compliance Specialist (m/w/d) in München to help maintain high standards across company operations. This position centers on supporting regulatory compliance and upholding ethical practices throughout the organization. Role overview The Compliance Specialist works closely with multiple departments to ensure that internal processes meet industry regulations and standards. The role involves contributing to the development and implementation of compliance strategies that help minimize risk and protect the company’s reputation. What you will do Monitor company activities to confirm alignment with relevant laws and guidelines Collaborate with different teams to build and refine compliance procedures Support efforts to foster ethical conduct across the organization Help identify potential compliance risks and recommend solutions Requirements Experience or interest in compliance, regulatory affairs, or a related field Strong attention to detail and a proactive approach to problem-solving Ability to work with colleagues from various departments

Apr 30, 2026

Apply

Advisory AI Foundry Architect

ServiceNow

Full-time|On-site|Bangalore

ServiceNow is looking for an Advisory AI Foundry Architect in Bangalore. This position centers on designing and implementing artificial intelligence solutions that help clients address complex business challenges. Role overview This role involves collaborating directly with clients to understand their needs and pain points. The Advisory AI Foundry Architect develops and architects AI-driven strategies and solutions that aim to improve operational efficiency and effectiveness. What you will do Work with clients to analyze their business challenges and requirements Design and architect AI-based solutions tailored to client needs Drive projects focused on transforming operations through artificial intelligence Requirements Expertise in artificial intelligence technologies and solution architecture Strong experience working with clients on technical projects Ability to translate business needs into effective AI solutions

Apr 30, 2026

Create account — see all 1,286,053 results