Qualifications

Requirements5+ years of experience in software development (Python and JavaScript). Proficient in working with large codebases (e.g., Django, Flask, FastAPI, Node.js, or similar). Familiarity with Git workflows (pull requests, diffs, commits, cherry-picking). Experience in writing tests or validation scripts (pytest, unittest, or similar). Ability to compose clear and precise technical specifications. Knowledge of AI coding benchmarks or evaluation frameworks (e.g., SWE-bench or similar). Practical experience with Docker (Dockerfiles, image builds, debugging). Nice to HaveExperience in contributing to or maintaining open-source projects.

About the role

Gramian Consultancy seeks an AI Evaluation Engineer with a strong background in software engineering and coding. This remote contract role is open to candidates based in Brazil, as well as Bangladesh, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, and Vietnam.

Role overview

This position centers on creating and implementing benchmark tasks that reflect real-world software engineering challenges. The work involves evaluating AI systems by designing scenarios where codebases require targeted changes, such as bug fixes, refactoring, or migrations, and then verifying the accuracy of AI-generated solutions.

What you will do

Create and implement multi-agent benchmark tasks that simulate practical code modifications, including bug fixes, migrations, and refactoring.
Apply the Harbor evaluation framework to run and validate tasks in containerized environments.
Draft clear task instructions, specifying file paths, function signatures, expected behaviors, and constraints.
Write Python validation scripts to check the correctness of code changes.
Decompose complex tasks into steps for specialized agents.
Review large open-source codebases to identify realistic scenarios for tasks.
Run, debug, and refine tasks within Docker to ensure they are reproducible.
Iterate on task quality and complexity based on evaluation feedback.

Key details

Contract type: Contractor assignment (no medical or paid leave)
Duration: 4 weeks or longer
Schedule: 8 hours per day, with at least 4 hours overlapping Pacific Standard Time (PST)
Interview process: Take-home assessment

Gramian Consultancy is a boutique firm specializing in IT professional services and engineering talent solutions. The team focuses on software engineering and leadership, helping organizations build effective teams by connecting them with professionals who match their needs.

About Gramian Consultancy

Gramian Consultancy is committed to delivering top-notch IT professional services and engineering talent solutions. Our mission is to empower organizations by providing them with exceptional talent that aligns perfectly with their operational goals, ensuring the development of high-performing teams.

AI Evaluation Engineer (Software Engineering/Coding)

Experience Level