About the job
About CodeNinja
CodeNinja stands at the forefront of global AI and engineering services, empowering enterprises to build, scale, and operate intelligent systems. With a diverse team of over 350 engineers across four continents and a track record of more than 400 successful deployments, we help organizations leverage artificial intelligence through our Global Capability Centers, Work AI, Physical AI, and AI Labs. Proud to be recognized as one of Pakistan’s fastest-growing AI firms and a multi-award recipient on Clutch, CodeNinja is dedicated to enabling over 250 clients worldwide to innovate, automate, and thrive in the intelligence economy.
Role Overview
We are seeking a highly skilled AI Software Development Engineer in Test II (AI SDET II) to join our dynamic team. As a senior quality engineer, you will be responsible for developing and overseeing the test strategy and automation processes for our AI-enabled products. Your expertise will contribute to designing and maintaining automated tests and AI evaluation suites, integrating quality metrics into our CI/CD pipelines, and collaborating with Scrum teams to ensure the delivery of robust AI capabilities without defects.
Key Responsibilities
- Lead the development and execution of a comprehensive test strategy for assigned AI features, ensuring alignment with Definition of Done and release criteria across unit, API, UI, and end-to-end testing layers.
- Design, create, and maintain automated AI evaluation suites, which include developing golden datasets, scoring mechanisms, and regression baselines to cover critical user journeys and failure modes.
- Validate changes in prompt, retrieval, and orchestration through systematic test harnesses and adherence to established acceptance thresholds.
- Enhance and extend automation frameworks and reusable utilities, focusing on improving test stability, execution speed, and signal-to-noise ratios.
- Integrate automated tests and AI evaluations into our CI/CD pipeline, effectively triaging pipeline failures and minimizing flaky tests.
- Collaborate with engineering teams to enhance testability through improved contracts, logging practices, and observability, promoting shift-left quality initiatives.
- Conduct thorough analysis of defects and AI output issues, performing root cause analysis and implementing regression coverage to mitigate future occurrences.
- Mentor junior SDETs in automation, test design, and AI validation methodologies, while effectively communicating quality risks to stakeholders.
- Ensure production readiness by validating guardrails, monitoring performance signals, and confirming compliance with performance requirements for AI features.

