Eram Talent logo

AI Infrastructure Engineer

Eram TalentDhahran, Eastern Province, Saudi Arabia
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

Qualifications:Bachelor’s degree or higher in Computer Science, Engineering, or a related technical field.3+ years of experience in infrastructure engineering, ideally focused on AI, machine learning, or high-performance computing environments. Cloud proficiency in GCP/OpenShift, Kubernetes (k8s), and Docker containers/images. Deep understanding of AI skills, including model training, testing/evaluation, and deployment. Familiarity with ML/LLMOPs. Knowledge of LLMs and GenAI core skills, including how LLMs operate and their inference mechanics. Experience in inference scaling, distributed computing, and benchmarking for SLAs/SLOs. Proficient in working with GPUs and handling distributed workloads with autoscaling. Experience with NVIDIA NIMs and Huggingface frameworks. Knowledge of NVIDIA Superpods (HPC, Slurm, k8s). Ability to develop monitoring and dashboard solutions for LLM/ML workloads and applications. Understanding of AI Application Architecture and end-to-end flows. Familiarity with DevOps practices (CI/CD, ArgoCD, Git, Jenkins, etc.). Programming skills in Python and SQL.

About the job

Eram Talent is seeking an exceptional AI Infrastructure Engineer to become a key player in our forward-thinking team. The successful candidate will design, build, and maintain scalable and resilient infrastructure solutions that underpin AI and machine learning operations. This position requires close collaboration with data scientists, machine learning engineers, and software developers to enhance infrastructure performance and streamline AI model development and deployment.

Key Responsibilities:

  • Design, implement, and manage high-performance computing environments tailored for AI and machine learning applications.
  • Deploy and maintain GPU-accelerated clusters, cloud-based AI platforms, and parallel processing systems.
  • Collaborate with data scientists and ML engineers to understand infrastructure requirements for various AI projects.
  • Optimize resource allocation and scalability of AI infrastructure to support large datasets and complex models.
  • Automate infrastructure provisioning and deployment using Infrastructure as Code (IaC) tools.
  • Ensure security, compliance, and reliability of AI infrastructure.
  • Monitor system performance and troubleshoot issues to minimize downtime and maximize productivity.
  • Stay updated on emerging technologies and best practices in AI infrastructure and propose continuous improvements.

About Eram Talent

Eram Talent is a dynamic organization dedicated to innovation in technology and human resource solutions. Our mission is to empower businesses through cutting-edge infrastructure and talent development, making us a leader in the field of AI and machine learning.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.