companyAghanim logo

Mid-Level/High-Level DevOps / SRE Engineer

AghanimLisbon
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Mid to Senior

Qualifications

The ideal candidate will have a solid foundation in DevOps principles and practices, with experience in managing cloud infrastructure, preferably in GCP and GKE environments. Proficiency in Terraform and CI/CD processes, along with a strong focus on observability and incident management, is crucial.

About the job

Aghanim is hiring a Mid-Level/High-Level DevOps / SRE Engineer in Lisbon. This role focuses on managing and improving our production platform, which runs on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE). Cloudflare sits at the front, Datadog provides observability, and CI/CD pipelines run through GitHub Actions.

Work closely with Senior and Principal engineers to strengthen reliability, expand monitoring, and reduce manual operational work. The systems you support handle high loads and must be ready for sudden traffic spikes.

What You Will Do

Platform Operations (GCP/GKE)

  • Manage and support production systems on GCP, with a focus on GKE and other managed services.
  • Carry out platform enhancements and operational tasks as directed by more senior engineers.

Infrastructure as Code & Delivery Enablement

  • Apply infrastructure changes using Terraform and, where needed, Terragrunt.
  • Develop and maintain Helm charts and Kubernetes manifests.
  • Improve reliability of GitHub Actions and CI/CD workflows, including deployment automation.

Monitoring & Observability (Datadog)

  • Create and manage Datadog dashboards and monitors to ensure effective alerting.
  • Find and address monitoring gaps in key system components. Refine alerts to cut noise and improve signal quality.

Incident Management

  • Participate in incident response and operational support: triage, mitigation using runbooks, escalation, and follow-up remediation.
  • Contribute to postmortem reviews with clear facts, timelines, and actionable remediation steps.

Security Fundamentals (DevSecOps)

  • Set up and operate security tools and monitoring systems. Help triage findings and implement solutions under supervision.
  • Promote secure-by-default practices such as secrets management, access control, and baseline hardening.

Cost Awareness

  • Understand and manage operational costs for the platform.

About Aghanim

At Aghanim, we pride ourselves on fostering an innovative and collaborative work environment. Our mission is to leverage cutting-edge technologies to deliver exceptional services. We value our employees and strive to create a workplace that promotes professional growth and personal development.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.