About the job
About Us:
At onhires, we are dedicated to developing a powerful and scalable trading platform tailored for high-traffic, latency-sensitive applications. Our infrastructure utilizes cutting-edge technologies to facilitate real-time trading while ensuring unmatched reliability and performance. Join us to contribute to the evolution of our platform and engineering culture.
Job Summary:
We are seeking a Lead DevOps & Platform Engineer to spearhead the design, implementation, and management of our AWS-centric infrastructure. In this crucial role, you will enhance the productivity of our product engineering team while ensuring platform scalability, reliability, and security. This position blends elements of DevOps, Platform Engineering, and Site Reliability Engineering (SRE). You will advocate for best practices, influence our engineering culture, and guarantee that our platform remains robust, efficient, and future-ready.
Key Responsibilities:
Platform Engineering
Infrastructure Design: Design and implement scalable infrastructure to support the deployment and management of our trading platform.
Developer Tooling: Create and maintain internal tools that enhance developer workflows, including sophisticated CI/CD pipelines.
Infrastructure as Code (IaC): Advocate for and implement IaC practices using Terraform, CloudFormation, or Pulumi.
Core Services Management: Oversee and optimize platform-critical services, including:
NATS Cluster
RabbitMQ
AWS RDS PostgreSQL
Redis Cluster
DevOps
Automation and CI/CD: Streamline and optimize deployment processes to ensure smooth continuous integration and delivery.
Container Orchestration: Oversee and scale containerized workloads utilizing Kubernetes and Docker.
Cloud Optimization: Assess and enhance cloud resource utilization for optimal performance and cost-effectiveness.
Site Reliability Engineering (SRE)
Reliability Metrics: Develop and uphold Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Monitoring & Observability: Implement monitoring tools and dashboards (e.g., Prometheus, Datadog, Grafana) for enhanced visibility into system performance.

