About the job
dyneits is hiring a Site Reliability Engineer focused on OpenStack and private cloud operations. This remote role supports EST and North America time zones and is available as a full-time or long-term contract position.
Role overview
This position centers on maintaining production support, troubleshooting, and ensuring platform reliability for OpenStack-based private clouds. The engineer will work hands-on with Linux, networking, and storage systems. Collaboration with internal engineering teams and direct interaction with customers are key aspects of the job.
What you will do
- Diagnose and resolve complex issues in OpenStack and Linux environments.
- Support and manage OpenStack services, including Nova, Neutron, Cinder, and Keystone.
- Perform root cause analysis to implement long-term solutions.
- Participate in incident management and on-call rotations.
- Monitor system performance, availability, and reliability.
- Work with engineering teams to implement fixes and improvements.
- Communicate with customers through various channels.
- Carry out system optimization and performance tuning tasks.
Requirements
- Deep understanding of Linux internals and system performance.
- Experience with kernel tuning, troubleshooting, file systems, and disk management.
- Familiarity with partitions, LVM, SCSI multipath, and basic Ceph knowledge.
- Ability to troubleshoot IO and performance issues.
- Understanding of DHCP, DNS, VLANs, network bonding, and routing concepts.
- Hands-on experience with OpenStack services (Nova, Neutron, Cinder, Keystone).
- Strong troubleshooting and debugging skills, including root cause analysis.
- Experience supporting production environments and handling customer-facing technical issues.
Nice to have
- Basic knowledge of Kubernetes concepts.
- Familiarity with monitoring tools like Prometheus and Grafana.
- Understanding of metrics, logging, and alerting systems.
- Basic scripting skills in Python or Go.
- Experience with automation and observability practices.
Soft skills
- Strong problem-solving and analytical thinking.
- Ability to perform in high-pressure production settings.
- Clear and effective communicator.
- Proactive approach to preventing issues.
- Comfortable working in remote, distributed teams.

