About the job
Join NEXTDC as a Network Platform DevOps Engineer, where you will play a pivotal role in revolutionizing network operations in our data center. Your primary focus will be on designing, constructing, and managing the foundational network automation and monitoring tools for both IT and OT operations. Collaborating closely with the Network & Security Operations team, you will transition the organization from manual processes to automated, code-driven networking practices.
In this vital position, your responsibilities will include:
- Crafting and deploying the initial suite of network automation playbooks, scripts, and templates for systematic changes, provisioning, and configuration compliance across IT and OT networks.
- Identifying and implementing essential toolchain components (e.g., Git repositories, CI/CD tools, Ansible/Terraform, and basic testing frameworks) and outlining their usage within the operations team.
- Establishing core network monitoring and logging frameworks for the data center, including onboarding devices, setting alert rules, and creating dashboards and basic SLOs, while integrating with ITSM/CMDB/DCIM platforms wherever feasible.
- Developing and documenting standard operating procedures and runbooks for utilizing automation and tools effectively (including change requests, pipeline operations, and rollback procedures).
- Providing support for incident and problem management by delivering tools, diagnostic scripts, and data exports to expedite root cause analysis and post-incident evaluations.
Your technical expertise will encompass:
- Robust Python and shell scripting capabilities for constructing automation, lightweight tools, and integrations without existing frameworks.
- Practical experience in setting up at least one network automation/IaC stack (e.g., Ansible + Git + CI or Terraform-based workflows) from an initial stage.
- Experience in configuring and onboarding devices into monitoring/logging platforms (e.g., SolarWinds/PRTG, Prometheus/Grafana, Elastic/Splunk), establishing alert rules, and developing insightful dashboards.
- Knowledge of switching, routing, firewalls, and VPN configurations in data center or enterprise networks, including high availability and basic segmentation designs.
- Ability to design simple yet robust automation patterns and coding standards that are operationally viable (including idempotent changes, rollbacks, and safety checks).

