About the job

About Truelogic

Truelogic stands as a premier provider of nearshore staff augmentation services, proudly located in New York. With over 20 years of experience, we deliver exceptional technology solutions to a diverse client base, from pioneering startups to established industry leaders, facilitating their digital transformation journeys.

Our dedicated team of 600+ highly qualified tech professionals, situated across Latin America, is at the forefront of digital innovation, collaborating with U.S. companies on significant projects. Whether engaging with Fortune 500 firms or fast-growing startups, we are committed to achieving impactful results.

By applying for this role, you are taking the first step towards becoming part of a vibrant team that values your skills and aspirations. We strive to align your expertise with opportunities that promote outstanding career advancement and success while contributing to transformative initiatives that will shape the future.

Our Client

A leading player in the Financial Services sector.

Job Summary

The Site Reliability Operations (SRO) team is responsible for ensuring the continuous stability of our internal IT infrastructure and mission-critical backend systems. While this position is not directly focused on DevOps, it plays a vital role in incident monitoring, coordination, and operational restoration, especially in a high-stakes, regulated environment.

This role encompasses incident command, technical troubleshooting, project leadership, and effective communication with various internal and external stakeholders.

Responsibilities

Act as Incident Commander during incidents, leading team coordination, communication, and service restoration efforts.
Generate executive-level incident reports, conduct Root Cause Analyses (RCAs), and advocate for continuous improvement.
Enhance observability using tools such as AWS CloudWatch and New Relic, minimizing alert noise and gaps.
Provide hands-on support in both Linux and Windows environments, addressing complex infrastructure challenges.
Oversee and implement deployments utilizing Jenkins, GitLab, or comparable CI/CD platforms.
Lead infrastructure projects, including migrations, upgrades, and process enhancements.
Implement change management and risk assessment protocols for production alterations.
Maintain accurate documentation and Standard Operating Procedures (SOPs), serving as a crucial liaison between engineering teams and external vendors.
Participate in an on-call rotation: 1-week rotation, with the possibility of critical incident call-ins.

About the job

About Truelogic

Our Client

A leading player in the Financial Services sector.

Job Summary

This role encompasses incident command, technical troubleshooting, project leadership, and effective communication with various internal and external stakeholders.

Responsibilities

Act as Incident Commander during incidents, leading team coordination, communication, and service restoration efforts.
Generate executive-level incident reports, conduct Root Cause Analyses (RCAs), and advocate for continuous improvement.
Enhance observability using tools such as AWS CloudWatch and New Relic, minimizing alert noise and gaps.
Provide hands-on support in both Linux and Windows environments, addressing complex infrastructure challenges.
Oversee and implement deployments utilizing Jenkins, GitLab, or comparable CI/CD platforms.
Lead infrastructure projects, including migrations, upgrades, and process enhancements.
Implement change management and risk assessment protocols for production alterations.
Maintain accurate documentation and Standard Operating Procedures (SOPs), serving as a crucial liaison between engineering teams and external vendors.
Participate in an on-call rotation: 1-week rotation, with the possibility of critical incident call-ins.

About Truelogic

Job Summary

This role encompasses incident command, technical troubleshooting, project leadership, and effective communication with various internal and external stakeholders.

Responsibilities

Act as Incident Commander during incidents, leading team coordination, communication, and service restoration efforts.

Generate executive-level incident reports, conduct Root Cause Analyses (RCAs), and advocate for continuous improvement.

Enhance observability using tools such as AWS CloudWatch and New Relic, minimizing alert noise and gaps.

Provide hands-on support in both Linux and Windows environments, addressing complex infrastructure challenges.

Oversee and implement deployments utilizing Jenkins, GitLab, or comparable CI/CD platforms.

Lead infrastructure projects, including migrations, upgrades, and process enhancements.

Implement change management and risk assessment protocols for production alterations.

Maintain accurate documentation and Standard Operating Procedures (SOPs), serving as a crucial liaison between engineering teams and external vendors.

Participate in an on-call rotation: 1-week rotation, with the possibility of critical incident call-ins.