Ai Agent Reliability Engineer jobs in Bangalore – Browse 956 openings on RoboApply Jobs

Ai Agent Reliability Engineer jobs in Bangalore

Open roles matching “Ai Agent Reliability Engineer” with location signals for Bangalore. 956 active listings on RoboApply Jobs.

956 jobs found

1 - 20 of 956 Jobs
Apply
Emergent Labs Inc. logo
Full-time|On-site|Bangalore

At Emergent Labs Inc., we are pioneering the future of software development by creating autonomous coding agents that revolutionize traditional programming methods. Our innovative systems can generate, test, and deploy production applications directly from plain-language commands, allowing for a seamless development experience.Since our public launch, we hav…

Mar 5, 2026
Apply
Cloudflare, Inc. logo
Full-time|On-site|In-Office

Cloudflare runs a global network that supports millions of websites and online services, from individual creators to large enterprises. The company’s platform helps speed up and secure Internet applications without requiring customers to install extra hardware or software. Every request that passes through Cloudflare’s infrastructure benefits from improved performance and security, helping to reduce spam and cyberattacks. Cloudflare’s culture has been recognized by Entrepreneur Magazine and Fast Company for innovation and positive workplace values. The engineering team values people who notice weaknesses in the Internet and want to address them. Curiosity, a drive to solve problems with AI, and a collaborative approach are key traits for success here. The team works in a fast-learning environment where improvements help everyone. Role overview This in-office role in Bangalore focuses on building AI agents at scale. These agents interact directly with Cloudflare’s customers, and code written in this position will be used by real users around the world from day one. The work draws on Cloudflare’s core technologies, such as Workers, Durable Objects, KV, R2, D1, Vectorize, Workers AI, AI Gateway, and the Agent SDK, to create reliable agents that customers rely on daily. What you will do Develop agents using Cloudflare Workers, with Durable Objects for managing state and short-term memory. Integrate tools through the Agent SDK, MCP, and function calling. Leverage Vectorize, KV, R2, and D1 for semantic memory, caching, file storage, and configuration management. Operate models using Workers AI and AI Gateway. Purpose of the role The main goal is to deliver production-ready AI agents using Cloudflare’s stack. This involves an ongoing cycle of building, deploying, learning, and refining. Code produced in this role will serve as a key entry point for Cloudflare customers.

Apr 23, 2026
Apply
Zscaler logo
On-site|On-site|Bangalore, IND

About ZscalerZscaler stands at the forefront of zero trust security, empowering the world’s largest enterprises, critical infrastructure organizations, and government bodies to protect their users, branches, applications, data, and devices. Our cutting-edge Zscaler Zero Trust Exchange platform, fortified by advanced AI, effectively mitigates billions of cyber threats and policy violations daily, enabling organizations to enhance productivity while minimizing costs and complexity.At Zscaler, we prioritize impact over titles and cultivate a culture built on trust, transparency, and constructive dialogue. We focus on harnessing the best ideas at speed, fostering high-performing teams that deliver impactful results with exceptional quality. Our core values revolve around customer obsession, collaboration, ownership, and accountability.We embrace an “AI Forward, People First” philosophy to inspire innovation and empower our team members to reach their fullest potential. If you are motivated by purpose, excel in solving intricate challenges, and aim to make a positive global impact, we invite you to join Zscaler and help shape the future of cybersecurity.RoleWe are in search of an experienced Senior Staff Machine Learning Engineer to become an integral part of our Engineering team. This hybrid position is based in Bangalore and reports to the Manager of Machine Learning Engineering.In this role, you will guide the technical direction, bridge the divide between research and production, and drive technical excellence throughout the organization. You will lead complex projects, mentor junior engineers, and architect the scalable models and systems that power the world’s leading cloud security platform.Key ResponsibilitiesDesign and implement scalable, reliable, and efficient production-grade Gen AI/ML systems, from data ingestion to monitoring.Drive innovation by researching and assessing emerging AI/ML frameworks, rapidly prototyping innovative solutions, and advocating for full-scale implementation.Establish and uphold robust MLOps practices, including logging, monitoring, and CI/CD pipelines for distributed ML systems.Mentor junior engineers in system design best practices while fostering a culture of technical excellence.Collaborate with cross-functional teams to translate intricate business requirements into impactful technical solutions.

Feb 3, 2026
Apply
Sumo Logic logo
Full-time|On-site|Bangalore, Karnataka, India

Sumo Logic seeks a Staff Site Reliability Engineer based in Bangalore, Karnataka, India. The main focus of this position is to maintain and enhance the reliability and performance of company systems. Collaboration with development teams is central, especially when resolving operational issues and building solutions that keep systems stable. Key Responsibilities Partner with engineers to boost system reliability and maximize uptime. Create and improve monitoring and automation tools to support operational goals. Diagnose and resolve operational challenges as they occur. Contribute to optimizing performance throughout the infrastructure.

Apr 28, 2026
Apply
Veeam Software logo
Full-time|On-site|Bangalore, India

Veeam is a leading provider of data and AI solutions, dedicated to helping organizations protect and manage their data effectively. Recognized as a pioneer in data resilience and security posture management, we empower businesses to navigate the complexities of identity, data, security, and AI risk. With our headquarters in Seattle and operations in over 30 countries, Veeam proudly safeguards the operations of more than 550,000 customers globally. Join our dynamic team and be part of a transformative journey as we advance together, fostering growth, learning, and making a significant impact for renowned brands around the world.About the RoleAs a Staff Site Reliability Engineer, you will take on a pivotal role as a hands-on technical leader within our Site Reliability Engineering (SRE) team. Your expertise will guide senior engineers, influence product development efforts, and ensure our systems are constructed to be reliable, scalable, and observable from the ground up.You will spearhead strategic initiatives, mentor peers in SRE practices, and help define architectural best practices across our platform. This role is crucial for aligning teams, enforcing high standards, and scaling SRE principles globally at Veeam.What You’ll DoReliability Engineering & Resilience:Serve as a technical authority, mentoring senior engineers and guiding design decisions to enhance service reliability and resilience.Lead the establishment and enforcement of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets; ensure adherence across engineering teams.Collaborate with fellow staff members across teams to unify strategy and promote shared reliability standards and objectives.Engage with development and product teams to proactively design for failure, construct resilient architectures, and operationalize reliability from inception.Observability & Operational Excellence:Promote the organization-wide adoption of observability best practices and tools.Ensure that metrics, logs, and traces yield deep, actionable insights throughout systems.Lead complex incident responses, conduct postmortems, and drive systemic reliability enhancements.Encourage and uphold a blameless culture of learning and continuous improvement.

Mar 10, 2026
Apply
Rubrik logo
Full-time|On-site|Bangalore

About the Role:Production EngineerThe Production Engineer at Rubrik is essential for achieving operational excellence. This position involves managing alerts, addressing outages, and leading incident resolution as an Incident Manager. The ideal candidate will possess hands-on experience in maintaining highly available critical services across multi-cloud environments while continuously enhancing processes through automation and intelligent monitoring.What You’ll Do:Become a vital part of a 24/7 Production Operations team dedicated to managing and supporting critical infrastructure and services in multi-cloud environments.Supervise staging and production environments to ensure optimal uptime and reliability.Implement and uphold comprehensive observability solutions for real-time monitoring, alerting, and metrics collection.Lead incident management initiatives by promptly responding to alerts and outages, coordinating teams for timely resolutions.Investigate recurring incidents to identify root causes, minimize toil, and enhance system resilience.Design and develop automation tools to proactively detect, triage, and remediate production issues.Maintain and update runbooks to facilitate incident response and address recurring issues.Exhibit strong decision-making skills under pressure, effectively managing critical situations with urgency and composure.

Feb 21, 2026
Apply
Veeam Software logo
Full-time|On-site|Bangalore, India

Veeam is recognized as the premier Data and AI Trust Company, dedicated to assisting organizations in comprehending, securing, and fortifying their data and AI systems. As the leading entity in data resilience and security posture management, Veeam is designed to address the convergence of identity, data, security, and AI risk. Our headquarters are in Seattle, and we operate in over 30 countries, safeguarding the data of more than 550,000 customers globally who rely on Veeam to maintain business continuity. Join us as we advance together, fostering growth, learning, and making a significant impact for some of the world’s most renowned brands.We are seeking a Senior Software Engineer - Reliability to take on a pivotal role as a hands-on technical leader within our Site Reliability Engineering (SRE) team. In this position, you will mentor senior engineers, influence product development, and ensure that our operational systems are designed for reliability, scalability, and observability from the ground up.Your responsibilities will include driving strategic initiatives, mentoring others in SRE practices, and defining architectural best practices across our platform. This role is crucial for aligning teams, maintaining high standards, and scaling SRE principles globally within Veeam.Your tasks will include:Reliability Engineering & ResilienceDesign and enhance infrastructure to ensure high availability, fault tolerance, and scalability across public clouds, starting with Azure and planning expansion to other providers.Establish and uphold Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to define and enforce reliability goals.Lead incident response initiatives, conduct thorough analysis, facilitate blameless postmortems, and host sharing sessions to maximize learning throughout our engineering team, driving improvements across the socio-technical engineering ecosystem.Observability & Operational ExcellencePromote deep observability practices, ensuring telemetry, logs, and metrics are effectively utilized to enhance our operational insights.

Mar 10, 2026
Apply
Rubrik logo
Full-time|On-site|Bangalore

About the Role:Production EngineerThe Production Engineer at Rubrik is pivotal in ensuring operational excellence, managing alerts, addressing outages, and spearheading incident resolution as an Incident Manager. This position demands hands-on expertise in maintaining highly available critical services across multi-cloud environments while fostering continuous improvements through automation and intelligent monitoring.What You Will Do:Become a key member of a 24/7 Production Operations team dedicated to managing and supporting vital infrastructure and services across multi-cloud environments.Supervise staging and production environments to guarantee maximum uptime and reliability.Deploy and maintain comprehensive observability solutions for real-time monitoring, alerting, and metrics collection.Lead incident management initiatives by promptly responding to alerts and outages, coordinating teams for swift resolution.Investigate recurring incidents to identify root causes, mitigate toil, and enhance system resilience.Design and develop automation tools to proactively detect, triage, and rectify production issues.Update and maintain runbooks to facilitate incident response and address recurring issues.Exhibit strong decision-making abilities under pressure, managing critical situations with urgency and composure.

Feb 21, 2026
Apply
Reltio logo
Full-time|Hybrid|Bangalore

At Reltio®, we are passionate about transforming data into a powerful asset that drives business success. Our award-winning AI-driven data unification and management solutions—including entity resolution, multi-domain master data management (MDM), and innovative data products—help organizations break free from data silos and harness trusted, interoperable data. The Reltio Connected Data Platform™ delivers the right data at the right time, enabling data and analytics leaders to respond swiftly to business demands. Major enterprise brands across various industries worldwide depend on our cloud-native MDM capabilities to enhance efficiency, mitigate risks, and propel growth.Our values are at the heart of everything we do. We prioritize our customers' success with a steadfast commitment to a “Customer First” philosophy. Embracing diversity, we stand by the belief that we are “Better Together” as One Reltio. We seek to “Simplify and Share” knowledge collaboratively, removing obstacles to foster progress. We hold ourselves accountable for outcomes, embodying a mindset of excellence where we “Own It”. Each day, we strive to innovate, ensuring that today is “Always Better Than Yesterday”. If you resonate with these values, we welcome you to join our team at Reltio and contribute to our mission of excellence.Recognized with numerous awards for our technology and culture, Reltio was founded on a distributed workforce model and offers flexible work arrangements to support our employees in balancing their professional and personal lives. If you are eager to contribute to groundbreaking technology and thrive in a collaborative environment focused on enabling digital transformation through connected data, we would love to hear from you!

May 4, 2026
Apply
zzazz logo
Full-time|On-site|Bangalore Office

Roles and ResponsibilitiesGuarantee the reliability, availability, and optimal performance of our systems and services.Automate and optimize operations and processes for greater efficiency.Continuously monitor system health, identify bottlenecks, and proactively resolve potential issues.Collaborate with development teams to enhance system architecture and performance.Conduct thorough post-incident reviews and implement necessary improvements.Develop and maintain infrastructure as code using industry-standard tools like Terraform and Ansible.

Jan 28, 2025
Apply
Black Duck Software logo
Full-time|On-site|Bangalore

Role Overview Black Duck Software is looking for a Senior Site Reliability Engineer in Bangalore. This role focuses on maintaining the reliability, availability, and performance of our systems. Collaboration with development teams is central to the work, with an emphasis on building and supporting scalable infrastructure. What You Will Do Work with developers to design, implement, and maintain scalable systems. Troubleshoot production issues and identify long-term solutions. Strengthen the resilience of our platform through process and technical improvements. Promote a culture of continuous improvement across teams.

Apr 14, 2026
Apply
UiPath logo
Full-time|On-site|INDIA : BANGALORE - ENGINEERING

Join the UiPath TeamThe team at UiPath is passionate about harnessing the transformative potential of automation to redefine the way the world operates. We are dedicated to developing industry-leading enterprise software that empowers organizations.To realize this vision, we seek individuals who are inquisitive, motivated, generous, and authentic. We value those who thrive in a dynamic, fast-paced environment and who genuinely care—about their colleagues, the mission of UiPath, and the broader impact of our work.Are you ready to make a difference?Your RoleAs a Principal Site Reliability Engineer at UiPath, you will play a pivotal role in enhancing the reliability of our expansive, cloud-native systems. This position requires a comprehensive understanding of the full reliability spectrum, going beyond any single domain. You will define and drive the architecture, scalability, measurement, and automation of reliability across our systems.This role focuses on shaping the reliability practices at UiPath rather than merely reacting to outages or coding. You will collaborate with engineering and platform teams to integrate reliability into our systems, workflows, and organizational culture. Your contributions will elevate our standards for monitoring, automation, and ensuring our systems can withstand real-world loads and failures.You will take ownership of service reliability, observability, automation, and continuous improvement initiatives, partnering with teams in Romania and India as necessary.Your Responsibilities at UiPathComprehensive Reliability Ownership: Develop and refine the reliability strategy for our distributed systems, ensuring a balance of availability, performance, velocity, and cost through well-defined SLIs/SLOs and error budgets.Incident Management & Operational Excellence: Lead and actively participate in high-severity incidents, driving structured troubleshooting in uncertain situations and ensuring sustainable systemic enhancements.Observability & Operational Insights: Advocate for robust observability practices to make service health and performance risks visible and actionable.Automation, Tooling & Engineering Discipline: Automate manual operational tasks through effective tooling and self-service options while applying disciplined engineering methodologies.Infrastructure, Cloud & IaC: Champion reliable and scalable cloud infrastructure utilizing Infrastructure as Code, collaborating with platform teams to establish best practices.Technical Leadership & Organizational Impact: Influence strategic decisions to improve reliability outcomes and mentor team members to foster a culture of excellence.

Feb 10, 2026
Apply
Zscaler logo
Full-time|On-site|Bangalore, IND

As a Staff Site Reliability Engineer at Zscaler, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based security services. You will engage in troubleshooting complex Linux and network issues while implementing automation solutions to enhance operational efficiency.Your expertise will contribute to our mission of delivering unparalleled security solutions to our clients.

Mar 4, 2026
Apply
AxiCorp Financial Services logo
Full-time|Hybrid|Bangalore, India (Hybrid)

Please note that we will only accept candidates who possess the appropriate rights and documentation for employment in India.About Us:Axi is a premier global provider specializing in margin and deliverable Foreign Exchange, Contracts for Difference (CFDs), and Financial Spread Betting. Our evolution into a world-class, multifaceted brokerage is marked by a presence across six regions and significant investments in cutting-edge trading technology, designed to deliver the most comprehensive trading experience for clients ranging from novices to institutional investors.Your Role:As a Site Reliability Engineer, you will be pivotal in ensuring the availability, reliability, and operational excellence of Axi's technology infrastructure. You will design, implement, and maintain sophisticated monitoring, alerting, and log management solutions. Collaborating closely with Technology teams throughout the Development and Operations phases, your goal is to proactively identify and address any business-impacting incidents before they are reported by affected users, ensuring thorough observability and analysis through effective log management.Your Responsibilities:Act as the Product Owner for Monitoring and Observability within Axi's Technology Operations Environment.Evaluate the current environment and propose a roadmap for optimizing product offerings while managing the lifecycle of existing products.Support technology delivery teams through all product delivery phases by gathering requirements, producing detailed designs, conducting PoCs, and architecting solutions.Tweak and refine health rules while maintaining existing monitoring solutions.Minimize toil by documenting and automating repeatable processes.Communicate ideas and designs effectively to both technical and non-technical stakeholders.Consistently document processes and maintain an up-to-date knowledge base of your product expertise.

Feb 10, 2026
Apply
Coram AI logo
Full-time|On-site|Bangalore

At Coram AI, we are revolutionizing video security for today's world. Our cutting-edge cloud-native platform leverages computer vision and artificial intelligence to empower businesses to enhance safety, make informed decisions, and accelerate their operations. From real-time alerts to effortless clip sharing and comprehensive multi-site visibility, we are setting new standards in security.Join our dynamic, fast-paced team that prioritizes transparency, skillful execution, and impactful contributions. Here, every team member's voice is valued, and everyone contributes to shaping how AI enhances safety and connectivity in our world.Meet the TeamOur team at Coram AI is comprised of seasoned entrepreneurs and technology innovators with over a decade of experience building autonomous vehicles at prestigious institutions such as Stanford University, Oxford University, Zoox, and Lyft. Having successfully founded and exited various tech companies, they have embarked on this new venture with Coram AI.The Role & RequirementsWe are on the lookout for a dedicated Technical Support Engineer to become an integral part of our Bangalore team. In this role, you will act as a vital link between our customers and our product, diagnosing issues, analyzing systems and metrics, and ensuring that our clients receive prompt, insightful solutions. Additionally, you will contribute to our continuous improvement by documenting solutions and automating responses to recurring challenges.In this role, you will:Diagnose and resolve customer issues by analyzing logs, querying APIs, and utilizing monitoring tools such as GrafanaMaintain clear communication with customers throughout the resolution process, ensuring they remain informed and reassuredDevelop and maintain documentation for critical and recurring issuesIdentify trends and create automation or solutions to prevent recurring issuesCollaborate with engineering teams to escalate complex bugs and advocate for customer needsAssist in enhancing support processes and toolsYou would be an excellent fit if you have:2+ years of experience in technical support or a related customer-facing technical roleProficiency in debugging issues using APIs, logs, and observability toolsStrong ability to produce clear documentation and runbooksExperience with scripting or automation tools to enhance efficiency

Jan 19, 2026
Apply
Coram AI logo
Full-time|On-site|Bangalore

Join Coram AI as a Founding QA Engineer and be a pivotal part of our innovative team in Bangalore. We are dedicated to building state-of-the-art AI solutions that enhance the quality and efficiency of our products. As a key member of our engineering team, you will help shape our quality assurance processes from the ground up, ensuring that our software meets the highest standards of performance and reliability.

Apr 8, 2026
Apply
StockX logo
Full-time|On-site|Bangalore, India

Join StockX as an AI Automation Engineer and play a pivotal role in driving automation solutions that enhance our operational efficiency. You will be responsible for designing, developing, and implementing cutting-edge AI-driven automation strategies that align with our business goals.Your expertise in AI technologies will help us streamline processes and improve service delivery. You will collaborate with cross-functional teams to identify automation opportunities and deliver high-quality solutions.

Mar 23, 2026
Apply
Celonis logo
Full-time|On-site|Bangalore, India

Join our innovative team as a Senior Applied AI Engineer at Celonis in Bangalore, where you will leverage advanced AI technologies to drive impactful solutions. Your expertise will play a crucial role in developing cutting-edge applications that enhance operational efficiency for our clients.In this role, you will collaborate with cross-functional teams to design, implement, and optimize AI-driven models and algorithms, transforming complex data into actionable insights.

Mar 26, 2026
Apply
ChargePoint, Inc. logo
Staff AI Engineer

ChargePoint, Inc.

Full-time|On-site|Bangalore, India

About UsAs electric vehicles are projected to account for almost 30% of new vehicle sales by 2025 and over 50% by 2040, the shift towards electric mobility is inevitable. ChargePoint (NYSE: CHPT) is at the forefront of this transformation, operating one of the world's most extensive EV charging networks and providing a complete suite of hardware, software, and mobile solutions tailored for every charging requirement across North America and Europe. We connect drivers, businesses, automakers, policymakers, utilities, and other stakeholders, making e-mobility a global reality.Since our inception in 2007, ChargePoint has dedicated itself to simplifying the transition to electric for businesses, fleets, and drivers. Joining ChargePoint offers a unique chance to contribute to creating an all-electric future and tapping into a trillion-dollar market.At ChargePoint, we cultivate a vibrant and productive work environment, rooted in our core values: Be Courageous, Charge Together, Love our Customers, Operate with Openness, and Relentlessly Pursue Awesome. These principles shape our daily interactions and collective efforts to build a brighter future for everyone.Become part of the team that is shaping the EV charging landscape and leave your imprint on how people and goods navigate their journeys for generations to come.

Feb 17, 2026
Apply
parspec logo
Full-time|Hybrid|Hybrid - Bangalore, India

Join parspec as an AI Operations Engineer and become a key player in our innovative technology team. In this hybrid role based in Bangalore, India, you will leverage your expertise in AI operations to enhance our processes and deliver cutting-edge solutions. Your contributions will be vital in ensuring the seamless integration of AI technologies into our systems.

Mar 10, 2026

Sign in to browse more jobs

Create account — see all 956 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.