flowith logoflowith logo

Senior Operations Expert (Full-Time) - Shanghai

flowithShanghai, Shanghai, China
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Senior

Qualifications

Expertise in building resilient systems that operate continuously with a strong emphasis on automation. Solid Operations Foundation: Minimum of 5 years of hands-on experience in SRE/DevOps/Operations with proven track record in serving millions of users. Strong proficiency in Linux and networking (TCP/IP, DNS, HTTP/HTTPS, TLS) with advanced troubleshooting skills. Cloud-Native & Edge Specialist: Profound knowledge and skills in the Cloudflare ecosystem (CDN/WAF/DNS/Edge Computing) and effective resource management within mainstream overseas cloud infrastructures (compute, network, load balancing, storage, managed databases). Automation & Monitoring Advocate: Skilled in developing and maintaining Prometheus + Grafana monitoring systems. Proficient in using Terraform (or similar IaC) and popular CI/CD toolchains. Capable of crafting operational tools using Shell/Python/Go. Architectural Insight: Comprehensive understanding of managed cloud caching and messaging systems (Serverless Redis, queues/event-driven architectures) with practical experience in security and compliance measures.

About the job

As the architectural mastermind and the vigilant steward of Flowith’s global production environment, you will play a pivotal role in our organization. This position transcends the traditional firefighting approach to outages; you will be the foundational pillar supporting our rapid business expansion. By mastering the Cloudflare ecosystem alongside mainstream global cloud infrastructure, you will design and implement sophisticated distributed architectures that can handle high concurrency with low latency. Through aggressive performance optimization and a steadfast commitment to automation, you will guarantee that millions of users worldwide enjoy seamless and stable AI interactions.

Key Responsibilities

  • Global Architecture Implementation: Architect and oversee cross-platform cloud-native architectures, facilitating multi-region deployments, elastic scaling, canary releases, and swift rollbacks to ensure the efficient functioning of globally distributed applications.
  • Traffic & Performance Optimization: Spearhead the architectural design of managed caching and asynchronous messaging capabilities to adeptly manage hot caches, task decoupling, and traffic surges.
  • High Availability & Continuity: Create and continuously refine an observability framework (SLI/SLO and alert governance). Formulate and conduct drills on backup/recovery, disaster recovery procedures, and emergency response systems to uphold business continuity.
  • Technical Vision & Empowerment: Engage in tech stack selection and architecture reviews for essential business features, striking the ideal balance between reliability, security, cost, and maintainability.

  • 全球化架构落地:设计并管理跨平台云原生架构,推进多地域部署、弹性扩缩容、灰度发布与快速回滚,保障全球分布式应用的高效运行。
  • 流量与性能优化:主导托管式缓存与异步消息能力的架构设计,从容应对热点缓存、任务解耦与流量削峰。
  • 高可用与连续性保障:建设并持续优化可观测性体系(SLI/SLO与告警治理),制定并演练备份恢复、容灾切换与应急响应机制,捍卫业务连续性底线。
  • 技术前瞻与架构赋能:参与核心业务的技术选型与架构评审,在可靠性、安全性、成本与可运维性之间找到最优解。

About flowith

Flowith is at the forefront of innovation, dedicated to enhancing user experiences through cutting-edge AI technologies. Our commitment to excellence drives our global operations, making us a leader in the cloud infrastructure domain.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.