Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets, unburden administrators, and unleash innovators. Together with our partners, we’re helping customers break free from the restrictive, overpriced legacy solutions that hold them back, and blaze forward with the full power of the open cloud in their hands.
Founded in 2007, we scaled the business with less than $3 million in outside funding until 2021, when we did a traditional IPO on the Nasdaq stock exchange. Today, Backblaze generates over $100m in revenue and is the leading specialized storage cloud - managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries, including businesses, developers, IT professionals, and individuals.
But while there is a lot to celebrate in our past, there is almost as much opportunity ahead of us. We are seeking a Sr. Manager, Operations (NOC + SRE) to join our team!
What You’ll Do:
The Senior Manager, Operations is responsible for leading the 24x7 Network Operations Center (NOC) and Site Reliability Engineering (SRE) teams in the Backblaze India office. This role ensures service continuity, system reliability, observability, and operational excellence across a diverse client infrastructure portfolio. This role is critical to delivering high-availability services with rapid incident resolution, automation, and measurable performance improvements.
Key Responsibilities
Operational Management
- Lead and develop a 24x7 NOC team to monitor, triage, and resolve incidents across customer environments (network, server, cloud, and security systems).
- Oversee daily operations including alert response, incident escalation, service reporting, and SLA adherence.
- Manage shift schedules, on-call rotations, escalation policies, and team performance reviews.
Site Reliability Engineering (SRE) Integration
- Champion and implement SRE practices such as SLIs/SLOs, error budgets, reliability scorecards, and toil reduction strategies.
- Drive automation and tool development to reduce manual work and improve response times.
- Establish observability practices using metrics, logs, traces, and health checks for proactive issue identification.
- Collaborate with Engineering to embed reliability, scalability, and fault tolerance into client solutions.
Service Quality & Improvement
- Conduct root cause analysis (RCA) and lead post-incident reviews (PIRs) to prevent recurrence and drive continuous improvement.
- Own the monitoring, incident, and change management frameworks based on ITIL and DevOps best practices.
- Define and track key performance indicators (KPIs) such as uptime, MTTR, first contact resolution, SLO compliance, and automation coverage.
- Ensure accurate and timely client communication during service-impacting events.
Client and Stakeholder Engagement
- Partner with Engineering, Service Delivery and Account Management teams to support operational onboarding and ongoing service support.
- Serve as a technical and operational escalation point for high-priority issues and executive briefings.
- Support pre-sales activities by providing input on operational readiness and service reliability.
The Right Fit:
- Bachelor’s degree in Computer Science, Engineering, or related field—or equivalent hands-on experience.
- 8+ years of experience in IT operations or infrastructure support, with at least 3 years in a leadership role within an MSP or SaaS environment.
- Proven experience managing NOC operations and applying SRE practices to improve system availability and reduce manual operations.
- Strong knowledge of networking (BGP, VPN, SD-WAN), server infrastructure (Linux), public cloud platforms, and automation frameworks.
- Experience with monitoring and incident management tools such as Zabbix, Prometheus, Grafana, Jira, and Firehydrant.
- ITIL Foundation and/or demonstrated experience with Incident, Problem, and Change Management processes.
Preferred:
- ITIL Foundation Certification or higher a plus.
- Experience with remote infrastructure management.
- Exposure to compliance standards (SOC 2, HIPAA, etc.).
- Knowledge of automation, scripting, or orchestration technologies.
Work Environment
- Operates within a 24x7 delivery model, with rotating on-call responsibilities and potential support for critical incident response outside business hours.
- Remote flexibility, with occasional travel to Corp Office, client or datacenter sites.
At this point, we hope you're feeling excited about the job description you're reading. Even if you don't meet every requirement, we still encourage you to apply. Learning, developing, and growing are key parts of our culture. We're eager to meet people who believe in our mission and can contribute to our team in various ways. We want people to feel comfortable expressing their true selves and to come, stay, and do their best work here.
At Backblaze, we value being fair and good to our customers, partners, and employees. That’s why diversity, equity, and inclusion are at the core of our values. We are committed to fostering a workforce where all employees feel a sense of belonging regardless of race, ethnicity, nationality, gender, sexual orientation, age, religion, socio-economic status, ability, veteran status, and education. We believe that our dedication to cultivating a diverse workspace not only allows us to better serve our customers in over 175 countries, but further reinforces our commitment to doing the right thing. We are proud to be an Equal Opportunity Employer.
To understand more about the data we collect and process as part of your application, please view our Backblaze Employee Privacy Notice.