As a Hardware Systems Engineer, you'll troubleshoot and maintain Cloudflare's server fleet, validate firmware updates, and enhance automation tools.
Available Locations: Bengaluru
About the department
Cloudflare's Infrastructure group is responsible for building our global network. Our Hardware Engineering team helps research, develop, test, and deploy new equipment enabling 20% of the world's internet traffic to be served smoothly. Deployed across 330 cities in 120+ countries, the hardware we select helps improve the security, reliability, and performance of the Internet.
About the Role
We need to make thoughtful infrastructure choices affecting a significant portion of the Internet. Hardware we work with includes servers and components, as well as PDUs and network hardware. . As a Hardware Systems Engineer, you will work with colleagues on the Hardware Engineering, Product teams, and Hardware Sourcing teams to troubleshoot and maintain Cloudflare's worldwide fleet of storage and compute servers.
What you'll do
Examples of desirable skills, knowledge and experience
Bonus Points
About the department
Cloudflare's Infrastructure group is responsible for building our global network. Our Hardware Engineering team helps research, develop, test, and deploy new equipment enabling 20% of the world's internet traffic to be served smoothly. Deployed across 330 cities in 120+ countries, the hardware we select helps improve the security, reliability, and performance of the Internet.
About the Role
We need to make thoughtful infrastructure choices affecting a significant portion of the Internet. Hardware we work with includes servers and components, as well as PDUs and network hardware. . As a Hardware Systems Engineer, you will work with colleagues on the Hardware Engineering, Product teams, and Hardware Sourcing teams to troubleshoot and maintain Cloudflare's worldwide fleet of storage and compute servers.
What you'll do
- Work with software teams to validate bug fixes and assess performance of new firmware revisions
- Validate and deploy firmware updates to the fleet, monitoring the progress of the rollout for compliance and reliability
- Work with server and component vendors to obtain, debug, and maintain the latest updates
- Work with our Site Reliability Engineering teams to triage hardware problem reports
- Support our Data Centre Engineering teams in resolving hardware issues
- Develop and maintain automation tools to update firmware on servers and components in Cloudflare's fleet
- Communicate your results and updates through blog posts, internal talks, and tickets
Examples of desirable skills, knowledge and experience
- Bachelor's degree in Computer Engineering, Electrical Engineering, or Computer Science
- Desire to learn about the Cloudflare hardware used by 20% of all web sites
- Desire to learn how a diverse server fleet is managed at scale
- Desire to learn the tools Cloudflare uses to maintain and monitor our hardware
- Knowledge of bash and python and basic Linux task automation
- Knowledge of x86 server hardware including motherboards, CPUs, memory, storage and firmware updates. Knowledge of other platforms such as arm is a bonus.
- Knowledge of configuration management principals, in particular we use salt to manage our fleet
- Knowledge of Redfish, IPMI and server remote management protocols
- Knowledge of running production mission critical systems
Bonus Points
- Familiarity with server hardware architecture
- Knowledge of debugging server hardware faults and the ability to engage with our sourcing team and vendors to improve quality
- Experience of managing large fleets comprising of thousands of servers
- Experience of observability and monitoring tools such as Prometheus and Grafana, and the ability to observe trends over time
- Experience with software development tools and processes such as git, Bitbucket and TeamCity and Jira
Top Skills
Bash
Bitbucket
Git
Grafana
Ipmi
JIRA
Linux
Prometheus
Python
Redfish
Salt
Teamcity
X86 Server Hardware
Similar Jobs at Cloudflare
Cloud • Information Technology • Security • Software • Cybersecurity
The role involves monitoring and managing Cloudflare's global network, maintaining data center operations, troubleshooting issues, and coordinating with teams and contractors.
Top Skills:
AnsibleApacheBashChefCiscoDwdmGo-LangHaproxyJIRAJuniperLinuxNginxPuppetPythonSaltstackVarnish
Cloud • Information Technology • Security • Software • Cybersecurity
Lead a team of engineers building features for Cloudflare One's Zero Trust security platform. Manage product strategy and ensure engineering goals are met.
Top Skills:
ClickhouseElasticsearchGoGrafanaKafkaKibanaPostgresPrometheusPythonReactRedisRustTimescaledbTypescript
Cloud • Information Technology • Security • Software • Cybersecurity
Lead and manage a team of engineers focused on Data Localization products, ensuring quality delivery, team growth, and alignment with company strategies while collaborating with various stakeholders.
What you need to know about the Kolkata Tech Scene
When considering the industries shaping India's tech scene, gaming might not immediately come to mind. However, in the last decade, increased internet usage and greater access to mobile devices have catapulted the industry to new heights, with Kolkata-based companies like Virtualinfocom, Red Apple Technologies and Digitoonz, at the forefront, driving the design and animation of new gaming titles for players.