About NetApp
NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people.
If this sounds like something you want to be part of, NetApp is the place for you. You can help bring new ideas to life, approaching each challenge with fresh eyes. We embrace diversity and openness because it's in our DNA. Of course, you won't be doing it alone. At NetApp, we're all about asking for help when we need it, collaborating with others, and partnering across the organization - and beyond.
"At NetApp, we fully embrace and advance a diverse, inclusive global workforce with a culture of belonging that leverages the backgrounds and perspectives of all employees, customers, partners, and communities to foster a higher performing organization."-George Kurian, CEO
Job Summary
As a Site Reliability Engineer (SRE) with a specialization in storage, you'll manage and optimize a portfolio of customer-facing cloud services (SaaS/IaaS) on Google Cloud Platform (GCP), ensuring their overall availability, performance, and security. You will collaborate closely with global teams from NetApp and GCP, with a primary focus on supporting Google Cloud NetApp Volumes. This position includes rotational on-call work as part of a global team due to the critical nature of the services we support.
Job Requirements
You will be working in a dynamic and fast-paced environment as an engineer on the Site Reliability Engineering (SRE) team. This team is responsible for assisting customers of Google Cloud NetApp Volumes in resolving complex technical issues in production environments. We are seeking an SRE with a deep understanding of storage systems, complex distributed systems, and cloud technologies, and the ability to articulate these concepts clearly to customers and fellow engineers.
You will work with your teammates and our customers to support innovative, cutting-edge technologies that address real-world challenges. You will provide valuable feedback and guidance to our Product and Engineering teams while representing the voice of our customers. You have the opportunity to make a significant impact and take real ownership of your work.
Job Responsibilities
• Collaborate with external customers and partners to ensure their success with Google Cloud NetApp Volumes.
• Respond to, troubleshoot, and drive root cause analysis (RCA) of complex live production incidents, including cross-platform issues involving OS, networking, and databases in cloud-based SaaS/IaaS environments by following and implementing SRE best practices.
• Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Google Cloud Monitoring, ElasticSearch, Grafana, and SolarWinds. Develop and implement steps to improve system and application performance, availability, and reliability.
• Document system knowledge, create runbooks, and ensure critical system information is readily available.
• Stay up-to-date with security trends and proactively identify, diagnose, and resolve complex security issues.
• Maintain and monitor deployment, orchestration of servers, Docker containers, databases, and general backend infrastructure.
• Automate tasks and system components that would benefit from automation or are performed manually.
• Utilize Atlassian Jira to track issues to resolution based on their priority.
• Engage in incident management processes and resolve issues within agreed SLAs/SLOs.
Qualification
• Extensive experience in storage technologies and incident management processes.
• Advanced knowledge of Linux operating systems (e.g., Ubuntu, CentOS).
• Proficiency in container-based architecture (e.g., Kubernetes).
• Intermediate to advanced knowledge of automation tools and scripting languages such as Ansible, Python, Bash, Go, and PowerShell.
• Solid understanding of algorithms, data structures, and databases (SQL/NoSQL).
• Intermediate knowledge of networking concepts.
• Hands-on experience with cloud environments, particularly GCP.
• Exceptional debugging skills across various platforms and technologies.
• Familiarity with site reliability engineering principles and best practices.
Education
• BE in Computer Science or a related field.
• 6+ years of professional experience in a relevant role.
Equal Opportunity Employer:
NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, and any protected classification.
Did you know...
Statistics show women apply to jobs only when they're 100% qualified. But no one is 100% qualified. We encourage you to shift the trend and apply anyway! We look forward to hearing from you.
Why NetApp?
We are all about helping customers turn challenges into business opportunity. It starts with bringing new thinking to age-old problems, like how to use data most effectively to run better - but also to innovate. We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches.
We enable a healthy work-life balance. Our volunteer time off program is best in class, offering employees 40 hours of paid time off each year to volunteer with their favourite organizations. We provide comprehensive benefits, including health care, life and accident plans, emotional support resources for you and your family, legal services, and financial savings programs to help you plan for your future. We support professional and personal growth through educational assistance and provide access to various discounts and perks to enhance your overall quality of life.
If you want to help us build knowledge and solve big problems, let's talk.
NetApp makes your hybrid cloud run like a dream
The NetApp portfolio of leading data, application, and storage solutions helps organizations manage applications and data everywhere across hybrid...
Apply Now