Your opportunity
We are an SRE Team who focuses on the customer experience by improving reliability of New Relic’s streaming services and making it easy for the development teams creating those services to do reliability right. We concentrate on services built to stream customer data, currently limited to the Kafka platform but later may be expanded to other frameworks. 

You will bring your operational paranoia with a zest for pragmatic reliability to a team that is centered around providing a service oriented experience for our proprietary Kafka client library, operational readiness activities, and solving problems to reduce our risk of lag related incidents. You will work with this intercontinental team to develop the expertise and best practices for New Relic development teams to operate highly reliable services.

Opportunity to work from a remote office may be available depending on applicant location.

What you'll do
  • Participate in a “first line of defense” on-call rotation for the services your team owns and supports with an aim for proactive reliability and continually seeking automation opportunities for built-in reliability
  • Perform SRE operational activities such as managing alerts within PagerDuty, running incident response procedures, maintaining runbooks and automation, executing gamedays and chaos practices
  • Build tooling, libraries, frameworks and guidance for effective and efficient creation and operation of Kafka based streaming services

This role requires
  • 2+ years experience writing software in Java, Python, or Go.
  • Experience with configuration management using Ansible or Terraform.
  • Experience operating software in production Kubernetes environments at scale.
  • Outstanding communication and interpersonal skills.

Bonus points if you have
  • Experience with SRE and/or Operations principles.
  • Experience dealing with concurrency in Java is especially helpful.
  • Experience developing services that consume and produce to Kafka.
  • Background working with AWS products including Managed Streaming Kafka and Elastic Kubernetes Service.
  • Experience using New Relic for service and infrastructure observability.

Is a Remote Job?
Remote

New Relic helps engineers and developers do their best work every day — using data, not opinions — at every stage of the software lifecycle. The world’s best engineering teams rely on New Relic to...

Apply Now