Site Reliability Engineer

 

Description:

PagerDuty is seeking a Site Reliability Engineer to join our SRE-Platform team. In this role you will be a key contributor to building, maintaining and scaling the Kubernetes platform that powers PagerDuty. We build solutions that accelerate developer productivity, improve reliability and help PagerDuty scale for today and tomorrow. lf you’re passionate about platform engineering, developer experience and all things Kubernetes, we’d love to hear from you!

PagerDuty is a flexible, hybrid workplace. We embrace and encourage in-person working as an integral part of our culture. Both our employees and external research tells us that co-located collaboration strengthens connections, drives innovation, and accelerates learning.

This role is expected to come into our Toronto office one day per month, so you can thrive in your new role and fully embrace being a Dutonian!

Key Responsibilities

  • You deploy, configure, monitor and optimize highly available Kubernetes clusters on AWS/EKS
  • You help maintain the overall health of the platform including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
  • You continuously strive to improve the developer experience: Full lifecycle support (creation, development, deployment, retirement), observability, flexible connectivity, and monitoring
  • You stay current on technical trends in order to suggest innovative tools and approaches to interesting problems
  • You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules

Basic Qualifications

  • 2+ years of experience in platform engineering, site reliability engineering or DevOps roles
  • Experience managing multiple Kubernetes clusters in a production environment
  • Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Experience with infrastructure as code (ie Terraform or CloudFormation)
  • Knowledge of a dynamic language like (ie Ruby or Python)

Organization PagerDuty
Industry Engineering Jobs
Occupational Category Site Reliability Engineer
Job Location Toronto,Canada
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2024-08-15 12:01 pm
Expires on 2024-12-23