Senior Site Reliability Engineer

Description:

This role is expected to come into our Toronto office one day per month, so you can thrive in your new role and fully embrace being a Dutonian!

Key Responsibilities

You help maintain the overall health of the platform including triaging and troubleshooting production issues, monitoring system capacity, and working with other technical teams to ensure adherence to compliance and security best practices
You partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform
You continuously strive to improve the developer experience: Full lifecycle support (creation, development, deployment, retirement), observability, flexible connectivity, and monitoring
You share your expertise with the entire Engineering organization
You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules

Basic Qualifications

5+ years of experience in Platform Engineering, Site Reliability Engineering or DevOps roles
Experience managing multiple Kubernetes clusters in a production environment
Experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
Experience deploying web applications on Kubernetes (Helm, ArgoCD)
Experience with infrastructure as code (ie. Terraform or CloudFormation)
Knowledge of a dynamic language like (ie. Ruby or Python)

Preferred Qualifications

Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk)
Knowledge of configuration management systems (e.g. Ansible, Chef, Puppet)
Experience in automating releases, continuous integration/delivery systems and relevant tools (e.g. Jenkins, CircleCI, Travis CI, Buildkite)

Organization	PagerDuty
Industry	Engineering Jobs
Occupational Category	Senior Site Reliability Engineer
Job Location	Toronto,Canada
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Experienced Professional
Experience	5 Years
Posted at	2024-08-16 5:48 am
Expires on	2025-06-02