Lead Site Reliability Engineer

Description:

This team is focused on the execution and delivery of highly resilient and scalable platforms using Container as a Service technologies. As a member of the team, you will lead and partner with various application owners, platform leads and team members to design, implement and continuously improve container technologies. You must have knowledge of Platform as a Service (PaaS) technologies and Container as a Service (CaaS) technologies such as Kubernetes.

What You Are Great At

Applying broad range of knowledge skills and experiences with an area of expertise to assignments that are received in the form of objectives.
Determining how to use resources to meet schedules and goals.
Providing guidance to peers within the latitude of established company policy.
Using broad knowledge of the organization to impact strategy, policy, and process development as a technical authority and leader with vision for positive business outcomes
Leading multi-functional strategic and tactical efforts.
Providing leadership by assisting in triage for escalated production incidents.
Being a change agent able to develop, implement and maintain policies and processes
Collaborating with peer technology organizations, business, clients and management to review application, systems and infrastructure functionality and develop plans for improvement.
Leading development and implementation of strategies focused on greater efficiencies to deliver systems.
Identifying and implementing strategies to reduce platform Mean-Time-To-Resolution (MTTR)
Reliability (SRE) practices and automation principles.
Managing continuous improvement of service engineering, delivery, and operational practices.
Reduces expenses by eliminating unnecessary downtime and disruptions.
Understanding of current business and technology trends to find opportunities for improving services and reducing risk.
Adopting and promoting an an SLO mindset with Disaster recovery best practices in mind
Effectively navigating organization structure and culture to make positive outcomes.

What It Takes

10+ years of related experience, or equivalent Intermediate and advanced level certifications that demonstrate knowledge of Cloud and security concepts
Extensive knowledge of: CaaS Technologies including Kubernetes, Google Anthos/Google Kubernetes Engine (GKE), Ingress and PaaS technologies
Knowledge of (IaaS) technologies including Hypervisor (VMWare ESX), Routing (VMWare NSX-T) and Load Balancing (F5, etc.)
Knowledge of monitoring and logging technologies including VMWare Tanzu Observability/Wavefront, Dynatrace and Splunk
In depth knowledge of Network and Infrastructure security best practices including governance
Experience in CI/CD Pipeline implementation Automation of build, Packaging and Release Management activities (Build automation, CI/ CD, GIT, Jenkins, Git)
Experience with tools like JIRA, GIT/Bitbucket, Confluence, etc.
Build self-healing and automated systems
Design and build systems to collect, visualize, and store service health indicators
Demonstrates ability to achieve successful outcomes in handling difficult situations and work with various customers and management levels.
Demonstrates previously working in Agile team working in SCRUM and Kanban formats.
Communicate effectively with technical and non-technical audiences.
A self-starter with the ability to work independently and in a collaborative team environment

Organization	OpenText
Industry	IT / Telecom / Software Jobs
Occupational Category	Lead Site Reliability Engineer
Job Location	Ontario,Canada
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Experienced Professional
Experience	10 Years
Posted at	2023-12-03 10:56 am
Expires on	2025-03-27