Lead Site Reliability Engineer

 

Description:


This team is focused on the execution and delivery of highly resilient and scalable platforms using Container as a Service technologies. As a member of the team, you will lead and partner with various application owners, platform leads and team members to design, implement and continuously improve container technologies. You must have knowledge of Platform as a Service (PaaS) technologies and Container as a Service (CaaS) technologies such as Kubernetes.

What You Are Great At
 

  • Applying broad range of knowledge skills and experiences with an area of expertise to assignments that are received in the form of objectives.
  • Determining how to use resources to meet schedules and goals.
  • Providing guidance to peers within the latitude of established company policy.
  • Using broad knowledge of the organization to impact strategy, policy, and process development as a technical authority and leader with vision for positive business outcomes
  • Leading multi-functional strategic and tactical efforts.
  • Providing leadership by assisting in triage for escalated production incidents.
  • Being a change agent able to develop, implement and maintain policies and processes
  • Collaborating with peer technology organizations, business, clients and management to review application, systems and infrastructure functionality and develop plans for improvement.
  • Leading development and implementation of strategies focused on greater efficiencies to deliver systems.
  • Identifying and implementing strategies to reduce platform Mean-Time-To-Resolution (MTTR)
  • Reliability (SRE) practices and automation principles.
  • Managing continuous improvement of service engineering, delivery, and operational practices.
  • Reduces expenses by eliminating unnecessary downtime and disruptions.
  • Understanding of current business and technology trends to find opportunities for improving services and reducing risk.
  • Adopting and promoting an an SLO mindset with Disaster recovery best practices in mind
  • Effectively navigating organization structure and culture to make positive outcomes.
     

What It Takes
 

  • 10+ years of related experience, or equivalent Intermediate and advanced level certifications that demonstrate knowledge of Cloud and security concepts
  • Extensive knowledge of: CaaS Technologies including Kubernetes, Google Anthos/Google Kubernetes Engine (GKE), Ingress and PaaS technologies
  • Knowledge of (IaaS) technologies including Hypervisor (VMWare ESX), Routing (VMWare NSX-T) and Load Balancing (F5, etc.)
  • Knowledge of monitoring and logging technologies including VMWare Tanzu Observability/Wavefront, Dynatrace and Splunk
  • In depth knowledge of Network and Infrastructure security best practices including governance
  • Experience in CI/CD Pipeline implementation Automation of build, Packaging and Release Management activities (Build automation, CI/ CD, GIT, Jenkins, Git)
  • Experience with tools like JIRA, GIT/Bitbucket, Confluence, etc.
  • Build self-healing and automated systems
  • Design and build systems to collect, visualize, and store service health indicators
  • Demonstrates ability to achieve successful outcomes in handling difficult situations and work with various customers and management levels.
  • Demonstrates previously working in Agile team working in SCRUM and Kanban formats.
  • Communicate effectively with technical and non-technical audiences.
  • A self-starter with the ability to work independently and in a collaborative team environment

Organization OpenText
Industry IT / Telecom / Software Jobs
Occupational Category Lead Site Reliability Engineer
Job Location Ontario,Canada
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Experienced Professional
Experience 10 Years
Posted at 2023-12-03 10:56 am
Expires on 2024-12-10