Site Reliability Engineer

 

Description:

The Reliability Engineering organization provides multitude of products and services related to operations and continuity of business delivery.


The Site Reliability Engineering teams make the SAP Business Technology Platform run better by providing 24x7 deep technical coverage for Incident Management (Outages and other incidents with major customer impact) applying SRE principles. We share a Live Site First culture and care for the business continuity of our customers running mission critical applications in the Cloud.



We are looking for an engineer to join an already established SRE team for the SAP Business Technology Platform.

 


EXPECTATIONS AND TASKS

As a Site Reliability Engineer, you will have the opportunity to operate and support business critical Cloud services. As part of your daily job, you will proactively monitor the service behavior and identify areas for improvement. You will participate in the development of tools for monitoring and troubleshooting cloud services built on latest open source and SAP technologies, following SRE principles.

 


Responsibilities

  • Act as technical expert during Live site incidents (downtimes of supported services in scope), investigate and solve incidents on a deep technical level.
  • Drive root cause analysis and follow-up improvements to prevent issues from reoccurring.
  • Perform in-depth troubleshooting and log analysis to identify and solve complex issues in accordance with internal and external SLAs.
  • Build software-based solutions to address improvements in service reliability and stability.
  • Enhance infrastructure and platform monitoring by gathering system metrics (4 Golden Signals) and implementing tools for recovery.
  • Integrate and collaborate closely with development teams and work with them on outputs from Postmortems and product improvements.
  • Learn new technologies and keep up to date with latest development increments.
  • Create and maintain technical documentation.
  • Define, advocate, apply SRE best practices.
  • Participate in the on-call rotation (follow the sun approach) to react to major incidents. On-call has a special compensation package.

Organization SAP
Industry Engineering Jobs
Occupational Category Site Reliability Engineer
Job Location Montreal,Canada
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2024-05-04 5:40 am
Expires on 2025-01-22