Description:
We are looking for a Senior Site Reliability Engineer to join our growing Core Services group, the Site Reliability Engineering team! Reporting to the Senior Engineering Manager, you'll apply your technical and domain expertise to solve complex technical and business challenges; respond to and assist with production incidents in collaboration with product teams; participate in design discussions, code reviews, and project-related team meetings; and work with other engineers to develop innovative solutions that meet business needs concerning functionality, performance, observability, scalability, and reliability.
You will:
-
- Build, deploy, and maintain observability platforms to enable teams to self-serve their metrics gathering and dashboarding needs
- Help improve the monitoring and metrics pipelines that application APIs and ETL workflows through SLIs, SLOs, and drill down metrics
- Partner with other teams to iterate on and improve BenchSci’s Incident Response processes
- Help other teams to respond, mitigate, and remediate production incidents
- Help other teams write effective post-mortems and improve our reliability culture and processes
- Help reduce toil and improve developer productivity by automating our team and business processes
- Lead software and system design initiatives by leveraging cloud-native design patterns and injecting your cloud expertise into the entire development lifecycle
- Partner with engineering and product stakeholders and other cross-functional teams to devise and refine requirements
- Work with your team, Staff Engineers, and Engineering Managers to help promote SRE best practices
- Communicate cross-cutting decisions to all potentially impacted teams
You have:
-
- 5+ years of experience working as a Senior Site Reliability Engineer preferred; experience working in DevOps with a focus on observability and reliability may also be considered
- Expert knowledge of incident response, observability, and reliability tools and techniques in a cloud-native environment (Google Cloud is preferred, but AWS experience is also valuable)
- Experience with cloud design patterns (Google Cloud is considered an asset) and developing specialized application stacks on cloud services (Python backend, TypeScript frontend)
- Experience working in Python and JavaScript/TypeScript codebases
- Eagerness to share your own ideas, and openness to those of others