Description:
As a System support Advisor for our datacenter engineering team, you will be architecting, implementing, and managing the infrastructure necessary to support cloud workloads on physical servers.
Key Responsibilities
- Monitoring health and performance of cloud computing physical servers (i.e. HPE, DELL, Supermicro, others).
- Diagnosing and resolving software issues, hardware failures, kernel panics, system crashes, and performance degradation.
- Lead and support a team of two technical support specialists through daily calls in an agile environment.
- Leveraging scripting languages (e.g., Bash, Python) and automation tools (e.g., Ansible, Puppet) to automate repetitive tasks, streamline operations, and ensure consistency across bare metal systems.
- Assessing the need for software and firmware updates based on vendor recommendations, security patches, and performance enhancements.
- Hardening the security of bare-metal servers through configuration management, access controls, and encryption mechanisms.
- Ensuring that bare-metal deployments comply with regulatory requirements and internal governance policies.
- Maintaining comprehensive documentation of system configurations, procedures, and troubleshooting steps to facilitate knowledge sharing and continuity of operations.
- Work closely with vendors and other internal and external parties to deliver an efficient support solution.
- Contribute to problem resolution a part of cross functional teams using Agile/Scrum, Lean methodologies
Critical Qualifications
- Degree in Computer Science or Information Science
- 5 to 10 years of technical and operation experience in IT
- Strong knowledge of physical server hardware and data center management
- Experience building and maintain Ansible playbooks (including Ansible tower)
- Server hardware and firmware management tools:
- HPe OneView
- HPe OneView Global Dashboard
- DellEMC OpenManage
- DellEMC SupportAssist Enterprise
- Advanced Linux knowledge (medium to high level)
- Python/API (intermediate to advance)
- Strong problem-solving skills and ability to work under pressure.
- Strong time management skills and work ethic to manage multiple accountabilities.
- Ability to build relationships and work effectively with internal players and vendors
- Embrace change within the same team
- Available to work 24/7 on call (rotating schedule)