
HPC System Administrator
Dynamics Ats
Canada
•1 day ago
•No application
About
HPC System Administrator
JOB-10044328
Anticipated Start Date
June 23, 2025
Location
Houston, TX
Type of Employment
Contract Hire
Employer Info
Our client is a global leader in energy technology, providing cutting-edge solutions across the oil and gas industry. Operating in over 100 countries, they focus on digital innovation and sustainable practices to drive the future of energy and support the transition to lower-carbon operations.
Job Summary
W e are seeking a skilled HPC Systems Administrator to join our Houston-based Systems team. This role focuses on the management and support of a hybrid high-performance computing (HPC) environment, both on-premises and in the cloud, that supports proprietary scientific applications and mission-critical production and development workflows. The ideal candidate will have a deep understanding of large-scale compute infrastructure, strong Linux system administration skills, and experience supporting enterprise HPC environments across multiple data center technologies.
Job Description
- Administer and maintain a hybrid HPC infrastructure with thousands of servers, large-scale storage systems, and tape automation technologies.
- Install, configure, and manage Linux operating systems (RHEL, CentOS, Rocky Linux) in a distributed enterprise environment.
- Deploy and manage HPC-related software using IBM xCAT and automation tools such as Ansible and Terraform.
- Maintain and support compute hardware, including servers, GPUs, SSDs, robotic tape libraries, and disk arrays.
- Administer storage solutions including HPE ClusterStor, NetApp, Dell Isilon, and Pure Storage systems.
- Manage and provision cloud instances in Google Cloud Platform and Microsoft Azure; build custom VM images and write cloud automation scripts.
- Support database environments (PostgreSQL) and perform installation and ongoing maintenance.
- Script in Bash, Perl, Python, Ruby, and MRTG for automation and system diagnostics.
- Ensure secure system configurations, including the deployment and management of Linux endpoint security tools.
- Maintain networking configurations related to Ethernet, InfiniBand, and Fiber Channel SANs.
- Participate in and support disaster recovery, system backup, and restore operations using IBM Spectrum, Dell Networker, and related tools.
- Evaluate system performance and recommend improvements for operational efficiency and scalability.
- Investigate, debug, and resolve system-level issues proactively.
- Follow structured change management procedures including testing in non-production environments.
- Provide clear communication regarding planned changes, outages, and maintenance windows.
- Adhere to internal IT deployment standards with regular reporting and compliance tracking.
- Document and share technical solutions and best practices via internal support systems.
- Work collaboratively with internal teams (networking, desktop support, programming) and external vendors.
- Submit detailed weekly status reports and participate in weekly technical review meetings.
- Provide 24/7 on-call support on a rotational basis, including participation in periodic power-downs and emergency data center operations.
- Actively engage in peer review of major projects prior to deployment.
- Ensure compliance with internal quality assurance protocols, best practices, and safety requirements.
S kills Required
- Minimum 5 years of experience in a large-scale HPC environment.
- Proven experience managing Linux-based systems (RHEL, CentOS, or Rocky Linux).
- Familiarity with HPC management tools like IBM xCAT.
- Strong background in computer hardware support and enterprise-level storage solutions.
- Proficiency with public cloud platforms (Google Cloud Platform and Microsoft Azure).
- Hands-on experience with automation and configuration management tools (e.g., Ansible, Terraform).
- Competence in scripting languages (Bash, Python, Perl, Ruby).
- Knowledge of network infrastructure including LAN/WAN, Ethernet, InfiniBand, and SAN technologies.
- Experience with container technologies and security configurations for Linux environments.
- Solid understanding of backup and recovery solutions and enterprise monitoring.
Education
- Highschool diploma or GED
Pay Rate
- $45 per hour and higher depending on experience
Additional Details
- Shift: 9am - 5pm
- Highly self-motivated and capable of working independently.
- Team player who can collaborate effectively across technical groups.
- Strong communication skills—written, verbal, and interpersonal.
- Willingness to mentor and train junior team members.
- Commitment to documentation, operational discipline, and industry best practices.
We are an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law.