HPC Linux System Administrator
Create a free account to apply in seconds
Description
HPC Linux System Administrator
This role has been designed as ‘’Onsite’ with an expectation that you will primarily work from an HPE office.
Who We Are:
Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.
Job Description:
High Performance Computing, AI and Labs is a critical element of HPE. We are focused on delivering innovative solutions that accelerate our customers’ digital transformation, enabling them to tackle their complex, and data-intensive workloads. Combining deep expertise and the development of the world’s most cutting-edge, high-performance supercomputers, is defining the next era of computing delivering valuable insight & innovation. Join us and redefine what’s next for you.
HPE Nonstop is a complete software stack and offers a rich suite of security products to protect Nonstop workloads and data - be it the new crypto, standards compliance, auditing, new security design in the enterprise and so on. Nonstop comes with its own unique security architecture that has continued to protect the Nonstop deployments over time. The portfolio of security products is enhanced at regular intervals to stay with times and ahead of the curve.
At Nonstop, we have been studying the trends, listening to our customers, and looking at ways to not only make Nonstop secure but also enable customers to secure their Nonstop environment and implement corporate security practices for their Nonstop infrastructure. Given that Nonstop is a complete software stack, we look at it in terms of – a) security infrastructure available in the platform for customers and application developers protect and secure the information flow b) Integrate into Enterprise’s Authentication and Access control architecture c) implement modern cryptography d) help customers monitor their Nonstop workloads using modern tools and techniques e) implement advanced forensics to track user actions f) help customers implement regular compliance measures and demonstrate compliance g) monitor emerging threats from a technology landscape, research and scout for solutions.
What you’ll do:
Responsibilities:
• Must be hands-on. Be able to develop a solid understanding of the Linux system and be able to test the system.
• Manage and maintain HPC clusters, including installation, configuration, and optimization of compute and management nodes.
• Administer Linux/Unix-based systems, ensuring high availability, performance, and security.
• Perform system imaging, software provisioning, and configuration management using tools such as Ansible.
• Conduct hardware troubleshooting and coordinate with vendors or internal teams for hardware repairs and replacements.
• Oversee lab systems used for development, testing, and release validation in HPC environments.
• Manage storage systems (NFS, Lustre, GPFS, RAID) and ensure efficient data flow across the HPC environment.
• Monitor system performance, perform regular health checks, and implement preventive maintenance measures.
• Apply OS, firmware, and security updates to maintain system stability and compliance.
• Develop and maintain automation scripts (using Bash, Python, or Ansible) to improve operational efficiency.
• Document system configurations, maintenance procedures, and troubleshooting guides.
• Collaborate with cross-functional teams across geographies to resolve issues, plan upgrades, and support project activities.
• Provides guidance and mentoring to less-experienced staff members.
• What you need to bring:
Education and Experience Required:
• Bachelor's or Master's engineering degree in Computer Science, Information Systems.
• Typically 4-8 years experience.
Knowledge and Skills:
• Strong proficiency in Linux/Unix administration (installation, configuration, tuning, troubleshooting).
• Experience managing HPC clusters (e.g., HPE Cray, Slurm, PBS, LSF).
• Solid understanding of networking fundamentals (TCP/IP, DNS, DHCP, VLANs).
• Experience with storage management systems such as NFS, Lustre, or GPFS.
• Hands-on experience in hardware diagnostics and maintenance.
• Familiarity with system monitoring tools such as Prometheus, Grafana, or Nagios.
• Working knowledge of containerization (Docker, Singularity) and virtualization technologies is a plus.
• Proficiency in shell scripting (Bash).
• Familiarity with Python or Ansible for automation and orchestration.
• Ability to automate routine tasks and enhance operational efficiency.
• Strong troubleshooting and problem-solving skills with a focus on root cause analysis.
• Experience in maintaining accurate system documentation and change logs.
Additional Skills:
Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, Solutions Design, Testing & Automation, User Experience (UX)
What We Can Offer You:
Health & Wellbeing
We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development
We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
Unconditional Inclusion
We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.
Let's Stay Connected:
Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.
#india
Job:
Engineering
Job Level:
TCP_03
HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT