HPC Ecosystems β OpenHPC 2.x System Administrator 101
A practical guide for managing users, software, and infrastructure in an OpenHPC 2.x cluster environment.
π Contents
- Adding Users to Cluster
- Installing Software to Compute Nodes
- IPMI / BMC (Remote Management)
- Learning Steps
- Cluster Configuration
- Cheatsheet
π smshost Cheatsheet
Quick reference guide for common administrative tasks on the smshost.
HPC Ecosystems Document
The security policy for SharePoint prevents embedding. Please view the document directly:
Open Document in SharePointπ€ Adding Users to Cluster (OpenHPC 2.x)
Warewulf manages system files using the wwsh file * interface.
To view currently managed files:
wwsh file list
User accounts are created on the smshost and then propagated to compute nodes via Warewulf.
- Add users using the standard
sudo useraddcommand. - Sync account files across the cluster:
β‘ Force Propagation
π» Installing Software to Compute Nodes (OpenHPC 2.x)
Summary
Most of the provisioned image's configuration is conducted in a chroot filesystem. These chroots cannot be directly provisioned by Warewulf. Once satisfied with the chroot configuration, it is encapsulated and compressed into a Virtual Node File System (VNFS) image, which Warewulf provisions. Think of the chroot as the βsource codeβ and the VNFS as the βcompiled binary.β
Software Installation Steps
- Install software into the compute node root filesystem (chroot):
dnf install fail2ban --installroot $CHROOT
- Rebuild the VNFS:
sudo wwvnfs --chroot $CHROOT
- Reboot compute nodes.
- Verify scheduler is running.
Install System Software for Compute Nodes
Assuming the directory structure on the smshost represents the root filesystem for the compute node (chroot).
The default location is defined in input.local and is likely: /opt/ohpc/admin/images/rocky8.6
export CHROOT=/opt/ohpc/admin/images/rocky8.6 sudo dnf -y --installroot $CHROOT install python37
The above command installs Python 3.7 directly into the root filesystem of the compute node image.
π οΈ Install Software Apps for Users (OpenHPC 2.x)
Python 3 compiler installation: Download the source code, extract it, and navigate to the folder:
β οΈ PATH Warnings
We solve this PATH warning using module files.
π Update Application Module Files (OpenHPC 2.x)
Copy a template from $MODULEPATH to /opt/ohpc/pub/modulefiles/ and edit as needed:
π§ IPMI / BMC (Remote Management)
The IPMI / BMC network allows remote control of hardware (reboot, power up/down).
Recommended practice is to separate the BMC management network from the production network.
Default node credentials: Username: chpc, Password: bmc123qwe.
π IP Address Conventions
Standard IP assignments for production and management networks:
β‘ The Most Common IPMI Commands
Check the status of a node (verify if it is powered on or unreachable):
Remotely power down a node:
Remotely power up a node:
Remotely reboot a node:
π Scripts
There is a resource script setbmc.sh to enable faster manual configuration (see HPC Ecosystems GitHub).
π Learning Steps
Extract node configuration from Warewulf and store in input.local for future provisioning.
Determine the correct ordering of nodes:
- Use
warewulf nodescanto quickly add nodes; check ordering carefully. - Use BMC commands to flash nodes to visually identify them.
π Cheatsheet
β‘ IPMI Quick Commands
ipmitool -U chpc -P bmc123qwe -H 10.10.10.203 power on
ipmitool -U chpc -P bmc123qwe -H 10.10.10.213 power off
ipmitool -U chpc -P bmc123qwe -H 10.10.10.202 sdr list
ipmitool -U chpc -P bmc123qwe -H 10.10.10.202 sdr type Temperature