NEC SX-Aurora TSUBASA: ICM User's Guide Last revision: 2023-10-27 2. Basic usage To use the ICM's TSUBASA installation, users must access the login node first at hpc.icm.edu.pl through SSH [3] and then establish a further connection to the Rysy cluster as in Listing 1. Alternatively, the -J command line option can be passed to the (OpenSSH) client application to specify a jump host (here, the hpc login node) through which the connection will be established (issue man ssh command for details). [Listing 1] Accessing NEC SX-Aurora TSUBASA installation at ICM: $ ssh username@hpc.icm.edu.pl $ ssh rysy The system runs Slurm Workload Manager [4] for job scheduling and Environment Modules [5] to manage application software. The single compute node (PBaran) of the ve partition can be used interactively -- see Listing 2 -- or as a batch job (see further below). [Listing 2] Running interactive Slurm session on Rysy/PBaran: $ srun -A GRANT_ID -p ve --gres=ve:1 --pty bash -l Once the interactive shell session has started, the environmental variable $VE NODE NUMBER is being automatically set to control which VE card is to be used by the user programs. This variable can be read and set manually with echo [6] and export [7] commands, respectively. The software used to operate the VEs -- including binaries, libraries, header files, etc. -- is installed in /opt/nec/ve directory. Its effective use requires modification of the environmental variables [8], such as $PATH, $LD LIBRARY PATH and others, which can be done conveniently with the source command [9]: [Listing 3] Sourcing VE environmental variables: $ source /opt/nec/ve/mpi/2.2.0/bin/necmpivars.sh Sourcing the variables (Listing 3) makes various VE tools accessible within the user environment. This includes the NEC compilers for C, C++, and Fortran languages that can be invoked by ncc, nc++, and nfort, respectively, or by their respective MPI wrappers: mpincc, mpinc++, and mpinfort. Please note that several compiler versions are currently installed and it might be necessary to include a version number in your command, e.g. ncc-2.5.1. The general usage is consistent with the GNU GCC. Table 2 lists several standard options for the NEC compilers btation for details. The last four of them, are used for performance analysis and allow for efficient software development. Some of these, apart from being used as command line options at compile time, also rely on dedicated environmental variables that need to be set at runtime. For a full list of performance-related options, variables, as well as their output description, see PROGINF/FTRACE User's Guide [10] and the compiler-specific documentation [11, 12]. [Table 1] Several basic options for the NEC compilers: Option Description -c create object file -o output file name -I/path/to/include include header files -L/path/to/lib include libraries -g debugger symbols -Wall enable syntax warnings -Werror treat warnings as errors -O[0-4] optimisation levels -ftrace use the profiler -proginf enable execution analysis -report-all report diagnostics -traceback provides traceback information -fdiag-vector=[0-3] level of details for vector diagnostics The binaries can be run directly by specifying the path or by using the VE loader program (ve_exec) -- a few examples including parallel execution are gathered in Listing 4. For a full listing of options available for mpirun, see the corresponding manual page [13] or issue mpirun -h command. [Listing 4] Executing serial and parallel VE programs: $ ./program $ ve_exec ./program $ mpirun ./program $ mpirun -v -np 2 -ve 0-1 ./program # enables VE cards 0 and 1 A non-interactive mode of operation is a batch mode which requires a script to be submitted to Slurm. An example job script is shown in Listing 5. It specifies the name of the job (-J), requested number of nodes (-N), CPUs (--ntasks-per-node), memory (-mem; here in Megabytes), wall time limit (--time), grant ID (-A), partition (-p), generic resources (--gres), output file (--output), and the actual commands to be executed once the resources are granted. See Slurm documentation for an extensive list of available options [14]. [Listing 5] Example Slurm job script: #!/bin/bash -l #SBATCH -J name #SBATCH -N 1 #SBATCH --ntasks-per-node 1 #SBATCH --mem 1000 #SBATCH --time=1:00:00 #SBATCH -A <Grant ID> #SBATCH -p ve #SBATCH --gres=ve:1 #SBATCH --output=out ./program Listing 6 provides a few basic example commands used to work with job scripts: submitting the job (sbatch) which returns the ID number assigned to it by the queuing system, listing the user's jobs along with their status (squeue), listing the details of a specified job (scontrol), cancelling execution of a job (scancel). Consult the documentation for more [14]. [Listing 6] Example Slurm commands: $ sbatch job.sl # submits the job $ squeue -u $USER # lists the user's current jobs $ scontrol show job <ID> # lists the details of the job by given <ID> $ scancel <ID> # cancels the job with given <ID> Since there's no dedicated filesystem to be used for calculations on the Rysy cluster, the jobs should be run from within the $HOME directory. The ve partition (PBaran compute node) is intended for jobs utilizing VE cards, and as such it should not be used for intensive CPU-consuming tasks. Full documentation for NEC SX-Aurora TSUBASA, its hardware and software components, is available at the NEC website [15]. An accessible introduction to using VEs is also provided on a dedicated blog [16]. References: [3] SSH: Secure Shell https://en.wikipedia.org/wiki/Secure_Shell [4] Slurm Workload Manager https://slurm.schedmd.com/overview.html [5] Environment Modules https://modules.readthedocs.io/en/latest [6] echo (command) https://en.wikipedia.org/wiki/Echo_(command) [7] export command https://ss64.com/bash/export.html [8] Environment variable https://en.wikipedia.org/wiki/Environment_variable [9] source command https://ss64.com/bash/source.html [10] PROGINF/FTRACE Userbin contrast to other ICM systemss Guide https://www.hpc.nec/documents/sdk/pdfs/ g2at03e-PROGINF_FTRACE_User_Guide_en.pdf [11] NEC C/C++ Compiler User's Guide https://www.hpc.nec/documents/sdk/pdfs /g2af01e-C++UsersGuide-016.pdf [12] NEC Fortran Compiler User's Guide https://www.hpc.nec/documents/sdk/pdfs/ /g2af02e-FortranUsersGuide-016.pdf [13] mpirun command https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php [14] Slurm Workload Manager: Documentation https://slurm.schedmd.com/documentation.html [15] NEC SX-Aurora TSUBASA Documentation https://www.hpc.nec/documents/ [16] NEC Blog: First Steps with the SX-Aurora TSUBASA vector engine https://sx-aurora.github.io/posts/VE-first-steps