NEC SX-Aurora TSUBASA: ICM User's Guide
Last revision: 2023-10-27

2. Basic usage

To use the ICM's TSUBASA installation, users must access the login
node first at hpc.icm.edu.pl through SSH [3] and then establish a
further connection to the Rysy cluster as in Listing 1. Alternatively,
the -J command line option can be passed to the (OpenSSH) client
application to specify a jump host (here, the hpc login node) through
which the connection will be established (issue man ssh command for
details).

[Listing 1] Accessing NEC SX-Aurora TSUBASA installation at ICM:
$ ssh username@hpc.icm.edu.pl
$ ssh rysy

The system runs Slurm Workload Manager [4] for job scheduling and
Environment Modules [5] to manage application software. The single
compute node (PBaran) of the ve partition can be used interactively --
see Listing 2 -- or as a batch job (see further below).

[Listing 2] Running interactive Slurm session on Rysy/PBaran:
$ srun -A GRANT_ID -p ve --gres=ve:1 --pty bash -l

Once the interactive shell session has started, the environmental
variable $VE NODE NUMBER is being automatically set to control which
VE card is to be used by the user programs. This variable can be read
and set manually with echo [6] and export [7] commands,
respectively. The software used to operate the VEs -- including
binaries, libraries, header files, etc. -- is installed in /opt/nec/ve
directory. Its effective use requires modification of the
environmental variables [8], such as $PATH, $LD LIBRARY PATH and
others, which can be done conveniently with the source command [9]:

[Listing 3] Sourcing VE environmental variables:
$ source /opt/nec/ve/mpi/2.2.0/bin/necmpivars.sh

Sourcing the variables (Listing 3) makes various VE tools accessible
within the user environment. This includes the NEC compilers for C,
C++, and Fortran languages that can be invoked by ncc, nc++, and
nfort, respectively, or by their respective MPI wrappers: mpincc,
mpinc++, and mpinfort. Please note that several compiler versions are
currently installed and it might be necessary to include a version
number in your command, e.g. ncc-2.5.1. The general usage is
consistent with the GNU GCC.

Table 2 lists several standard options for the NEC compilers btation
for details. The last four of them, are used for performance analysis
and allow for efficient software development. Some of these, apart
from being used as command line options at compile time, also rely on
dedicated environmental variables that need to be set at runtime. For
a full list of performance-related options, variables, as well as
their output description, see PROGINF/FTRACE User's Guide [10] and the
compiler-specific documentation [11, 12].

[Table 1] Several basic options for the NEC compilers:
Option Description
-c			create object file
-o			output file name
-I/path/to/include	include header files
-L/path/to/lib		include libraries
-g			debugger symbols
-Wall			enable syntax warnings
-Werror			treat warnings as errors
-O[0-4]			optimisation levels
-ftrace			use the profiler
-proginf		enable execution analysis
-report-all		report diagnostics
-traceback		provides traceback information
-fdiag-vector=[0-3]	level of details for vector diagnostics

The binaries can be run directly by specifying the path or by using
the VE loader program (ve_exec) -- a few examples including parallel
execution are gathered in Listing 4. For a full listing of options
available for mpirun, see the corresponding manual page [13] or issue
mpirun -h command.

[Listing 4] Executing serial and parallel VE programs:
$ ./program
$ ve_exec ./program
$ mpirun ./program
$ mpirun -v -np 2 -ve 0-1 ./program # enables VE cards 0 and 1

A non-interactive mode of operation is a batch mode which requires a
script to be submitted to Slurm. An example job script is shown in
Listing 5. It specifies the name of the job (-J), requested number of
nodes (-N), CPUs (--ntasks-per-node), memory (-mem; here in
Megabytes), wall time limit (--time), grant ID (-A), partition (-p),
generic resources (--gres), output file (--output), and the actual
commands to be executed once the resources are granted. See Slurm
documentation for an extensive list of available options [14].

[Listing 5] Example Slurm job script:
#!/bin/bash -l
#SBATCH -J name
#SBATCH -N 1
#SBATCH --ntasks-per-node 1
#SBATCH --mem 1000
#SBATCH --time=1:00:00
#SBATCH -A <Grant ID>
#SBATCH -p ve
#SBATCH --gres=ve:1
#SBATCH --output=out
./program

Listing 6 provides a few basic example commands used to work with job
scripts: submitting the job (sbatch) which returns the ID number
assigned to it by the queuing system, listing the user's jobs along
with their status (squeue), listing the details of a specified job
(scontrol), cancelling execution of a job (scancel). Consult the
documentation for more [14].

[Listing 6] Example Slurm commands:
$ sbatch job.sl # submits the job
$ squeue -u $USER # lists the user's current jobs
$ scontrol show job <ID> # lists the details of the job by given <ID>
$ scancel <ID> # cancels the job with given <ID>

Since there's no dedicated filesystem to be used for calculations on
the Rysy cluster, the jobs should be run from within the $HOME
directory. The ve partition (PBaran compute node) is intended for jobs
utilizing VE cards, and as such it should not be used for intensive
CPU-consuming tasks.

Full documentation for NEC SX-Aurora TSUBASA, its hardware and
software components, is available at the NEC website [15]. An
accessible introduction to using VEs is also provided on a dedicated
blog [16].

References:

[3] SSH: Secure Shell
    https://en.wikipedia.org/wiki/Secure_Shell
[4] Slurm Workload Manager
    https://slurm.schedmd.com/overview.html
[5] Environment Modules
    https://modules.readthedocs.io/en/latest
[6] echo (command)
    https://en.wikipedia.org/wiki/Echo_(command)
[7] export command
    https://ss64.com/bash/export.html
[8] Environment variable
    https://en.wikipedia.org/wiki/Environment_variable
[9] source command
    https://ss64.com/bash/source.html
[10] PROGINF/FTRACE Userbin contrast to other ICM systemss Guide
     https://www.hpc.nec/documents/sdk/pdfs/
     g2at03e-PROGINF_FTRACE_User_Guide_en.pdf
[11] NEC C/C++ Compiler User's Guide
     https://www.hpc.nec/documents/sdk/pdfs
     /g2af01e-C++UsersGuide-016.pdf
[12] NEC Fortran Compiler User's Guide
     https://www.hpc.nec/documents/sdk/pdfs/
     /g2af02e-FortranUsersGuide-016.pdf
[13] mpirun command
     https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php
[14] Slurm Workload Manager: Documentation
     https://slurm.schedmd.com/documentation.html
[15] NEC SX-Aurora TSUBASA Documentation
     https://www.hpc.nec/documents/
[16] NEC Blog: First Steps with the SX-Aurora TSUBASA vector engine
     https://sx-aurora.github.io/posts/VE-first-steps