ug4
|
Cekon is the smaller in-house cluster of the G-CSC. It consists of 23 compute nodes with 4 cores per node, that is 92 computing cores.
Configuration: Normally a run of CMake with "standard" parameters should do the job.
There are some problems with the pre-installed GCC 4.1.2, see GCC 4.1.2 . You can also use icc
as compiler.
cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang ..
To start Jobs on Cekon, use ugsubmit
/ uginfo
/ ugcancel
:
ugsubmit - Job Scheduling on Clusters .
To start a job with 16 processes and arguments -ex conv_diff/laplace.lua -numPreRef 3 -numRefs 9
, use
ugsubmit 16 --- ugshell -ex conv_diff/laplace.lua -numPreRef 3 -numRefs 9
For information about your jobs use uginfo
, to kill job with id 2359 use ugcancel 2359
Internally, ugsubmit will pass all the information to the appropriate Job Scheduler of the cluster (for cekon this is SLURM).
If you want to do job scheduling, start your job with
salloc -n 16 mpirun ugshell -ex conv_diff/laplace.lua -numPreRef 3 -numRefs 9
Please note that ony the salloc
parameter -n
reserves a number of processes / cores of a job, while -N
(capital N
) a number of nodes. See the salloc
manual page for further details.
To display information about jobs already running (located in the SLURM scheduling queue) use the squeue
command.
ddt
in the Unix shell to start the Debugger and fill in the job definition of the job to be debugged in the GUI (X11 based) — everything should be quite self-explanatory. Troubleshooting
If you encounter an error like the following when submitting your job on cekon you might have ignored the general remark in section General Notes
salloc -n 64 ./ugshell -ex conv_diff/laplace.lua -numPreRefs 3 -numRefs 7 salloc: Granted job allocation 14437 libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: cekon.gcsc.uni-frankfurt.de Local device: mthca0 --------------------------------------------------------------------------