ug4
Cekon

Cekon is the smaller in-house cluster of the G-CSC. It consists of 23 compute nodes with 4 cores per node, that is 92 computing cores.

  • Configuration: Normally a run of CMake with "standard" parameters should do the job.

    There are some problems with the pre-installed GCC 4.1.2, see GCC 4.1.2 . You can also use icc as compiler.

  • You can also use llvm/clang as compiler:
    cmake -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang ..
    
    Note
    Clang only produces debug information in the Dwarf4-format, which is not supported by gdb on this machine. Use ddt instead.

  • To start Jobs on Cekon, use ugsubmit / uginfo / ugcancel :
    ugsubmit - Job Scheduling on Clusters .

    To start a job with 16 processes and arguments -ex conv_diff/laplace.lua -numPreRef 3 -numRefs 9, use

    ugsubmit 16 --- ugshell -ex conv_diff/laplace.lua -numPreRef 3 -numRefs 9 
    

    For information about your jobs use uginfo, to kill job with id 2359 use ugcancel 2359 Internally, ugsubmit will pass all the information to the appropriate Job Scheduler of the cluster (for cekon this is SLURM).

  • If you want to do job scheduling, start your job with

    salloc -n 16 mpirun ugshell -ex conv_diff/laplace.lua -numPreRef 3 -numRefs 9
    Note
    Be aware that this will only run on Cekon, where ugsubmit-scripts are as easy to use and will run on a number of clusters.

    Please note that ony the salloc parameter -n reserves a number of processes / cores of a job, while -N (capital N) a number of nodes. See the salloc manual page for further details.
    To display information about jobs already running (located in the SLURM scheduling queue) use the squeue command.

  • Debugging : If DDT is installed, simply type ddt in the Unix shell to start the Debugger and fill in the job definition of the job to be debugged in the GUI (X11 based) — everything should be quite self-explanatory.

Troubleshooting

  • If you encounter an error like the following when submitting your job on cekon you might have ignored the general remark in section General Notes

    salloc -n 64 ./ugshell -ex conv_diff/laplace.lua -numPreRefs 3 -numRefs 7
    salloc: Granted job allocation 14437
    libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
        This will severely limit memory registrations.
    --------------------------------------------------------------------------
    WARNING: There was an error initializing an OpenFabrics device.
    
      Local host:   cekon.gcsc.uni-frankfurt.de
      Local device: mthca0
    --------------------------------------------------------------------------