ug4
|
Then, to get "clean" timing measurements, you should do a release build, i.e.
(Obviously the above configurations can be performed by only one cmake
command.)
Timing measurements are only useful at points where processes are synchronised, e.g. after computing of global norms, after performing pcl::Synchronize()
, pcl::AllProcsTrue()
...
For weak scalability e.g. of GMG : Check if the number of iterations is constant over all problem sizes.
The approach of (re)distributing the grid to all MPI processes involved in a simulation run in a hierarchical fashion turned out to be essential for a good performance of large jobs (running with >= 1024 PE) on JuGene (see Working with ug4 on JuGene).
-numPreRefs
(as usual): level where the grid is distributed the first time. -numRefs
(as usual): toplevel of the grid hierarchy. -hRedistFirstLevel
(default -1): first level where grid is redistributed (default -1, i.e, hierarchical redistribution is deactivated). -hRedistStepSize
(default 1): Specifies after how much further refinements the grid will be redistributed again. -hRedistNewProcsPerStep
(default: 2dim): Number of MPI processes ("target procs") in a redistribution step to which each processor who already has received its part of the grid redistributes it. -hRedistMaxSteps
(default: 1000000000; not used in the example above): Limits the number of redistribution steps (to avoid useless redistributions involving only a few processes at the "end of the
hierarchy"). numPreRefs < hRedistFirstLevel < numRefs
. Sketch of the algorithm:
numPreRef
times => toplevel of the grid hierarchy is now level numPreRef
. k
MPI processes (k
will be explained below). hRedistFirstLevel
is reached. hRedistNumProcsPerStep
MPI processes for every process which already has received a part of the grid. hRedistStepSize
times:numRefs
refinement steps, and also the number of redistribution steps controlled by -hRedistMaxSteps
are not yet reached, go to 4. (else: finished). Now all MPI processes of the simulation run have their part of the grid. To make things clear:
numPreRefs
refinement steps the grid will be distributed, and hRedistFirstLevel + i * hRedistStepSize
numPreRefs + hRedistFirstLevel + i * hRedistStepSize < numRefs
; 0 <= i < hRedistMaxSteps
),So, all parameters with name part "Redist"
refer to the redistribution of an already distributed grid.
The number of processes k
of the first distribution step is determined by the (total) number of MPI processes, numProcs
, on one side, and the other redistribution parameters on the other side, starting "from top" (i.e. top most redistribution level) "to bottom" (first distribution step):
numProcs / hRedistNumProcsPerStep
is the number of target procs to which the grid is distributed in the second last redistribution step,
numProcs / hRedistNumProcsPerStep / hRedistNumProcsPerStep
the number of target procs in the third last redistribution step (or the first distribution step, if only one redistribution step is performed) etc.
parameter_test.lua
, e.g. ddu.PrintSteps()
) as the one that actually carries out the (re)distribution (ddu.RefineAndDistributeDomain()
; cf. domain_distribution_util.lua
). Please note that hierarchical redistribution is not compatible with "grid
distribution type" (distributionType
) "grid2d"
(see Mapping of MPI processes). In the moment (march 2012) also grid distribution type "metis"
is unsupported.
See also e.g. ll_scale_gmg.x
(in scripts/shell/
) for usage examples (specifically to JuGene, but also in general).
"Topology aware mapping" of MPI processes to nodes / cores with respect to the network topology of the parallel machine on which a parallel job is run might be important one day.
-distType
: Available values: "grid2d", "bisect", "metis" apps/scaling_tests/modular_scalability_test.lua
. apps/d3f/elder_scalability_test.lua
. scripts/tools/scaling_analyzer.lua
. -ugshell
parameter -logtofile
. This is of course not necessary if logfiles are automatically created by the resource manager (e.g., on General Information about JuGene). inFiles
of the analyzer script enter the names of the logfiles of all runs which profiling results should be analysed (edit a local copy). ugshell
is executable on the machine used for this analysis (which is not the case if you are working on a login node of e.g. JuGene), one can also execute ugshell -ex jugene/scaling_analyzer.lua
(adapt file pathes in inFiles
relative to ugshell
). util.printStats(stats)
functionality, e.g. in apps/scaling_tests/modular_scalability_test.lua
.