|
ug4
|
Then, to get "clean" timing measurements, you should do a release build, i.e.
(Obviously the above configurations can be performed by only one cmake command.)
Timing measurements are only useful at points where processes are synchronised, e.g. after computing of global norms, after performing pcl::Synchronize(), pcl::AllProcsTrue() ...
For weak scalability e.g. of GMG : Check if the number of iterations is constant over all problem sizes.
The approach of (re)distributing the grid to all MPI processes involved in a simulation run in a hierarchical fashion turned out to be essential for a good performance of large jobs (running with >= 1024 PE) on JuGene (see Working with ug4 on JuGene).
-numPreRefs (as usual): level where the grid is distributed the first time. -numRefs (as usual): toplevel of the grid hierarchy. -hRedistFirstLevel (default -1): first level where grid is redistributed (default -1, i.e, hierarchical redistribution is deactivated). -hRedistStepSize (default 1): Specifies after how much further refinements the grid will be redistributed again. -hRedistNewProcsPerStep (default: 2dim): Number of MPI processes ("target procs") in a redistribution step to which each processor who already has received its part of the grid redistributes it. -hRedistMaxSteps (default: 1000000000; not used in the example above): Limits the number of redistribution steps (to avoid useless redistributions involving only a few processes at the "end of the
hierarchy"). numPreRefs < hRedistFirstLevel < numRefs. Sketch of the algorithm:
numPreRef times => toplevel of the grid hierarchy is now level numPreRef. k MPI processes (k will be explained below). hRedistFirstLevel is reached. hRedistNumProcsPerStep MPI processes for every process which already has received a part of the grid. hRedistStepSize times:numRefs refinement steps, and also the number of redistribution steps controlled by -hRedistMaxSteps are not yet reached, go to 4. (else: finished). Now all MPI processes of the simulation run have their part of the grid. To make things clear:
numPreRefs refinement steps the grid will be distributed, and hRedistFirstLevel + i * hRedistStepSizenumPreRefs + hRedistFirstLevel + i * hRedistStepSize < numRefs; 0 <= i < hRedistMaxSteps),So, all parameters with name part "Redist" refer to the redistribution of an already distributed grid.
The number of processes k of the first distribution step is determined by the (total) number of MPI processes, numProcs, on one side, and the other redistribution parameters on the other side, starting "from top" (i.e. top most redistribution level) "to bottom" (first distribution step):
numProcs / hRedistNumProcsPerStep is the number of target procs to which the grid is distributed in the second last redistribution step,
numProcs / hRedistNumProcsPerStep / hRedistNumProcsPerStep the number of target procs in the third last redistribution step (or the first distribution step, if only one redistribution step is performed) etc.
parameter_test.lua, e.g. ddu.PrintSteps()) as the one that actually carries out the (re)distribution (ddu.RefineAndDistributeDomain(); cf. domain_distribution_util.lua). Please note that hierarchical redistribution is not compatible with "grid
distribution type" (distributionType) "grid2d" (see Mapping of MPI processes). In the moment (march 2012) also grid distribution type "metis" is unsupported.
See also e.g. ll_scale_gmg.x (in scripts/shell/) for usage examples (specifically to JuGene, but also in general).
"Topology aware mapping" of MPI processes to nodes / cores with respect to the network topology of the parallel machine on which a parallel job is run might be important one day.
-distType: Available values: "grid2d", "bisect", "metis" apps/scaling_tests/modular_scalability_test.lua. apps/d3f/elder_scalability_test.lua. scripts/tools/scaling_analyzer.lua. -ugshell parameter -logtofile. This is of course not necessary if logfiles are automatically created by the resource manager (e.g., on General Information about JuGene). inFiles of the analyzer script enter the names of the logfiles of all runs which profiling results should be analysed (edit a local copy). ugshell is executable on the machine used for this analysis (which is not the case if you are working on a login node of e.g. JuGene), one can also execute ugshell -ex jugene/scaling_analyzer.lua (adapt file pathes in inFiles relative to ugshell). util.printStats(stats) functionality, e.g. in apps/scaling_tests/modular_scalability_test.lua.