Dynamic Load Balancing Library

DLB is a dynamic library designed to speed up HPC hybrid applications (i.e., two levels of parallelism) by improving the load balance of the outer level of parallelism (e.g., MPI) by dynamically redistributing the computational resources at the inner level of parallelism (e.g., OpenMP). at run time.

This dynamism allows DLB to react to different sources of imbalance: Algorithm, data, hardware architecture and resource availability among others.

Lend When Idle

LeWI (Lend When Idle) is the algorithm used to redistribute the computational resources that are not being used from one process to another process inside the same shared memory node in order to speed up its execution.

Dynamic Resource Ownership Manager

DROM (Dynamic Resource Ownership Manager) is the algorithm used to manage the CPU affinity of a process running a shared memory programming model (e.g., OpenMP).

Tracking Application Live Performance

TALP (Tracking Application Live Performance) is the module used to gather performance data from the application. The data can be obtained during the execution or as a report at the end.

Installation

Build requirements
- A supported platform running GNU/Linux (i386, x86-64, ARM, PowerPC or IA64)
- C compiler
- Python 2.4 or higher (Python 3 recommended)
- GNU Autotools, only needed if you want to build from the repository.
Download the DLB source code:
1. Either from our website: DLB Downloads.
2. Or from a git repository
  - Clone DLB repository
    - From GitHub:
      git clone https://github.com/bsc-pm/dlb.git
    - From our internal GitLab repository (BSC users only):
      git clone https://pm.bsc.es/gitlab/dlb/dlb.git
  - Or download from GitHub releases
  - Bootstrap autotools:
```
cd dlb
./bootstrap
```
Run configure. Optionally, check the configure flags by running ./configure -h to see detailed information about some features. MPI support must be enabled with --with-mpi and, optionally, an argument telling where MPI can be located.
```
./configure --prefix=<DLB_PREFIX> [<configure-flags>]
```
Build and install
```
make
make install
```
Optionally, add the installed bin directory to your PATH
```
export PATH=<DLB_PREFIX>/bin:$PATH
```

For more information about the autotools installation process, please refer to INSTALL

Basic usage

Choose between linking or preloading the binary with the DLB shared library libdlb.so and configure DLB using the environment variable DLB_ARGS.

Example 1: Share CPUs between MPI processes

# Link application with DLB
mpicc -o myapp myapp.c -L<DLB_PREFIX>/lib -ldlb -Wl,-rpath,<DLB_PREFIX>/lib

# Launch MPI as usual, each process will dynamically adjust the number of threads
export DLB_ARGS="--lewi"
mpirun -n <np> ./myapp

Example 2: Share CPUs between MPI processes with advanced affinity control through OMPT.

# Link application with an OMPT capable OpenMP runtime
OMPI_CC=clang mpicc -o myapp myapp.c -fopenmp

# Launch application:
#   * Set environment variables
#   * DLB library is preloaded
#   * Run application with binary dlb_run
export DLB_ARGS="--lewi --ompt"
export OMP_WAIT_POLICY="passive"
preload="<DLB_PREFIX>/lib/libdlb.so"
mpirun -n <np> <DLB_PREFIX>/bin/dlb_run env LD_PRELOAD="$preload" ./myapp

Example 3: Manually reduce assigned CPUs to an OpenMP process.

# Launch an application preloading DLB
export OMP_NUM_THREADS=4
export DLB_ARGS="--drom"
export LD_PRELOAD=<DLB_PREFIX>/lib/libdlb.so
taskset -c 0-3 ./myapp &

# Reduce CPU binding to [1,3] and threads to 2
myapp_pid=$!
dlb_taskset -p $myapp_pid -c 1,3

Example 4: Get a TALP summary report at the end of an execution

export DLB_ARGS="--talp --talp-summary=pop-metrics"
PRELOAD=<DLB_PREFIX>/lib/libdlb_mpi.so
mpirun <opts> env LD_PRELOAD="$PRELOAD" ./app

User Guide

Please refer to our DLB User Guide for a more complete documentation.

Citing DLB

If you want to cite DLB, you can use the following publications:

Hints to improve automatic load balancing with LeWI for hybrid applications at Journal of Parallel and Distributed Computing 2014. (bibtex) (pdf)
LeWI: A Runtime Balancing Algorithm for Nested Parallelism at International Conference in Parallel Processing 2009, ICPP09. (bibtex) (pdf)
DROM: Enabling Efficient and Effortless Malleability for Resource Managers at 47th International Conference on Parallel Processing, August 2018. (bibtex) (pdf)
TALP: A Lightweight Tool to Unveil Parallel Efficiency of Large-scale Executions at Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy, June 2021 (bibtex) (pdf)

Contact Information

For questions, suggestions and bug reports, you can contact us via e-mail at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1,536 Commits
cmake		cmake
doc		doc
m4		m4
scripts		scripts
src		src
talp-pages		talp-pages
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.readthedocs.yaml		.readthedocs.yaml
AUTHORS		AUTHORS
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
COPYING.LESSER		COPYING.LESSER
INSTALL		INSTALL
Makefile.am		Makefile.am
README.md		README.md
bootstrap		bootstrap
configure.ac		configure.ac
meson.build		meson.build
meson_options.txt		meson_options.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Dynamic Load Balancing Library

Lend When Idle

Dynamic Resource Ownership Manager

Tracking Application Live Performance

Installation

Basic usage

User Guide

Citing DLB

Contact Information

About

Licenses found

Releases 18

Packages

Contributors 4

Languages

License

Licenses found

bsc-pm/dlb

Folders and files

Latest commit

History

Repository files navigation

Dynamic Load Balancing Library

Lend When Idle

Dynamic Resource Ownership Manager

Tracking Application Live Performance

Installation

Basic usage

User Guide

Citing DLB

Contact Information

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 18

Packages 0

Contributors 4

Languages

Packages