Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use device arena for the_fa_arena when activating GPU-aware MPI (AMRe…
…X-Codes#3362) ## Summary This change suggested by @WeiqunZhang points `the_fa_arena` to `The_Device_Arena` when activating GPU-aware MPI. This obviates the need for setting `the_arena_is_managed=0` to take advantage of GPU-aware MPI since it does not work well with managed memory. ## Additional background The motivation for this PR is that this was an long-pending change but the immediate trigger was finding that GPU-aware MPI can reduce communication times significantly but that currently needs setting `the_arena_is_managed=0`. Not setting this for GPU-aware MPI currently results in degraded performance. Past discussion on GPU-aware MPI: AMReX-Codes#2967 ## Preliminary performance test Running 100 steps on 8 GPUs over 2 Perlmutter A100 nodes with `Tests/GPU/CNS/Exec/Sod`, `amr.n_cell = 128^3` per GPU, `amr.max_grid_size = 128`, `amrex.use_profiler_syncs = 1` and setting optimal GPU affinities. ### Without `amrex.use_gpu_aware_mpi=1` ``` FabArray::ParallelCopy_nowait() 200 0.133 0.1779 0.2067 17.82% FabArray::ParallelCopy_finish() 200 0.07822 0.1193 0.1786 15.40% ``` ### With `amrex.use_gpu_aware_mpi=1` ``` FabArray::ParallelCopy_nowait() 200 0.05655 0.07633 0.1034 11.20% FabArray::ParallelCopy_finish() 200 0.03969 0.06087 0.09024 9.77% ``` Co-authored-by: Mukul Dave <[email protected]>
- Loading branch information