From 502d25777892b013ae5b5ac0ef674910259e6845 Mon Sep 17 00:00:00 2001 From: nlg550 <38725499+nlg550@users.noreply.github.com> Date: Wed, 19 May 2021 14:19:19 -0300 Subject: [PATCH] Update README.md --- README.md | 81 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 43 insertions(+), 38 deletions(-) diff --git a/README.md b/README.md index 8f396f4..5dd9821 100644 --- a/README.md +++ b/README.md @@ -1,51 +1,51 @@ -# ZPIC - OmpSs-2 +# ZPIC -ZPIC is a 2D plasma simulator using the widely used PIC (particle-in-cell) algorithm. The program uses a finite difference model to simulate eletromagnetic plasma events. These version is an adaptation of ZPIC, so it can be executed in parallel using the OmpSs-2 programming model. The original serial code belongs to the [ZPIC suite](https://github.com/ricardo-fonseca/zpic). +[ZPIC](https://github.com/ricardo-fonseca/zpic) is a sequential 2D EM-PIC kinetic plasma simulator based on OSIRIS [1], implementing the same core algorithm and features. From ZPIC code, we developed several parallel versions to explore task-based programming models ([OmpSs-2](https://pm.bsc.es/ompss-2)) and emerging platforms (GPUs with [OpenACC](https://www.openacc.org/)). -## Features +## Parallel versions ### OmpSs: -- Rows decomposition -- Parallelism based on tasks and data dependencies (OmpSs-2) -- No taskwait (or synchronism) between iterations of the simulation (OmpSs-2) +- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis) +- All simulation steps defined as tasks +- Tasks are synchronized exclusively by data dependencies +- Fully asynchronous execution +- Local buffers (one per region) + parallel reduction for solving data races in the current deposition -### OmpSs@OpenAcc (under development): -- Rows decomposition -- Parallelism based on tasks and data dependencies (OmpSs-2) -- No taskwait (or synchronism) between iterations of the simulation (OmpSs-2) -- The OpenAcc kernels are integrated into OmpSs tasks -- Hybrid structure for the particles (Structure of Arrays (SoA) for GPU and Array of Structure (AoS) for CPU) -- Manual allocation of regions to be execute in the GPU (based on a percentage of the total number of regions) -- Bucket Sort every 15 iterations - -### OpenAcc: -- Based on the serial version -- All computation are done in the GPU -- Particles use a Structure of Array (SoA) to improve GPU performance -- Bucket Sort every 15 iterations +### OpenACC: +- Target architecture: NVIDIA GPUs +- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis). Each region is further divided into tiles (16x16 cells). +- Particles: Structure of Arrays (SoA) for coalesced memory accesses +- Highly optimized particle advance (shared memory usage, atomic operations with infrequent memory conflicts, etc.) +- Highly optimized bucket sort based on [2, 3] +- NVIDIA Unified Memory + Explicit memory management for critical sections +- Support for multi-GPUs systems (OpenMP as management layer: launching kernels, synchronizing devices, etc.) -## Plasma Experiments / Input -In the same way of the original code, the simulation paramenters are set in a .c file in input folder that are later included in the main.c +### OmpSs + OpenACC: +- Target architecture: NVIDIA GPUs +- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis). Each region is further divided into tiles (16x16 cells). +- Particles: Structure of Arrays (SoA) for coalesced memory accesses +- Highly optimized particle advance (shared memory usage, atomic operations with infrequent memory conflicts, etc.) +- Highly optimized bucket sort based on [2, 3] +- NVIDIA Unified Memory + Explicit memory management for critical sections +- Support for multi-GPUs systems +- OpenACC kernels incorporated as OmpSs tasks +- Asynchronous queues/streams for kernel overlapping +- Fully asynchronous execution +- Variant: Manual - Manual management of asynchronous queues and tasks (instead of entrusting this function to the NANOS6 runtime) -Two widely known plasma experiments - LWFA and Weibel Instability - are already included. Each experiment have a smaller and a larger variant. +## Plasma Experiments / Input +Please check for the [ZPIC documentation](https://github.com/ricardo-fonseca/zpic/blob/master/doc/Documentation.md) for more information for setting up the simulation parameters. Included experiments: LWFA and Weibel Instability. For organization purpose, the simulation parameters are included on the file name with the following naming scheme: `experiment type - number of time steps - number of particles per species - grid size x - grid size y` ## Output -The same file used to set the parameters of the simulation defines the ouput files and the frequency of the output (in terms of simulation iterations). -### ZDF Format -Like the original ZPIC, both serial and pure OmpSs-2 versions supports the ZDF file format. For more information, please visit the [ZDF repository](https://github.com/ricardo-fonseca/zpic/tree/master/zdf). +Like the original ZPIC, all versions output the simulation results in the ZDF format. For more information, please visit the [ZDF repository](https://github.com/ricardo-fonseca/zpic/tree/master/zdf). -In the future, all the versions will have support for this file format. +## Compilation and Execution -### CSV Format -Besides the ZDF files, ZPIC can produce .csv files (delimiter = ";") for the following parameters: -- Charge map (for each particle type) -- Eletric field magnitude -- Magnetic field magnitude -- EM fields' energy -- Particles' energy +### Compilation requirements: +- [Nanos6 Runtime](https://github.com/bsc-pm/nanos6) +- [Mercurium Compiler](https://github.com/bsc-pm/mcxx) -## Compilation and Execution ### OmpSs: ``` make @@ -62,6 +62,11 @@ make make ./zpic ``` -### Compilation requirements: -- [Nanos6 Runtime](https://github.com/bsc-pm/nanos6) -- [Mercurium Compiler](https://github.com/bsc-pm/mcxx) + + +## References + +[1] R. A. Fonseca et al., ‘OSIRIS: A Three-Dimensional, Fully Relativistic Particle in Cell Code for Modeling Plasma Based Accelerators’, in Computational Science — ICCS 2002, Berlin, Heidelberg, 2002, vol. 2331, pp. 342–351. doi: 10.1007/3-540-47789-6_36. +[2]A. Jocksch, F. Hariri, T. M. Tran, S. Brunner, C. Gheller, and L. Villard, ‘A bucket sort algorithm for the particle-in-cell method on manycore architectures’, in Parallel Processing and Applied Mathematics, 2016, pp. 43–52. doi: 10.1007/978-3-319-32149-3_5. +[3]F. Hariri et al., ‘A portable platform for accelerated PIC codes and its application to GPUs using OpenACC’, Computer Physics Communications, vol. 207, pp. 69–82, Oct. 2016, doi: 10.1016/j.cpc.2016.05.008. +