Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nlg550 authored May 19, 2021
1 parent acd2108 commit 502d257
Showing 1 changed file with 43 additions and 38 deletions.
81 changes: 43 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,51 @@
# ZPIC - OmpSs-2
# ZPIC

ZPIC is a 2D plasma simulator using the widely used PIC (particle-in-cell) algorithm. The program uses a finite difference model to simulate eletromagnetic plasma events. These version is an adaptation of ZPIC, so it can be executed in parallel using the OmpSs-2 programming model. The original serial code belongs to the [ZPIC suite](https://github.com/ricardo-fonseca/zpic).
[ZPIC](https://github.com/ricardo-fonseca/zpic) is a sequential 2D EM-PIC kinetic plasma simulator based on OSIRIS [1], implementing the same core algorithm and features. From ZPIC code, we developed several parallel versions to explore task-based programming models ([OmpSs-2](https://pm.bsc.es/ompss-2)) and emerging platforms (GPUs with [OpenACC](https://www.openacc.org/)).

## Features
## Parallel versions

### OmpSs:
- Rows decomposition
- Parallelism based on tasks and data dependencies (OmpSs-2)
- No taskwait (or synchronism) between iterations of the simulation (OmpSs-2)
- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis)
- All simulation steps defined as tasks
- Tasks are synchronized exclusively by data dependencies
- Fully asynchronous execution
- Local buffers (one per region) + parallel reduction for solving data races in the current deposition

### OmpSs@OpenAcc (under development):
- Rows decomposition
- Parallelism based on tasks and data dependencies (OmpSs-2)
- No taskwait (or synchronism) between iterations of the simulation (OmpSs-2)
- The OpenAcc kernels are integrated into OmpSs tasks
- Hybrid structure for the particles (Structure of Arrays (SoA) for GPU and Array of Structure (AoS) for CPU)
- Manual allocation of regions to be execute in the GPU (based on a percentage of the total number of regions)
- Bucket Sort every 15 iterations

### OpenAcc:
- Based on the serial version
- All computation are done in the GPU
- Particles use a Structure of Array (SoA) to improve GPU performance
- Bucket Sort every 15 iterations
### OpenACC:
- Target architecture: NVIDIA GPUs
- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis). Each region is further divided into tiles (16x16 cells).
- Particles: Structure of Arrays (SoA) for coalesced memory accesses
- Highly optimized particle advance (shared memory usage, atomic operations with infrequent memory conflicts, etc.)
- Highly optimized bucket sort based on [2, 3]
- NVIDIA Unified Memory + Explicit memory management for critical sections
- Support for multi-GPUs systems (OpenMP as management layer: launching kernels, synchronizing devices, etc.)

## Plasma Experiments / Input
In the same way of the original code, the simulation paramenters are set in a .c file in input folder that are later included in the main.c
### OmpSs + OpenACC:
- Target architecture: NVIDIA GPUs
- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis). Each region is further divided into tiles (16x16 cells).
- Particles: Structure of Arrays (SoA) for coalesced memory accesses
- Highly optimized particle advance (shared memory usage, atomic operations with infrequent memory conflicts, etc.)
- Highly optimized bucket sort based on [2, 3]
- NVIDIA Unified Memory + Explicit memory management for critical sections
- Support for multi-GPUs systems
- OpenACC kernels incorporated as OmpSs tasks
- Asynchronous queues/streams for kernel overlapping
- Fully asynchronous execution
- Variant: Manual - Manual management of asynchronous queues and tasks (instead of entrusting this function to the NANOS6 runtime)

Two widely known plasma experiments - LWFA and Weibel Instability - are already included. Each experiment have a smaller and a larger variant.
## Plasma Experiments / Input
Please check for the [ZPIC documentation](https://github.com/ricardo-fonseca/zpic/blob/master/doc/Documentation.md) for more information for setting up the simulation parameters. Included experiments: LWFA and Weibel Instability. For organization purpose, the simulation parameters are included on the file name with the following naming scheme: `experiment type - number of time steps - number of particles per species - grid size x - grid size y`

## Output
The same file used to set the parameters of the simulation defines the ouput files and the frequency of the output (in terms of simulation iterations).

### ZDF Format
Like the original ZPIC, both serial and pure OmpSs-2 versions supports the ZDF file format. For more information, please visit the [ZDF repository](https://github.com/ricardo-fonseca/zpic/tree/master/zdf).
Like the original ZPIC, all versions output the simulation results in the ZDF format. For more information, please visit the [ZDF repository](https://github.com/ricardo-fonseca/zpic/tree/master/zdf).

In the future, all the versions will have support for this file format.
## Compilation and Execution

### CSV Format
Besides the ZDF files, ZPIC can produce .csv files (delimiter = ";") for the following parameters:
- Charge map (for each particle type)
- Eletric field magnitude
- Magnetic field magnitude
- EM fields' energy
- Particles' energy
### Compilation requirements:
- [Nanos6 Runtime](https://github.com/bsc-pm/nanos6)
- [Mercurium Compiler](https://github.com/bsc-pm/mcxx)

## Compilation and Execution
### OmpSs:
```
make
Expand All @@ -62,6 +62,11 @@ make
make
./zpic <Number of Regions> <Percentage of regions dedicated to GPU> <Number of GPU regions>
```
### Compilation requirements:
- [Nanos6 Runtime](https://github.com/bsc-pm/nanos6)
- [Mercurium Compiler](https://github.com/bsc-pm/mcxx)


## References

[1] R. A. Fonseca et al., ‘OSIRIS: A Three-Dimensional, Fully Relativistic Particle in Cell Code for Modeling Plasma Based Accelerators’, in Computational Science — ICCS 2002, Berlin, Heidelberg, 2002, vol. 2331, pp. 342–351. doi: 10.1007/3-540-47789-6_36.
[2]A. Jocksch, F. Hariri, T. M. Tran, S. Brunner, C. Gheller, and L. Villard, ‘A bucket sort algorithm for the particle-in-cell method on manycore architectures’, in Parallel Processing and Applied Mathematics, 2016, pp. 43–52. doi: 10.1007/978-3-319-32149-3_5.
[3]F. Hariri et al., ‘A portable platform for accelerated PIC codes and its application to GPUs using OpenACC’, Computer Physics Communications, vol. 207, pp. 69–82, Oct. 2016, doi: 10.1016/j.cpc.2016.05.008.

0 comments on commit 502d257

Please sign in to comment.