From 502d25777892b013ae5b5ac0ef674910259e6845 Mon Sep 17 00:00:00 2001
From: nlg550 <38725499+nlg550@users.noreply.github.com>
Date: Wed, 19 May 2021 14:19:19 -0300
Subject: [PATCH] Update README.md

---
 README.md | 81 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 43 insertions(+), 38 deletions(-)

diff --git a/README.md b/README.md
index 8f396f4..5dd9821 100644
--- a/README.md
+++ b/README.md
@@ -1,51 +1,51 @@
-# ZPIC - OmpSs-2 
+# ZPIC
 
-ZPIC is a 2D plasma simulator using the widely used PIC (particle-in-cell) algorithm. The program uses a finite difference model to simulate eletromagnetic plasma events. These version is an adaptation of ZPIC, so it can be executed in parallel using the OmpSs-2 programming model. The original serial code belongs to the [ZPIC suite](https://github.com/ricardo-fonseca/zpic).  
+[ZPIC](https://github.com/ricardo-fonseca/zpic) is a sequential 2D EM-PIC kinetic plasma simulator based on OSIRIS [1], implementing the same core algorithm and features. From ZPIC code, we developed several parallel versions to explore task-based programming models ([OmpSs-2](https://pm.bsc.es/ompss-2)) and emerging platforms (GPUs with [OpenACC](https://www.openacc.org/)). 
 
-## Features 
+## Parallel versions
 
 ### OmpSs:
-- Rows decomposition
-- Parallelism based on tasks and data dependencies (OmpSs-2)
-- No taskwait (or synchronism) between iterations of the simulation (OmpSs-2)
+- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis)
+- All simulation steps defined as tasks
+- Tasks are synchronized exclusively by data dependencies
+- Fully asynchronous execution
+- Local buffers (one per region) + parallel reduction for solving data races in the current deposition
 
-### OmpSs@OpenAcc (under development):
-- Rows decomposition
-- Parallelism based on tasks and data dependencies (OmpSs-2)
-- No taskwait (or synchronism) between iterations of the simulation (OmpSs-2)
-- The OpenAcc kernels are integrated into OmpSs tasks
-- Hybrid structure for the particles (Structure of Arrays (SoA) for GPU and Array of Structure (AoS) for CPU)
-- Manual allocation of regions to be execute in the GPU (based on a percentage of the total number of regions)
-- Bucket Sort every 15 iterations
-
-### OpenAcc:
-- Based on the serial version
-- All computation are done in the GPU
-- Particles use a Structure of Array (SoA) to improve GPU performance
-- Bucket Sort every 15 iterations
+### OpenACC:
+- Target architecture: NVIDIA GPUs
+- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis). Each region is further divided into tiles (16x16 cells). 
+- Particles: Structure of Arrays (SoA) for coalesced memory accesses
+- Highly optimized particle advance (shared memory usage, atomic operations with infrequent memory conflicts, etc.)
+- Highly optimized bucket sort based on [2, 3]
+- NVIDIA Unified Memory + Explicit memory management for critical sections
+- Support for multi-GPUs systems (OpenMP as management layer: launching kernels, synchronizing devices, etc.)
 
-## Plasma Experiments / Input
-In the same way of the original code, the simulation paramenters are set in a .c file in input folder that are later included in the main.c
+### OmpSs + OpenACC:
+- Target architecture: NVIDIA GPUs
+- Spatial row-wise decomposition (i.e., simulation split into regions alongside the y axis). Each region is further divided into tiles (16x16 cells). 
+- Particles: Structure of Arrays (SoA) for coalesced memory accesses
+- Highly optimized particle advance (shared memory usage, atomic operations with infrequent memory conflicts, etc.)
+- Highly optimized bucket sort based on [2, 3]
+- NVIDIA Unified Memory + Explicit memory management for critical sections
+- Support for multi-GPUs systems
+- OpenACC kernels incorporated as OmpSs tasks
+- Asynchronous queues/streams for kernel overlapping
+- Fully asynchronous execution
+- Variant: Manual - Manual management of asynchronous queues and tasks (instead of entrusting this function to the NANOS6 runtime)
 
-Two widely known plasma experiments - LWFA and Weibel Instability - are already included. Each experiment have a smaller and a larger variant.
+## Plasma Experiments / Input
+Please check for the [ZPIC documentation](https://github.com/ricardo-fonseca/zpic/blob/master/doc/Documentation.md) for more information for setting up the simulation parameters. Included experiments: LWFA and Weibel Instability. For organization purpose, the simulation parameters are included on the file name with the following naming scheme: `experiment type - number of time steps - number of particles per species - grid size x - grid size y`
 
 ## Output
-The same file used to set the parameters of the simulation defines the ouput files and the frequency of the output (in terms of simulation iterations).
 
-### ZDF Format
-Like the original ZPIC, both serial and pure OmpSs-2 versions supports the ZDF file format. For more information, please visit the [ZDF repository](https://github.com/ricardo-fonseca/zpic/tree/master/zdf).
+Like the original ZPIC, all versions output the simulation results in the ZDF format. For more information, please visit the [ZDF repository](https://github.com/ricardo-fonseca/zpic/tree/master/zdf).
 
-In the future, all the versions will have support for this file format.
+## Compilation and Execution
 
-### CSV Format
-Besides the ZDF files, ZPIC can produce .csv files (delimiter = ";") for the following parameters:
-- Charge map (for each particle type)
-- Eletric field magnitude
-- Magnetic field magnitude
-- EM fields' energy
-- Particles' energy
+### Compilation requirements:
+- [Nanos6 Runtime](https://github.com/bsc-pm/nanos6)
+- [Mercurium Compiler](https://github.com/bsc-pm/mcxx)
 
-## Compilation and Execution
 ### OmpSs:
 ```
 make
@@ -62,6 +62,11 @@ make
 make
 ./zpic <Number of Regions> <Percentage of regions dedicated to GPU> <Number of GPU regions>
 ```
-### Compilation requirements:
-- [Nanos6 Runtime](https://github.com/bsc-pm/nanos6)
-- [Mercurium Compiler](https://github.com/bsc-pm/mcxx)
+
+
+## References
+
+[1] R. A. Fonseca et al., ‘OSIRIS: A Three-Dimensional, Fully Relativistic Particle in Cell Code for Modeling Plasma Based Accelerators’, in Computational Science — ICCS 2002, Berlin, Heidelberg, 2002, vol. 2331, pp. 342–351. doi: 10.1007/3-540-47789-6_36.
+[2]A. Jocksch, F. Hariri, T. M. Tran, S. Brunner, C. Gheller, and L. Villard, ‘A bucket sort algorithm for the particle-in-cell method on manycore architectures’, in Parallel Processing and Applied Mathematics, 2016, pp. 43–52. doi: 10.1007/978-3-319-32149-3_5.
+[3]F. Hariri et al., ‘A portable platform for accelerated PIC codes and its application to GPUs using OpenACC’, Computer Physics Communications, vol. 207, pp. 69–82, Oct. 2016, doi: 10.1016/j.cpc.2016.05.008.
+