A3C is the asynchronous version of A2C, where gradients are computed on the workers directly after trajectory rollouts, and only then shipped to a central learner to accumulate these gradients on the central model. After the central model update, parameters are broadcast back to all workers. Similar to A2C, A3C scales to 16-32+ worker processes depending on the environment.
conda create -n rllib-a3c python=3.10
conda activate rllib-a3c
pip install -r requirements.txt
pip install -e '.[development]'