branch | macOS | Ubuntu | Format | Coverage |
---|---|---|---|---|
main | TBA | TBA | TBA | |
develop | TBA | TBA | TBA |
TACOS receives an arbitrary point-to-point network topology and autonomously synthesizes the topology-aware All-Reduce (Reduce-Scatter and All-Gather) collective communication algorithm. TACOS is powered by the Time-expanded Network (TEN) representation and Utilization Maximizing Link-Chunk Matching algorithm, thereby resulting in greater scalability to large networks.
Below figure summarizes the TACOS framework:
Please find more information about TACOS in this paper.
- William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Swati Gupta, and Tushar Krishna, "TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning," arXiv:2304.05301 [cs.DC]
We highly recommend using the provided Docker image as the runtime environment, since TACOS requires several dependencies including protobuf and boost. You can either download the Docker image from the Docker Hub, or you may build one locally using the provided script.
- Download the TACOS project.
git clone --recurse-submodules https://github.com/astra-sim/tacos.git
- Pull the TACOS Docker Image.
docker pull astrasim/tacos:latest
# Instead, you may consider building this Docker Image locally.
./utils/build_docker_image.sh
- Start the Docker Container (which becomes your TACOS runtime environment).
./utils/start_docker_container.sh
- Run TACOS with the provided script.
[docker] ./tacos.sh
If you'd like to analyze the codebase, src/main.cpp
is the main entry point.
For any questions about TACOS, please contact Will Won or Tushar Krishna. You may also find or open a GitHub Issue in this repository.