Skip to content

TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning

License

Notifications You must be signed in to change notification settings

astra-sim/tacos

Repository files navigation

🌮 TACOS

[T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning

Latest Release

Latest Release

Project Status

branch macOS Ubuntu Format Coverage
main TBA TBA format TBA
develop TBA TBA format TBA

Overview

TACOS receives an arbitrary point-to-point network topology and autonomously synthesizes the topology-aware All-Reduce (Reduce-Scatter and All-Gather) collective communication algorithm. TACOS is powered by the Time-expanded Network (TEN) representation and Utilization Maximizing Link-Chunk Matching algorithm, thereby resulting in greater scalability to large networks.

Below figure summarizes the TACOS framework: TACOS Abstraction

Please find more information about TACOS in this paper.

  • William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Swati Gupta, and Tushar Krishna, "TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning," arXiv:2304.05301 [cs.DC]

Getting Started

We highly recommend using the provided Docker image as the runtime environment, since TACOS requires several dependencies including protobuf and boost. You can either download the Docker image from the Docker Hub, or you may build one locally using the provided script.

  1. Download the TACOS project.
git clone --recurse-submodules https://github.com/astra-sim/tacos.git
  1. Pull the TACOS Docker Image.
docker pull astrasim/tacos:latest

# Instead, you may consider building this Docker Image locally.
./utils/build_docker_image.sh
  1. Start the Docker Container (which becomes your TACOS runtime environment).
./utils/start_docker_container.sh
  1. Run TACOS with the provided script.
[docker] ./tacos.sh

If you'd like to analyze the codebase, src/main.cpp is the main entry point.

Contact Us

For any questions about TACOS, please contact Will Won or Tushar Krishna. You may also find or open a GitHub Issue in this repository.

About

TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning

Resources

License

Stars

Watchers

Forks

Packages

No packages published