Skip to content

Shubhanshu-1507/Performance-analysis-of-multi-strategy-parallelization-techniques-in-large-language-models-

Repository files navigation

LLM Parallelism Explorer PoC

LLM Parallelism Explorer PoC is a cutting-edge research tool designed to optimize parallelism strategies for large language models, with a particular focus on Mixture of Experts (MoE) architectures. This proof-of-concept project performs a comprehensive search across various parallelism configurations to estimate memory usage and identify optimal setups for efficient training on distributed systems. It also uses facebookresearch/hydra for easy configuration management.

Features

  • Supports advanced parallelism techniques:
    • Tensor Parallelism (TP)
    • Pipeline Parallelism (PP)
    • Expert Parallelism (EP)
    • Context Parallelism (CP)
    • Data Parallelism (DP)
  • Precise memory estimation for model parameters, optimizer states, and activations
  • Flexible search space for parallelism configurations
  • Multiple data parallel sharding strategies
  • CSV output for in-depth analysis of results
  • Hydra-powered configuration management

Installation

Install the required dependencies with:

pip install -r requirements.txt

Usage

Use a specific configuration file and customize the GPU range:

python main.py \
    --config-name llama3.1-405b.yaml \
    +ngpus_range="[8, 128, 1024, 10240]"

Configuration

Leverage Hydra for easy configuration management. Modify these parameters in your YAML config file:

  • Model architecture details (e.g., hidden size, number of layers)
  • MoE-specific settings (e.g., number of experts, expert frequency)
  • Training parameters (e.g., global batch size, data types)
  • Parallelism search ranges (e.g., TP, PP, EP ranges)

Output

The script generates memory_estimation.csv with comprehensive memory estimations for each valid parallelism configuration, including:

  • Total memory usage
  • Model and optimizer states memory
  • Activations memory
  • Expert and non-expert parameters
  • Component-specific activation memory
image

Credits

This project builds upon state-of-the-art parallelism techniques from recent research:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages