Pulse · deepspeedai/DeepSpeed

February 27, 2025 – March 6, 2025

22 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Training multiple models
#7018 commented on Mar 6, 2025 • 10 new comments
Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs
#6964 commented on Mar 5, 2025 • 2 new comments
Unpin once transformers latest is fixed
#7088 commented on Mar 3, 2025 • 0 new comments
Update Domino for Llama3
#7084 commented on Mar 5, 2025 • 0 new comments
Conditionally quote env vars
#7071 commented on Mar 5, 2025 • 0 new comments
Fix, pipeline model with moe cause error when send grad
#7055 commented on Mar 5, 2025 • 0 new comments
Enable ZeRO set/get APIs for NVMe offload
#7046 commented on Mar 5, 2025 • 0 new comments
Enable python 3.11 and 3.12 tests
#7007 commented on Mar 5, 2025 • 0 new comments
Enable torch.autocast with ZeRO
#6993 commented on Mar 6, 2025 • 0 new comments
Improve overflow handling in ZeRO
#6976 commented on Mar 4, 2025 • 0 new comments
Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models
#6553 commented on Feb 28, 2025 • 0 new comments
support autoTP with weight only quantization in DS inference path
#4750 commented on Mar 5, 2025 • 0 new comments
Getting requirements to build wheel: finished with status 'error'
#7043 commented on Mar 5, 2025 • 0 new comments
AssertionError: no sync context manager is incompatible with gradientpartitioning logic of ZeRo stage 3
#6793 commented on Mar 4, 2025 • 0 new comments
Deepspeed Inference not working on llama when input has padding and using kernel injection
#3960 commented on Mar 4, 2025 • 0 new comments
[REQUEST] Runable solution of RTX 5090 GPU + Linux Driver version + Pytorch version + Deepspeed version for LLM finetuning?
#7042 commented on Mar 3, 2025 • 0 new comments
[BUG] Deepspeed does not update the model when using "Qwen/Qwen2.5-3B" and is fine with ""Qwen/Qwen2.5-1.%B""
#7077 commented on Mar 3, 2025 • 0 new comments
Dynamic/variable batch size support
#1051 commented on Mar 3, 2025 • 0 new comments
[BUG] DS zero stage 1 or 2 communication uses reduce-scatter instead of All-reduce
#7059 commented on Mar 1, 2025 • 0 new comments
Suspected memory leak during zero3 training. oom eventually after several checkpoint
#3582 commented on Mar 1, 2025 • 0 new comments
Ascend 910B: attributeerror: 'deepspeedcpuadam' object has no attribute 'ds_opt_adam'
#7061 commented on Feb 28, 2025 • 0 new comments
[BUG] deepspeed zero2 training hangon and timeout after a fixed step
#7044 commented on Feb 28, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

February 27, 2025 – March 6, 2025

Overview

Could not load contribution data

8 Pull requests merged by 5 people

5 Pull requests opened by 5 people

3 Issues closed by 3 people

8 Issues opened by 8 people

22 Unresolved conversations

Insights: deepspeedai/DeepSpeed

February 27, 2025 – March 6, 2025

Overview

Could not load contribution data

8 Pull requests merged by 5 people

5 Pull requests opened by 5 people

3 Issues closed by 3 people

8 Issues opened by 8 people

22 Unresolved conversations