Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FLUX model components fail to utilize available GPU memory (6.8GB/16GB used) with t5xxl/vae/clip falling back to CPU #199

Open
WpythonW opened this issue Oct 30, 2024 · 1 comment
Labels
🐞 bug Something isn't working

Comments

@WpythonW
Copy link

Issue Description

I'm experiencing inefficient GPU utilization with FLUX model components. While the main FLUX model uses GPU (6.8GB VRAM), other components (t5xxl-q4_0, ae-fp16, clip_l-fp16) seem to run on CPU despite having 9.2GB of free GPU memory available.

Expected behavior: All model components should utilize available GPU memory for optimal performance.
Actual behavior:

  • Main FLUX model uses GPU (6.8GB VRAM)
  • t5xxl-q4_0, ae-fp16, and clip_l-fp16 components appear to run on CPU
  • Getting warnings: WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!

Steps to Reproduce

  1. Install Nexa SDK with CUDA support:
!CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
  1. Run the following code:
from nexa.gguf import NexaImageInference
model_path = "FLUX.1-schnell:q4_0"
inference = NexaImageInference(
    model_path=model_path,
    wtype="q4_0",
    num_inference_steps=5,
    width=1024,
    height=1024,
    guidance_scale=1.5,
    random_seed=42
)
img = inference.txt2img("A sunset over a mountain range")

OS

Linux (Kaggle environment)

Python Version

Version: 3.10

Nexa SDK Version

nexaai-0.0.9.0

GPU (if using one)

NVIDIA Tesla P100 (16GB VRAM)

@WpythonW WpythonW added the 🐞 bug Something isn't working label Oct 30, 2024
@WpythonW WpythonW changed the title [BUG] <title> [BUG] FLUX model components fail to utilize available GPU memory (6.8GB/16GB used) with t5xxl/vae/clip falling back to CPU Oct 30, 2024
@zhiyuan8
Copy link
Contributor

zhiyuan8 commented Nov 2, 2024

Thank you for reaching out with your request!

We are actively addressing issues related to inefficient GPU utilization for FLUX model components. We are trying to offload more FLUX components to GPU and will fix this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants