Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flux recipe for ci #12037

Open
wants to merge 106 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
0e3c818
Vae added and matched flux checkpoint
Victor49152 Sep 4, 2024
8c9c56f
Flux model added.
Victor49152 Sep 18, 2024
9a304dc
Copying FlowMatchEulerScheduler over
Victor49152 Sep 19, 2024
73c714d
WIP: Start to test the pipeline forward pass
Victor49152 Sep 24, 2024
f4d7747
Vae added and matched flux checkpoint
Victor49152 Sep 4, 2024
6e4de91
Inference pipeline runs with offloading function
Victor49152 Sep 27, 2024
2cb67f2
Start to test image generation
Victor49152 Oct 1, 2024
c18cf60
Decoding with VAE part has been verified. Still need to check the den…
Victor49152 Oct 2, 2024
072ce16
The inference pipeline is verified.
Victor49152 Oct 3, 2024
b4d281f
Add arg parsers and refactoring
Victor49152 Oct 3, 2024
7d27534
Tested on multi batch sizes and prompts.
Victor49152 Oct 4, 2024
6d2da09
Add headers
Victor49152 Oct 4, 2024
db43ec7
Apply isort and black reformatting
Victor49152 Oct 4, 2024
597a646
Renaming
Victor49152 Oct 4, 2024
d2bfbc3
Merge remote-tracking branch 'origin/mingyuanm/diffusion' into mingyu…
Victor49152 Oct 4, 2024
7894f2c
Merge branch 'refs/heads/main' into mingyuanm/diffusion
Victor49152 Oct 14, 2024
6fb7433
Move shceduler to sampler folder
Victor49152 Oct 14, 2024
f4cf498
Merging folders.
Victor49152 Oct 14, 2024
756b8ee
Apply isort and black reformatting
Victor49152 Oct 14, 2024
73e2099
Tested after path changing.
Victor49152 Oct 14, 2024
aec7a13
Apply isort and black reformatting
Victor49152 Oct 14, 2024
15db8ad
Move MMDIT block to NeMo
Victor49152 Oct 14, 2024
6801903
Apply isort and black reformatting
Victor49152 Oct 14, 2024
7d34b30
Add joint attention and single attention to NeMo
Victor49152 Oct 15, 2024
2bf20e1
Apply isort and black reformatting
Victor49152 Oct 15, 2024
d78c682
Joint attention updated
Victor49152 Oct 16, 2024
fbd6987
Apply isort and black reformatting
Victor49152 Oct 16, 2024
aa9df2a
Remove redundant importing
Victor49152 Oct 16, 2024
ae18bb6
Refactor to inherit megatron module
Victor49152 Oct 17, 2024
94b1a3d
Adding mockdata
Victor49152 Oct 21, 2024
15761a4
DDP training works
Victor49152 Oct 25, 2024
83456df
Added flux controlnet training components while not tested yet
Victor49152 Oct 30, 2024
e0de704
Flux training with DDP tested on 1 GPU
Victor49152 Nov 1, 2024
3b62be0
Flux and controlnet now could train on precached mode.
Victor49152 Nov 5, 2024
b8044db
Custom FSDP path added to megatron parallel.
Victor49152 Nov 12, 2024
a57162b
Bug fix
Victor49152 Nov 13, 2024
fe0f705
A hacky way to wrap frozen flux into FSDP to reproduce illegal memory…
Victor49152 Nov 13, 2024
0237c05
Typo
Victor49152 Nov 13, 2024
cb8cd6e
Bypass the no grad issue when no single layers exists
Victor49152 Nov 13, 2024
d48a60e
Merge branch 'refs/heads/main' into mingyuanm/flux_controlnet
Victor49152 Nov 14, 2024
e2fb592
A hacky way to wrap frozen flux into FSDP to reproduce illegal memory…
Victor49152 Nov 13, 2024
31b849c
Merge remote-tracking branch 'origin/mingyuanm/fsdp_debugging' into m…
Victor49152 Nov 14, 2024
0226a88
Let the flux model's dtype autocast before FSDP wrapping
shjwudp Nov 14, 2024
4ed3a6d
fix RuntimeError: "Output 0 of SliceBackward0 is a view and is being …
shjwudp Nov 14, 2024
d1a28bb
Add a wrapper to flux controlnet so they are all wrapped into FSDP au…
Victor49152 Nov 15, 2024
47ca7e5
Get rid of concat op in flux single transformer
Victor49152 Nov 20, 2024
3ff2c1b
Get rid of concat op in flux single transformer
Victor49152 Nov 20, 2024
de62607
Merge remote-tracking branch 'origin/mingyuanm/single_transformer_tp_…
Victor49152 Nov 20, 2024
8786981
single block attention.linear_proj.bias must not require grads after …
Victor49152 Nov 20, 2024
47fbedc
use cpu initialization to avoid OOM
Victor49152 Nov 25, 2024
8b051d4
Set up flux training script with tp
Victor49152 Dec 5, 2024
733f5fe
SDXL fid image generation script updated.
Victor49152 Dec 9, 2024
950b362
Mcore self attention API changed
Victor49152 Dec 10, 2024
229ece4
Add a dummy task encoder for raw image inputs
Victor49152 Dec 10, 2024
f8f31df
Support loading crudedataset via energon dataloader
Victor49152 Dec 12, 2024
389c14e
Default save last to True
Victor49152 Dec 12, 2024
0ddca3e
Add controlnet inference pipeline
Victor49152 Dec 13, 2024
615dbea
Add controlnet inference script
Victor49152 Dec 13, 2024
b5ea320
Image resize mode update
Victor49152 Dec 13, 2024
bdb8155
Remove unnecessary bias to avoid sharding issue.
Victor49152 Dec 13, 2024
78eed47
Handle MCore custom fsdp checkpoint load (#11621)
shjwudp Dec 17, 2024
f94f142
Checkpoint naming
Victor49152 Dec 17, 2024
31a6bae
Image logger WIP
Victor49152 Dec 17, 2024
ba0d84e
Image logger works fine
Victor49152 Dec 17, 2024
d78a329
save hint and output to image logger.
Victor49152 Dec 19, 2024
e50cbfd
Update flux controlnet training step
Victor49152 Dec 20, 2024
9c26721
Add model connector and try to load from dist ckpt but failed.
Victor49152 Dec 31, 2024
1026c0a
Renaming and refactoring submodel configs for nemo run compatibility
Victor49152 Jan 6, 2025
2dfab9e
Nemo run script works for basic testing recipe
Victor49152 Jan 6, 2025
0d874f5
Added tp2 training factory
Victor49152 Jan 7, 2025
ec3154e
Added convergence recipe
Victor49152 Jan 8, 2025
0aab7dc
Added flux training scripts
Victor49152 Jan 8, 2025
5053c41
Inference script tested
Victor49152 Jan 8, 2025
0d73d69
Controlnet inference script tested
Victor49152 Jan 8, 2025
b016614
Moving scripts to correct folder and modify headers
Victor49152 Jan 8, 2025
848bd27
Apply isort and black reformatting
Victor49152 Jan 8, 2025
6129c82
Doc strings update
Victor49152 Jan 9, 2025
790abdd
Apply isort and black reformatting
Victor49152 Jan 9, 2025
90a20e8
pylint correction
Victor49152 Jan 9, 2025
36bcf5c
Apply isort and black reformatting
Victor49152 Jan 9, 2025
36644c8
Merge branch 'refs/heads/main' into mingyuanm/flux_controlnet
Victor49152 Jan 9, 2025
1ec2c16
Add import guard since custom fsdp is not merged to mcore yet
Victor49152 Jan 9, 2025
6c2cbeb
Add copy right headers and correct code check
Victor49152 Jan 9, 2025
20a52e3
Apply isort and black reformatting
Victor49152 Jan 9, 2025
cab13b6
Dist loading with TP2 resolved. Convergence not tested because of Mco…
Victor49152 Jan 13, 2025
ccf1526
Sharded state dict method tested
Victor49152 Jan 14, 2025
5e460f8
Improve hf ckpt converting and saving logic
Victor49152 Jan 15, 2025
510c16c
Update recipes
Victor49152 Jan 21, 2025
abede80
Merge branch 'refs/heads/main' into mingyuanm/flux_controlnet_sharded…
Victor49152 Jan 21, 2025
1f084f7
Add notebook
Victor49152 Jan 22, 2025
385b359
Apply isort and black reformatting
Victor49152 Jan 22, 2025
5035c11
Merge branch 'main' into mingyuanm/flux_controlnet_sharded_dict
Victor49152 Jan 22, 2025
22222e4
Add CI recipe file
Victor49152 Jan 23, 2025
71d7294
Update recipe
Victor49152 Jan 23, 2025
8b93421
Refactor names
Victor49152 Jan 23, 2025
b5d1ea1
Merge branch 'main' into mingyuanm/flux_controlnet_CI
yaoyu-33 Feb 3, 2025
d5c0d67
Add guard
yaoyu-33 Feb 3, 2025
11fcab9
Apply isort and black reformatting
yaoyu-33 Feb 4, 2025
dc4cf2a
Fix
yaoyu-33 Feb 4, 2025
0e1c17b
Apply isort and black reformatting
yaoyu-33 Feb 4, 2025
54f9e71
fix known issues
yaoyu-33 Feb 4, 2025
9a7feb6
Merge remote-tracking branch 'origin/mingyuanm/flux_controlnet_CI' in…
yaoyu-33 Feb 4, 2025
7081e1b
Add import guard
yaoyu-33 Feb 4, 2025
efa82d8
fix issues importing
yaoyu-33 Feb 4, 2025
f2dc0b5
Apply isort and black reformatting
yaoyu-33 Feb 4, 2025
17b8d65
Update flux_535m.py
yaoyu-33 Feb 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove redundant importing
Signed-off-by: mingyuanm <[email protected]>
  • Loading branch information
Victor49152 committed Oct 16, 2024
commit aa9df2a02ae59356a093549387b4ec3d611eac00
3 changes: 0 additions & 3 deletions nemo/collections/diffusion/encoders/conditioner.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from dataclasses import dataclass
from typing import Any, Callable, Dict, List, Optional, Union

import torch
import torch.nn as nn
from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer
Expand Down
2 changes: 1 addition & 1 deletion nemo/collections/diffusion/models/flux/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

from dataclasses import dataclass
from typing import Any, Callable, Dict, List, Optional, Union
from typing import Callable

import torch
from megatron.core.transformer.transformer_config import TransformerConfig
Expand Down
6 changes: 2 additions & 4 deletions nemo/collections/diffusion/models/flux/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

import os
from typing import Any, Callable, Dict, List, Optional, Union
from typing import List, Optional, Union

import numpy as np
import torch
Expand All @@ -24,11 +24,11 @@
from tqdm import tqdm

from nemo.collections.diffusion.encoders.conditioner import FrozenCLIPEmbedder, FrozenT5Embedder
from nemo.collections.diffusion.models.flux.model import Flux, FluxParams

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'FluxParams' is not used.
from nemo.collections.diffusion.sampler.flow_matching.flow_match_euler_discrete import FlowMatchEulerDiscreteScheduler
from nemo.collections.diffusion.utils.flux_ckpt_converter import flux_transformer_converter
from nemo.collections.diffusion.utils.flux_pipeline_utils import FluxModelParams
from nemo.collections.diffusion.vae.autoencoder import AutoEncoder, AutoEncoderParams
from nemo.collections.diffusion.vae.autoencoder import AutoEncoder


class FluxInferencePipeline(nn.Module):
Expand Down Expand Up @@ -247,8 +247,6 @@
offload: bool = True,
):
assert device == 'cuda', 'Transformer blocks in Mcore must run on cuda devices'
height = height
width = width

if prompt is not None and isinstance(prompt, str):
batch_size = 1
Expand Down
1 change: 0 additions & 1 deletion nemo/collections/diffusion/utils/mcore_parallel_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@

import megatron.core.parallel_state as ps
import torch
from megatron.core.tensor_parallel.random import model_parallel_cuda_manual_seed


class Utils:
Expand Down
Loading