-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Insights: NVIDIA/NeMo
Overview
Could not load contribution data
Please try again later
3 Releases published by 1 person
-
v2.2.0rc0 NVIDIA Neural Modules 2.2.0rc0
published
Feb 2, 2025 -
v2.2.0rc1 NVIDIA Neural Modules 2.2.0rc1
published
Feb 4, 2025 -
v2.2.0rc2 NVIDIA Neural Modules 2.2.0rc2
published
Feb 17, 2025
254 Pull requests merged by 56 people
-
chore: Cherry pick deepseek
#12324 merged
Feb 22, 2025 -
Cherry pick
fix masked loss calculation (12255)
intor2.2.0
#12286 merged
Feb 22, 2025 -
DeepSeek
#11971 merged
Feb 22, 2025 -
Cherry pick
Energon ckpt multimodal (12245)
intor2.2.0
#12307 merged
Feb 22, 2025 -
cherry pick 12209
#12240 merged
Feb 22, 2025 -
Fix BertEmbeddingDataset
#12272 merged
Feb 22, 2025 -
Cherry pick
Add modelopt to requirements_nlp.txt (12261)
intor2.2.0
#12278 merged
Feb 22, 2025 -
Cherry pick
Add eval requirement to setup.py (12152)
intor2.2.0
#12277 merged
Feb 22, 2025 -
build: Bump PyT to 25.01
#11973 merged
Feb 22, 2025 -
Cherry pick
Fix the local path in Sortformer diarizer training tutorial (12135)
intor2.2.0
#12316 merged
Feb 22, 2025 -
automodel notebooks fix
#12238 merged
Feb 22, 2025 -
Cherry pick
build: Exclude tensorstore 0.1.72 (12317)
intor2.2.0
#12318 merged
Feb 22, 2025 -
Fixes and refactor for custom pretraining loop
#12319 merged
Feb 22, 2025 -
Misc resiliency features
#12302 merged
Feb 21, 2025 -
Add checkpointing support to custom pretraining loop
#12291 merged
Feb 21, 2025 -
build: Exclude tensorstore 0.1.72
#12317 merged
Feb 21, 2025 -
Fix the local path in Sortformer diarizer training tutorial
#12135 merged
Feb 21, 2025 -
[nemo1] Fix Mamba/Bert loading from checkpoint after TE extra states were introduced
#12275 merged
Feb 21, 2025 -
add ctc segmentation
#12312 merged
Feb 21, 2025 -
ci: Fix test workflow
#12311 merged
Feb 21, 2025 -
Update for pytorch 25.01 container
#12310 merged
Feb 21, 2025 -
Test model loading for nemo export
#12262 merged
Feb 21, 2025 -
Add test for evaluation
#12276 merged
Feb 21, 2025 -
build: Editable nemo install (#12304)
#12308 merged
Feb 21, 2025 -
Energon ckpt multimodal
#12245 merged
Feb 21, 2025 -
build: Editable nemo install
#12304 merged
Feb 21, 2025 -
Cherry pick
Set L2_Speech_Batch_Size_OOMptimizer_Canary to be optional (12299)
intor2.2.0
#12300 merged
Feb 21, 2025 -
Set L2_Speech_Batch_Size_OOMptimizer_Canary to be optional
#12299 merged
Feb 21, 2025 -
ci: Flaky tests release
#12293 merged
Feb 20, 2025 -
Add sampling args in TRTLLM generate
#11612 merged
Feb 20, 2025 -
build: Bump mcore ref
#12287 merged
Feb 20, 2025 -
fix masked loss calculation
#12255 merged
Feb 20, 2025 -
fix speechlm import ckpt on slurm
#12244 merged
Feb 20, 2025 -
remove nemo1 tests
#12280 merged
Feb 20, 2025 -
remove nemo1 unit tests
#12281 merged
Feb 20, 2025 -
Fixing error when loading T5 checkpoint created with TE<1.13
#12264 merged
Feb 20, 2025 -
Build bitsandbytes
#12279 merged
Feb 20, 2025 -
chore(🤖): Bump
NVIDIA/Megatron-LM
to05ac33c...
(2025-02-20)#12274 merged
Feb 20, 2025 -
Add modelopt to requirements_nlp.txt
#12261 merged
Feb 20, 2025 -
Add eval requirement to setup.py
#12152 merged
Feb 20, 2025 -
Remove modelopt state when empty in NeMo 1.0 distillation
#12266 merged
Feb 20, 2025 -
Training loop
#12268 merged
Feb 20, 2025 -
Add optimizer fix
#12253 merged
Feb 19, 2025 -
Remove some old nemo1 contents in doc
#12156 merged
Feb 19, 2025 -
Llama Embedding Model Improvement
#12236 merged
Feb 19, 2025 -
Add evaluate utilities to custom pretraining loop
#12250 merged
Feb 19, 2025 -
Fix distillation state-dict loading bug
#12270 merged
Feb 19, 2025 -
add default kwargs for trtllm model runner
#12248 merged
Feb 19, 2025 -
ci: Fix pypi link of dry-run
#12267 merged
Feb 19, 2025 -
Fix 2D bucketing test on Python 3.12
#12265 merged
Feb 19, 2025 -
add default param dtype in mistral configs
#12186 merged
Feb 19, 2025 -
ci: Bump release workflows
#12259 merged
Feb 19, 2025 -
chore(🤖): Bump
NVIDIA/Megatron-LM
to61b2c4f...
(2025-02-19)#12251 merged
Feb 19, 2025 -
Ckpt fixes pytorch update
#12228 merged
Feb 19, 2025 -
build:
force-reinstall
#12214 merged
Feb 19, 2025 -
Restructure llm perf scripts to support vlm/diffusion/flux collections
#12252 merged
Feb 19, 2025 -
fix[export]: reshard model correctly handles extra_state when it's a tensor
#12132 merged
Feb 19, 2025 -
ci: Generate coverage for e2e tests
#12120 merged
Feb 18, 2025 -
Add setup function to setup model, optimizer and dataloaders for training
#12247 merged
Feb 18, 2025 -
Update scheduler
#12243 merged
Feb 18, 2025 -
Asr fixes 2.2
#12227 merged
Feb 18, 2025 -
Fix Llama Embedding Tutorial
#12149 merged
Feb 18, 2025 -
Add ConfigContainer plus data and tokenizer modules to nemo/tron
#12241 merged
Feb 18, 2025 -
Apply missing lr_mult and wd_mult to the lr and weight_decay of megatron param groups.
#12123 merged
Feb 18, 2025 -
ci: Disable flaky transcription tests
#12237 merged
Feb 18, 2025 -
add missing __init__
#12239 merged
Feb 18, 2025 -
[Draft] Llama Embedding Model Fix
#12235 merged
Feb 18, 2025 -
skip-linting label optionally disables lint checks
#12179 merged
Feb 18, 2025 -
ci: Simplify install-check
#12231 merged
Feb 18, 2025 -
Cherry pick
nemo-automodel checkpoint-io refactor (12070)
intor2.2.0
#12234 merged
Feb 18, 2025 -
Cherry pick
Fix loading extra states from torch tensor (12185)
intor2.2.0
#12226 merged
Feb 18, 2025 -
Add automodel multinode tut and fix sft peft bug
#12209 merged
Feb 18, 2025 -
nemo-automodel checkpoint-io refactor
#12070 merged
Feb 18, 2025 -
Cherry pick
build: Pin down transformers (12229)
intor2.2.0
#12230 merged
Feb 17, 2025 -
build: Pin down transformers
#12229 merged
Feb 17, 2025 -
Cherry pick
disable moe logging to avoid deepseek hang (12168)
intor2.2.0
#12192 merged
Feb 17, 2025 -
Fix loading extra states from torch tensor
#12185 merged
Feb 17, 2025 -
Cherry pick
Fix multi-GPU in-framework deployment (12090)
intor2.2.0
#12172 merged
Feb 17, 2025 -
ci: Remove
pull_request
trigger#12224 merged
Feb 17, 2025 -
ci: Update release workflows
#12223 merged
Feb 17, 2025 -
ci: Add install-test
#12215 merged
Feb 17, 2025 -
ci: Use release-ref
#12219 merged
Feb 17, 2025 -
ci: Code-freeze dry-run
#12217 merged
Feb 17, 2025 -
Cherry pick
Update TTS code to remove calls to deprecated functions (12153)
intor2.2.0
#12201 merged
Feb 17, 2025 -
Cherry pick
Add function calling SFT NeMo2.0 tutorial (11868)
intor2.2.0
#12180 merged
Feb 17, 2025 -
Revert pytest -s
#12202 merged
Feb 15, 2025 -
Fix nemo-run stdin exception
#12197 merged
Feb 15, 2025 -
Update TTS code to remove calls to deprecated functions
#12153 merged
Feb 15, 2025 -
Set weights_only=False in torch.load in EMA callback and AdapterMixin
#12198 merged
Feb 15, 2025 -
Update forward for eval/inference pass in FlowMatching models
#12056 merged
Feb 15, 2025 -
Use pip --no-deps --force-reinstall when building the test container
#12175 merged
Feb 14, 2025 -
Fixes for bumping pyt to 25.01
#12165 merged
Feb 14, 2025 -
Cherry pick
build: Force re-install VCS dependencies (12155)
intor2.2.0
#12191 merged
Feb 14, 2025 -
use getattr for .optim and allow variable num of args
#12184 merged
Feb 14, 2025 -
Add perf model configs gb200
#12140 merged
Feb 14, 2025 -
disable moe logging to avoid deepseek hang
#12168 merged
Feb 14, 2025 -
Fix nsys callback tests
#12177 merged
Feb 14, 2025 -
Add usage instructions for Cosmos TensorRT
#11650 merged
Feb 14, 2025 -
Fix num nodes to match parallel mappings
#12170 merged
Feb 14, 2025 -
Update Llama 3.1 model IDs
#12089 merged
Feb 14, 2025 -
Distillation NeMo run entrypoint and recipe
#12143 merged
Feb 14, 2025 -
Add function calling SFT NeMo2.0 tutorial
#11868 merged
Feb 14, 2025 -
Migrate SpeechLM to NeMo 2.0
#10808 merged
Feb 13, 2025 -
Fix multi-GPU in-framework deployment
#12090 merged
Feb 13, 2025 -
remove nemo.collection.nlp imports from nemo2
#11904 merged
Feb 13, 2025 -
Fix state transform
#12147 merged
Feb 13, 2025 -
[Audio] Tiny fix for time generation in flow matching
#12091 merged
Feb 13, 2025 -
AutoModel Notebooks
#12013 merged
Feb 13, 2025 -
Revert "build: Force re-install VCS dependencies"
#12163 merged
Feb 13, 2025 -
fix: add default kwargs for trtllm model runner
#12131 merged
Feb 13, 2025 -
chore: Update notebooks
#12161 merged
Feb 12, 2025 -
chore: Version bump
#12160 merged
Feb 12, 2025 -
ci: Bump code-freeze workflow
#12159 merged
Feb 12, 2025 -
build: Force re-install VCS dependencies
#12155 merged
Feb 12, 2025 -
Minor Bug Fixes - LLaMa Embedding
#12146 merged
Feb 12, 2025 -
update export io call
#12144 merged
Feb 12, 2025 -
Skip initialization in hf export
#12136 merged
Feb 12, 2025 -
interface for asymmetric pipeline schedule
#12039 merged
Feb 12, 2025 -
chore(🤖): Bump
NVIDIA/Megatron-LM
to55cdfc1...
(2025-02-12)#12148 merged
Feb 12, 2025 -
AudioToAudioModel: fix model->dataloader sample_rate parameter injection
#12092 merged
Feb 12, 2025 -
Add error message when downloading failed.
#12139 merged
Feb 12, 2025 -
fix: export weight name mapping if model is nemo model
#11497 merged
Feb 12, 2025 -
changed asr models outputs to be consistent
#11818 merged
Feb 12, 2025 -
tests: Run FSDP2 on dual-gpu
#12145 merged
Feb 11, 2025 -
Prevent downloading dataset every time in ci test
#12095 merged
Feb 11, 2025 -
Update vLLM to 0.7.2
#12078 merged
Feb 11, 2025 -
chore(🤖): Bump
NVIDIA/Megatron-LM
to26ad9b3...
(2025-02-11)#12130 merged
Feb 11, 2025 -
fix the issue during batched inference of Sortformer diarizer
#12047 merged
Feb 11, 2025 -
Rename neva datamodule
#12121 merged
Feb 11, 2025 -
Bug fix with generation of expert_tensor_parallel_rank
#12125 merged
Feb 11, 2025 -
Add Automodel support for Deepseek v3 model
#12099 merged
Feb 11, 2025 -
Add performance-optimized example for llama2 70b LoRA
#12055 merged
Feb 11, 2025 -
ci: Disable checks
#12129 merged
Feb 11, 2025 -
[MoE] Add type annotation for mixtral configs
#12126 merged
Feb 10, 2025 -
Malay/bw scripts
#11961 merged
Feb 10, 2025 -
DAPT playbooks - with NeMo 2.0
#12067 merged
Feb 10, 2025 -
Propogate dp last changes from mcore
#12012 merged
Feb 10, 2025 -
Add neva pretrain script
#12033 merged
Feb 10, 2025 -
ci: Bump bot
#12117 merged
Feb 10, 2025 -
ci: Bump Mcore inplace
#12115 merged
Feb 10, 2025 -
Update mcore commit (02.06.25)
#12114 merged
Feb 10, 2025 -
refactor peft module matching; introduce exclude_modules
#12066 merged
Feb 9, 2025 -
build: Optimize
#12112 merged
Feb 9, 2025 -
Ensure nemo.collections.vlm does not strictly require transformer engine
#12108 merged
Feb 9, 2025 -
ci: Fix flaky test
#12113 merged
Feb 9, 2025 -
build: Better caching
#12109 merged
Feb 9, 2025 -
etp docs
#12111 merged
Feb 9, 2025 -
ci: Update bump workflow
#12106 merged
Feb 8, 2025 -
ci: Modular unit tests
#12104 merged
Feb 8, 2025 -
ci: Update bump workflow
#12105 merged
Feb 8, 2025 -
build: Improve installer
#12016 merged
Feb 8, 2025 -
ci: codecov
#12030 merged
Feb 8, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=bcee052...
(2025-02-08)#12100 merged
Feb 8, 2025 -
Fix SBERT with sequence_len_offset
#12057 merged
Feb 8, 2025 -
Refactor VLM modules / Add InternVit submodule support
#11851 merged
Feb 7, 2025 -
Add Llama2 7B recipe
#11649 merged
Feb 7, 2025 -
fix nmt dataclass issue
#12081 merged
Feb 7, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=6219d96...
(2025-02-07)#12088 merged
Feb 7, 2025 -
minor fix in model's summary identation during logging
#12084 merged
Feb 7, 2025 -
set TOKENIZERS_PARALLELISM=True
#12083 merged
Feb 7, 2025 -
Fix hf_dataset bug
#12072 merged
Feb 7, 2025 -
Fix Linting
#12079 merged
Feb 7, 2025 -
Add Llama Embedding Tutorial
#12042 merged
Feb 6, 2025 -
throw MegatronOptimizerModule warning only with mcore models
#12085 merged
Feb 6, 2025 -
Debug Apex distributed optimizer to handle Transformer Engine 2.0
#12004 merged
Feb 6, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=0ae1d14...
(2025-02-06)#12073 merged
Feb 6, 2025 -
Update optimization features readme from nemo1 to nemo2
#12071 merged
Feb 6, 2025 -
attn_implementation eager fallback
#12060 merged
Feb 6, 2025 -
Adding nemo CI
#12052 merged
Feb 6, 2025 -
Conformer-based spectrogram estimator
#12002 merged
Feb 6, 2025 -
add cp_comm_type param to Mistral config
#12049 merged
Feb 6, 2025 -
Pipeline-parallel support for Knowledge Distillation (NeMo 2)
#11766 merged
Feb 6, 2025 -
Sortformer Diarizer 4spk v1 model PR Part 4: Sortformer Documents and Notebook Tutorials
#11707 merged
Feb 5, 2025 -
Recipe changes for performance
#11763 merged
Feb 5, 2025 -
ci: Lint Python files only
#12064 merged
Feb 5, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=ca46c53...
(2025-02-04)#12053 merged
Feb 5, 2025 -
chore: Add warning for rebase
#12061 merged
Feb 5, 2025 -
Add padding in mllama vision encoder to align with HF
#11808 merged
Feb 5, 2025 -
avoid missmatch error when loading older TE checkpoints
#12028 merged
Feb 4, 2025 -
ci: Allow skipping docs
#12048 merged
Feb 4, 2025 -
Adding TFLOPs callback for Multimodal models and NeVA calculator
#11969 merged
Feb 4, 2025 -
Clip Model in Nemo2
#11980 merged
Feb 4, 2025 -
fix llama-3.1 hf model_id
#11774 merged
Feb 4, 2025 -
nemo-automodel: fsdp2 support for peft
#12008 merged
Feb 4, 2025 -
ci: Update workflow
#12044 merged
Feb 4, 2025 -
ci: Update weekly brain
#12043 merged
Feb 4, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=284ed81...
(2025-02-04)#12038 merged
Feb 4, 2025 -
Version bump to
2.2.0rc2.dev0
#12040 merged
Feb 4, 2025 -
[MoE] fix run err in mixtral22B recipe and update its perf config
#12036 merged
Feb 4, 2025 -
ci: Retry on timeout
#11974 merged
Feb 4, 2025 -
ci: Always run linting
#12035 merged
Feb 3, 2025 -
Replace reference of requirements_infer.txt with requirements_deploy.txt
#12029 merged
Feb 3, 2025 -
ci: Run linting per domain
#12027 merged
Feb 3, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=dbe8fa0...
(2025-02-01)#12014 merged
Feb 3, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=d5069b8...
(2025-01-29)#11981 merged
Feb 3, 2025 -
Weekly bump
#11896 merged
Feb 3, 2025 -
Set zarr range to >=2.18.2 and <3.0.0
#12005 merged
Feb 3, 2025 -
[Audio] Fix extra step in Euler sampler for flow matching inference
#11989 merged
Feb 3, 2025 -
ci: Run unit tests on main
#11986 merged
Feb 3, 2025 -
Version bump to
2.2.0rc1
#12023 merged
Feb 2, 2025 -
ci: Release workflow
#12022 merged
Feb 2, 2025 -
ci: Build wheel workflow
#12021 merged
Feb 2, 2025 -
minor fix and simplify
#12007 merged
Feb 2, 2025 -
Adding speechlm AutoModel test
#11990 merged
Feb 2, 2025 -
chore(ci): Disable VMs cron job on forks
#12020 merged
Feb 2, 2025 -
Add the NeMo2 memory profiling plugin
#12009 merged
Feb 2, 2025 -
Llama3.2 1B Embedding Model Support
#11909 merged
Jan 31, 2025 -
Mask vocab padding token ids from CE loss
#11999 merged
Jan 31, 2025 -
Checkpoint saving for automodels via ModelCheckpoint
#11998 merged
Jan 31, 2025 -
Remove deprecated tests/infer_data_path.py
#11997 merged
Jan 31, 2025 -
Introduce evaluation API
#11895 merged
Jan 31, 2025 -
Adding serialization to all Auto* objects in HuggingFace transformers
#11645 merged
Jan 31, 2025 -
Speechllm develop gen duplex
#11993 merged
Jan 30, 2025 -
nemo automodel sft squad data prep fix
#11994 merged
Jan 30, 2025 -
remove --disable-ckpt from tests
#11996 merged
Jan 30, 2025 -
remove renormalize_blend_weights flag
#11975 merged
Jan 30, 2025 -
add exception when loading ckpt saved by TE < 1.13
#11988 merged
Jan 30, 2025 -
callbacks and bf16 grad
#11985 merged
Jan 30, 2025 -
Use override_vocab_size for trtllm export to support qwen
#11982 merged
Jan 29, 2025 -
[checkpoint][docs] Fix typos in dist checkpointing docs
#11983 merged
Jan 29, 2025 -
Update transcribe_utils.py
#11984 merged
Jan 29, 2025 -
improve error and debug messages in model connector
#11979 merged
Jan 29, 2025 -
add use_fast option
#11976 merged
Jan 29, 2025 -
Add batching support for evaluation
#11934 merged
Jan 29, 2025 -
ci: Add coverage reports
#11912 merged
Jan 28, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=0e85db5...
(2025-01-28)#11967 merged
Jan 28, 2025 -
Add options to add mp_policy and parallel_fn for NeMo automodel fsdp2
#11956 merged
Jan 27, 2025 -
Update torch load for load from disk
#11963 merged
Jan 27, 2025 -
Run Flake8 for nemo.export module
#11728 merged
Jan 27, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=f960d4d...
(2025-01-26)#11958 merged
Jan 27, 2025 -
[MoE] add expert tensor parallelism support for NeMo2.0 MoE
#11880 merged
Jan 27, 2025 -
llm performance scripts
#11736 merged
Jan 25, 2025 -
ci: Use single runner machines for unit tests
#11937 merged
Jan 25, 2025 -
PTQ & TRT-LLM updates related to upcoming PyTorch 25.01 bump
#11941 merged
Jan 25, 2025 -
enable loading older TE checkpoints
#11930 merged
Jan 24, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=2167226...
(2025-01-24)#11947 merged
Jan 24, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=0d59157...
(2025-01-23)#11932 merged
Jan 24, 2025 -
Add sharding for speechlm and vlm
#11876 merged
Jan 23, 2025 -
TPS-free 2D bucket estimation and filtering
#11738 merged
Jan 23, 2025 -
Cherry pick
build: Add
soxto SDE (11882)
intor2.1.1
#11936 merged
Jan 23, 2025 -
build: Fix triton
#11940 merged
Jan 23, 2025 -
build: Pin
triton
#11938 merged
Jan 23, 2025 -
Autodetect dtype on exporting to TensorRT-LLM
#11907 merged
Jan 23, 2025 -
build: Add
sox
to SDE#11882 merged
Jan 23, 2025 -
Add more fine-grained performance metrics
#11619 merged
Jan 23, 2025 -
Enable NeMo importer and loading dist CKPT for training
#11927 merged
Jan 23, 2025 -
Create test_phi3.py
#11843 merged
Jan 22, 2025 -
ci: Adjust input argument
#11921 merged
Jan 22, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=5c12382...
(2025-01-22)#11925 merged
Jan 22, 2025 -
Revert #11890 and add a test that would have caught the error
#11914 merged
Jan 22, 2025 -
chore(beep boop 🤖): Bump
MCORE_TAG=5c12382...
(2025-01-22)#11922 merged
Jan 22, 2025 -
Add New Transformer Backbone for TTS Models
#11911 merged
Jan 22, 2025
71 Pull requests opened by 41 people
-
nemo-ux: deprecate app state
#11935 opened
Jan 23, 2025 -
add initial hf automodel docs
#11942 opened
Jan 23, 2025 -
Add NVTX ranges to categorize execution
#11945 opened
Jan 23, 2025 -
[checkpoint] Log timings for checkpoint IO save and load
#11972 opened
Jan 28, 2025 -
Add docs on env vars
#11991 opened
Jan 29, 2025 -
Avoid init_ddp for inference
#12011 opened
Jan 31, 2025 -
Draft: Enable a minimal docs linkcheck build
#12015 opened
Feb 1, 2025 -
Log checkpoint saves at start and finish
#12018 opened
Feb 2, 2025 -
replace n-1 -> n
#12025 opened
Feb 3, 2025 -
Script for estimating data weights with optional temperature
#12032 opened
Feb 3, 2025 -
Add flux recipe for ci
#12037 opened
Feb 4, 2025 -
NeMo export: Remove unnecessary expert key mapping
#12041 opened
Feb 4, 2025 -
Fix/update audio to text dataset
#12045 opened
Feb 4, 2025 -
Fix per-rank log file creation
#12058 opened
Feb 5, 2025 -
Fix bugs in `AudioToMelSpectrogramPreprocessor.input_example`
#12063 opened
Feb 5, 2025 -
numactl cmd
#12069 opened
Feb 5, 2025 -
Configure FSDP to keep module params
#12074 opened
Feb 6, 2025 -
Make TETransformerLayerAutocast Support Cuda Graph
#12075 opened
Feb 6, 2025 -
Fix dataclass field in some asr examples.
#12076 opened
Feb 6, 2025 -
Add T5TTSv2 and Updates NeMo Audio Codecs
#12082 opened
Feb 6, 2025 -
[WIP] add auto model pretrain example
#12087 opened
Feb 6, 2025 -
Save and Restore ModelOpt state in NeMo 2.0
#12094 opened
Feb 7, 2025 -
Avoid rewrapping modules with DDP/FSDP if already wrapped
#12096 opened
Feb 7, 2025 -
Training Performance Optimization for flux_controlnet
#12097 opened
Feb 7, 2025 -
Add FastAPI v1/completions/ endpoint
#12101 opened
Feb 8, 2025 -
fix max_utts hard reqs
#12119 opened
Feb 10, 2025 -
Support Nvidia-DLFramework-Inspect
#12122 opened
Feb 10, 2025 -
Enable ucc backend for pp [NeMo1/NeMo2]
#12128 opened
Feb 10, 2025 -
Fix has_global_batch_sampler to handle datamodules without data_sampler attribute
#12137 opened
Feb 11, 2025 -
updated nemotron h100 cfgs
#12138 opened
Feb 11, 2025 -
Avoid rewrapping modules with DDP and Float16Module on repeated trainer.fit calls
#12141 opened
Feb 11, 2025 -
Add err msg if download failed
#12142 opened
Feb 11, 2025 -
Neva ETP EPP support
#12154 opened
Feb 12, 2025 -
Parakeet RNNT with target lang ID
#12173 opened
Feb 13, 2025 -
Remove getattr_proxy to avoid problematic edge cases
#12176 opened
Feb 13, 2025 -
Abhi/llava next sp
#12182 opened
Feb 14, 2025 -
Add DeepSeek-R1 Distillation NeMo 2.0 tutorial
#12187 opened
Feb 14, 2025 -
Fix: 'IterableDatasetWrapper' has no len() when using Lhotse datasets
#12190 opened
Feb 14, 2025 -
fix for te.linear
#12196 opened
Feb 14, 2025 -
exp manager updates
#12211 opened
Feb 17, 2025 -
Bump zarr version
#12216 opened
Feb 17, 2025 -
Support customization of a few parameters in scripts/vlm/llava_next_pretrain
#12218 opened
Feb 17, 2025 -
Moving async-queue to AppState
#12221 opened
Feb 17, 2025 -
Version bump to `2.2.0rc3.dev0`
#12222 opened
Feb 17, 2025 -
Add Trapezoidal / WSD LR scheduler
#12225 opened
Feb 17, 2025 -
Update L2_NeMo_2_NeMo_Mcore_Mixtral_bitexact to reenable failure on mismatch
#12233 opened
Feb 18, 2025 -
ONNX exporter
#12242 opened
Feb 18, 2025 -
Fixed normalization of feature vector and weight vector
#12246 opened
Feb 18, 2025 -
feat: Allow reshaping of HF checkpoint when converting from .nemo
#12249 opened
Feb 18, 2025 -
Add energon neva pretrain script and fix checkpoint saving
#12256 opened
Feb 19, 2025 -
Evo2 merge 20250214
#12263 opened
Feb 19, 2025 -
Fix model validate broadcast error
#12269 opened
Feb 19, 2025 -
NeVA performance recipe and script
#12271 opened
Feb 19, 2025 -
Fix for te v2.0
#12273 opened
Feb 19, 2025 -
Add trust_remote_code to load_context
#12282 opened
Feb 20, 2025 -
Perf script fix
#12285 opened
Feb 20, 2025 -
fix: typos in documentation files
#12288 opened
Feb 20, 2025 -
Call default factory in dataclasses when saving yaml via nemo.lightning.io
#12289 opened
Feb 20, 2025 -
Update README.md
#12294 opened
Feb 20, 2025 -
Adding FLOP calculator for FLUX
#12295 opened
Feb 20, 2025 -
Respect `pad_seq_length_to_mult` for chat datasets
#12297 opened
Feb 20, 2025 -
Add nemo-run recipe for evaluation
#12301 opened
Feb 21, 2025 -
fix loss reporting
#12303 opened
Feb 21, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `c91756d...` (2025-02-21)
#12305 opened
Feb 21, 2025 -
fixing max_utts
#12309 opened
Feb 21, 2025 -
Bug fixes
#12315 opened
Feb 21, 2025 -
build: Bump mcore
#12320 opened
Feb 21, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `7980711...` (2025-02-22)
#12321 opened
Feb 22, 2025 -
Entrypoint
#12322 opened
Feb 22, 2025 -
build: Bump PyT to 25.01 (#11973)
#12323 opened
Feb 22, 2025 -
use /tmp for HF_HOME
#12325 opened
Feb 22, 2025
29 Issues closed by 7 people
-
Optical Flow classifier
#11847 closed
Feb 21, 2025 -
Unserializable Error with using Energon Dataloader for NeVA (LLaVA) pretraining / fine-tuning and NeMo 2.0
#11931 closed
Feb 20, 2025 -
How to use nemo docker container as base image
#11824 closed
Feb 20, 2025 -
Failing convert_llama_hf_to_nemo.py
#11840 closed
Feb 20, 2025 -
can't load saved fp8 checkpoint when resume training (MOE model)
#11828 closed
Feb 19, 2025 -
How to set lhotse mixed noise parameters in yaml
#11812 closed
Feb 17, 2025 -
convert_llama_hf_to_nemo.py use llama31
#11717 closed
Feb 15, 2025 -
Llama 405B NeMo version
#11776 closed
Feb 15, 2025 -
`cfg` must have `tokenizer` config to create a tokenizer !
#12019 closed
Feb 13, 2025 -
llm.import_ckpt cannot run directly
#11756 closed
Feb 11, 2025 -
[QST] Found no performance gain training Mixtral-8x7B with FP8 on H800
#11959 closed
Feb 10, 2025 -
What kind of manifest does NEST require?
#11752 closed
Feb 10, 2025 -
Issue contributing
#11440 closed
Feb 9, 2025 -
NeMo won't install, no module named torch
#11601 closed
Feb 9, 2025 -
NeMo/examples/slu/speech_intent_slot 0 files were filtered totalling 0.00 hours
#11734 closed
Feb 9, 2025 -
I came here for llama3 meets upcycling
#11644 closed
Feb 7, 2025 -
tutorial notebook fails installing mamba-ssm dependency
#11691 closed
Feb 6, 2025 -
NeMo intermittent start-up failure with OMPI temp directory error on k8s
#11724 closed
Feb 6, 2025 -
how to inference a .nemo file which is converted from a HuggingFace format?
#11478 closed
Jan 31, 2025 -
IndexError: index 0 is out of bounds for dimension 1 with size 0
#11700 closed
Jan 30, 2025 -
Which version of transformer engine should I use, when I try to open ub_tp_comm_overlap?
#11683 closed
Jan 27, 2025 -
SymbolicValueError: STFT does not currently support complex types
#11684 closed
Jan 27, 2025 -
docs.nvidia.com link to tutorial notebooks via `stable` tag, which fail to install dependencies
#11690 closed
Jan 27, 2025 -
NeMo Git tag for patching a nvcr.io/nvidia/nemo:24.07 based Docker container
#11943 closed
Jan 24, 2025 -
How to disable `torch_dist` ckpt format ?
#11625 closed
Jan 24, 2025
41 Issues opened by 34 people
-
Nvidia NEMO 2.0 Serialization Issue: I am facing the same serialization issue with fiddle
#12296 opened
Feb 20, 2025 -
Issues around Resumed Runs
#12290 opened
Feb 20, 2025 -
Off By One Error When Checkpointing and Old Checkpoints Getting Deleted During Run
#12284 opened
Feb 20, 2025 -
Concurrency Issues with MSDD Diarization
#12254 opened
Feb 19, 2025 -
Exported Llama Models Trained Using NeMo Generate The Same Token Repeatedly
#12212 opened
Feb 17, 2025 -
loss divergence when CP>1 and MBS>1
#12210 opened
Feb 17, 2025 -
Pre-Training Neva under pipeline parallel set to 2.
#12205 opened
Feb 16, 2025 -
Checkpointing randomly fails
#12203 opened
Feb 15, 2025 -
Support configuration of num_workers and max_samples_per_sequence in llava_next_pretrain
#12195 opened
Feb 14, 2025 -
Bloated pre-requirements
#12188 opened
Feb 14, 2025 -
HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess."
#12178 opened
Feb 13, 2025 -
Update TE version for support of `pad_between_seqs=True`
#12174 opened
Feb 13, 2025 -
I am trying to train the FastConformer 120M model from scratch, but it is not converging?
#12167 opened
Feb 13, 2025 -
NeMo is not friendly to HF compatibility.
#12166 opened
Feb 13, 2025 -
[HELP] Run into the NaN grad problem while going through the exmaple of official document with fp16
#12134 opened
Feb 11, 2025 -
Implement multi-token prediction option for models
#12133 opened
Feb 11, 2025 -
Fail to convert trained checkpoint to HF format
#12124 opened
Feb 10, 2025 -
[QST] How to set MoE-specific TP size in recipe?
#12103 opened
Feb 8, 2025 -
Loss Fails to Converge in Nemo2-sft.ipynb with Precision 16
#12102 opened
Feb 8, 2025 -
ASR Lhotse dataloader : TypeError: object of type 'IterableDatasetWrapper' has no len()
#12093 opened
Feb 7, 2025 -
AttributeError: 'HFDatasetDataModule' object has no attribute 'tokenizer'
#12080 opened
Feb 6, 2025 -
Need some help/clarity on installing
#12068 opened
Feb 5, 2025 -
extra_loggers is not used to log metrics or hyperparameters
#12046 opened
Feb 4, 2025 -
llava-like dataset implementation "LazySupervisedDataset" likely fails to handle large dataset
#12034 opened
Feb 3, 2025 -
ASR: Is there a coefficient for transcripted words/phrases?
#12026 opened
Feb 3, 2025 -
How do I enable dllogger in NeMo 2.0?
#12010 opened
Jan 31, 2025 -
ASR: How to convert .ckpt to nemo correctly?
#12003 opened
Jan 31, 2025 -
num_sanity_val_steps too large issue
#11978 opened
Jan 28, 2025 -
Add option for prefetch factor of data loader to config
#11977 opened
Jan 28, 2025 -
Megatron BERT Embedding conversion inconsistency
#11970 opened
Jan 28, 2025 -
Pickling error when trying to save checkpoints with custom checkpointIO
#11955 opened
Jan 24, 2025 -
Add Git Tag associated with NeMo Docker containers
#11954 opened
Jan 24, 2025 -
Gemma 2 NeMo 2.0 to HF conversion bug
#11951 opened
Jan 24, 2025 -
Hybrid Sharding support with FSDP
#11946 opened
Jan 23, 2025 -
MegatronGPTModel trains much worse when reducing micro_batch_size
#11939 opened
Jan 23, 2025 -
Have a nemo training container without additional framework elements
#11933 opened
Jan 23, 2025 -
Installation instruction for conda/pip does not work
#11929 opened
Jan 22, 2025 -
Tenacity/s3fs not in requirements
#11926 opened
Jan 22, 2025
36 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Aligner/nemotron5
#11264 commented on
Feb 22, 2025 • 8 new comments -
Add Minitron pruning example for NeMo 2.0
#11848 commented on
Feb 20, 2025 • 7 new comments -
Allow configuration of PP communication backend to UCC in nemo2
#11755 commented on
Feb 18, 2025 • 6 new comments -
replaced classification model with EncDecSpeakerLabelModel
#11887 commented on
Feb 22, 2025 • 5 new comments -
Fast N-Gram LM on GPU + greedy decoding (RNN-T, TDT, CTC)
#10989 commented on
Feb 15, 2025 • 5 new comments -
Improving NeMo export
#11920 commented on
Feb 21, 2025 • 1 new comment -
Add "_skipme" option to Lhotse Dataloading
#11793 commented on
Feb 21, 2025 • 1 new comment -
Add a checkpoint averaging script for the new .distcp checkpoint format
#10462 commented on
Jan 23, 2025 • 0 new comments -
fix(huggingface-hub): allow offline mode
#11901 commented on
Feb 21, 2025 • 0 new comments -
fix: correct signal for Windows
#11898 commented on
Feb 15, 2025 • 0 new comments -
Lack of GPU memory
#11915 commented on
Jan 23, 2025 • 0 new comments -
Add nemo1 to nemo2 conversion for neva
#11860 commented on
Feb 18, 2025 • 0 new comments -
Not able to run LLaVA-Next pretraining with NeMo 2.0 using container version nemo:24.12
#11741 commented on
Jan 23, 2025 • 0 new comments -
FastPitch_Adapter_Finetuning doesn't works
#11666 commented on
Jan 30, 2025 • 0 new comments -
when i use container to do sft for any model, it has context not found error
#11825 commented on
Feb 7, 2025 • 0 new comments -
How to load local model using import ckpt function
#11867 commented on
Feb 7, 2025 • 0 new comments -
Add safetensor option when saving and restoring models
#11549 commented on
Feb 15, 2025 • 0 new comments -
Fixes ASR numpy > 2.x compatibility issues while replicating existing behavior
#11447 commented on
Feb 14, 2025 • 0 new comments -
NeMo-UX: MegatronAutoModel
#11341 commented on
Feb 19, 2025 • 0 new comments -
FilterbankFeatures may return NaNs on CUDA device - torch autocast problem
#11541 commented on
Feb 13, 2025 • 0 new comments -
Fix: Data from AIStore
#11241 commented on
Feb 22, 2025 • 0 new comments -
Add MCore FSDP2 support
#11216 commented on
Feb 13, 2025 • 0 new comments -
Add scripts for importing a ckpt and running a forward step on it for nemo.collections.llm
#11108 commented on
Feb 19, 2025 • 0 new comments -
[NeMo-UX] Add option to drop optimizer states
#11089 commented on
Feb 20, 2025 • 0 new comments -
ASR: Is there any checked and stable way for pretrain?
#11813 commented on
Feb 15, 2025 • 0 new comments -
Self_hosted not honor the parameters
#11924 commented on
Feb 22, 2025 • 0 new comments -
Add CI Tests for Canary/AEDMultitask "lang_field"
#10103 commented on
Feb 21, 2025 • 0 new comments -
Broken offline mode of NeMo
#11899 commented on
Feb 21, 2025 • 0 new comments -
XLarge Fastconformer Long FT does not converge with default parameters
#11894 commented on
Feb 20, 2025 • 0 new comments -
max_steps and time calculation are not working as expected.
#11900 commented on
Feb 20, 2025 • 0 new comments -
Support Pipeline Parallel in Knowledge Distillation
#11531 commented on
Feb 17, 2025 • 0 new comments -
Possible bug in ASRDecoderTimeStamps - math.ceil on fractional tokens_per_chunk leads to timestamps displacements on long files
#11604 commented on
Feb 16, 2025 • 0 new comments -
Canary ouputs English for Arabic Speech
#11826 commented on
Feb 16, 2025 • 0 new comments -
Cosmos support
#11844 commented on
Feb 16, 2025 • 0 new comments -
`prepare_energon_dataset.py` is supposed to save encoded latents but reconstructed videos are saved instead.
#11853 commented on
Feb 16, 2025 • 0 new comments