Skip to content

ohjoonhee/lightning-codebase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

This is to-go pytorch template utilizing lightning and wandb. This template uses Lightning CLI for config management. It follows most of Lightning CLI docs but, integrated with wandb. Since Lightning CLI instantiate classes on-the-go, there were some work-around while integrating WandbLogger to the template. This might not be the best practice, but still it works and quite convinient.

How To Use

It uses Lightning CLI, so most of its usage can be found at its official docs.
There are some added arguments related to wandb.

  • --name or -n: Name of the run, displayed in wandb
  • --version or -v: Version of the run, displayed in wandb as tags

Basic cmdline usage is as follows.
We assume cwd is project root dir.

fit stage

python src/main.py fit -c configs/config.yaml -n debug-fit-run -v debug-version

If using wandb for logging, change "project" key in cli_module/rich_wandb.py If you want to access log directory in your LightningModule, you can access as follows.

log_root_dir = self.logger.log_dir or self.logger.save_dir

Clean Up Wandb Artifacts

If using wandb for logging, model ckpt files are uploaded to wandb.
Since the size of ckpt files are too large, clean-up process needed.
Clean-up process delete all model ckpt artifacts without any aliases (e.g. best, lastest) To toggle off the clean-up process, add the following to config.yaml. Then every version of model ckpt files will be saved to wandb.

trainer:
  logger:
    init_args:
      clean: false

Model Checkpoint

One can save model checkpoints using Lightning Callbacks. It contains model weight, and other state_dict for resuming train.
There are several ways to save ckpt files at either local or cloud.

  1. Just leave everything in default, ckpt files will be saved locally. (at logs/${name}/${version}/fit/checkpoints)

  2. If you want to save ckpt files as wandb Artifacts, add the following config. (The ckpt files will be saved locally too.)

trainer:
  logger:
    init_args:
      log_model: all
  1. If you want to save ckpt files in cloud rather than local, you can change the save path by adding the config. (The ckpt files will NOT be saved locally.)
model_ckpt:
  dirpath: gs://bucket_name/path/for/checkpoints

AsyncCheckpointIO Plugins

You can set async checkpoint saving by providing config as follows.

trainer:
  plugins:
    - AsyncCheckpointIO

Automatic Batch Size Finder

Just add BatchSizeFinder callbacks in the config

trainer:
  callbacks:
    - class_path: BatchSizeFinder

Or add them in the cmdline.

python src/main.py fit -c configs/config.yaml --trainer.callbacks+=BatchSizeFinder
NEW! tune.py for lr_find and batch size find
python src/tune.py -c configs/config.yaml

NOTE: No subcommand in cmdline

Resume

Basically all logs are stored in logs/${name}/${version}/${job_type} where ${name} and ${version} are configured in yaml file or cmdline. {job_type} can be one of fit, test, validate, etc.

test stage

python src/main.py test -c configs/config.yaml -n debug-test-run -v debug-version --ckpt_path YOUR_CKPT_PATH

TODO

  • Check pretrained weight loading
  • Consider multiple optimizer using cases (i.e. GAN)
  • Add instructions in README (on-going)
  • Clean code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published