Skip to content

Commit

Permalink
Update README (grok-ai#42)
Browse files Browse the repository at this point in the history
* Add dynamic badges for tests, docs and nn-core version

* Remove unnecessary information in the README

* Update README

* Update structure in README

* Update checks badges

Co-authored-by: Valentino Maiorca <[email protected]>
  • Loading branch information
lucmos and Flegyas committed Feb 3, 2022
1 parent cd5167b commit 76714de
Showing 1 changed file with 96 additions and 166 deletions.
262 changes: 96 additions & 166 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,197 +1,127 @@
# NN Template

<p align="center">
<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/-PyTorch-red?logo=pytorch&labelColor=gray"></a>
<a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/code-Lightning-blueviolet"></a>
<a href="https://hydra.cc/"><img alt="Conf: hydra" src="https://img.shields.io/badge/conf-hydra-blue"></a>
<a href="https://wandb.ai/site"><img alt="Logging: wandb" src="https://img.shields.io/badge/logging-wandb-yellow"></a>
<a href="https://dvc.org/"><img alt="Conf: hydra" src="https://img.shields.io/badge/data-dvc-9cf"></a>
<a href="https://streamlit.io/"><img alt="UI: streamlit" src="https://img.shields.io/badge/ui-streamlit-orange"></a>
<a href="https://github.com/grok-ai/nn-template/actions/workflows/test_suite.yml"><img alt="CI" src=https://img.shields.io/github/workflow/status/grok-ai/nn-template/Test%20Suite/main?label=main%20checks></a>
<a href="https://github.com/grok-ai/nn-template/actions/workflows/test_suite.yml"><img alt="CI" src=https://img.shields.io/github/workflow/status/grok-ai/nn-template/Test%20Suite/develop?label=develop%20checks></a>
<a href="https://grok-ai.github.io/nn-template"><img alt="Docs" src=https://img.shields.io/github/workflow/status/grok-ai/nn-template/pages%20build%20and%20deployment/gh-pages?label=docs></a>
<a href="https://pypi.org/project/nn-template-core/"><img alt="Release" src="https://img.shields.io/pypi/v/nn-template-core?label=nn-core"></a>
<a href="https://black.readthedocs.io/en/stable/"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
</p>

[comment]: <> (<p align="center">)

Generic template to bootstrap your [PyTorch](https://pytorch.org/get-started/locally/) project. Click on [![](https://img.shields.io/badge/-Use_this_template-success?style=flat)](https://github.com/lucmos/nn-template/generate) and avoid writing boilerplate code for:
[comment]: <> ( <a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/-PyTorch-red?logo=pytorch&labelColor=gray"></a>)

- [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning), lightweight PyTorch wrapper for high-performance AI research.
- [Hydra](https://github.com/facebookresearch/hydra), a framework for elegantly configuring complex applications.
- [DVC](https://dvc.org/doc/start/data-versioning), track large files, directories, or ML models. Think "Git for data".
- [Weights and Biases](https://wandb.ai/home), organize and analyze machine learning experiments. *(educational account available)*
- [Streamlit](https://streamlit.io/), turns data scripts into shareable web apps in minutes.

*`nn-template`* is opinionated so you don't have to be.
If you use this template, please add
[![](https://shields.io/badge/-nn--template-emerald?style=flat&logo=github&labelColor=gray)](https://github.com/lucmos/nn-template)
to your `README`.


### Usage Examples

Checkout the [`mwe` branch](https://github.com/lucmos/nn-template/tree/mwe) to view a minimum working example on MNIST.

# Structure

```bash
.
├── .cache
├── conf # hydra compositional config
│   ├── nn
│   ├── default.yaml # current experiment configuration
│   ├── hydra
│   └── train
├── data # datasets
├── .env # system-specific env variables, e.g. PROJECT_ROOT
├── requirements.txt # basic requirements
├── src
│   ├── common # common modules and utilities
│   ├── data # PyTorch Lightning datamodules and datasets
│   ├── modules # PyTorch Lightning modules
│   ├── run.py # entry point to run current conf
│   └── ui # interactive streamlit apps
└── wandb # local experiments (auto-generated)
```

# Streamlit
[Streamlit](https://docs.streamlit.io/) is an open-source Python library that makes
it easy to create and share beautiful, custom web apps for machine learning and data science.

In just a few minutes, you can build and deploy powerful data apps to:

- **Explore** your data
- **Interact** with your model
- **Analyze** your model behavior and input sensitivity
- **Showcase** your prototype with [awesome web apps](https://streamlit.io/gallery)

Moreover, Streamlit enables interactive development with automatic rerun on files changes.

Launch a minimal app with `PYTHONPATH=. streamlit run src/ui/run.py`. There is a built-in function to restore a model checkpoint stored on W&B, with automatic download if the checkpoint is not present in the local machine:

![](https://i.imgur.com/3lTnOA1.png)



# Data Version Control

DVC runs alongside `git` and uses the current commit hash to version control the data.

Initialize the `dvc` repository:

```bash
$ dvc init
```

To start tracking a file or directory, use `dvc add`:

```bash
$ dvc add data/ImageNet
```

DVC stores information about the added file (or a directory) in a special `.dvc` file named `data/ImageNet.dvc`, a small text file with a human-readable format.
This file can be easily versioned like source code with Git, as a placeholder for the original data (which gets listed in `.gitignore`):

```bash
git add data/ImageNet.dvc data/.gitignore
git commit -m "Add raw data"
```

## Making changes

When you make a change to a file or directory, run `dvc add` again to track the latest version:
[comment]: <> ( <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/code-Lightning-blueviolet"></a>)

```bash
$ dvc add data/ImageNet
```

## Switching between versions

The regular workflow is to use `git checkout` first to switch a branch, checkout a commit, or a revision of a `.dvc` file, and then run `dvc checkout` to sync data:

```bash
$ git checkout <...>
$ dvc checkout
```

---

Read more in the [docs](https://dvc.org/doc/start/data-versioning)!


# Weights and Biases
[comment]: <> ( <a href="https://hydra.cc/"><img alt="Conf: hydra" src="https://img.shields.io/badge/conf-hydra-blue"></a>)

Weights & Biases helps you keep track of your machine learning projects. Use tools to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
[comment]: <> ( <a href="https://wandb.ai/site"><img alt="Logging: wandb" src="https://img.shields.io/badge/logging-wandb-yellow"></a>)

[This](https://wandb.ai/gladia/nn-template?workspace=user-lucmos) is an example of a simple dashboard.
[comment]: <> ( <a href="https://dvc.org/"><img alt="Conf: hydra" src="https://img.shields.io/badge/data-dvc-9cf"></a>)

## Quickstart
[comment]: <> ( <a href="https://streamlit.io/"><img alt="UI: streamlit" src="https://img.shields.io/badge/ui-streamlit-orange"></a>)

Login to your `wandb` account, running once `wandb login`.
Configure the logging in `conf/logging/*`.
[comment]: <> (</p>)


---


Read more in the [docs](https://docs.wandb.ai/). Particularly useful the [`log` method](https://docs.wandb.ai/library/log), accessible from inside a PyTorch Lightning module with `self.logger.experiment.log`.

> W&B is our logger of choice, but that is a purely subjective decision. Since we are using Lightning, you can replace
`wandb` with the logger you prefer (you can even build your own).
More about Lightning loggers [here](https://pytorch-lightning.readthedocs.io/en/latest/extensions/logging.html).

# Hydra

Hydra is an open-source Python framework that simplifies the development of research and other complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.

The basic functionalities are intuitive: it is enough to change the configuration files in `conf/*` accordingly to your preferences. Everything will be logged in `wandb` automatically.

Consider creating new root configurations `conf/myawesomeexp.yaml` instead of always using the default `conf/default.yaml`.
<p align="center">
<i>
nn-template is opinionated so you don't have to be
</i>
</p>


## Sweeps
Generic cookiecutter template to bootstrap your [PyTorch](https://pytorch.org/get-started/locally/) project,
read more in the [documentation](https://lucmos.github.io/nn-template).

You can easily perform hyperparameters [sweeps](https://hydra.cc/docs/advanced/override_grammar/extended), which override the configuration defined in `/conf/*`.
## Get started

The easiest one is the grid-search. It executes the code with every possible combinations of the specified hyperparameters:
Generate your project with cookiecutter:

```bash
PYTHONPATH=. python src/run.py -m optim.optimizer.lr=0.02,0.002,0.0002 optim.lr_scheduler.T_mult=1,2 optim.optimizer.weight_decay=0,1e-5
cookiecutter https://github.com/lucmos/nn-template
```

You can explore aggregate statistics or compare and analyze each run in the W&B dashboard.
> This is a *parametrized* template that uses [cookiecutter](https://github.com/cookiecutter/cookiecutter).
> Install cookiecutter with:
>
> ```pip install cookiecutter```
---

We recommend to go through at least the [Basic Tutorial](https://hydra.cc/docs/tutorials/basic/your_first_app/simple_cli), and the docs about [Instantiating objects with Hydra](https://hydra.cc/docs/patterns/instantiate_objects/overview).
## Integrations

Avoid writing boilerplate code to integrate:

# PyTorch Lightning

Lightning makes coding complex networks simple.
It is not a high level framework like `keras`, but forces a neat code organization and encapsulation.

You should be somewhat familiar with PyTorch and [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/stable/index.html) before using this template.

# Environment Variables

System specific variables (e.g. absolute paths to datasets) should not be under version control, otherwise there will be conflicts between different users.

The best way to handle system specific variables is through environment variables.
- [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning), lightweight PyTorch wrapper for high-performance AI research.
- [Hydra](https://github.com/facebookresearch/hydra), a framework for elegantly configuring complex applications.
- [Weights and Biases](https://wandb.ai/home), organize and analyze machine learning experiments. *(educational account available)*
- [Streamlit](https://streamlit.io/), turns data scripts into shareable web apps in minutes.
- [MkDocs](https://www.mkdocs.org/) and [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/), a fast, simple and downright gorgeous static site generator.
- [DVC](https://dvc.org/doc/start/data-versioning), track large files, directories, or ML models. Think "Git for data".
- [GitHub Actions](https://github.com/features/actions), to run the tests, publish the documentation and to PyPI automatically.
- Python best practices for developing and publishing research projects.

You can define new environment variables in a `.env` file in the project root. A copy of this file (e.g. `.env.template`) can be under version control to ease new project configurations.
## Structure

To define a new variable write inside `.env`:
The generated projects will contain the following files:

```bash
export MY_VAR=/home/user/my_system_path
```

You can dynamically resolve the variable name from Python code with:

```python
get_env("MY_VAR")
```

and in the Hydra `.yaml` configuration files with:

```yaml
${oc.env:MY_VAR}
.
├── conf
│   ├── default.yaml
│   ├── hydra
│   │   └── default.yaml
│   ├── nn
│   │   └── default.yaml
│   └── train
│   └── default.yaml
├── data
│   └── .gitignore
├── docs
│   ├── index.md
│   └── overrides
│   └── main.html
├── .editorconfig
├── .env
├── .env.template
├── env.yaml
├── .flake8
├── .github
│   └── workflows
│   ├── publish.yml
│   └── test_suite.yml
├── .gitignore
├── LICENSE
├── mkdocs.yml
├── .pre-commit-config.yaml
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── src
│   └── awesome_project
│   ├── data
│   │   ├── datamodule.py
│   │   ├── dataset.py
│   │   └── __init__.py
│   ├── __init__.py
│   ├── modules
│   │   ├── __init__.py
│   │   └── module.py
│   ├── pl_modules
│   │   ├── __init__.py
│   │   └── pl_module.py
│   ├── run.py
│   └── ui
│   ├── __init__.py
│   └── run.py
└── tests
├── conftest.py
├── __init__.py
├── test_checkpoint.py
├── test_configuration.py
├── test_nn_core_integration.py
├── test_resume.py
├── test_storage.py
└── test_training.py
```

0 comments on commit 76714de

Please sign in to comment.