Skip to content

DVC Get Started Project with a focus on `dvc experiment` features.

Notifications You must be signed in to change notification settings

aberabde/get-started-experiments

Repository files navigation

DVC Get Started with Experiments

This project is a showcase for dvc exp commands to manage large number of experiments. It trains a CNN on Fashion MNIST dataset in Tensorflow.

### Installation Instructions

After installing DVC and cloning the repository, you can run:

virtualenv .venv
. .venv/bin/activate
pip install -r requirements.txt

Retrieve all the required data and model files:

dvc pull

Running Experiments

You can run the experiment defined in the project.

dvc exp run

This new command in DVC 2.0 also allows to change the parameters on the fly with --set-param option.

dvc exp run --set-param model.conv_units=128 

params.yaml defines two parameters to modify with dvc exp run --set-param/-S option. The above command updates params.yaml with conv_units: 128 before running the experiment.

The experiment will produce metrics.json along with a models/model.h5.

You can check the changes in metrics:

dvc exp diff

It's also possible to queue experiments with --queue option and run them all in a single batch with --run-all.

dvc exp run --queue -S model.conv_units=32
dvc exp run --queue -S model.conv_units=64
dvc exp run --queue -S model.conv_units=96

The queued experiments can be run in parallel with --jobs.

dvc exp run --run-all --jobs 2

You can get the summary of experiments with:

dvc exp show

Limit the parameters and metrics to show with --include-params and --include-metrics options, respectively.

By default experiments are given auto-generated names derived from their inputs and environment. It may be easier to review them when you give names with the --name/-n option.

dvc exp run -n my-baseline-experiment

Artifacts produced by experiments are normally not checked out to the repository. If you want to do so, you can use:

dvc exp apply exp-123456

where exp-123456 is the experiment ID you see with dvc exp show.

Then, you can use DVC and Git commands on the artifacts and code as usual.

You can push and pull the code changes related to an experiment with dvc exp push and dvc exp pull respectively. These two commands work with Git remotes and DVC remotes together. Changes in the text files tracked by Git are transferred from/to Git repositories, and binary tracked by DVC are transferred from/to DVC remotes.

You can clean up the unused experiments with:

dvc exp gc --workspace

Parameters

There are two parameters in the project. They are set in params.yaml. models.conv_units defines the number of convolutional units in the model, and train.epochs sets the number of epochs to train the model.

train:
  - epochs: 1
model:
  - conv_units: 16

You can select these parameters in dvc exp show with --include-params.

Metrics

There are two metrics produced by the training stage.

  • loss: Categorical Crosstentropy loss value
  • acc: Categorical Accuracy metrics for the classes.

You can select these metrics in dvc exp show with --include-metrics.

Data Files

The data files used in the project are found in data/images. All of these files are tracked by DVC and should be retrieved using dvc pull from the configured remote.

Contributing

This repository is generated by example-repos-dev. For Pull Requests regarding the fixes, please use that repository.

About

DVC Get Started Project with a focus on `dvc experiment` features.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages