Linux & Mac | Windows | Code quality | Unit-tests |
---|---|---|---|
DVC is an open source tool for data science projects. It helps data scientists manage their code and data together in a simple form of Git-like commands.
Step | Command |
---|---|
Track code and data together | $ git add train.py $ dvc add images.zip |
Connect code and data by commands | $ dvc run -d images.zip -o images/ unzip -q images.zip $ dvc run -d images/ -d train.py -o model.p python train.py |
Make changes and reproduce | $ vi train.py $ dvc repro |
Share code | $ git add . $ git commit -m 'The baseline model' $ git push |
Share data and ML models | $ dvc config AWS.StoragePath mybucket/image_cnn $ dvc push |
Operating system dependent packages are the recommended way to install DVC. The latest version of the packages can be found at GitHub releases page: https://github.com/dataversioncontrol/dvc/releases
DVC could be installed via the Python Package Index (PyPI).
To install using pip:
pip install dvc
Website: https://dataversioncontrol.com
Tutorial: https://blog.dataversioncontrol.com/data-version-control-tutorial-9146715eda46?gi=a3e49be7976c
Documentation: http://dataversioncontrol.com/docs/
Discussion: https://discuss.dataversioncontrol.com/
This project is distributed under the Apache license version 2.0 (see the LICENSE file in the project root).
By submitting a pull request for this project, you agree to license your contribution under the Apache license version 2.0 to this project.