Skip to content

How to use DVC pipelines and vscode notebook extensions to run Data Science experiments in Python

Notifications You must be signed in to change notification settings

polecat-dev/talk-dvc-vscode

Repository files navigation

dvc_vscode

How to use DVC pipelines and vscode notebook extensions to run Data Science experiments in Python

Introduction

This is a repo that was done for a lecture for PyData Bristol on 10.2022 and for Data Bristol on 03.2023.

This repo has the materials for a simple DVC pipeline and how to monitor results using VSCode notebook extension - check changes.

Lecture from PyData Bristol

Prerequisites

  1. Python 3.11.
  2. Poetry.
  3. Graphviz.
  4. VSCode.

Installation (on linux)

git clone https://github.com/polecat-dev/talk-dvc-vscode.git

cd talk-dvc-vscode

poetry shell

pip install --upgrade pip

poetry install (installing dependencies)

python scripts/init_files.py

To start

To reproduce the pipeline -

dvc repro

To visualise results -

Go to results.

Define your kernel (with the poetry virtualenv).

Run all cells.

Save notebook changes.

Go to source control VSCode extension to check results changes.

How to visualise the pipeline

You can explore the pipeline in the dvc.yaml. To generate the graph visualization above, run:

poetry run dvc dag --dot | dot -Tpng > dag.png

dag-image

Credits

Thanks to PyData Bristol & Data Bristol for the talk opportunity.

Thanks to Polecat for the time and information sharing.

Thanks to SpaCy for the materials.

Thanks to DVC.

Maintainer - Alon Samuel.

About

How to use DVC pipelines and vscode notebook extensions to run Data Science experiments in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published