Skip to content

Latest commit

 

History

History
389 lines (313 loc) · 13.8 KB

README_bulk.md

File metadata and controls

389 lines (313 loc) · 13.8 KB

MLflow Export Import - Bulk Object Tools

Overview

High-level tools to copy an entire tracking server or a collection of MLflow objects (runs, experiments and registered models). Full object referential integrity is maintained as well as the original MLflow object names.

Three types of bulk tools:

  • All - all MLflow objects of the tracking server.
  • Registered models - models and their versions' run and the run's experiment.
  • Experiments.

Notes:

  • Original source model and experiment names are preserved.
  • Leverages the Single tools as basic building blocks.

Tools

MLflow Object Documentation Code Description
All export-all code Exports all MLflow objects (registered models, experiments and runs) to a directory.
import-all Uses import-models Imports MLflow objects from a directory.
Model export-models code Exports several (or all) registered models and their versions' backing run along with the run's experiment to a directory.
import-models code Imports registered models from a directory.
Experiment export-experiments code Export several (or all) experiments to a directory.
import-experiments code Imports experiments from a directory.

All MLflow Objects Tools

Export all MLflow objects

Exports all MLflow objects of the tracking server (Databricks workspace) - all models, experiments and runs. If you are exporting from Databricks, the notebook can be exported in several different formats.

Source: export_all.py.

Usage

export-all --help

Options:
  --output-dir TEXT               Output directory.  [required]
  --export-latest-versions BOOLEAN
                                  Export latest registered model versions
                                  instead of all versions.  [default: False]
  --stages TEXT                   Stages to export (comma seperated). Default
                                  is all stages and all versions. Stages are
                                  Production, Staging, Archived and None.
                                  Mututally exclusive with option --versions.
  --run-start-time TEXT           Only export runs started after this UTC time
                                  (inclusive). Format: YYYY-MM-DD.
  --export-deleted-runs BOOLEAN   Export deleted runs.  [default: False] 
  --export-version-model BOOLEAN  Export registered model version's 'cached'
                                  MLflow model.  [default: False] 
  --export-permissions BOOLEAN    Export Databricks permissions.  [default:
                                  False]
  --notebook-formats TEXT         Databricks notebook formats. Values are
                                  SOURCE, HTML, JUPYTER or DBC (comma
                                  seperated).
  --use-threads BOOLEAN           Process in parallel using threads.
                                  [default: False]

Example

export-all --output-dir out

Import all MLflow objects

import-all imports all exported MLflow objects. Since the exported output directory is the same structure for both export-all and export-models, this script calls import-models.

Examples

import-all --input-dir out

Registered Models

Copy registered models and transitively all the objects that the model versions depend on: runs and their experiments.

See also Single tools Registered Model Tools.

When exporting a registered models the following model's associated objects are also transitively exported:

  • Versions of a model.
  • The run associated with each version.
  • The experiment that the run belongs to.

Scripts

  • export-models - exports registered models.
  • import-models - imports models.

Export directory

Export directory samples: open source - Databricks.

+-manifest.json
|
+-models/
| +-models.json
| +-Sklearn_WineQuality/
| | +-model.json
| +-Keras_MNIST/
| | +-model.json
|
+-experiments/
| +-experiments.json
| +-1280664374380606/
| | +-experiment.json
| | | . . .

Export registered models

Exports registered models and their versions' backing run along with the run's experiment.

The export-all-runs option is of particular significance. It controls whether all runs of an experiment are exported or only those associated with a registered model version. Obviously there are many runs that are not linked to a registered model version. This can make a substantial difference in export time.

Source: export_models.py.

Usage

export-models --help

Options:
  --models TEXT                   Registered model names (comma delimited)  or
                                  filename ending with '.txt' containing them.
                                  For example, 'model1,model2'. 'all' will
                                  export all models. Or 'models.txt' will
                                  contain a list of model names.  [required]
  --output-dir TEXT               Output directory.  [required]
  --export-latest-versions BOOLEAN
                                  Export latest registered model versions
                                  instead of all versions.  [default: False]
  --export-all-runs BOOLEAN       Export all runs of experiment or just runs
                                  associated with registered model versions.
                                  [default: False]
  --stages TEXT                   Stages to export (comma seperated). Default
                                  is all stages and all versions. Stages are
                                  Production, Staging, Archived and None.
                                  Mututally exclusive with option --versions.
  --export-permissions BOOLEAN    Export Databricks permissions.  [default:
                                  False]
  --export-deleted-runs BOOLEAN   Export deleted runs.  [default: False]
  --export-version-model BOOLEAN  Export registered model version's 'cached'
                                  MLflow model.  [default: False]
  --notebook-formats TEXT         Databricks notebook formats. Values are
                                  SOURCE, HTML, JUPYTER or DBC (comma
                                  seperated).
  --use-threads BOOLEAN           Process in parallel using threads.
                                  [default: False]

Examples

Export all models
export-models --output-dir out
Export specified models
export-models \
  --output-dir out \
  --models sklearn-wine,sklearn-iris
Export models starting with prefix
export-models \
  --output-dir out \
  --models sklearn*
Export models from filename
export-models \
  --output-dir out \
  --models my-models.txt

where my-models.txt is:

sklearn_iris
sklearn_wine

Import registered models

Source: import_models.py.

Usage

import-models --help

Options:
  --input-dir TEXT               Input directory.  [required]
  --delete-model BOOLEAN         If the model exists, first delete the model
                                 and all its versions.  [default: False]
  --import-permissions BOOLEAN   Import Databricks permissions using the HTTP
                                 PATCH method.  [default: False]
  --experiment-rename-file TEXT  File with experiment names replacements:
                                 comma-delimited line such as
                                 'old_name,new_name'.
  --model-rename-file TEXT       File with registered model names
                                 replacements: comma-delimited line such as
                                 'old_name,new_name'.
  --import-source-tags BOOLEAN   Import source information for registered
                                 model and its versions ad tags in destination
                                 object.  [default: False]
  --use-src-user-id BOOLEAN      Set the destination user field to the source
                                 user field.  Only valid for open source
                                 MLflow.  When importing into Databricks, the
                                 source user field is ignored since it is
                                 automatically picked up from your Databricks
                                 access token.  There is no MLflow API
                                 endpoint to explicity set the user_id for Run
                                 and Registered Model.  [default: False]
  --use-threads BOOLEAN          Process in parallel using threads.  [default:
                                 False]

Examples

import-models  --input-dir out

Experiments

Export/import experiments to a directory.

Export Directory

Export directory samples: open source - Databricks.

Export directory

+-experiments.json
| +-5bd3b8a44faf4803989544af5cb4d66e/
| | +-run.json
| | +-artifacts/
| | | +-sklearn-model/
| +-4273c31c45744ec385f3654c63c31360/
| | +-run.json
| | +-artifacts/
| | +- . . .

Export Experiments

Export several (or all) experiments to a directory.

Usage

export-experiments --help

Options:
  --experiments TEXT             Experiment names or IDs (comma delimited).
                                 For example, 'sklearn_wine,sklearn_iris' or
                                 '1,2'. 'all' will export all experiments.
                                 [required]
  --output-dir TEXT              Output directory.  [required]
  --export-permissions BOOLEAN   Export Databricks permissions.  [default:
                                 False]
  --run-start-time TEXT          Only export runs started after this UTC time
                                 (inclusive). Format: YYYY-MM-DD.
  --export-deleted-runs BOOLEAN  Export deleted runs.  [default: False]
  --notebook-formats TEXT        Databricks notebook formats. Values are
                                 SOURCE, HTML, JUPYTER or DBC (comma
                                 seperated).
  --use-threads BOOLEAN          Process in parallel using threads.  [default:
                                 False]

Examples

Export experiments by experiment ID
export-experiments \
  --experiments 1280664374380606,e090757fcb8f49cb \
  --output-dir out
Export experiments by experiment name
export-experiments \
  --experiments /Users/[email protected]/sklearn_iris,/Users/[email protected]/keras_mnist \
  --output-dir out
Export experiments from filename
export-experiments \
  --output-dir out \
  --experiments my-experiments.txt

where my-experiments.txt is:

/Users/[email protected]/sklearn_iris
/Users/[email protected]/keras_mnist
Export all experiments
export-experiments \
  --experiments all --output-dir out
Exporting experiment: {'name': '/Users/[email protected]/sklearn_iris', 'id': '1280664374380606', 'mlflow.experimentType': 'MLFLOW_EXPERIMENT', 'lifecycle_stage': 'active'}
Exporting experiment: {'name': '/Users/[email protected]/keras_mnist', 'id': 'e090757fcb8f49cb', 'mlflow.experimentType': 'NOTEBOOK', 'lifecycle_stage': 'active'}
. . .
249 experiments exported
1770/1770 runs succesfully exported
Duration: 103.6 seonds

Import experiments

Import experiments from a directory. Reads the manifest file to import expirements and their runs.

The experiment will be created if it does not exist in the destination tracking server. If the experiment already exists, the source runs will be added to it.

Usage

import-experiments --help

Options:
  --input-dir TEXT               Input directory.  [required]
  --import-permissions BOOLEAN   Import Databricks permissions using the HTTP
                                 PATCH method.  [default: False]
  --import-source-tags BOOLEAN   Import source information for registered
                                 model and its versions ad tags in destination
                                 object.  [default: False]
  --use-src-user-id BOOLEAN      Set the destination user field to the source
                                 user field.  Only valid for open source
                                 MLflow.  When importing into Databricks, the
                                 source user field is ignored since it is
                                 automatically picked up from your Databricks
                                 access token.  There is no MLflow API
                                 endpoint to explicity set the user_id for Run
                                 and Registered Model.  [default: False]
  --experiment-rename-file TEXT  File with experiment names replacements:
                                 comma-delimited line such as
                                 'old_name,new_name'.
  --use-threads BOOLEAN          Process in parallel using threads.  [default:
                                 False]

Examples

import-experiments \
  --input-dir exported_experiments

Replace /Users/[email protected] with /Users/[email protected] in experiment name.

import-experiments \
  --input-dir exported_experiments \
  --experiment-name-replacements-file experiment-names.csv
cat experiment-names.csv

/Users/[email protected],/Users/[email protected]
/Users/[email protected],/Users/[email protected]