Skip to content

Commit

Permalink
Adjust Examples to New Pipeline Interface (zenml-io#1552)
Browse files Browse the repository at this point in the history
* Allow passing any step function input as parameter or artifact

* Simplify and niceify parsing

* Raise ZenML error message instead of type error

* Simplify getting unwrapped signature

* Some improvements

* Correctly handle BaseParameters in StepRunner

* Formatting

* Improve pipeline step verification

* Preparations for new pipeline interface

* Helper function for unique step names

* Allow passing name to pipeline init

* New pipelines WIP

* Add some todos

* Hacky implementation of some todos

* A little cleanup

* More cleanup

* Update new pipeline with dev changes

* Fix some stuff

* Some additional verification

* WIP step invocation

* Some cleanup

* Allow partial step calls

* Add method to load v0.3 pipelines

* Refactoring

* Update BaseParameter handling

* Remove inputs from config

* Allow configuration of step in __init__

* Improve readability

* POC implementation of Optional/Union/Any step inputs and outputs

* POC implementation of external artifacts

* POC implementation pipeline composition

* Store pipeline outputs

* Prevent BaseStep.after(...) when calling steps multiple times

* Implement new way to call pipeline

* More type validation

* Remove breakpoint

* Support pydantic custom root type

* Exclude unmaterialized artifacts from type checking

* Refactor some things

* Refactor

* Cleanup

* Cleanup step validation

* More type validation

* Remove some validation to make it more flexible

* Re-add parameter value validation

* Parameter tests

* Allow configuring multiple materializers for union outputs

* Post merge fixes

* Fix materializer import

* Misc changes

* Try to remove hack

* Refactor upload

* Hopefully make pipeline backwards compatible

* Some docstrings

* More docstrings

* Mypy

* Public pipeline and step API (zenml-io#1532)

* Adjust all imports to use zenml.step and zenml.pipeline

* Update src/zenml/steps/entrypoint_function_utils.py

Co-authored-by: Felix Altenberger <[email protected]>

* Update src/zenml/new/pipelines/pipeline.py

Co-authored-by: Felix Altenberger <[email protected]>

* Apply some review suggestions

* Move build_utils and deserialization_utils to zenml.new.pipelines

* Adjust airflow example

* Fix new pipeline and step decorators

* Adjust add_your_own example

* Fix most docstrings

* Delete old pipeline files

* Adjust Kubernetes example

* Update hook loading

* Adjust Kubeflow example (untested)

* Adjust Kubeflow example (untested)

* Remove pipeline inputs/outputs from config

* Adjust Vertex example

* Adjust Sagemaker example

* Refactor into function to avoid duplicate code

* (Failing) Adjust XGB example

* Fix XGB example by removing params

* Adjust scipy example

* Improve materializer selection

* Adjust pytorch example

* Fix base params bug

* Adjust neural prophet example

* (Untested) Adjust LGBM example

* Adjust facets example

* Fix error when using class based api

* Adjust wandb example

* Adjust whylogs integration steps and example

* Adjust deepchecks integration steps and example

* (Failing) Adjust Evidently example

* Fix step parameter configuration

* Fix some tests

* Fix merge error

* Fix more unit tests

* Fix post_execution.get_pipeline() for new pipeline classes

* Add missing comment

* Add docstring

* Adjust great expectations integration steps and example

* (Untested) Adjust Tekton example

* More unit test fixes

* (Untested) Adjust MLflow tracking example

* Adjust mlflow deployer steps

* Adjust Slack example

* Replace some step name occurences by incovation id

* Adjust MLflow deployment example

* Adjust MLflow registry step and example

* Fix pydantic name error for reserved step arguments

* (Untested) Adjust Neptune example

* Adjust quickstart

* Adjust step operator example

* update mlflow removing run()

* Implement new versioning logic

* Fix last unit tests

* Docstrings

* Functional integration tests

* Adjust Seldon deployer integration steps

* Remove useless 'condition' arg of GE steps

* Implement missing property

* Revert "Adjust Seldon deployer integration steps"

This reverts commit 1109ee2.

* update facets WIP

* Fix formatting

* Improve step/pipeline entrypoint validation

* Revert unfinished examples / integration steps

* update whylogs notebook

* update GE but post-execution still broken

* Add missing annotations to func

* Add deprecation warnings

* Expose parameters on step context

* Darglint

* Adjust old examples to integration step changes

* Revert import changes in tests

* Fix whylogs integration test

* update the pipeline step name responsible for deployment

* delete great_expectations notebook (drifted)

* Fix quickstart_py37 notebook

* Update examples/facets_visualize_statistics/run.py

* Update examples/kserve_deployment/steps/tensorflow_steps/tf_trainer.py

* Update examples/deepchecks_data_validation/run.py

* Update examples/vertex_ai_orchestration/run.py

* Update examples/step_operator_remote_training/run.py

* Update examples/scipy/run.py

* Update examples/lightgbm/run.py

* fix deepchecks notebook

* update facets notebook

* Adjust quickstart notebook

* Fix quickstart script

* Fix GE integration test

* Fix quickstart integration test

* update evidently notebook

* deprecate kubeflow notebook

* Update hook readme

* formatting fixes

* Fix formatting

* Cleanup quickstart notebook

* Cleanup quickstart notebook more

* add exception

* fix typos

* fix deployer step name for mlflow deployment test

* formatting fixes

* final typo fix

---------

Co-authored-by: Michael Schuster <[email protected]>
Co-authored-by: Michael Schuster <[email protected]>
Co-authored-by: Alex Strick van Linschoten <[email protected]>
Co-authored-by: Safoine El Khabich <[email protected]>
Co-authored-by: Alex Strick van Linschoten <[email protected]>
  • Loading branch information
6 people authored May 25, 2023
1 parent ea4a84b commit 8fa9053
Show file tree
Hide file tree
Showing 215 changed files with 1,457 additions and 3,147 deletions.
1 change: 1 addition & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ prepending = "prepending"
prev = "prev"
creat = "creat"
ret = "ret"
daa = "daa"

[default]
locale = "en-us"
Original file line number Diff line number Diff line change
Expand Up @@ -600,7 +600,7 @@ The following output shows three different Service Connectors configured from th
* a multi-type GCP Service Connector that allows access to every possible resource accessible with the configured credentials
* a multi-instance GCS Service Connector that allows access to multiple GCS buckets
* a single-instance GCS Serice Connector that only permits access to one GCS bucket
* a single-instance GCS Service Connector that only permits access to one GCS bucket
```
$ zenml service-connector list
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ If you have used ZenML before, you must be familiar with the flow of registering
like this:

```
zenml artifact-store regsiter my_store --flavor=s3 --path=s3://my_bucket
zenml artifact-store register my_store --flavor=s3 --path=s3://my_bucket
```

Commands like these assume that you already have the stack component deployed. In this case, it would mean that you must
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ from zenml.config import DockerSettings

ZenML determines the root directory of your source files in the following order:

* If you've intialized zenml (`zenml init`), the repository root directory will be used.
* If you've initialized zenml (`zenml init`), the repository root directory will be used.
* Otherwise, the parent directory of the python file you're executing will be the source root. For example, running `python /path/to/file.py`, the source root would be `/path/to`.

You can specify how these files are handled using the `source_files` attribute on the `DockerSettings`:
Expand Down
4 changes: 2 additions & 2 deletions docs/book/user-guide/advanced-guide/manage-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
description: Navigating multiple development environments.
---

# Manage multiple environments through which code propogates
# Manage multiple environments through which code propagates

{% hint style="warning" %}
**Note:** This page is a work in progress (WIP) and is currently under development. If you have any questions or need
Expand Down Expand Up @@ -63,4 +63,4 @@ offers [image builders](../component-guide/image-builders/), a
special [stack component](../starter-guide/understand-stacks.md), allowing users to build and push docker images in a different specialized *image builder environment*.

Note that even if you don't configure an image builder in your stack, ZenML still uses
the [local image builder](../component-guide/image-builders/local.md) to retain consistency across all builds. In this case, the image builder environment is the same as the client environment.
the [local image builder](../component-guide/image-builders/local.md) to retain consistency across all builds. In this case, the image builder environment is the same as the client environment.
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ The sorting of **versions** on a `PipelineView` is from **newest** to **oldest**

#### Getting runs from a fetched pipeline version

Each pipeline version can be executed many times. You can get a list of all runs using the `runs` attribute of a `PiplineVersionView`:
Each pipeline version can be executed many times. You can get a list of all runs using the `runs` attribute of a `PipelineVersionView`:

```python
print(latest_version.runs)
Expand Down
7 changes: 7 additions & 0 deletions examples/add_your_own/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ from typing import Type

from zenml import step, pipeline

from zenml import step, pipeline
from zenml.enums import ArtifactType
from zenml.io import fileio
from zenml.materializers.base_materializer import BaseMaterializer
Expand Down Expand Up @@ -63,6 +64,12 @@ def my_first_step() -> MyObj:

my_first_step = my_first_step.configure(output_materializers=MyMaterializer)

# (Optional) tell ZenML to use your custom materializer for this step.
# This is usually not needed since ZenML automatically discovers materializers
# and determines which one to use based on the data type of your output
step1.configure(output_materializers=MyMaterializer)


@step
def my_second_step(my_obj: MyObj) -> None:
"""Step that log the input object and returns nothing."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,21 @@
# or implied. See the License for the specific language governing
# permissions and limitations under the License.

from steps.evaluators import evaluator
from steps.importers import importer_mnist
from steps.trainers import trainer

from zenml import pipeline
from zenml.config import DockerSettings
from zenml.integrations.constants import PYTORCH
from zenml.pipelines import pipeline

docker_settings = DockerSettings(
required_integrations=[PYTORCH], requirements=["torchvision"]
)


@pipeline(settings={"docker": docker_settings})
def fashion_mnist_pipeline(
importer,
trainer,
evaluator,
):
"""Link all the steps and artifacts together."""
train_dataloader, test_dataloader = importer()
def fashion_mnist_pipeline():
train_dataloader, test_dataloader = importer_mnist()
model = trainer(train_dataloader)
evaluator(test_dataloader=test_dataloader, model=model)
17 changes: 3 additions & 14 deletions examples/airflow_orchestration/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,23 @@
# permissions and limitations under the License.

from pipelines.fashion_mnist_pipeline import fashion_mnist_pipeline
from steps.evaluators import evaluator
from steps.importers import importer_mnist
from steps.trainers import trainer

if __name__ == "__main__":
pipeline_instance = fashion_mnist_pipeline(
importer=importer_mnist(),
trainer=trainer(),
evaluator=evaluator(),
)

pipeline_instance.run()
fashion_mnist_pipeline()

# In case you want to run this on a schedule uncomment the following lines.
# Note that Airflow schedules need to be set in the past:

# from datetime import datetime, timedelta

# from zenml.integrations.airflow.flavors.airflow_orchestrator_flavor import (
# AirflowOrchestratorSettings,
# )
# from zenml.pipelines import Schedule

# pipeline_instance.run(
# scheduled_pipeline = fashion_mnist_pipeline.with_options(
# schedule=Schedule(
# start_time=datetime.now() - timedelta(hours=1),
# end_time=datetime.now() + timedelta(hours=1),
# interval_second=timedelta(minutes=15),
# catchup=False,
# )
# )
# scheduled_pipeline()
2 changes: 1 addition & 1 deletion examples/airflow_orchestration/steps/evaluators.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from torch import nn
from torch.utils.data import DataLoader

from zenml.steps import step
from zenml import step

# Get cpu or gpu device for training.
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
Expand Down
3 changes: 2 additions & 1 deletion examples/airflow_orchestration/steps/importers.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
from torchvision import datasets
from torchvision.transforms import ToTensor

from zenml.steps import Output, step
from zenml import step
from zenml.steps import Output


@step
Expand Down
2 changes: 1 addition & 1 deletion examples/airflow_orchestration/steps/trainers.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from torch import nn
from torch.utils.data import DataLoader

from zenml.steps import step
from zenml import step

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.

from zenml.steps import BaseParameters, step


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# or implied. See the License for the specific language governing
# permissions and limitations under the License.


from zenml.steps import BaseParameters, step


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# or implied. See the License for the specific language governing
# permissions and limitations under the License.


from zenml.steps import BaseParameters, step


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# or implied. See the License for the specific language governing
# permissions and limitations under the License.


from zenml.steps import BaseParameters, step


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# or implied. See the License for the specific language governing
# permissions and limitations under the License.


from zenml.steps import BaseParameters, step


Expand Down
74 changes: 23 additions & 51 deletions examples/deepchecks_data_validation/deepchecks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@
"source": [
"# Install the ZenML CLI tool, Deepchecks and scikit-learn\n",
"\n",
"!pip install zenml \n",
"!zenml integration install deepchecks sklearn -f"
"!pip install zenml\n",
"!zenml integration install deepchecks sklearn -y"
]
},
{
Expand Down Expand Up @@ -177,10 +177,10 @@
"from sklearn.base import ClassifierMixin\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"\n",
"from zenml import pipeline, step\n",
"from zenml.integrations.constants import DEEPCHECKS, SKLEARN\n",
"from zenml.logger import get_logger\n",
"from zenml.pipelines import pipeline\n",
"from zenml.steps import Output, step"
"from zenml.steps import Output"
]
},
{
Expand Down Expand Up @@ -272,16 +272,14 @@
"outputs": [],
"source": [
"from zenml.integrations.deepchecks.steps import (\n",
" DeepchecksDataIntegrityCheckStepParameters,\n",
" deepchecks_data_integrity_check_step,\n",
")\n",
"\n",
"data_validator = deepchecks_data_integrity_check_step(\n",
" step_name=\"data_validator\",\n",
" params=DeepchecksDataIntegrityCheckStepParameters(\n",
"data_validator = deepchecks_data_integrity_check_step.with_options(\n",
" parameters=dict(\n",
" dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
" ),\n",
")\n"
")"
]
},
{
Expand All @@ -298,15 +296,11 @@
"outputs": [],
"source": [
"from zenml.integrations.deepchecks.steps import (\n",
" DeepchecksDataDriftCheckStepParameters,\n",
" deepchecks_data_drift_check_step,\n",
")\n",
"\n",
"data_drift_detector = deepchecks_data_drift_check_step(\n",
" step_name=\"data_drift_detector\",\n",
" params=DeepchecksDataDriftCheckStepParameters(\n",
" dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
" ),\n",
"data_drift_detector = deepchecks_data_drift_check_step.with_options(\n",
" parameters=dict(dataset_kwargs=dict(label=LABEL_COL, cat_features=[]))\n",
")"
]
},
Expand All @@ -324,13 +318,11 @@
"outputs": [],
"source": [
"from zenml.integrations.deepchecks.steps import (\n",
" DeepchecksModelValidationCheckStepParameters,\n",
" deepchecks_model_validation_check_step,\n",
")\n",
"\n",
"model_validator = deepchecks_model_validation_check_step(\n",
" step_name=\"model_validator\",\n",
" params=DeepchecksModelValidationCheckStepParameters(\n",
"model_validator = deepchecks_model_validation_check_step.with_options(\n",
" parameters=dict(\n",
" dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
" ),\n",
")"
Expand All @@ -350,16 +342,14 @@
"outputs": [],
"source": [
"from zenml.integrations.deepchecks.steps import (\n",
" DeepchecksModelDriftCheckStepParameters,\n",
" deepchecks_model_drift_check_step,\n",
")\n",
"\n",
"model_drift_detector = deepchecks_model_drift_check_step(\n",
" step_name=\"model_drift_detector\",\n",
" params=DeepchecksModelDriftCheckStepParameters(\n",
"model_drift_detector = deepchecks_model_drift_check_step.with_options(\n",
" parameters=dict(\n",
" dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
" ),\n",
")\n"
")"
]
},
{
Expand Down Expand Up @@ -392,14 +382,7 @@
"docker_settings = DockerSettings(required_integrations=[DEEPCHECKS, SKLEARN])\n",
"\n",
"@pipeline(enable_cache=False, settings={\"docker\": docker_settings})\n",
"def data_validation_pipeline(\n",
" data_loader,\n",
" trainer,\n",
" data_validator,\n",
" model_validator,\n",
" data_drift_detector,\n",
" model_drift_detector,\n",
"):\n",
"def data_validation_pipeline():\n",
" \"\"\"Links all the steps together in a pipeline\"\"\"\n",
" df_train, df_test = data_loader()\n",
" data_validator(dataset=df_train)\n",
Expand Down Expand Up @@ -444,19 +427,10 @@
},
"outputs": [],
"source": [
"pipeline_instance = data_validation_pipeline(\n",
" data_loader=data_loader(),\n",
" trainer=trainer(),\n",
" data_validator=data_validator,\n",
" model_validator=model_validator,\n",
" data_drift_detector=data_drift_detector,\n",
" model_drift_detector=model_drift_detector,\n",
")\n",
"pipeline_instance.run()"
"data_validation_pipeline()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -476,13 +450,11 @@
"metadata": {},
"outputs": [],
"source": [
"pipeline_instance.run()\n",
"\n",
"last_run = pipeline_instance.get_runs()[0]\n",
"data_val_step = last_run.get_step(step=data_validator)\n",
"model_val_step = last_run.get_step(step=model_validator)\n",
"data_drift_step = last_run.get_step(step=data_drift_detector)\n",
"model_drift_step = last_run.get_step(step=model_drift_detector)"
"last_run = data_validation_pipeline.get_runs()[0]\n",
"data_val_step = last_run.get_step(step=\"deepchecks_data_drift_check_step\")\n",
"model_val_step = last_run.get_step(step=\"deepchecks_data_integrity_check_step\")\n",
"data_drift_step = last_run.get_step(step=\"deepchecks_model_drift_check_step\")\n",
"model_drift_step = last_run.get_step(step=\"deepchecks_model_validation_check_step\")"
]
},
{
Expand Down Expand Up @@ -550,7 +522,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -564,7 +536,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
Expand Down
Loading

0 comments on commit 8fa9053

Please sign in to comment.