Adjust Examples to New Pipeline Interface (zenml-io#1552)

* Allow passing any step function input as parameter or artifact * Simplify and niceify parsing * Raise ZenML error message instead of type error * Simplify getting unwrapped signature * Some improvements * Correctly handle BaseParameters in StepRunner * Formatting * Improve pipeline step verification * Preparations for new pipeline interface * Helper function for unique step names * Allow passing name to pipeline init * New pipelines WIP * Add some todos * Hacky implementation of some todos * A little cleanup * More cleanup * Update new pipeline with dev changes * Fix some stuff * Some additional verification * WIP step invocation * Some cleanup * Allow partial step calls * Add method to load v0.3 pipelines * Refactoring * Update BaseParameter handling * Remove inputs from config * Allow configuration of step in __init__ * Improve readability * POC implementation of Optional/Union/Any step inputs and outputs * POC implementation of external artifacts * POC implementation pipeline composition * Store pipeline outputs * Prevent BaseStep.after(...) when calling steps multiple times * Implement new way to call pipeline * More type validation * Remove breakpoint * Support pydantic custom root type * Exclude unmaterialized artifacts from type checking * Refactor some things * Refactor * Cleanup * Cleanup step validation * More type validation * Remove some validation to make it more flexible * Re-add parameter value validation * Parameter tests * Allow configuring multiple materializers for union outputs * Post merge fixes * Fix materializer import * Misc changes * Try to remove hack * Refactor upload * Hopefully make pipeline backwards compatible * Some docstrings * More docstrings * Mypy * Public pipeline and step API (zenml-io#1532) * Adjust all imports to use zenml.step and zenml.pipeline * Update src/zenml/steps/entrypoint_function_utils.py Co-authored-by: Felix Altenberger <[email protected]> * Update src/zenml/new/pipelines/pipeline.py Co-authored-by: Felix Altenberger <[email protected]> * Apply some review suggestions * Move build_utils and deserialization_utils to zenml.new.pipelines * Adjust airflow example * Fix new pipeline and step decorators * Adjust add_your_own example * Fix most docstrings * Delete old pipeline files * Adjust Kubernetes example * Update hook loading * Adjust Kubeflow example (untested) * Adjust Kubeflow example (untested) * Remove pipeline inputs/outputs from config * Adjust Vertex example * Adjust Sagemaker example * Refactor into function to avoid duplicate code * (Failing) Adjust XGB example * Fix XGB example by removing params * Adjust scipy example * Improve materializer selection * Adjust pytorch example * Fix base params bug * Adjust neural prophet example * (Untested) Adjust LGBM example * Adjust facets example * Fix error when using class based api * Adjust wandb example * Adjust whylogs integration steps and example * Adjust deepchecks integration steps and example * (Failing) Adjust Evidently example * Fix step parameter configuration * Fix some tests * Fix merge error * Fix more unit tests * Fix post_execution.get_pipeline() for new pipeline classes * Add missing comment * Add docstring * Adjust great expectations integration steps and example * (Untested) Adjust Tekton example * More unit test fixes * (Untested) Adjust MLflow tracking example * Adjust mlflow deployer steps * Adjust Slack example * Replace some step name occurences by incovation id * Adjust MLflow deployment example * Adjust MLflow registry step and example * Fix pydantic name error for reserved step arguments * (Untested) Adjust Neptune example * Adjust quickstart * Adjust step operator example * update mlflow removing run() * Implement new versioning logic * Fix last unit tests * Docstrings * Functional integration tests * Adjust Seldon deployer integration steps * Remove useless 'condition' arg of GE steps * Implement missing property * Revert "Adjust Seldon deployer integration steps" This reverts commit 1109ee2. * update facets WIP * Fix formatting * Improve step/pipeline entrypoint validation * Revert unfinished examples / integration steps * update whylogs notebook * update GE but post-execution still broken * Add missing annotations to func * Add deprecation warnings * Expose parameters on step context * Darglint * Adjust old examples to integration step changes * Revert import changes in tests * Fix whylogs integration test * update the pipeline step name responsible for deployment * delete great_expectations notebook (drifted) * Fix quickstart_py37 notebook * Update examples/facets_visualize_statistics/run.py * Update examples/kserve_deployment/steps/tensorflow_steps/tf_trainer.py * Update examples/deepchecks_data_validation/run.py * Update examples/vertex_ai_orchestration/run.py * Update examples/step_operator_remote_training/run.py * Update examples/scipy/run.py * Update examples/lightgbm/run.py * fix deepchecks notebook * update facets notebook * Adjust quickstart notebook * Fix quickstart script * Fix GE integration test * Fix quickstart integration test * update evidently notebook * deprecate kubeflow notebook * Update hook readme * formatting fixes * Fix formatting * Cleanup quickstart notebook * Cleanup quickstart notebook more * add exception * fix typos * fix deployer step name for mlflow deployment test * formatting fixes * final typo fix --------- Co-authored-by: Michael Schuster <[email protected]> Co-authored-by: Michael Schuster <[email protected]> Co-authored-by: Alex Strick van Linschoten <[email protected]> Co-authored-by: Safoine El Khabich <[email protected]> Co-authored-by: Alex Strick van Linschoten <[email protected]>
hipotures · May 25, 2023 · 8fa9053 · 8fa9053
1 parent ea4a84b
commit 8fa9053
Show file tree

Hide file tree

Showing 215 changed files with 1,457 additions and 3,147 deletions.
diff --git a/.typos.toml b/.typos.toml
@@ -22,6 +22,7 @@ prepending = "prepending"
 prev = "prev"
 creat = "creat"
 ret = "ret"
+daa = "daa"
 
 [default]
 locale = "en-us"
diff --git a/...your-mlops-platform/connect-zenml-to-infrastructure/service-connectors-guide.md b/...your-mlops-platform/connect-zenml-to-infrastructure/service-connectors-guide.md
@@ -600,7 +600,7 @@ The following output shows three different Service Connectors configured from th
 
 * a multi-type GCP Service Connector that allows access to every possible resource accessible with the configured credentials
 * a multi-instance GCS Service Connector that allows access to multiple GCS buckets
-* a single-instance GCS Serice Connector that only permits access to one GCS bucket
+* a single-instance GCS Service Connector that only permits access to one GCS bucket
 
 ```
 $ zenml service-connector list

diff --git a/...your-mlops-platform/deploy-and-set-up-a-cloud-stack/deploy-a-stack-component.md b/...your-mlops-platform/deploy-and-set-up-a-cloud-stack/deploy-a-stack-component.md
@@ -8,7 +8,7 @@ If you have used ZenML before, you must be familiar with the flow of registering
 like this:
 
 ```
-zenml artifact-store regsiter my_store --flavor=s3 --path=s3://my_bucket
+zenml artifact-store register my_store --flavor=s3 --path=s3://my_bucket
 ```
 
 Commands like these assume that you already have the stack component deployed. In this case, it would mean that you must

diff --git a/docs/book/user-guide/advanced-guide/containerize-your-pipeline.md b/docs/book/user-guide/advanced-guide/containerize-your-pipeline.md
@@ -83,7 +83,7 @@ from zenml.config import DockerSettings
 
 ZenML determines the root directory of your source files in the following order:
 
-* If you've intialized zenml (`zenml init`), the repository root directory will be used.
+* If you've initialized zenml (`zenml init`), the repository root directory will be used.
 * Otherwise, the parent directory of the python file you're executing will be the source root. For example, running `python /path/to/file.py`, the source root would be `/path/to`.
 
 You can specify how these files are handled using the `source_files` attribute on the `DockerSettings`:

diff --git a/docs/book/user-guide/advanced-guide/manage-environments.md b/docs/book/user-guide/advanced-guide/manage-environments.md
@@ -2,7 +2,7 @@
 description: Navigating multiple development environments.
 ---
 
-# Manage multiple environments through which code propogates
+# Manage multiple environments through which code propagates
 
 {% hint style="warning" %}
 **Note:** This page is a work in progress (WIP) and is currently under development. If you have any questions or need
@@ -63,4 +63,4 @@ offers [image builders](../component-guide/image-builders/), a
 special [stack component](../starter-guide/understand-stacks.md), allowing users to build and push docker images in a different specialized *image builder environment*.
 
 Note that even if you don't configure an image builder in your stack, ZenML still uses
-the [local image builder](../component-guide/image-builders/local.md) to retain consistency across all builds. In this case, the image builder environment is the same as the client environment.
+the [local image builder](../component-guide/image-builders/local.md) to retain consistency across all builds. In this case, the image builder environment is the same as the client environment.
diff --git a/docs/book/user-guide/starter-guide/fetch-runs-after-execution.md b/docs/book/user-guide/starter-guide/fetch-runs-after-execution.md
@@ -83,7 +83,7 @@ The sorting of **versions** on a `PipelineView` is from **newest** to **oldest**
 
 #### Getting runs from a fetched pipeline version
 
-Each pipeline version can be executed many times. You can get a list of all runs using the `runs` attribute of a `PiplineVersionView`:
+Each pipeline version can be executed many times. You can get a list of all runs using the `runs` attribute of a `PipelineVersionView`:
 
 ```python
 print(latest_version.runs)

diff --git a/examples/add_your_own/README.md b/examples/add_your_own/README.md
@@ -30,6 +30,7 @@ from typing import Type
 
 from zenml import step, pipeline
 
+from zenml import step, pipeline
 from zenml.enums import ArtifactType
 from zenml.io import fileio
 from zenml.materializers.base_materializer import BaseMaterializer
@@ -63,6 +64,12 @@ def my_first_step() -> MyObj:
 
 my_first_step = my_first_step.configure(output_materializers=MyMaterializer)
 
+# (Optional) tell ZenML to use your custom materializer for this step.
+# This is usually not needed since ZenML automatically discovers materializers
+# and determines which one to use based on the data type of your output
+step1.configure(output_materializers=MyMaterializer)
+
+
 @step
 def my_second_step(my_obj: MyObj) -> None:
     """Step that log the input object and returns nothing."""

diff --git a/examples/airflow_orchestration/pipelines/fashion_mnist_pipeline.py b/examples/airflow_orchestration/pipelines/fashion_mnist_pipeline.py
@@ -12,22 +12,21 @@
 #  or implied. See the License for the specific language governing
 #  permissions and limitations under the License.
 
+from steps.evaluators import evaluator
+from steps.importers import importer_mnist
+from steps.trainers import trainer
+
+from zenml import pipeline
 from zenml.config import DockerSettings
 from zenml.integrations.constants import PYTORCH
-from zenml.pipelines import pipeline
 
 docker_settings = DockerSettings(
     required_integrations=[PYTORCH], requirements=["torchvision"]
 )
 
 
 @pipeline(settings={"docker": docker_settings})
-def fashion_mnist_pipeline(
-    importer,
-    trainer,
-    evaluator,
-):
-    """Link all the steps and artifacts together."""
-    train_dataloader, test_dataloader = importer()
+def fashion_mnist_pipeline():
+    train_dataloader, test_dataloader = importer_mnist()
     model = trainer(train_dataloader)
     evaluator(test_dataloader=test_dataloader, model=model)
diff --git a/examples/airflow_orchestration/run.py b/examples/airflow_orchestration/run.py
@@ -13,34 +13,23 @@
 #  permissions and limitations under the License.
 
 from pipelines.fashion_mnist_pipeline import fashion_mnist_pipeline
-from steps.evaluators import evaluator
-from steps.importers import importer_mnist
-from steps.trainers import trainer
 
 if __name__ == "__main__":
-    pipeline_instance = fashion_mnist_pipeline(
-        importer=importer_mnist(),
-        trainer=trainer(),
-        evaluator=evaluator(),
-    )
-
-    pipeline_instance.run()
+    fashion_mnist_pipeline()
 
     # In case you want to run this on a schedule uncomment the following lines.
     # Note that Airflow schedules need to be set in the past:
 
     # from datetime import datetime, timedelta
 
-    # from zenml.integrations.airflow.flavors.airflow_orchestrator_flavor import (
-    #     AirflowOrchestratorSettings,
-    # )
     # from zenml.pipelines import Schedule
 
-    # pipeline_instance.run(
+    # scheduled_pipeline = fashion_mnist_pipeline.with_options(
     #     schedule=Schedule(
     #         start_time=datetime.now() - timedelta(hours=1),
     #         end_time=datetime.now() + timedelta(hours=1),
     #         interval_second=timedelta(minutes=15),
     #         catchup=False,
     #     )
     # )
+    # scheduled_pipeline()
diff --git a/examples/airflow_orchestration/steps/evaluators.py b/examples/airflow_orchestration/steps/evaluators.py
@@ -16,7 +16,7 @@
 from torch import nn
 from torch.utils.data import DataLoader
 
-from zenml.steps import step
+from zenml import step
 
 # Get cpu or gpu device for training.
 DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

diff --git a/examples/airflow_orchestration/steps/importers.py b/examples/airflow_orchestration/steps/importers.py
@@ -16,7 +16,8 @@
 from torchvision import datasets
 from torchvision.transforms import ToTensor
 
-from zenml.steps import Output, step
+from zenml import step
+from zenml.steps import Output
 
 
 @step

diff --git a/examples/airflow_orchestration/steps/trainers.py b/examples/airflow_orchestration/steps/trainers.py
@@ -16,7 +16,7 @@
 from torch import nn
 from torch.utils.data import DataLoader
 
-from zenml.steps import step
+from zenml import step
 
 DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
 

diff --git a/examples/bentoml_deployment/steps/deployment_trigger_step.py b/examples/bentoml_deployment/steps/deployment_trigger_step.py
@@ -11,6 +11,7 @@
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
 #  or implied. See the License for the specific language governing
 #  permissions and limitations under the License.
+
 from zenml.steps import BaseParameters, step
 
 

diff --git a/examples/custom_code_deployment/kserve_pytorch/steps/deployment_trigger.py b/examples/custom_code_deployment/kserve_pytorch/steps/deployment_trigger.py
@@ -12,7 +12,6 @@
 #  or implied. See the License for the specific language governing
 #  permissions and limitations under the License.
 
-
 from zenml.steps import BaseParameters, step
 
 

diff --git a/examples/custom_code_deployment/kserve_tensorflow/steps/deployment_trigger.py b/examples/custom_code_deployment/kserve_tensorflow/steps/deployment_trigger.py
@@ -12,7 +12,6 @@
 #  or implied. See the License for the specific language governing
 #  permissions and limitations under the License.
 
-
 from zenml.steps import BaseParameters, step
 
 

diff --git a/examples/custom_code_deployment/seldon_pytorch/steps/deployment_trigger.py b/examples/custom_code_deployment/seldon_pytorch/steps/deployment_trigger.py
@@ -12,7 +12,6 @@
 #  or implied. See the License for the specific language governing
 #  permissions and limitations under the License.
 
-
 from zenml.steps import BaseParameters, step
 
 

diff --git a/examples/custom_code_deployment/seldon_tensorflow/steps/deployment_trigger.py b/examples/custom_code_deployment/seldon_tensorflow/steps/deployment_trigger.py
@@ -12,7 +12,6 @@
 #  or implied. See the License for the specific language governing
 #  permissions and limitations under the License.
 
-
 from zenml.steps import BaseParameters, step
 
 

diff --git a/examples/deepchecks_data_validation/deepchecks.ipynb b/examples/deepchecks_data_validation/deepchecks.ipynb
@@ -75,8 +75,8 @@
    "source": [
     "# Install the ZenML CLI tool, Deepchecks and scikit-learn\n",
     "\n",
-    "!pip install zenml \n",
-    "!zenml integration install deepchecks sklearn -f"
+    "!pip install zenml\n",
+    "!zenml integration install deepchecks sklearn -y"
    ]
   },
   {
@@ -177,10 +177,10 @@
     "from sklearn.base import ClassifierMixin\n",
     "from sklearn.ensemble import RandomForestClassifier\n",
     "\n",
+    "from zenml import pipeline, step\n",
     "from zenml.integrations.constants import DEEPCHECKS, SKLEARN\n",
     "from zenml.logger import get_logger\n",
-    "from zenml.pipelines import pipeline\n",
-    "from zenml.steps import Output, step"
+    "from zenml.steps import Output"
    ]
   },
   {
@@ -272,16 +272,14 @@
    "outputs": [],
    "source": [
     "from zenml.integrations.deepchecks.steps import (\n",
-    "    DeepchecksDataIntegrityCheckStepParameters,\n",
     "    deepchecks_data_integrity_check_step,\n",
     ")\n",
     "\n",
-    "data_validator = deepchecks_data_integrity_check_step(\n",
-    "    step_name=\"data_validator\",\n",
-    "    params=DeepchecksDataIntegrityCheckStepParameters(\n",
+    "data_validator = deepchecks_data_integrity_check_step.with_options(\n",
+    "    parameters=dict(\n",
     "        dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
     "    ),\n",
-    ")\n"
+    ")"
    ]
   },
   {
@@ -298,15 +296,11 @@
    "outputs": [],
    "source": [
     "from zenml.integrations.deepchecks.steps import (\n",
-    "    DeepchecksDataDriftCheckStepParameters,\n",
     "    deepchecks_data_drift_check_step,\n",
     ")\n",
     "\n",
-    "data_drift_detector = deepchecks_data_drift_check_step(\n",
-    "    step_name=\"data_drift_detector\",\n",
-    "    params=DeepchecksDataDriftCheckStepParameters(\n",
-    "        dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
-    "    ),\n",
+    "data_drift_detector = deepchecks_data_drift_check_step.with_options(\n",
+    "    parameters=dict(dataset_kwargs=dict(label=LABEL_COL, cat_features=[]))\n",
     ")"
    ]
   },
@@ -324,13 +318,11 @@
    "outputs": [],
    "source": [
     "from zenml.integrations.deepchecks.steps import (\n",
-    "    DeepchecksModelValidationCheckStepParameters,\n",
     "    deepchecks_model_validation_check_step,\n",
     ")\n",
     "\n",
-    "model_validator = deepchecks_model_validation_check_step(\n",
-    "    step_name=\"model_validator\",\n",
-    "    params=DeepchecksModelValidationCheckStepParameters(\n",
+    "model_validator = deepchecks_model_validation_check_step.with_options(\n",
+    "    parameters=dict(\n",
     "        dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
     "    ),\n",
     ")"
@@ -350,16 +342,14 @@
    "outputs": [],
    "source": [
     "from zenml.integrations.deepchecks.steps import (\n",
-    "    DeepchecksModelDriftCheckStepParameters,\n",
     "    deepchecks_model_drift_check_step,\n",
     ")\n",
     "\n",
-    "model_drift_detector = deepchecks_model_drift_check_step(\n",
-    "    step_name=\"model_drift_detector\",\n",
-    "    params=DeepchecksModelDriftCheckStepParameters(\n",
+    "model_drift_detector = deepchecks_model_drift_check_step.with_options(\n",
+    "    parameters=dict(\n",
     "        dataset_kwargs=dict(label=LABEL_COL, cat_features=[]),\n",
     "    ),\n",
-    ")\n"
+    ")"
    ]
   },
   {
@@ -392,14 +382,7 @@
     "docker_settings = DockerSettings(required_integrations=[DEEPCHECKS, SKLEARN])\n",
     "\n",
     "@pipeline(enable_cache=False, settings={\"docker\": docker_settings})\n",
-    "def data_validation_pipeline(\n",
-    "    data_loader,\n",
-    "    trainer,\n",
-    "    data_validator,\n",
-    "    model_validator,\n",
-    "    data_drift_detector,\n",
-    "    model_drift_detector,\n",
-    "):\n",
+    "def data_validation_pipeline():\n",
     "    \"\"\"Links all the steps together in a pipeline\"\"\"\n",
     "    df_train, df_test = data_loader()\n",
     "    data_validator(dataset=df_train)\n",
@@ -444,19 +427,10 @@
    },
    "outputs": [],
    "source": [
-    "pipeline_instance = data_validation_pipeline(\n",
-    "    data_loader=data_loader(),\n",
-    "    trainer=trainer(),\n",
-    "    data_validator=data_validator,\n",
-    "    model_validator=model_validator,\n",
-    "    data_drift_detector=data_drift_detector,\n",
-    "    model_drift_detector=model_drift_detector,\n",
-    ")\n",
-    "pipeline_instance.run()"
+    "data_validation_pipeline()"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -476,13 +450,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "pipeline_instance.run()\n",
-    "\n",
-    "last_run = pipeline_instance.get_runs()[0]\n",
-    "data_val_step = last_run.get_step(step=data_validator)\n",
-    "model_val_step = last_run.get_step(step=model_validator)\n",
-    "data_drift_step = last_run.get_step(step=data_drift_detector)\n",
-    "model_drift_step = last_run.get_step(step=model_drift_detector)"
+    "last_run = data_validation_pipeline.get_runs()[0]\n",
+    "data_val_step = last_run.get_step(step=\"deepchecks_data_drift_check_step\")\n",
+    "model_val_step = last_run.get_step(step=\"deepchecks_data_integrity_check_step\")\n",
+    "data_drift_step = last_run.get_step(step=\"deepchecks_model_drift_check_step\")\n",
+    "model_drift_step = last_run.get_step(step=\"deepchecks_model_validation_check_step\")"
    ]
   },
   {
@@ -550,7 +522,7 @@
    "provenance": []
   },
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -564,7 +536,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.2"
+   "version": "3.9.13"
   },
   "vscode": {
    "interpreter": {
Original file line number	Diff line number	Diff line change
Expand Up		@@ -12,7 +12,6 @@
		# or implied. See the License for the specific language governing
		# permissions and limitations under the License.


		from zenml.steps import BaseParameters, step


Expand Down