FIX: reflected 2nd review

karthisubra · Mar 16, 2023 · 5332dce · 5332dce
1 parent 82a26ee
commit 5332dce
Showing 1 changed file with 25 additions and 18 deletions.
diff --git a/notebooks/kubeflow_pipelines/walkthrough/solutions/kfp_walkthrough_vertex_pytorch.ipynb b/notebooks/kubeflow_pipelines/walkthrough/solutions/kfp_walkthrough_vertex_pytorch.ipynb
@@ -12,7 +12,7 @@
     "1. Learn how to use the hyperparameter tuning engine on Vertex AI to find the best hyperparameters\n",
     "1. Learn how to deploy a trained Pytorch model on Vertex AI as a REST API and query it\n",
     "\n",
-    "In this lab, you develop, package as a docker image, and run on **Vertex AI Training** a training application that trains a multi-class classification model that predicts the type of forest cover from cartographic data. The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n",
+    "In this lab you will develop a multi-class classification training application, package it as a docker image and run the application on Vertex AI.The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n",
     "\n",
     "The training code uses `Pytorch` for data pre-processing and modeling. The code has been instrumented using the `hypertune` package so it can be used with **Vertex AI** hyperparameter tuning.\n"
    ]
@@ -179,7 +179,13 @@
     "Soil_Type:STRING,\\\n",
     "Cover_Type:INTEGER\n",
     "\n",
-    "bq --location=$DATASET_LOCATION --project_id=$PROJECT_ID mk --dataset $DATASET_ID\n",
+    "exists=$(bq ls -d | grep -w $DATASET_ID)\n",
+    "if [ -n \"$exists\" ]; then\n",
+    "   echo \"$DATASET_ID already exists\"\n",
+    "else\n",
+    "   echo \"Creating $dataset\"\n",
+    "   bq --location=$DATASET_LOCATION --project_id=$PROJECT_ID mk --dataset $DATASET_ID\n",
+    "fi\n",
     "\n",
     "bq --project_id=$PROJECT_ID --dataset_id=$DATASET_ID load \\\n",
     "--source_format=CSV \\\n",
@@ -375,7 +381,7 @@
     "        )\n",
     "        return torch.tensor(one_hot).float()\n",
     "\n",
-    "    def seriarize_constants(self):\n",
+    "    def serialize_constants(self):\n",
     "        return {\"dictionary\": self.dictionary.get_itos()}\n",
     "\n",
     "\n",
@@ -394,7 +400,7 @@
     "        standardized = (feature - self.mean) / self.std\n",
     "        return torch.tensor(standardized)[:, None].float()\n",
     "\n",
-    "    def seriarize_constants(self):\n",
+    "    def serialize_constants(self):\n",
     "        return {\"mean\": self.mean, \"std\": self.std}"
    ]
   },
@@ -437,7 +443,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Preproc Categorical columns\n",
+    "# Preprocessing Categorical columns\n",
     "transformers = {\n",
     "    c_feature: OneHotEncoder().fit(df_train[c_feature])\n",
     "    for c_feature in CATEGORICAL_FEATURES\n",
@@ -458,8 +464,7 @@
    "metadata": {},
    "source": [
     "### Export preprocessing states file for prediction\n",
-    "Our training and validation data are transformed successfully.<br>\n",
-    "Then, let's test the seriarize_constants function, and save the states in a JSON file."
+    "\"Once the training and validation data have been transformed successfully. We can use `serialize_constants` to save the states to a JSON file."
    ]
   },
   {
@@ -470,7 +475,7 @@
    "source": [
     "# export json for preprocessing\n",
     "preprocessing_json = {\n",
-    "    c: transformers[c].seriarize_constants()\n",
+    "    c: transformers[c].serialize_constants()\n",
     "    for c in df_train.columns\n",
     "    if c != LABEL_COLUMN\n",
     "}\n",
@@ -493,10 +498,10 @@
    "metadata": {},
    "source": [
     "### Define a model and training/validation step\n",
-    "We define a simple neural network model, and training and validation steps in Pytorch.\n",
+    "We will define a simple neural network model with the training and validation steps in PyTorch.\n",
     "\n",
-    "In Pytorch, we can define a neural network model in a class subclassing `torch.nn.Module`.<br>\n",
-    "there is an `__init__()` method that defines the layers and other components of a model, and a `forward()` method where the computation gets done. <br>\n",
+    "In Pytorch, we can define a neural network model in a class that subclasses `torch.nn.Module`.<br>\n",
+    "There is an `__init__()` method that defines the layers and other components of a model, and a `forward()` method where the computation gets done. <br>\n",
     "For more detail, refer to [the official document](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).\n",
     "\n",
     "Also, to make our model executable either on CPU or GPU device, we define [`torch.device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device) depending on the cuda availability, specify in `torch.nn.Module.to()` method which take care of parameter dtype conversion."
@@ -621,7 +626,9 @@
    "metadata": {},
    "source": [
     "### Run training locally.\n",
-    "Let's test if it runs locally before passing it Cloud training."
+    "Let's test if it runs locally before passing it to Vertex for training.\n",
+    "\n",
+    "This will help identify and fix any errors in our code locally."
    ]
   },
   {
@@ -687,10 +694,10 @@
    "metadata": {},
    "source": [
     "### Write the tuning script. \n",
-    "In order to run Cloud training, we define a python file that includes all the codes from preprocessing to training. Most of the codes are the same as the one above.<br>\n",
-    "But we need to add some codes to do hyperparameter tuning, and save trained model and preprocessing states for later use.\n",
+    "In order to run training in Vertex, we define a python file that includes all the code from preprocessing to training. <br>\n",
+    "Most of the code is the same as what we have in the cells above. But we will add a few additional lines of code to do hyperparameter training. These lines will save the trained mode and the preprocessing states for later use.\n",
     "\n",
-    "Notice the use of the `hypertune` package to report the `accuracy` optimization metric to Vertex AI hyperparameter tuning service.\n"
+    "We will use of the `hypertune` library to report the `accuracy` as the metric for optimization."
    ]
   },
   {
@@ -747,7 +754,7 @@
     "        )\n",
     "        return torch.tensor(one_hot).float()\n",
     "\n",
-    "    def seriarize_constants(self):\n",
+    "    def serialize_constants(self):\n",
     "        return {\"dictionary\": self.dictionary.get_itos()}\n",
     "\n",
     "\n",
@@ -766,7 +773,7 @@
     "        standardized = (feature - self.mean) / self.std\n",
     "        return torch.tensor(standardized)[:, None].float()\n",
     "\n",
-    "    def seriarize_constants(self):\n",
+    "    def serialize_constants(self):\n",
     "        return {\"mean\": self.mean, \"std\": self.std}\n",
     "\n",
     "def preprocess(df, transformers):\n",
@@ -877,7 +884,7 @@
     "        gcs_model_path = \"{}/{}\".format(job_dir, model_filename)\n",
     "\n",
     "        # export json for preprocessing\n",
-    "        preprocesinng_json = {c: transformers[c].seriarize_constants() \n",
+    "        preprocesinng_json = {c: transformers[c].serialize_constants() \n",
     "                              for c in df_train.columns if c != LABEL_COLUMN}\n",
     "        preproc_json_filename = 'preprocessing.json'\n",
     "        with open(preproc_json_filename, 'w') as f:\n",