Skip to content

Commit

Permalink
FIX: reflected 2nd review
Browse files Browse the repository at this point in the history
  • Loading branch information
takumiohym committed Mar 16, 2023
1 parent 82a26ee commit 5332dce
Showing 1 changed file with 25 additions and 18 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"1. Learn how to use the hyperparameter tuning engine on Vertex AI to find the best hyperparameters\n",
"1. Learn how to deploy a trained Pytorch model on Vertex AI as a REST API and query it\n",
"\n",
"In this lab, you develop, package as a docker image, and run on **Vertex AI Training** a training application that trains a multi-class classification model that predicts the type of forest cover from cartographic data. The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n",
"In this lab you will develop a multi-class classification training application, package it as a docker image and run the application on Vertex AI.The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n",
"\n",
"The training code uses `Pytorch` for data pre-processing and modeling. The code has been instrumented using the `hypertune` package so it can be used with **Vertex AI** hyperparameter tuning.\n"
]
Expand Down Expand Up @@ -179,7 +179,13 @@
"Soil_Type:STRING,\\\n",
"Cover_Type:INTEGER\n",
"\n",
"bq --location=$DATASET_LOCATION --project_id=$PROJECT_ID mk --dataset $DATASET_ID\n",
"exists=$(bq ls -d | grep -w $DATASET_ID)\n",
"if [ -n \"$exists\" ]; then\n",
" echo \"$DATASET_ID already exists\"\n",
"else\n",
" echo \"Creating $dataset\"\n",
" bq --location=$DATASET_LOCATION --project_id=$PROJECT_ID mk --dataset $DATASET_ID\n",
"fi\n",
"\n",
"bq --project_id=$PROJECT_ID --dataset_id=$DATASET_ID load \\\n",
"--source_format=CSV \\\n",
Expand Down Expand Up @@ -375,7 +381,7 @@
" )\n",
" return torch.tensor(one_hot).float()\n",
"\n",
" def seriarize_constants(self):\n",
" def serialize_constants(self):\n",
" return {\"dictionary\": self.dictionary.get_itos()}\n",
"\n",
"\n",
Expand All @@ -394,7 +400,7 @@
" standardized = (feature - self.mean) / self.std\n",
" return torch.tensor(standardized)[:, None].float()\n",
"\n",
" def seriarize_constants(self):\n",
" def serialize_constants(self):\n",
" return {\"mean\": self.mean, \"std\": self.std}"
]
},
Expand Down Expand Up @@ -437,7 +443,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Preproc Categorical columns\n",
"# Preprocessing Categorical columns\n",
"transformers = {\n",
" c_feature: OneHotEncoder().fit(df_train[c_feature])\n",
" for c_feature in CATEGORICAL_FEATURES\n",
Expand All @@ -458,8 +464,7 @@
"metadata": {},
"source": [
"### Export preprocessing states file for prediction\n",
"Our training and validation data are transformed successfully.<br>\n",
"Then, let's test the seriarize_constants function, and save the states in a JSON file."
"\"Once the training and validation data have been transformed successfully. We can use `serialize_constants` to save the states to a JSON file."
]
},
{
Expand All @@ -470,7 +475,7 @@
"source": [
"# export json for preprocessing\n",
"preprocessing_json = {\n",
" c: transformers[c].seriarize_constants()\n",
" c: transformers[c].serialize_constants()\n",
" for c in df_train.columns\n",
" if c != LABEL_COLUMN\n",
"}\n",
Expand All @@ -493,10 +498,10 @@
"metadata": {},
"source": [
"### Define a model and training/validation step\n",
"We define a simple neural network model, and training and validation steps in Pytorch.\n",
"We will define a simple neural network model with the training and validation steps in PyTorch.\n",
"\n",
"In Pytorch, we can define a neural network model in a class subclassing `torch.nn.Module`.<br>\n",
"there is an `__init__()` method that defines the layers and other components of a model, and a `forward()` method where the computation gets done. <br>\n",
"In Pytorch, we can define a neural network model in a class that subclasses `torch.nn.Module`.<br>\n",
"There is an `__init__()` method that defines the layers and other components of a model, and a `forward()` method where the computation gets done. <br>\n",
"For more detail, refer to [the official document](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).\n",
"\n",
"Also, to make our model executable either on CPU or GPU device, we define [`torch.device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device) depending on the cuda availability, specify in `torch.nn.Module.to()` method which take care of parameter dtype conversion."
Expand Down Expand Up @@ -621,7 +626,9 @@
"metadata": {},
"source": [
"### Run training locally.\n",
"Let's test if it runs locally before passing it Cloud training."
"Let's test if it runs locally before passing it to Vertex for training.\n",
"\n",
"This will help identify and fix any errors in our code locally."
]
},
{
Expand Down Expand Up @@ -687,10 +694,10 @@
"metadata": {},
"source": [
"### Write the tuning script. \n",
"In order to run Cloud training, we define a python file that includes all the codes from preprocessing to training. Most of the codes are the same as the one above.<br>\n",
"But we need to add some codes to do hyperparameter tuning, and save trained model and preprocessing states for later use.\n",
"In order to run training in Vertex, we define a python file that includes all the code from preprocessing to training. <br>\n",
"Most of the code is the same as what we have in the cells above. But we will add a few additional lines of code to do hyperparameter training. These lines will save the trained mode and the preprocessing states for later use.\n",
"\n",
"Notice the use of the `hypertune` package to report the `accuracy` optimization metric to Vertex AI hyperparameter tuning service.\n"
"We will use of the `hypertune` library to report the `accuracy` as the metric for optimization."
]
},
{
Expand Down Expand Up @@ -747,7 +754,7 @@
" )\n",
" return torch.tensor(one_hot).float()\n",
"\n",
" def seriarize_constants(self):\n",
" def serialize_constants(self):\n",
" return {\"dictionary\": self.dictionary.get_itos()}\n",
"\n",
"\n",
Expand All @@ -766,7 +773,7 @@
" standardized = (feature - self.mean) / self.std\n",
" return torch.tensor(standardized)[:, None].float()\n",
"\n",
" def seriarize_constants(self):\n",
" def serialize_constants(self):\n",
" return {\"mean\": self.mean, \"std\": self.std}\n",
"\n",
"def preprocess(df, transformers):\n",
Expand Down Expand Up @@ -877,7 +884,7 @@
" gcs_model_path = \"{}/{}\".format(job_dir, model_filename)\n",
"\n",
" # export json for preprocessing\n",
" preprocesinng_json = {c: transformers[c].seriarize_constants() \n",
" preprocesinng_json = {c: transformers[c].serialize_constants() \n",
" for c in df_train.columns if c != LABEL_COLUMN}\n",
" preproc_json_filename = 'preprocessing.json'\n",
" with open(preproc_json_filename, 'w') as f:\n",
Expand Down

0 comments on commit 5332dce

Please sign in to comment.