This directory contains the code for running a Wide and Deep model. It also runs in Cloud ML Engine. This code has been tested on Python 2.7 but should also run on Python 3.5
Follow along the codelab here: http://bit.ly/widendeep-slides
Wide and deep jointly trains wide linear models and deep neural networks -- to combine the benefits of memorization and generalization for recommender systems. See the research paper for more details. The code is based on the TensorFlow wide and deep tutorial.
We will use the Kaggle Criteo Dataset to predict the probability that an ad is clicked.
The dataset is downloaded as part of the script (and in the cloud directly uses the copy stored online).
If you wish to download a copy, the data are located here:
- gs://dataset-uploader/criteo-kaggle/small_version -- 2.5MB, 10K rows
- gs://dataset-uploader/criteo-kaggle/medium_version -- 273MB, 1M rows
- gs://dataset-uploader/criteo-kaggle/large_version -- 2.7GB, 10M rows
Each folder contains 2 files: train.csv and eval.csv
The command line tool gsutil is part of the Google Cloud SDK, and comes with gcloud. No gsutil but still want to download? Replace "gs://" with 'https://storage.googleapis.com/', for example: https://storage.googleapis.com/dataset-uploader/criteo-kaggle/small_version/train.csv
This repo presents 3 methods of running the model: locally, on a jupyter notebook, and on Google Cloud ML Engine.
The commands below assume you are in this directory (wide_n_deep).
You should move to it with cd workshop_sections/wide_n_deep
python trainer/task.py
Run the notebook, and step through the cells.
jupyter notebook
The workflow to run this on Cloud Machine Learning Engine is to do a local run first, then move to the cloud.
$ gcloud ml-engine local train --package-path=trainer --module-name=trainer.task
TensorFlow version 1.0.0
model directory = models/model_WIDE_AND_DEEP_1491431579
estimator built
fit done
evaluate done
Accuracy: 0.84125
Model exported to models/model_WIDE_AND_DEEP_1491431579/exports
Ensure you have the project you want to work in selected. You can check using gcloud config list
If you need to set it to a new value, do so using gcloud config set <YOUR_PROJECT_ID_HERE>
You should also make sure that the Cloud ML Engine API is turned on for your project. More info about getting setup is here: https://cloud.google.com/ml-engine/docs/quickstarts/command-line
Next, set the following environment variables and submit a training job.
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-c
export PROJECT_ID=`gcloud config list project --format "value(core.project)"`
export BUCKET=gs://${PROJECT_ID}-ml
export JOB_NAME=widendeep_${USER}_$(date +%Y%m%d_%H%M%S)
export TRAIN_PATH=${BUCKET}/${JOB_NAME}
gcloud ml-engine jobs submit training ${JOB_NAME} --package-path=trainer --module-name=trainer.task --region=us-central1 --job-dir=${TRAIN_PATH}
When you are ready to run it on a more power cluster, you can customize a config.yaml file. Included in this repo are 2 examples, one for the STANDARD_1 sizing, and one custom setup which includes 3 GPU-enabled machines.
gcloud ml-engine jobs submit training ${JOB_NAME} --package-path=trainer --module-name=trainer.task --job-dir=${TRAIN_PATH} --config config_standard.yaml
gcloud ml-engine jobs submit training ${JOB_NAME} --package-path=trainer --module-name=trainer.task --job-dir=${TRAIN_PATH} --config config_gpu.yaml
You can check the status of your training job with the command:
gcloud ml-engine jobs describe $JOB_NAME
You can also see it's progress in your cloud console and view the logs.
To run another job (in your dev workflow), simply set a new JOB_NAME
and TRAIN_PATH
and then re-run the jobs.summit
call. Job names must be unique.
export JOB_NAME=widendeep_${USER}_$(date +%Y%m%d_%H%M%S)
export TRAIN_PATH=${BUCKET}/${JOB_NAME}
gcloud ml-engine jobs submit training ${JOB_NAME} --package-path=trainer --module-name=trainer.task --region=us-central1 --job-dir=${TRAIN_PATH}
Whether you ran your training locally or in the cloud, you should now have a set of files exported. If you ran this locally, it will be located in someplace similar to models/model_WIDE_AND_DEEP_1234567890/exports/1234567890
. If you ran it in the cloud, it will be located in the GCS bucket that you passed.
The trained model files that were exported are ready to be used for prediction.
You can run prediction jobs in Cloud ML Engine as well, using the Prediction Service.
Before we begin, if you trained a model locally, you should upload the contents that were exported (something like saved_model.pb
and a folder called variables
) to a Google Cloud Storage location and make a note of its address (gs://<BUCKET_ID>/path/to/model
)
Now we are ready to create a model
export MODEL_NAME='my_model'
gcloud ml-engine models create $MODEL_NAME
Next, create a 'version' of that model
export VERSION_NAME='my_version'
export DEPLOYMENT_SOURCE='gs://LOCATION_OF_MODEL_FILES'
gcloud ml-engine versions create $VERSION_NAME --model $MODEL_NAME --origin $DEPLOYMENT_SOURCE
Finally, make a prediction with your newly deployed version!
gcloud ml-engine predict --model $MODEL_NAME --version $VERSION_NAME --json-instances test_instance.json