Skip to content

Commit 1cbdd53

Browse files
iElshaphilschmidJoe Davisonjulien-c
authored
Add How to deploy a pipeline to Google Cloud blog article (huggingface#85)
* Add How to deploy a pipeline to Google Cloud blog article * Update how-to-deploy-a-pipeline-to-google-clouds.md Co-authored-by: Philipp Schmid <[email protected]> * Update how-to-deploy-a-pipeline-to-google-clouds.md Co-authored-by: Philipp Schmid <[email protected]> * Update how-to-deploy-a-pipeline-to-google-clouds.md Co-authored-by: Philipp Schmid <[email protected]> * Update how-to-deploy-a-pipeline-to-google-clouds.md Co-authored-by: Philipp Schmid <[email protected]> * Apply suggestions from code review Batch suggestion update Co-authored-by: Philipp Schmid <[email protected]> * Improvements & Requested changes Typos fixed * Added more information about the dockerfile Added an environment variable in the dockerfile content Added instructions "before you begin the deployment" * Added more information in the section "My Goal" * Apply suggestions from code review Co-authored-by: Joe Davison <[email protected]> * Removed outdated paragraph * Apply suggestions from code review * Add thumbnail * Add metadata stuff + fix a few english * Link from /blog Co-authored-by: Philipp Schmid <[email protected]> Co-authored-by: Joe Davison <[email protected]> Co-authored-by: Julien Chaumond <[email protected]>
1 parent dc1e023 commit 1cbdd53

9 files changed

+187
-0
lines changed

_blog.yml

+6
Original file line numberDiff line numberDiff line change
@@ -110,3 +110,9 @@
110110
title: "Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers"
111111
thumbnail: ./assets/16_fine_tune_xlsr_wav2vec2/xlsr_wav2vec2.png
112112
date: March 12, 2021
113+
114+
- local: how-to-deploy-a-pipeline-to-google-clouds
115+
title: "My Journey to a serverless transformers pipeline on Google Cloud"
116+
author: Maxence
117+
guest: true
118+
date: March 18, 2021
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
title: "My Journey to a serverless transformers pipeline on Google Cloud"
3+
thumbnail: /blog/assets/14_how_to_deploy_a_pipeline_to_google_clouds/thumbnail.png
4+
---
5+
6+
# My Journey to a serverless transformers pipeline on <br>Google Cloud
7+
8+
<div class="blog-metadata">
9+
<small>Published March 18, 2021.</small>
10+
<a target="_blank" class="btn no-underline text-sm mb-5 font-sans" href="https://github.com/huggingface/blog/blob/master/how-to-deploy-a-pipeline-to-google-clouds.md">
11+
Update on GitHub
12+
</a>
13+
</div>
14+
15+
<div class="author-card">
16+
<a href="/Maxence">
17+
<img class="avatar avatar-user" src="https://aeiljuispo.cloudimg.io/v7/https://s3.amazonaws.com/moonup/production/uploads/1613496680893-602bfe18c4f8038e9a1e0a66.jpeg?w=200&h=200&f=face" title="Gravatar">
18+
<div class="bfc">
19+
<code>Maxence</code>
20+
<span class="fullname">Maxence Dominici</span>
21+
<span class="bg-gray-100 rounded px-1 text-gray-600 text-sm font-mono">guest</span>
22+
</div>
23+
</a>
24+
</div>
25+
26+
> ##### A guest blog post by community member <a href="/Maxence">Maxence Dominici</a>
27+
28+
This article will discuss my journey to deploy the `transformers` _sentiment-analysis_ pipeline on [Google Cloud](https://cloud.google.com). We will start with a quick introduction to `transformers` and then move to the technical part of the implementation. Finally, we'll summarize this implementation and review what we have achieved.
29+
30+
## The Goal
31+
![img.png](assets/14_how_to_deploy_a_pipeline_to_google_clouds/Customer_review.png)
32+
I wanted to create a micro-service that automatically detects whether a customer review left in Discord is positive or negative. This would allow me to treat the comment accordingly and improve the customer experience. For instance, if the review was negative, I could create a feature which would contact the customer, apologize for the poor quality of service, and inform him/her that our support team will contact him/her as soon as possible to assist him and hopefully fix the problem. Since I don't plan to get more than 2,000 requests per month, I didn't impose any performance constraints regarding the time and the scalability.
33+
34+
## The Transformers library
35+
I was a bit confused at the beginning when I downloaded the .h5 file. I thought it would be compatible with `tensorflow.keras.models.load_model`, but this wasn't the case. After a few minutes of research I was able to figure out that the file was a weights checkpoint rather than a Keras model.
36+
After that, I tried out the API that Hugging Face offers and read a bit more about the pipeline feature they offer. Since the results of the API & the pipeline were great, I decided that I could serve the model through the pipeline on my own server.
37+
38+
Below is the [official example](https://github.com/huggingface/transformers#quick-tour) from the Transformers GitHub page.
39+
40+
```python
41+
from transformers import pipeline
42+
43+
# Allocate a pipeline for sentiment-analysis
44+
classifier = pipeline('sentiment-analysis')
45+
classifier('We are very happy to include pipeline into the transformers repository.')
46+
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
47+
```
48+
49+
50+
## Deploy transformers to Google Cloud
51+
> GCP is chosen as it is the cloud environment I am using in my personal organization.
52+
53+
### Step 1 - Research
54+
I already knew that I could use an API-Service like `flask` to serve a `transformers` model. I searched in the Google Cloud AI documentation and found a service to host Tensorflow models named [AI-Platform Prediction](https://cloud.google.com/ai-platform/prediction/docs). I also found [App Engine](https://cloud.google.com/appengine) and [Cloud Run](https://cloud.google.com/run) there, but I was concerned about the memory usage for App Engine and was not very familiar with Docker.
55+
56+
### Step 2 - Test on AI-Platform Prediction
57+
58+
As the model is not a "pure TensorFlow" saved model but a checkpoint, and I couldn't turn it into a "pure TensorFlow model", I figured out that the example on [this page](https://cloud.google.com/ai-platform/prediction/docs/deploying-models) wouldn't work.
59+
From there I saw that I could write some custom code, allowing me to load the `pipeline` instead of having to handle the model, which seemed is easier. I also learned that I could define a pre-prediction & post-prediction action, which could be useful in the future for pre- or post-processing the data for customers' needs.
60+
I followed Google's guide but encountered an issue as the service is still in beta and everything is not stable. This issue is detailed [here](https://github.com/huggingface/transformers/issues/9926).
61+
62+
63+
### Step 3 - Test on App Engine
64+
65+
I moved to Google's [App Engine](https://cloud.google.com/appengine) as it's a service that I am familiar with, but encountered an installation issue with TensorFlow due to a missing system dependency file. I then tried with PyTorch which worked with an F4_1G instance, but it couldn't handle more than 2 requests on the same instance, which isn't really great performance-wise.
66+
67+
### Step 4 - Test on Cloud Run
68+
69+
Lastly, I moved to [Cloud Run](https://cloud.google.com/run) with a docker image. I followed [this guide](https://cloud.google.com/run/docs/quickstarts/build-and-deploy#python) to get an idea of how it works. In Cloud Run, I could configure a higher memory and more vCPUs to perform the prediction with PyTorch. I ditched Tensorflow as PyTorch seems to load the model faster.
70+
71+
72+
## Implementation of the serverless pipeline
73+
74+
The final solution consists of four different components:
75+
- `main.py` handling the request to the pipeline
76+
- `Dockerfile` used to create the image that will be deployed on Cloud Run.
77+
- Model folder having the `pytorch_model.bin`, `config.json` and `vocab.txt`.
78+
- Model : [DistilBERT base uncased finetuned SST-2
79+
](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
80+
- To download the model folder, follow the instructions in the button. ![img.png](assets/14_how_to_deploy_a_pipeline_to_google_clouds/Download_instructions_button.png)
81+
- You don't need to keep the `rust_model.ot` or the `tf_model.h5` as we will use [PyTorch](https://pytorch.org/).
82+
- `requirement.txt` for installing the dependencies
83+
84+
The content on the `main.py` is really simple. The idea is to receive a `GET` request containing two fields. First the review that needs to be analysed, second the API key to "protect" the service. The second parameter is optional, I used it to avoid setting up the oAuth2 of Cloud Run. After these arguments are provided, we load the pipeline which is built based on the model `distilbert-base-uncased-finetuned-sst-2-english` (provided above). In the end, the best match is returned to the client.
85+
86+
```python
87+
import os
88+
from flask import Flask, jsonify, request
89+
from transformers import pipeline
90+
91+
app = Flask(__name__)
92+
93+
model_path = "./model"
94+
95+
@app.route('/')
96+
def classify_review():
97+
review = request.args.get('review')
98+
api_key = request.args.get('api_key')
99+
if review is None or api_key != "MyCustomerApiKey":
100+
return jsonify(code=403, message="bad request")
101+
classify = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)
102+
return classify("that was great")[0]
103+
104+
105+
if __name__ == '__main__':
106+
# This is used when running locally only. When deploying to Google Cloud
107+
# Run, a webserver process such as Gunicorn will serve the app.
108+
app.run(debug=False, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
109+
```
110+
111+
Then the `DockerFile` which will be used to create a docker image of the service. We specify that our service runs with python:3.7, plus that we need to install our requirements. Then we use `gunicorn` to handle our process on the port `5000`.
112+
```dockerfile
113+
# Use Python37
114+
FROM python:3.7
115+
# Allow statements and log messages to immediately appear in the Knative logs
116+
ENV PYTHONUNBUFFERED True
117+
# Copy requirements.txt to the docker image and install packages
118+
COPY requirements.txt /
119+
RUN pip install -r requirements.txt
120+
# Set the WORKDIR to be the folder
121+
COPY . /app
122+
# Expose port 5000
123+
EXPOSE 5000
124+
ENV PORT 5000
125+
WORKDIR /app
126+
# Use gunicorn as the entrypoint
127+
CMD exec gunicorn --bind :$PORT main:app --workers 1 --threads 1 --timeout 0
128+
```
129+
130+
It is important to note the arguments `--workers 1 --threads 1` which means that I want to execute my app on only one worker (= 1 process) with a single thread. This is because I don't want to have 2 instances up at once because it might increase the billing. One of the downsides is that it will take more time to process if the service receives two requests at once. After that, I put the limit to one thread due to the memory usage needed for loading the model into the pipeline. If I were using 4 threads, I might have 4 Gb / 4 = 1 Gb only to perform the full process, which is not enough and would lead to a memory error.
131+
132+
Finally, the `requirement.txt` file
133+
```python
134+
Flask==1.1.2
135+
torch===1.7.1
136+
transformers~=4.2.0
137+
gunicorn>=20.0.0
138+
```
139+
140+
141+
## Deployment instructions
142+
143+
First, you will need to meet some requirements such as having a project on Google Cloud, enabling the billing and installing the `gcloud` cli. You can find more details about it in the [Google's guide - Before you begin](https://cloud.google.com/run/docs/quickstarts/build-and-deploy#before-you-begin),
144+
145+
Second, we need to build the docker image and deploy it to cloud run by selecting the correct project (replace `PROJECT-ID`) and set the name of the instance such as `ai-customer-review`. You can find more information about the deployment on [Google's guide - Deploying to](https://cloud.google.com/run/docs/quickstarts/build-and-deploy#deploying_to).
146+
147+
```shell
148+
gcloud builds submit --tag gcr.io/PROJECT-ID/ai-customer-review
149+
gcloud run deploy --image gcr.io/PROJECT-ID/ai-customer-review --platform managed
150+
```
151+
152+
After a few minutes, you will also need to upgrade the memory allocated to your Cloud Run instance from 256 MB to 4 Gb. To do so, head over to the [Cloud Run Console](https://console.cloud.google.com/run) of your project.
153+
154+
There you should find your instance, click on it.
155+
156+
![img.png](assets/14_how_to_deploy_a_pipeline_to_google_clouds/Cloud_run_instance.png)
157+
158+
After that you will have a blue button labelled "edit and deploy new revision" on top of the screen, click on it and you'll be prompt many configuration fields. At the bottom you should find a "Capacity" section where you can specify the memory.
159+
160+
![img.png](assets/14_how_to_deploy_a_pipeline_to_google_clouds/Edit_memory.png)
161+
162+
## Performances
163+
![img.png](assets/14_how_to_deploy_a_pipeline_to_google_clouds/Request_Result.png)
164+
165+
Handling a request takes less than five seconds from the moment you send the request including loading the model into the pipeline, and prediction. The cold start might take up an additional 10 seconds more or less.
166+
167+
We can improve the request handling performance by warming the model, it means loading it on start-up instead on each request (global variable for example), by doing so, we win time and memory usage.
168+
169+
## Costs
170+
I simulated the cost based on the Cloud Run instance configuration with [Google pricing simulator](https://cloud.google.com/products/calculator#id=cd314cba-1d9a-4bc6-a7c0-740bbf6c8a78)
171+
![Estimate of the monthly cost](./assets/14_how_to_deploy_a_pipeline_to_google_clouds/Estimate_of_the_monthly_cost.png)
172+
173+
For my micro-service, I am planning to near 1,000 requests per month, optimistically. 500 may more likely for my usage. That's why I considered 2,000 requests as an upper bound when designing my microservice.
174+
Due to that low number of requests, I didn't bother so much regarding the scalability but might come back into it if my billing increases.
175+
176+
Nevertheless, it's important to stress that you will pay the storage for each Gigabyte of your build image. It's roughly €0.10 per Gb per month, which is fine if you don't keep all your versions on the cloud since my version is slightly above 1 Gb (Pytorch for 700 Mb & the model for 250 Mb).
177+
178+
## Conclusion
179+
180+
By using Transformers' sentiment analysis pipeline, I saved a non-negligible amount of time. Instead of training/fine-tuning a model, I could find one ready to be used in production and start the deployment in my system. I might fine-tune it in the future, but as shown on my test, the accuracy is already amazing!
181+
I would have liked a "pure TensorFlow" model, or at least a way to load it in TensorFlow without Transformers dependencies to use the AI platform. It would also be great to have a lite version.

0 commit comments

Comments
 (0)