|
| 1 | +--- |
| 2 | +title: 'Deploy Hugging Face models easily with Amazon SageMaker' |
| 3 | +thumbnail: /blog/assets/17_the_partnership_amazon_sagemaker_and_hugging_face/thumbnail.png |
| 4 | +--- |
| 5 | + |
| 6 | +<img src="/blog/assets/17_the_partnership_amazon_sagemaker_and_hugging_face/cover.png" alt="hugging-face-and-aws-logo" class="w-full"> |
| 7 | + |
| 8 | + |
| 9 | +# **Deploy Hugging Face models easily with Amazon SageMaker 🏎** |
| 10 | + |
| 11 | +Earlier this year[ we announced a strategic collaboration with Amazon](https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face) to make it easier for companies to use Hugging Face in Amazon SageMaker, and ship cutting-edge Machine Learning features faster. We introduced new Hugging Face Deep Learning Containers (DLCs) to[ train Hugging Face Transformer models in Amazon SageMaker](https://huggingface.co/transformers/sagemaker.html#getting-started-train-a-transformers-model). |
| 12 | + |
| 13 | +Today, we are excited to share a new inference solution with you that makes it easier than ever to deploy Hugging Face Transformers with Amazon SageMaker! With the new Hugging Face Inference DLCs, you can deploy your trained models for inference with just one more line of code, or select any of the 10,000+ publicly available models from the[ Model Hub](https://huggingface.co/models), and deploy them with Amazon SageMaker. |
| 14 | + |
| 15 | +Deploying models in SageMaker provides you with production-ready endpoints that scale easily within your AWS environment, with built-in monitoring and a ton of enterprise features. It's been an amazing collaboration and we hope you will take advantage of it! |
| 16 | + |
| 17 | +Here's how to use the new[ SageMaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit) to deploy Transformers-based models: |
| 18 | + |
| 19 | + |
| 20 | +```python |
| 21 | +from sagemaker.huggingface import HuggingFaceModel |
| 22 | + |
| 23 | +# create Hugging Face Model Class and deploy it as SageMaker Endpoint |
| 24 | +huggingface_model = HuggingFaceModel(...).deploy() |
| 25 | +``` |
| 26 | + |
| 27 | + |
| 28 | +That's it! 🚀 |
| 29 | + |
| 30 | +To learn more about accessing and using the new Hugging Face DLCs with the Amazon SageMaker Python SDK, check out the guides and resources below. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | +# **Resources, Documentation & Samples 📄** |
| 39 | + |
| 40 | +Below you can find all the important resources for deploying your models to Amazon SageMaker. |
| 41 | + |
| 42 | + |
| 43 | +## **Blog/Video** |
| 44 | + |
| 45 | +- [Video: Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker](https://youtu.be/pfBGgSGnYLs) |
| 46 | +- [Video: Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker](https://youtu.be/l9QZuazbzWM) |
| 47 | + |
| 48 | + |
| 49 | +## **Samples/Documentation** |
| 50 | + |
| 51 | +- [Hugging Face documentation for Amazon SageMaker](https://huggingface.co/docs/sagemaker/main) |
| 52 | +- [Deploy models to Amazon SageMaker](https://huggingface.co/docs/sagemaker/inference) |
| 53 | +- [Amazon SageMaker documentation for Hugging Face](https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html) |
| 54 | +- [Python SDK SageMaker documentation for Hugging Face](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html) |
| 55 | +- [Deep Learning Container](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers) |
| 56 | +- [Notebook: Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference](https://github.com/huggingface/notebooks/blob/master/sagemaker/11_deploy_model_from_hf_hub/deploy_transformer_model_from_hf_hub.ipynb) |
| 57 | +- [Notebook: Deploy a Hugging Face Transformer model from S3 to SageMaker for inference](https://github.com/huggingface/notebooks/blob/master/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb) |
| 58 | + |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | + |
| 63 | +# **SageMaker Hugging Face Inference Toolkit ⚙️** |
| 64 | + |
| 65 | +In addition to the Hugging Face Transformers-optimized Deep Learning Containers for inference, we have created a new[ Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit) for Amazon SageMaker. This new Inference Toolkit leverages the `pipelines` from the `transformers` library to allow zero-code deployments of models without writing any code for pre- or post-processing. In the "Getting Started" section below you find two examples of how to deploy your models to Amazon SageMaker. |
| 66 | + |
| 67 | +In addition to the zero-code deployment, the Inference Toolkit supports "bring your own code" methods, where you can override the default methods. You can learn more about "bring your own code" in the documentation[ here](https://github.com/aws/sagemaker-huggingface-inference-toolkit#-user-defined-codemodules) or you can check out the sample notebook "deploy custom inference code to Amazon SageMaker". |
| 68 | + |
| 69 | + |
| 70 | +## **API - Inference Toolkit Description** |
| 71 | + |
| 72 | +Using the` transformers pipelines`, we designed an API, which makes it easy for you to benefit from all `pipelines` features. The API has a similar interface than the[ 🤗 Accelerated Inference API](https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html), meaning your inputs need to be defined in the `inputs` key and if you want additional supported `pipelines` parameters you can add them in the `parameters` key. Below you can find examples for requests. |
| 73 | + |
| 74 | + |
| 75 | +```python |
| 76 | +# text-classification request body |
| 77 | +{ |
| 78 | + "inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days." |
| 79 | +} |
| 80 | +# question-answering request body |
| 81 | +{ |
| 82 | + "inputs": { |
| 83 | + "question": "What is used for inference?", |
| 84 | + "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference." |
| 85 | + } |
| 86 | +} |
| 87 | +# zero-shot classification request body |
| 88 | +{ |
| 89 | + "inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!", |
| 90 | + "parameters": { |
| 91 | + "candidate_labels": [ |
| 92 | + "refund", |
| 93 | + "legal", |
| 94 | + "faq" |
| 95 | + ] |
| 96 | + } |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +# **Getting started 🧭** |
| 101 | + |
| 102 | +In this guide we will use the new Hugging Face Inference DLCs and Amazon SageMaker Python SDK to deploy two transformer models for inference. |
| 103 | + |
| 104 | +In the first example, we deploy for inference a Hugging Face Transformer model trained in Amazon SageMaker. |
| 105 | + |
| 106 | +In the second example, we directly deploy one of the 10,000+ publicly available Hugging Face Transformers models from the[ Model Hub](https://huggingface.co/models) to Amazon SageMaker for Inference. |
| 107 | + |
| 108 | + |
| 109 | +## **Setting up the environment** |
| 110 | + |
| 111 | +We will use an Amazon SageMaker Notebook Instance for the example. You can learn[ here how to set up a Notebook Instance.](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) To get started, jump into your Jupyter Notebook or JupyterLab and create a new Notebook with the `conda_pytorch_p36` kernel. |
| 112 | + |
| 113 | +**_Note: The use of Jupyter is optional: We could also launch SageMaker API calls from anywhere we have an SDK installed, connectivity to the cloud, and appropriate permissions, such as a Laptop, another IDE, or a task scheduler like Airflow or AWS Step Functions._** |
| 114 | + |
| 115 | +After that we can install the required dependencies. |
| 116 | + |
| 117 | + |
| 118 | +```bash |
| 119 | +pip install "sagemaker>=2.48.0" --upgrade |
| 120 | +``` |
| 121 | + |
| 122 | + |
| 123 | +To deploy a model on SageMaker, we need to create a `sagemaker` Session and provide an IAM role with the right permission. The `get_execution_role` method is provided by the SageMaker SDK as an optional convenience. You can also specify the role by writing the specific role ARN you want your endpoint to use. This IAM role will be later attached to the Endpoint, e.g. download the model from Amazon S3. |
| 124 | + |
| 125 | + |
| 126 | +```python |
| 127 | +import sagemaker |
| 128 | + |
| 129 | +sess = sagemaker.Session() |
| 130 | +role = sagemaker.get_execution_role() |
| 131 | +``` |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## **Deploy a trained Hugging Face Transformer model to SageMaker for inference** |
| 136 | + |
| 137 | +There are two ways to deploy your SageMaker trained Hugging Face model. You can either deploy it after your training is finished, or you can deploy it later, using the `model_data` pointing to your saved model on Amazon S3. In addition to the two below-mentioned options, you can also instantiate Hugging Face endpoints with lower-level SDK such as `boto3` and `AWS CLI`, `Terraform` and with CloudFormation templates. |
| 138 | + |
| 139 | + |
| 140 | +### **Deploy the model directly after training with the Estimator class** |
| 141 | + |
| 142 | +If you deploy your model directly after training, you need to ensure that all required model artifacts are saved in your training script, including the tokenizer and the model. A benefit of deploying directly after training is that SageMaker model container metadata will contain the source training job, providing lineage from training job to deployed model. |
| 143 | + |
| 144 | + |
| 145 | +```python |
| 146 | +from sagemaker.huggingface import HuggingFace |
| 147 | + |
| 148 | +############ pseudo code start ############ |
| 149 | + |
| 150 | +# create HuggingFace estimator for running training |
| 151 | +huggingface_estimator = HuggingFace(....) |
| 152 | + |
| 153 | +# starting the train job with our uploaded datasets as input |
| 154 | +huggingface_estimator.fit(...) |
| 155 | + |
| 156 | +############ pseudo code end ############ |
| 157 | + |
| 158 | +# deploy model to SageMaker Inference |
| 159 | +predictor = hf_estimator.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge") |
| 160 | + |
| 161 | +# example request, you always need to define "inputs" |
| 162 | +data = { |
| 163 | + "inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days." |
| 164 | +} |
| 165 | +# request |
| 166 | +predictor.predict(data) |
| 167 | +``` |
| 168 | + |
| 169 | + |
| 170 | +After we run our request we can delete the endpoint again with. |
| 171 | + |
| 172 | + |
| 173 | +```python |
| 174 | +# delete endpoint |
| 175 | +predictor.delete_endpoint() |
| 176 | +``` |
| 177 | + |
| 178 | +### **Deploy the model from pre-trained checkpoints using the <code>HuggingFaceModel</code> class** |
| 179 | + |
| 180 | +If you've already trained your model and want to deploy it at some later time, you can use the `model_data` argument to specify the location of your tokenizer and model weights. |
| 181 | + |
| 182 | + |
| 183 | +```python |
| 184 | +from sagemaker.huggingface.model import HuggingFaceModel |
| 185 | + |
| 186 | +# create Hugging Face Model Class |
| 187 | +huggingface_model = HuggingFaceModel( |
| 188 | + model_data="s3://models/my-bert-model/model.tar.gz", # path to your trained sagemaker model |
| 189 | + role=role, # iam role with permissions to create an Endpoint |
| 190 | + transformers_version="4.6", # transformers version used |
| 191 | + pytorch_version="1.7", # pytorch version used |
| 192 | +) |
| 193 | +# deploy model to SageMaker Inference |
| 194 | +predictor = huggingface_model.deploy( |
| 195 | + initial_instance_count=1, |
| 196 | + instance_type="ml.m5.xlarge" |
| 197 | +) |
| 198 | + |
| 199 | +# example request, you always need to define "inputs" |
| 200 | +data = { |
| 201 | + "inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days." |
| 202 | +} |
| 203 | + |
| 204 | +# request |
| 205 | +predictor.predict(data) |
| 206 | +``` |
| 207 | + |
| 208 | +After we run our request, we can delete the endpoint again with: |
| 209 | + |
| 210 | + |
| 211 | +```python |
| 212 | +# delete endpoint |
| 213 | +predictor.delete_endpoint() |
| 214 | +``` |
| 215 | + |
| 216 | + |
| 217 | + |
| 218 | +## **Deploy one of the 10,000+ Hugging Face Transformers to Amazon SageMaker for Inference** |
| 219 | + |
| 220 | +To deploy a model directly from the Hugging Face Model Hub to Amazon SageMaker, we need to define two environment variables when creating the `HuggingFaceModel`. We need to define: |
| 221 | + |
| 222 | +* HF_MODEL_ID: defines the model id, which will be automatically loaded from[ huggingface.co/models](http://huggingface.co/models) when creating or SageMaker Endpoint. The 🤗 Hub provides 10,000+ models all available through this environment variable. |
| 223 | +* HF_TASK: defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be found[ here](https://huggingface.co/transformers/main_classes/pipelines.html). |
| 224 | + |
| 225 | +```python |
| 226 | +from sagemaker.huggingface.model import HuggingFaceModel |
| 227 | + |
| 228 | +# Hub Model configuration. <https://huggingface.co/models> |
| 229 | +hub = { |
| 230 | + 'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models |
| 231 | + 'HF_TASK':'question-answering' # NLP task you want to use for predictions |
| 232 | +} |
| 233 | + |
| 234 | +# create Hugging Face Model Class |
| 235 | +huggingface_model = HuggingFaceModel( |
| 236 | + env=hub, # configuration for loading model from Hub |
| 237 | + role=role, # iam role with permissions to create an Endpoint |
| 238 | + transformers_version="4.6", # transformers version used |
| 239 | + pytorch_version="1.7", # pytorch version used |
| 240 | +) |
| 241 | + |
| 242 | +# deploy model to SageMaker Inference |
| 243 | +predictor = huggingface_model.deploy( |
| 244 | + initial_instance_count=1, |
| 245 | + instance_type="ml.m5.xlarge" |
| 246 | +) |
| 247 | + |
| 248 | +# example request, you always need to define "inputs" |
| 249 | +data = { |
| 250 | +"inputs": { |
| 251 | + "question": "What is used for inference?", |
| 252 | + "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference." |
| 253 | + } |
| 254 | +} |
| 255 | + |
| 256 | +# request |
| 257 | +predictor.predict(data) |
| 258 | +``` |
| 259 | + |
| 260 | +After we run our request we can delete the endpoint again with. |
| 261 | + |
| 262 | + |
| 263 | +```python |
| 264 | +# delete endpoint |
| 265 | +predictor.delete_endpoint() |
| 266 | +``` |
| 267 | + |
| 268 | +--- |
| 269 | + |
| 270 | +# **FAQ 🎯** |
| 271 | + |
| 272 | +You can find the complete [Frequently Asked Questions](https://huggingface.co/docs/sagemaker/faq) in the [documentation](https://huggingface.co/docs/sagemaker/faq). |
| 273 | + |
| 274 | +_Q: Which models can I deploy for Inference?_ |
| 275 | + |
| 276 | +A: You can deploy: |
| 277 | +* any 🤗 Transformers model trained in Amazon SageMaker, or other compatible platforms and that can accomodate the SageMaker Hosting design |
| 278 | +* any of the 10,000+ publicly available Transformer models from the Hugging Face[ Model Hub](https://huggingface.co/models), or |
| 279 | +* your private models hosted in your Hugging Face premium account! |
| 280 | + |
| 281 | +_Q: Which pipelines, tasks are supported by the Inference Toolkit?_ |
| 282 | + |
| 283 | +A: The Inference Toolkit and DLC support any of the `transformers` `pipelines`. You can find the full list [here](https://huggingface.co/transformers/main_classes/pipelines.html) |
| 284 | + |
| 285 | +_Q: Do I have to use the `transformers pipelines` when hosting SageMaker endpoints?_ |
| 286 | + |
| 287 | +A: No, you can also write your custom inference code to serve your own models and logic, documented [here](add-link-here). |
| 288 | + |
| 289 | +_Q: Do I have to use the SageMaker Python SDK to use the Hugging Face Deep Learning Containers (DLCs)?_ |
| 290 | + |
| 291 | +A: You can use the Hugging Face DLC without the SageMaker Python SDK and deploy your models to SageMaker with other SDKs, such as the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-training-job.html), [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job) or [Cloudformation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-endpoint.html). The DLCs are also available through Amazon ECR and can be pulled and used in any environment of choice. |
| 292 | + |
| 293 | +_Q: Why should I use the Hugging Face Deep Learning Containers?_ |
| 294 | + |
| 295 | +A: The DLCs are fully tested, maintained, optimized deep learning environments that require no installation, configuration, or maintenance. In particular, our inference DLC comes with a pre-written serving stack, which drastically lowers the technical bar of DL serving. |
| 296 | + |
| 297 | +_Q: How is my data and code secured by Amazon SageMaker?_ |
| 298 | + |
| 299 | +A: Amazon SageMaker provides numerous security mechanisms including **[encryption at rest](https://docs.aws.amazon.com/sagemaker/latest/dg/encryption-at-rest-nbi.html)** and **[in transit](https://docs.aws.amazon.com/sagemaker/latest/dg/encryption-in-transit.html)**, **[Virtual Private Cloud (VPC) connectivity](https://docs.aws.amazon.com/sagemaker/latest/dg/interface-vpc-endpoint.html),** and **[Identity and Access Management (IAM)](https://docs.aws.amazon.com/sagemaker/latest/dg/security_iam_service-with-iam.html)**. To learn more about security in the AWS cloud and with Amazon SageMaker, you can visit **[Security in Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/security_iam_service-with-iam.html)** and **[AWS Cloud Security](https://docs.aws.amazon.com/sagemaker/latest/dg/security_iam_service-with-iam.html)**. |
| 300 | + |
| 301 | +_Q: Is this available in my region?_ |
| 302 | + |
| 303 | +A: For a list of the supported regions, please visit the **[AWS region table](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/)** for all AWS global infrastructure. |
| 304 | + |
| 305 | +_Q: Do you offer premium support or support SLAs for this solution?_ |
| 306 | + |
| 307 | +A: AWS Technical Support tiers are available from AWS and cover development and production issues for AWS products and services - please refer to AWS Support for specifics and scope. |
| 308 | + |
| 309 | +If you have questions which the Hugging Face community can help answer and/or benefit from, please **[post them in the Hugging Face forum](https://discuss.huggingface.co/c/sagemaker/17)**. |
| 310 | + |
| 311 | +--- |
| 312 | + |
| 313 | +If you need premium support from the Hugging Face team to accelerate your NLP roadmap, our[ Expert Acceleration Program](https://huggingface.co/support) offers direct guidance from our open-source, science, and ML Engineering teams. |
0 commit comments