Skip to content

Commit 0a70009

Browse files
authored
Merge pull request graviraja#11 from graviraja/week7
Week7
2 parents 90903ea + 367aedc commit 0a70009

24 files changed

+1565
-5
lines changed

.dvc/config

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
[core]
2-
remote = storage
2+
remote = model-store
33
['remote "storage"']
44
url = gdrive://19JK5AFbqOBlrFVwDHjTrf9uvQFtS0954
5+
['remote "model-store"']
6+
url = s3://models-dvc/trained_models/

.github/workflows/build_docker_image.yaml

+19-4
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,29 @@ jobs:
77
runs-on: ubuntu-latest
88
defaults:
99
run:
10-
working-directory: ./week_6_github_actions
10+
working-directory: ./week_7_ecr
1111
steps:
1212
- name: Checkout
1313
uses: actions/checkout@v2
1414
with:
1515
ref: ${{ github.ref }}
16+
- name: Configure AWS Credentials
17+
uses: aws-actions/configure-aws-credentials@v1
18+
with:
19+
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
20+
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
21+
aws-region: us-west-2
1622
- name: Build container
1723
run: |
18-
docker network create data
19-
docker build --tag inference:latest .
20-
docker run -d -p 8000:8000 --network data --name inference_container inference:latest
24+
docker build --build-arg AWS_ACCOUNT_ID=${{ secrets.AWS_ACCOUNT_ID }} \
25+
--build-arg AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }} \
26+
--build-arg AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }} \
27+
--tag mlops-basics .
28+
- name: Push2ECR
29+
id: ecr
30+
uses: jwalton/gh-ecr-push@v1
31+
with:
32+
access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
33+
secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
34+
region: us-west-2
35+
image: mlops-basics:latest

README.md

+25
Original file line numberDiff line numberDiff line change
@@ -208,3 +208,28 @@ References
208208
- [Configuring service account](https://dvc.org/doc/user-guide/setup-google-drive-remote)
209209

210210
- [Github actions](https://docs.github.com/en/actions/quickstart)
211+
212+
213+
## Week 7: Container Registry - AWS ECR
214+
215+
<img src="https://img.shields.io/static/v1.svg?style=for-the-badge&label=difficulty&message=medium&color=orange"/>
216+
217+
Refer to the [Blog Post here](https://www.ravirajag.dev/blog/mlops-container-registry)
218+
219+
A container registry is a place to store container images. A container image is a file comprised of multiple layers which can execute applications in a single instance. Hosting all the images in one stored location allows users to commit, identify and pull images when needed.
220+
221+
Amazon Simple Storage Service (S3) is a storage for the internet. It is designed for large-capacity, low-cost storage provision across multiple geographical regions.
222+
223+
In this week, I will be going through the following topics:
224+
225+
- `Basics of S3`
226+
227+
- `Programmatic access to S3`
228+
229+
- `Configuring AWS S3 as remote storage in DVC`
230+
231+
- `Basics of ECR`
232+
233+
- `Configuring GitHub Actions to use S3, ECR`
234+
235+
![Docker](images/ecr_flow.png)

images/ecr_flow.png

439 KB
Loading

week_7_ecr/Dockerfile

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
FROM huggingface/transformers-pytorch-cpu:latest
2+
3+
COPY ./ /app
4+
WORKDIR /app
5+
6+
ARG AWS_ACCESS_KEY_ID
7+
ARG AWS_SECRET_ACCESS_KEY
8+
9+
10+
#this envs are experimental
11+
ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
12+
AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
13+
14+
15+
# install requirements
16+
RUN pip install "dvc[s3]"
17+
RUN pip install -r requirements_inference.txt
18+
19+
# initialise dvc
20+
RUN dvc init --no-scm
21+
# configuring remote server in dvc
22+
RUN dvc remote add -d model-store s3://models-dvc/trained_models/
23+
24+
RUN cat .dvc/config
25+
# pulling the trained model
26+
RUN dvc pull dvcfiles/trained_model.dvc
27+
28+
ENV LC_ALL=C.UTF-8
29+
ENV LANG=C.UTF-8
30+
31+
# running the application
32+
EXPOSE 8000
33+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

week_7_ecr/README.md

+169
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
2+
**Note: The purpose of the project to explore the libraries and learn how to use them. Not to build a SOTA model.**
3+
4+
## Requirements:
5+
6+
This project uses Python 3.8
7+
8+
Create a virtual env with the following command:
9+
10+
```
11+
conda create --name project-setup python=3.8
12+
conda activate project-setup
13+
```
14+
15+
Install the requirements:
16+
17+
```
18+
pip install -r requirements.txt
19+
```
20+
21+
## Running
22+
23+
### Training
24+
25+
After installing the requirements, in order to train the model simply run:
26+
27+
```
28+
python train.py
29+
```
30+
31+
### Monitoring
32+
33+
Once the training is completed in the end of the logs you will see something like:
34+
35+
```
36+
wandb: Synced 5 W&B file(s), 4 media file(s), 3 artifact file(s) and 0 other file(s)
37+
wandb:
38+
wandb: Synced proud-mountain-77: https://wandb.ai/raviraja/MLOps%20Basics/runs/3vp1twdc
39+
```
40+
41+
Follow the link to see the wandb dashboard which contains all the plots.
42+
43+
### Versioning data
44+
45+
Refer to the blog: [DVC Configuration](https://www.ravirajag.dev/blog/mlops-dvc)
46+
47+
### Exporting model to ONNX
48+
49+
Once the model is trained, convert the model using the following command:
50+
51+
```
52+
python convert_model_to_onnx.py
53+
```
54+
55+
### Inference
56+
57+
#### Inference using standard pytorch
58+
59+
```
60+
python inference.py
61+
```
62+
63+
#### Inference using ONNX Runtime
64+
65+
```
66+
python inference_onnx.py
67+
```
68+
69+
## S3 & ECR
70+
71+
Follow the instructions mentioned in the [blog post](https://www.ravirajag.dev/blog/mlops-container-registry) for creating S3 bucket and ECR repository.
72+
73+
### Configuring dvc
74+
75+
```
76+
dvc init (this has to be done at root folder)
77+
dvc remote add -d model-store s3://models-dvc/trained_models/
78+
```
79+
80+
### AWS credentials
81+
82+
Create the credentials as mentioned in the [blog post](https://www.ravirajag.dev/blog/mlops-container-registry)
83+
84+
**Do not share the secrets with others**
85+
86+
Set the ACCESS key and id values in environment variables.
87+
88+
```
89+
export AWS_ACCESS_KEY_ID=<ACCESS KEY ID>
90+
export AWS_SECRET_ACCESS_KEY=<ACCESS SECRET>
91+
```
92+
93+
### Trained model in DVC
94+
95+
Sdd the trained model(onnx) to dvc using the following command:
96+
97+
```shell
98+
cd dvcfiles
99+
dvc add ../models/model.onnx --file trained_model.dvc
100+
```
101+
102+
Push the model to remote storage
103+
104+
```shell
105+
dvc push trained_model.dvc
106+
```
107+
108+
### Docker
109+
110+
Install the docker using the [instructions here](https://docs.docker.com/engine/install/)
111+
112+
Build the image using the command
113+
114+
```shell
115+
docker build -t mlops-basics:latest .
116+
```
117+
118+
Then run the container using the command
119+
120+
```shell
121+
docker run -p 8000:8000 --name inference_container mlops-basics:latest
122+
```
123+
124+
(or)
125+
126+
Build and run the container using the command
127+
128+
```shell
129+
docker-compose up
130+
```
131+
132+
### Pushing the image to ECR
133+
134+
Follow the instructions mentioned in [blog post](https://www.ravirajag.dev/blog/mlops-container-registry) for creating ECR repository.
135+
136+
- Authenticating docker client to ECR
137+
138+
```
139+
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 246113150184.dkr.ecr.us-west-2.amazonaws.com
140+
```
141+
142+
- Tagging the image
143+
144+
```
145+
docker tag mlops-basics:latest 246113150184.dkr.ecr.us-west-2.amazonaws.com/mlops-basics:latest
146+
```
147+
148+
- Pushing the image
149+
150+
```
151+
docker push 246113150184.dkr.ecr.us-west-2.amazonaws.com/mlops-basics:latest
152+
```
153+
154+
Refer to `.github/workflows/build_docker_image.yaml` file for automatically creating the docker image with trained model and pushing it to ECR.
155+
156+
157+
### Running notebooks
158+
159+
I am using [Jupyter lab](https://jupyter.org/install) to run the notebooks.
160+
161+
Since I am using a virtualenv, when I run the command `jupyter lab` it might or might not use the virtualenv.
162+
163+
To make sure to use the virutalenv, run the following commands before running `jupyter lab`
164+
165+
```
166+
conda install ipykernel
167+
python -m ipykernel install --user --name project-setup
168+
pip install ipywidgets
169+
```

week_7_ecr/app.py

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from fastapi import FastAPI
2+
from inference_onnx import ColaONNXPredictor
3+
app = FastAPI(title="MLOps Basics App")
4+
5+
predictor = ColaONNXPredictor("./models/model.onnx")
6+
7+
@app.get("/")
8+
async def home_page():
9+
return "<h2>Sample prediction API</h2>"
10+
11+
12+
@app.get("/predict")
13+
async def get_prediction(text: str):
14+
result = predictor.predict(text)
15+
return result

week_7_ecr/configs/config.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
defaults:
2+
- model: default
3+
- processing: default
4+
- training: default
5+
- override hydra/job_logging: colorlog
6+
- override hydra/hydra_logging: colorlog

week_7_ecr/configs/model/default.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
name: google/bert_uncased_L-2_H-128_A-2 # model used for training the classifier
2+
tokenizer: google/bert_uncased_L-2_H-128_A-2 # tokenizer used for processing the data
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
batch_size: 64
2+
max_length: 128
+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
max_epochs: 1
2+
log_every_n_steps: 10
3+
deterministic: true
4+
limit_train_batches: 0.25
5+
limit_val_batches: ${training.limit_train_batches}

week_7_ecr/convert_model_to_onnx.py

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import torch
2+
import hydra
3+
import logging
4+
5+
from omegaconf.omegaconf import OmegaConf
6+
7+
from model import ColaModel
8+
from data import DataModule
9+
10+
logger = logging.getLogger(__name__)
11+
12+
13+
@hydra.main(config_path="./configs", config_name="config")
14+
def convert_model(cfg):
15+
root_dir = hydra.utils.get_original_cwd()
16+
model_path = f"{root_dir}/models/best-checkpoint.ckpt"
17+
logger.info(f"Loading pre-trained model from: {model_path}")
18+
cola_model = ColaModel.load_from_checkpoint(model_path)
19+
20+
data_model = DataModule(
21+
cfg.model.tokenizer, cfg.processing.batch_size, cfg.processing.max_length
22+
)
23+
data_model.prepare_data()
24+
data_model.setup()
25+
input_batch = next(iter(data_model.train_dataloader()))
26+
input_sample = {
27+
"input_ids": input_batch["input_ids"][0].unsqueeze(0),
28+
"attention_mask": input_batch["attention_mask"][0].unsqueeze(0),
29+
}
30+
31+
# Export the model
32+
logger.info(f"Converting the model into ONNX format")
33+
torch.onnx.export(
34+
cola_model, # model being run
35+
(
36+
input_sample["input_ids"],
37+
input_sample["attention_mask"],
38+
), # model input (or a tuple for multiple inputs)
39+
f"{root_dir}/models/model.onnx", # where to save the model (can be a file or file-like object)
40+
export_params=True,
41+
opset_version=10,
42+
input_names=["input_ids", "attention_mask"], # the model's input names
43+
output_names=["output"], # the model's output names
44+
dynamic_axes={
45+
"input_ids": {0: "batch_size"}, # variable length axes
46+
"attention_mask": {0: "batch_size"},
47+
"output": {0: "batch_size"},
48+
},
49+
)
50+
51+
logger.info(
52+
f"Model converted successfully. ONNX format model is at: {root_dir}/models/model.onnx"
53+
)
54+
55+
56+
if __name__ == "__main__":
57+
convert_model()

0 commit comments

Comments
 (0)