Le Wagon Data Science certification exam starter pack for the predictive API test.
💡 This challenge is completely independent of other challenges. It is not required to complete any other challenge in order to work on this challenge.
📝 Let's duplicate the repository of the API challenge.
Go to https://github.com/lewagon/data-certification-api:
- Click on
Use this template
- Enter the repository name
data-certification-api
- Set it as Public
- Click on
Create repository from template
- Click on
Code
- Select
SSH
- Copy the SSH URL of the repository (the format is
[email protected]:YOUR_GITHUB_NICKNAME/data-certification-api.git
)
📝 Now we will clone your new repository.
Open your terminal and run the following commands:
👉 replace YOUR_GITHUB_NICKNAME
with your github nickname and PASTE_REPOSITORY_URL_HERE
with the SSH URL you just copied:
cd ~/code/YOUR_GITHUB_NICKNAME
git clone PASTE_REPOSITORY_URL_HERE
cd data-certification-api
💡 The content of the challenge should look like this:
tree
.
├── Dockerfile
├── MANIFEST.in
├── Makefile
├── README.md
├── api
│ ├── __init__.py
│ └── app.py
├── exampack
│ ├── __init__.py
│ ├── data
│ ├── models
│ ├── predictor.py
│ ├── tests
│ │ └── __init__.py
│ └── utils.py
├── notebooks
├── requirements.txt
├── scripts
│ └── exampack-run
└── setup.py
Open your favourite text editor and proceed with the challenge.
📝 In this challenge, you are provided with a trained model saved as model.joblib
. The goal is to create an API that will predict the popularity of a song based on its other features.
👉 You will only need to edit the code of the API in api/app.py
🚨
👉 The package versions listed in requirements.txt
should work out of the box with the pipelined model saved in model.joblib
The requirements.txt
file lists the exact version of the packages required in order to be able to load the pipelined model that we provide.
pip install -r requirements.txt
👉 If you encounter a version conflict while installing the packages 👈
In this case you will need to create a new virtual environment in order to be able to load the pipeline.
👉 Only execute this commands if you encounter an issue while installing the packages 🚨
pyenv install 3.8.6
pyenv virtualenv 3.8.6 certif
pyenv local certif
pip install -r requirements.txt
📝 Start a uvicorn
server in order to make sure that the setup works correctly.
Run the server:
uvicorn api.app:app --reload
Open your browser at http://localhost:8000/
👉 You should see the response { "ok": true }
You will now be able to work on the content of the API while uvicorn
automatically reloads your code as it changes.
Predict the popularity of a Spotify song
GET /predict
Parameter | Type | Description |
---|---|---|
acousticness | float | whether the track is acoustic |
danceability | float | describes how suitable a track is for dancing |
duration_ms | int | duration of the track in milliseconds |
energy | float | represents a perceptual measure of intensity and activity |
explicit | int | whether the track has explicit lyrics |
id | string | id for the track |
instrumentalness | float | predicts whether a track contains no vocals |
key | int | the key the track is in |
liveness | float | detects the presence of an audience in the recording |
loudness | float | the overall loudness of a track in decibels |
mode | int | modality of a track |
name | string | name of the track |
release_date | string | release date |
speechiness | float | detects the presence of spoken words in a track |
tempo | float | overall estimated tempo of a track in beats per minute |
valence | float | describes the musical positiveness conveyed by a track |
artist | string | artist who performed the track |
Returns a dictionary with the artist
, the name
of the song and predicted popularity
as an integer.
Example request:
/predict?acousticness=0.654&danceability=0.499&duration_ms=219827&energy=0.19&explicit=0&id=0B6BeEUd6UwFlbsHMQKjob&instrumentalness=0.00409&key=7&liveness=0.0898&loudness=-16.435&mode=1&name=Back%20in%20the%20Goodle%20Days&release_date=1971&speechiness=0.0454&tempo=149.46&valence=0.43&artist=John%20Hartford
Example response:
{
"artist": "John Hartford",
"name": "Back in the Goodle Days",
"popularity": 22
}
👉 It is your turn, code the endpoint in api/app.py
. If you want to verify what data types the pipeline expects, have a look at the docstring of the create_pipeline
method in exampack/trainer.py
.
📝 Push your API to production on the hosting service of your choice.
👉 If you opt for Google Cloud Platform 👈
Once you have changed your GCP_PROJECT_ID
in the Makefile
, run the following commands to build and deploy your containerized API to Container Registry and finally Cloud Run.