This repo corresponds to the course of Machine Learning in Production. This code is done for educational purposes. As such, it is neither a real production code, nor a toy example easy to understand but useless. We tried to make it as similar as possible to real production systems, highlighting some parts and missing others to make it more readable.
In 2020's edition we will train a model to tag Stackoverflow's questions. Data is publicly available here. Basically
- We build a pipeline in Airflow to preprocess data in Google's BigQuery.
- We create Python packages, with their corresponding tests, to preprocess text, train a model and predict it.
- We create Dockerfiles that runs a Flask app that serves the model.