In this tutorial, we'll cover the full process of building a beginner machine learning project. This includes creating a hypothesis, setting up the model, and measuring error. By the end, you'll understand how to build an end-to-end machine learning project using Python and Jupyter.
To make this interesting, we'll use a fun dataset. We'll use data from historical Olympic games. We'll try to predict how many medals a country will win based on historical and current data.
Most machine learning projects follow a similar outline, which we'll also follow here. This outline will help you tackle any machine learning problem.
Project Steps
- Form a hypothesis.
- Find and explore the data.
- (If necessary) Reshape the data to predict your target.
- Clean the data for ML.
- Pick an error metric.
- Split your data.
- Train a model.
You can find the code for this project here.
File overview:
machine_learning.ipynb
- the main project codedata_prep.ipynb
- the code to generate the team-level dataset from an athlete-level dataset
To follow this project, please install the following locally:
- Python 3.8+
- Python packages
- pandas
- numpy
- scikit-learn
- seaborn
We'll be using data from the Olympics, which was originally on Kaggle.
You can download the files we'll use in this project here:
- teams.csv - the team-level data that we use in this project.
- athlete_events.csv - this is the original athlete-level data