Machine Learning Pipeline

A Machine Learning Pipeline for Predicting total spend and customer groups of customers from a set of survey data using H20 AI's GBM. This is part of a Data Science assignment for a job interview.

Installation

Firstly, clone the repository to a folder of your choice.

Next install Anaconda's distribution of Python and install the necessary libraries:

Create an environment with a specific version of Python and activate the environment
```
conda create -n <env name> python=3.7.6
```
Activate your conda environment
```
conda activate <env name>
```
Install libraries in requirements.txt file. For Windows machine, run the following code in the command line.
```
for /f %i in (requirements.txt) do conda install --yes %i
```

The input dataset has been ignored in this repo for confidentiality reasons.

With the input dataset available, run the following file:

python src/train.py

This would run the machine learning pipeline, that processes the input data and generates predictions for both total spend and customer groups. The results of the model is as follows:

For exploration of other models run in the exploration phase, please refer to notebooks/Assignment 1 and 2.html

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
images		images
notebooks		notebooks
src		src
.gitignore		.gitignore
Data Scientist Questions.docx		Data Scientist Questions.docx
README.md		README.md
model_config.py		model_config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Pipeline

Installation

About

Releases

Packages

Languages

Aznoryusof/Customer-Group-Spend-Prediction

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Pipeline

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages