ML Hikeathon (An Online Machine Learning Hackathon)

Link: https://datahack.analyticsvidhya.com/contest/hikeathon/

Private Leaderboard Rank: 1 Private Leaderboard Score: 0.9409118988 Public Leaderboard Rank: 1 Public Leaderboard Score: 0.9415281974

VM Specs while working out these steps
Ram - 156 GB
CPU - 32 cores
Min free hard disk space - 100GB

Environment Setup:
Install Anaconda Version - 4.5.2
Create Virtual Environment with Python- 2.7.15
and do pip install -r requirement.txt to install all the dependencies

Execution Steps:
Run the following notebooks in order (Files generated from one might be used in notebooks that follows later)

undirected_graph_features.ipynb
Run time - around 30 hours
script generates following files, which is used in final model building
degrees_contact.pkl - degrees for the undirected contact graph
cluster_coeffs.pkl - cluster coefficients for the node pairs in the full data set
triangles.pkl - number of triangles for the node pairs in the full data set
jc_rsa_pa_aai.csv - graph features for the node pairs in the full data set
directed_degrees_feature_creation.ipynb
Run time - around 1.5 hours
script generates following files, which is used in final model building
directed_degrees.pkl - degree features for directed contact graph
neighbours_features.ipynb
Run time - around 12 hours
script generates following files, which is used in final model building
neigbours_vars_pat_leftover_2.pkl
neigbours_vars_sahil_2.csv
neigbours_vars_sahil_1.csv
degree_2_neighbour_feats.pkl
leak_analysis.ipynb
Run time - around 1 hours
script generates following files, which is used in final model building
leak_feature.pkl
freq_train_test.ipynb
Merges the user features with train and test data and applies frequency encoding on the nodes
Run time - around 1.5 hours
freq_new_train.pkl - frequency encoded with user features and degree contacts for train
freq_new_test.pkl - frequency encoded with user features and degree contacts for test
train_script.ipynb
Merges all the above set and creates final data for modelling and builds 10 models of LightGBM on that
Run time - around 20 hours (Each model approx 2 hours)
For sanity check the final data set should contain 128 columns
Saves pickles of all the models
test_script.ipynb
Creates final test data and final predictions
Generates final_sol.csv which is used for final submission

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Hikeathon (An Online Machine Learning Hackathon)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
custom_estimator.py		custom_estimator.py
custom_fold_generator.py		custom_fold_generator.py
directed_degrees_feature_creation.ipynb		directed_degrees_feature_creation.ipynb
encoding.py		encoding.py
freq_train_test.ipynb		freq_train_test.ipynb
hikeathon ppt.pdf		hikeathon ppt.pdf
leak_analysis.ipynb		leak_analysis.ipynb
neighbours_features.ipynb		neighbours_features.ipynb
readme.md		readme.md
requirement.txt		requirement.txt
test_script.ipynb		test_script.ipynb
train_script.ipynb		train_script.ipynb
undirected_graph_features.ipynb		undirected_graph_features.ipynb

sahil711/AV_MLHikeathon_Apr2019

Folders and files

Latest commit

History

Repository files navigation

ML Hikeathon (An Online Machine Learning Hackathon)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages