DrivenData-PumpItUp

This repository contains R code for the Pump it Up: Data Mining the Water Table competition on Driven Data.

The data is provided by Taarifa and the Tanzanian Ministry of Water. The goal is to predict whether a water pump is functional, functional but needs repairs or non functional.

I use H2O's random forest to get a score 0.821. I have uploaded my best (current) submission but not the data. Sign up at Driven Data to download the following files:

SubmissionFormat.csv
Test set values.csv
Training set labels.csv
Training set values.csv

Read the data and do some preprocessing

The first step is to read the data and set some values to missing (NA in R): See read-data.md.

Engineer features

The next step is to clean up the features (transform some, remove others) and possibly engineer some new features: See transform-data.md.

Predict status with a random forest

Use a random forest to predict the functionality status of pumps in the test set: See predict-data.md.

Update

Added a Makefile, which spins the R scripts to produces the md files. See how to Build a report based on an R script.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Figures		Figures
.gitignore		.gitignore
DrivenData-PumpItUp.Rproj		DrivenData-PumpItUp.Rproj
Makefile		Makefile
README.md		README.md
myRfunctions.R		myRfunctions.R
myRsession.R		myRsession.R
predict-data.R		predict-data.R
predict-data.md		predict-data.md
read-data.R		read-data.R
read-data.md		read-data.md
submission-h2o_randomForest-ntrees1000.csv		submission-h2o_randomForest-ntrees1000.csv
transform-data.R		transform-data.R
transform-data.md		transform-data.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DrivenData-PumpItUp

Read the data and do some preprocessing

Engineer features

Predict status with a random forest

Update

About

Releases

Packages

Languages

dipetkov/DrivenData-PumpItUp

Folders and files

Latest commit

History

Repository files navigation

DrivenData-PumpItUp

Read the data and do some preprocessing

Engineer features

Predict status with a random forest

Update

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages