First of all, I must say thank you to Chris Whong for obtaining the incredible dataset that it's used in this project. Here is his little odyssey to get the data.
After cleaning and getting a sample from the original dataset, it's possible to predict, with an accuracy of 71.74%
, if the tip of a trip in a NYC taxi it's going to be less thant 20%
or greater than or equal to 20%
of the charge.
For read an extended version there are some IPython notebooks
that describe the complete process. You can find them in this repo, but for a better reading use this nbviewer
link.