New-York-City-Taxi-Trip-Duration

Project Overview

This project focus on predicting the duration of NYC taxi rides using machine learning techniques. The workflow include data preprocessing, feature engineering, model development, and performance evaluation. Ridge Regression serves as the primary predictive model, with efforts focused on maximizing its performance.

Data Description

The dataset contains information about NYC taxi rides. columns include:

vendor_id : A unique identifier representing the taxi service provider associated with the trip.
pickup_datetime : The exact date and time when the taxi meter was started, indicating the beginning of the trip.
passenger_count : The number of passengers in the taxi, as recorded by the driver.
pickup_longitude : The geographic longitude of the location where the trip started.
pickup_latitude : The geographic latitude of the location where the trip started.
dropoff_longitude : The geographic longitude of the location where the trip ended.
dropoff_latitude : The geographic latitude of the location where the trip ended.
trip_duration : The total time of the trip, measured in seconds, from start to finish.

Feature Engineering

Various features are engineered to improve the model's performance:

Time-Based Features: Extracted from pickup_datetime (hour, dayofmonth, dayofweek, month,etc..).
Geographical Features: Direction and Distances between pickup and dropoff locations.
Log Transformation: Applied to trip duration, distance and manhattan distance to reduce skewness.
Rush Hour: A newly created feature that identifies whether the trip occurred during a peak traffic period

Model

The model which used in this project is Ridge Regression. It has pipeline contains of:

Column Transformer: Applies OneHotEncoder to categorical features and StandardScaler to numeric features.
Polynomail Features : Generates polynomial features to enable the model to capture more complex patterns in the data
Ridge Regression: A linear regression model with L2 regularization to prevent overfitting.

Evaluation Metrics

The model is evaluated using the following metrics:

Root Mean Squared Error (RMSE): Evaluates the average magnitude of the prediction errors.
R² Score: Reflects the proportion of variance in the target variable that can be explained by the model's features.

Final Model results

Train Evaluation : RMSE: 0.4254, R²: 0.6700
Test Evaluation : RMSE: 0.4977, R²: 0.6130

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
Models		Models
Notebooks		Notebooks
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New-York-City-Taxi-Trip-Duration

Project Overview

Data Description

Feature Engineering

Model

Evaluation Metrics

Final Model results

About

Releases

Packages

Languages

MamdouhHisham/New-York-City-Taxi-Trip-Duration

Folders and files

Latest commit

History

Repository files navigation

New-York-City-Taxi-Trip-Duration

Project Overview

Data Description

Feature Engineering

Model

Evaluation Metrics

Final Model results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages