Public Transport (Prasarana) Kuala Lumpur Ridership Forecasting based on various factors
Dataset:
- Data.gov.my
- Publicholidays.com.my
- Visualcrossing.com (Weather)
- Wikipedia
Upon merging the datasets, the final outcome of dataset named df_cleaned.csv can be found in this link:
https://drive.google.com/file/d/10z8PAfh2_Sp8KeEhHtPN3YbIJ3iI5ooD/view?usp=share_link
Introduction
According to a recent article by NST, Malaysia Transport Minister Anthony Loke has informed that cars made up the highest number of registered vehicles, with 17,244,978 cars in 2023. The total number of registered vehicles in Malaysia has reached more than 36.3 million, exceeding the country's population. The large number of cars on the road can be concerning as many vehicles may lead to heavy traffic congestion in the Klang Valley area. Besides, there is also a large amount of CO2 emissions due to emissions from such vehicles. Transport is Malaysia's second largest carbon dioxide driver after electricity and heat production (Solaymani, 2022). The lack of low-carbon technologies, the electrification of vehicles, the lack of EV charging ports widely implemented across Malaysia, and the lack of connectivity of rail networks and buses across the Klang Valley and other rural states are concerning. We aim to explore the main driving factors that could contribute to the fluctuating patterns of public transport ridership based on factors such as weather, Covid-19, fuel price, and public holidays and predict such ridership for each Prasarana rail and bus service. At the end of our project, we aim to contribute to infrastructure optimization - for instance, increasing intervals during the rainy season and using our data to support decision-making processes.
Business Objective
- To develop a time series forecasting model using Prasarana dataset obtained from data.gov.my
- To employ regression modelling by considering various factors that may influence the ridership of Prasarana services with an accuracy of 70%.
- To explore and analyze the data using various EDA techniques to understand patterns and hidden insights in the dataset
Understanding about our dataset
Our dataset is obtained from various sources, which include Wikipedia, data.gov.my, weather (visualcrossing.com), and publicholidays.com.my. The ridership and fuel dataset obtained from data.gov.my has the most updated data as of the date we extracted the data from the website (2019 - November 2023), while the other datasets are obtained from 2019-2023. The purpose of the dataset is to investigate the various contributing factors that may contribute to a spike or decrease in Prasarana ridership. We will explore the data's dimensions, content, structure, and summary in the section below.
Link to RPubs: http://rpubs.com/jsheng/prasaranaforecast
Link to Video Presentation: https://drive.google.com/drive/folders/1jBYMxtQJhuaimiZw1udxb7P67n41ppEa?usp=share_link
Link to Youtube: https://www.youtube.com/watch?v=agMUD5AtvEI
References
- https://www.nst.com.my/news/nation/2023/12/987062/363-million-vehicles-malaysia# Solaymani, S. (2022).
- CO2 emissions and the transport sector in Malaysia. Frontiers in Environmental Science, 9, 774164. https://doi.org/10.3389/fenvs.2021.774164
I would like to thank all of my group members for their contributions and support in this project.
- @Nisyhaal
- @juinnsheng
- @marknicholas15
- @Triyin
- Zhang Yan Yu Bo
- Bruce Xiao Rui Jie