Description:
This project explores sports event attendance data to uncover insights using Python libraries such as Pandas, pandasql, and visualization tools like Matplotlib and Seaborn. The project involves loading and manipulating sports event data from Excel files, performing SQL-like queries in Python, and visualizing trends in attendance based on various factors like game timing and weather conditions.
Key Features and Analysis Performed:
-
Data Loading and Preparation: Imported datasets from Excel files into Pandas DataFrames, including game sales, UCP scans, and reservations.
-
SQL-like Data Queries: Utilized pandasql for SQL-style querying within Python. Analyses included comparing attendance for evening and afternoon games, monthly attendance trends, and UCP member behavior.
-
Attendance Analysis: Investigated the correlation between game timing (evening vs. afternoon) and attendance, average attendance by month, and the behavior of UCP members with specific reservation and scan thresholds.
-
Web Scraping: Employed Python's requests and BeautifulSoup for scraping game data from a sports website, enhancing the dataset.
-
Data Transformation and Feature Engineering: Enhanced the data with calculated fields such as game time and attendance status, and merged with weather data for comprehensive analysis.
-
Data Visualization: Utilized Matplotlib and Seaborn for plotting various aspects of the data, such as attendance trends by game time, month, and other factors.
-
Predictive Modeling and Machine Learning: Explored various machine learning models like Linear Regression, Random Forest, and XGBoost to predict attendance, including hyperparameter tuning and model evaluation using RMSE and R2 score.
-
Feature Importance Analysis: Determined the importance of various features in the attendance prediction using the Random Forest model.
-
Custom Predictions and Insights: Made custom attendance predictions based on specific game conditions and visualized true vs. predicted attendance comparisons.
Technologies Used: Python, Pandas, pandasql, Matplotlib, Seaborn, BeautifulSoup, Machine Learning (scikit-learn, XGBoost).
The project provided detailed insights into factors affecting sports event attendance, showcasing the power of Python and SQL for data analysis and the effectiveness of machine learning in predictive analytics. The visualizations and models developed offer valuable tools for sports event management and marketing strategies.