Instacart Next Basket Prediction

Overview

The goal of this project is to improve the basket prediction algorithm for Instacart, aiming to increase the F1 score from 0.25 to at least 0.28. Various techniques were used to explore customer purchasing behavior and enhance prediction accuracy. The final model surpassed the success threshold, achieving an F1 score of 0.30. Detailed insights and findings are available in the report.

Objective

Improve the F1 score of the current basket prediction algorithm by at least 0.03 (from 0.25 to 0.28).
Analyze purchasing patterns and explore feature engineering to enhance decision-making.

Dataset

The project uses a dataset containing approximately 30 million product orders from 3 million orders by 200,000 customers. The data includes multiple files detailing product information, user orders, and prior product orders.

Frameworks and Tools

This project leverages a range of powerful frameworks and tools to ensure cutting-edge performance and efficiency. Here are the key technologies used:

Core Technologies

Polars
PySpark
XGBoost
LightGBM
H2O

Additional Tools

Plotly
Pandas
Matplotlib
Seaborn

Methodology

To tackle the problem, I used a combination of feature engineering, distributed computing, and GPU-accelerated training:

Feature Engineering: Focused on user, product, and time-based features. Many engineered features performed exceptionally well.
Data Processing: Initially used Polars for efficient data manipulation, then transitioned to PySpark for distributed data processing as the dataset grew in size.
Modeling: Models were trained using XGBoost, LightGBM, and H2O, with distributed computing and GPU training for scalability.
Validation: Employed a time-based validation strategy to ensure the model accounted for the sequential nature of purchases.

Key Findings

Reorder Patterns: Users tend to reorder on the same day, the 7th day, or the 30th day after a previous order.
Peak Ordering Time: Orders are mostly placed between 8 AM and 4 PM.
Product Preference: Organic products are reordered 8% more frequently than non-organic products.
Department Reorder Rates: Dairy, Eggs, Produce, Beverages, and Bakery have reorder rates above 65%, while Personal Care and Pantry have rates below 35%.

Conclusion

The model achieved an F1 score of 0.30, surpassing the success threshold of 0.27. Future improvements can be made through further feature engineering and by exploring advanced architectures such as LSTMs, GRUs, and Transformers for better handling of sequential data.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
models		models
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
isntacart_basket_analysis_report-1.pdf		isntacart_basket_analysis_report-1.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instacart Next Basket Prediction

Overview

Objective

Dataset

Frameworks and Tools

Core Technologies

Additional Tools

Methodology

Key Findings

Conclusion

About

Releases

Packages

Languages

License

d-sutariya/Instacart-Basket-Recommendation

Folders and files

Latest commit

History

Repository files navigation

Instacart Next Basket Prediction

Overview

Objective

Dataset

Frameworks and Tools

Core Technologies

Additional Tools

Methodology

Key Findings

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages