CTR prediction model based on pure Spark MLlib, no third-party library.
- Naive Bayes
- Logistic Regression
- Factorization Machine
- Random Forest
- Gradient Boosted Decision Tree
- GBDT + LR
- Neural Network
- Inner Product Neural Network (IPNN)
- Outer Product Neural Network (OPNN)
A small portion of some public ads database for test and initial debug.
You can directly get comparision among different models on metrics such as AUC under ROC and P-R curve.
Data Format
root
|-- user_id: integer (user id)
|-- item_id: integer (item id)
|-- category_id: integer (item category id)
|-- content_type: string (item content type)
|-- timestamp: string (timestamp)
|-- user_item_click: long (the number of user clicked the item)
|-- user_item_imp: double (the number of user watched the item)
|-- item_ctr: double (historical CTR of the item)
|-- is_new_user: integer (is the user a new user)
|-- user_embedding: array (embedding of the user)
| |-- element: double
|-- item_embedding: array (embedding of the item)
| |-- element: double
|-- label: integer (label of the sample 0-negative 1-positive)
It's a maven project. Spark version is 2.3.0. Scala version is 2.11.
After dependencies are imported by maven automatically, you can simple run the example function (com.ggstar.example.ModelSelection) to train all the CTR models and get the metrics comparison among all the models.
- [GBDT+LR]Practical Lessons from Predicting Clicks on Ads at Facebook.pdf
- [FNN]Deep Learning over Multi-field Categorical Data.pdf
- [Multi-Task]An Overview of Multi-Task Learning in Deep Neural Networks.pdf
- [PNN]Product-based Neural Networks for User Response Prediction.pdf
- [Wide & Deep]Wide & Deep Learning for Recommender Systems.pdf
- [DeepFM]- A Factorization-Machine based Neural Network for CTR Prediction.pdf
- Deep Crossing- Web-Scale Modeling without Manually Crafted Combinatorial Features.pdf
- Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction.pdf
- Entire Space Multi-Task Model_ An Effective Approach for Estimating Post-Click Conversion Rate.pdf
- Deep Interest Network for Click-Through Rate Prediction.pdf
- Bid-aware Gradient Descent for Unbiased Learning with Censored Data in Display Advertising.pdf
- Ad Click Prediction a View from the Trenches.pdf
- Image Matters- Visually modeling user behaviors using Advanced Model Server.pdf
- Logistic Regression in Rare Events Data.pdf
- Deep & Cross Network for Ad Click Predictions.pdf
- Learning Deep Structured Semantic Models for Web Search using Clickthrough Data.pdf
- Adaptive Targeting for Online Advertisement.pdf