SparrowRecSys is a movie recommendation system, named SparrowRecSys (Sparrow Recommendation System), which means "a sparrow is small but has all the internal organs". The project is a mixed language project based on maven, which also includes different modules of recommendation systems such as TensorFlow, Spark, and Jetty Server.
- Java 8
- Scala 2.11
- Python 3.6+
- TensorFlow 2.0+
The project data comes from the open source movie data set MovieLens, The project's own data set has been streamlined from the MovieLens data set, retaining only 1,000 movies and related comments and user data. Please go to MovieLens official website to download the full dataset. It is recommended to use MovieLens 20M Dataset.
SparrowRecSys technical architecture follows the classic industrial-grade deep learning recommendation system architecture, including multiple modules such as offline data processing, model training, near-line stream processing, online model services, and front-end recommendation result display. The following is the architecture diagram of SparrowRecSys:
-
It is divided into three main sections: data processing, model part, and frontend part.
-
Data Processing Section
-
User Information : User data includes user actions, social relationships, and attribute tags.
-
Item Information : Item data includes item attributes, tags, and third-party information.
-
Context Information : Contextual data includes time, location, and other contextual parameters.
-
Data Processing Platforms:
-
Flink: Used for real-time data processing.
-
Spark: Used for offline data processing.
-
Redis: Used for storing user, item, and context features.
-
Feature Engineering:
-
User Features: User actions, social relationships, attribute tags.
-
Item Features: Item attributes, tags, third-party information.
-
Context Features: Time, location, and other contextual parameters.
-
Techniques: Normalization, binarization, non-linear transformations, ID features, one-hot encoding, embedding, feature combination.
-
Model Part
-
Recommendation System Model and Online Serving:
-
Cold Start Strategy :
-
Recall Layer : Embedding, collaborative filtering, multi-dimensional tags, social relationships, freshness update.
-
Ranking Layer : Temporal and sequential models, LR (Logistic Regression), FM (Factorization Machines), MLR (Multivariate Linear Regression), deep learning models.
-
Filling Strategy Algorithm : Diversity, novelty, hotness, flow control, freshness.
-
Exploration and Utilization : Interaction with candidate item database.
-
Model Serving:
-
MLeap: Model deployment.
-
TensorFlow Serving: Model serving.
-
Model Training:
-
Platforms: Spark MLlib, TensorFlow.
-
Offline evaluation: Metrics include AUC, Recall, RMSE.
-
Frontend Part
-
Implementation: Based on HTML and JavaScript with AJAX functionalities.
-
Recommendation Item List : Display of recommended items.
- Word2vec (Item2vec)
- DeepWalk (Random Walk based Graph Embedding)
- Embedding MLP
- Wide&Deep
- Nerual CF
- Two Towers
- DeepFM
- DIN(Deep Interest Network)
- [FFM] Field-aware Factorization Machines for CTR Prediction (Criteo 2016)
- [GBDT+LR] Practical Lessons from Predicting Clicks on Ads at Facebook (Facebook 2014)
- [PS-PLM] Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction (Alibaba 2017)
- [FM] Fast Context-aware Recommendations with Factorization Machines (UKON 2011)
- [DCN] Deep & Cross Network for Ad Click Predictions (Stanford 2017)
- [Deep Crossing] Deep Crossing - Web-Scale Modeling without Manually Crafted Combinatorial Features (Microsoft 2016)
- [PNN] Product-based Neural Networks for User Response Prediction (SJTU 2016)
- [DIN] Deep Interest Network for Click-Through Rate Prediction (Alibaba 2018)
- [ESMM] Entire Space Multi-Task Model - An Effective Approach for Estimating Post-Click Conversion Rate (Alibaba 2018)
- [Wide & Deep] Wide & Deep Learning for Recommender Systems (Google 2016)
- [xDeepFM] xDeepFM - Combining Explicit and Implicit Feature Interactions for Recommender Systems (USTC 2018)
- [Image CTR] Image Matters - Visually modeling user behaviors using Advanced Model Server (Alibaba 2018)
- [AFM] Attentional Factorization Machines - Learning the Weight of Feature Interactions via Attention Networks (ZJU 2017)
- [DIEN] Deep Interest Evolution Network for Click-Through Rate Prediction (Alibaba 2019)
- [DSSM] Learning Deep Structured Semantic Models for Web Search using Clickthrough Data (UIUC 2013)
- [FNN] Deep Learning over Multi-field Categorical Data (UCL 2016)
- [DeepFM] A Factorization-Machine based Neural Network for CTR Prediction (HIT-Huawei 2017)
- [NFM] Neural Factorization Machines for Sparse Predictive Analytics (NUS 2017)