- Data Source:
Hive
- Word Segmentation:
Ansj
- Feature Engineering:
NGram + TF-IDF
orPre-Trained Word2Vec
- Classification Algorithm:
XGBoost
- Model Training:
Spark Pipeline
- Model Selection and Tuning:
Cross Validation + Grid Search
- Spark 2.1.1
- Hive 1.2.1
- XGBoost4J-Spark 0.7
- Ansj 5.1.2