Skip to content

yushiyin/xgbspark-text-classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Features

  • Data Source: Hive
  • Word Segmentation: Ansj
  • Feature Engineering: NGram + TF-IDF or Pre-Trained Word2Vec
  • Classification Algorithm: XGBoost
  • Model Training: Spark Pipeline
  • Model Selection and Tuning: Cross Validation + Grid Search

Environments

About

XGBoost on Spark for Chinese Text Classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%