Golang machine learning lib. It's forked form github.com/xlvector/hector, but has been rebuild for clearer CLI commands and algorithms scalability.
- Logistic Regression
- Factorized Machine
- CART, Random Forest, Random Decision Tree, Gradient Boosting Decision Tree
- Neural Network
Hector support libsvm-like data format. Following is an sample dataset
1 1:0.7 3:0.1 9:0.4
0 2:0.3 4:0.9 7:0.5
0 2:0.7 5:0.3
...
go get github.com/pantsing/hector
hector --help
Here, supported algorithms include
- lr : logistic regression with SGD and L2 regularization.
- ftrl : FTRL-proximal logistic regreesion with L1 regularization. Please review this paper for more details "Ad Click Prediction: a View from the Trenches".
- ep : bayesian logistic regression with expectation propagation. Please review this paper for more details "Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine"
- fm : factorization machine
- cart : classifiaction tree
- cart-regression : regression tree
- rf : random forest
- rdt : random decision trees
- gbdt : gradient boosting decisio tree
- linear-svm : linear svm with L1 regularization
- svm : svm optimizaed by SMO (current, its linear svm)
- l1vm : vector machine with L1 regularization by RBF kernel
- knn : k-nearest neighbor classification
Following are datasets used in benchmarks, You can find them from UCI Machine Learning Repository
- heart
- fourclass
I will do 5-fold cross validation on the dataset, and use AUC as evaluation metric. Following are the results:
DataSet | Method | AUC |
---|---|---|
heart | FTRL-LR | 0.9109 |
heart | EP-LR | 0.8982 |
heart | CART | 0.8231 |
heart | RDT | 0.9155 |
heart | RF | 0.9019 |
heart | GBDT | 0.9061 |
fourclass | FTRL-LR | 0.8281 |
fourclass | EP-LR | 0.7986 |
fourclass | CART | 0.9832 |
fourclass | RDT | 0.9925 |
fourclass | RF | 0.9947 |
fourclass | GBDT | 0.9958 |