Skip to content

Latest commit

 

History

History
70 lines (50 loc) · 2.06 KB

model_comparison.md

File metadata and controls

70 lines (50 loc) · 2.06 KB

Comparison of Machine Learning Models

K-nearest neighbors (KNN)

Advantages:

  • Simple to understand and explain
  • Model training is fast
  • Can be used for classification and regression

Disadvantages:

  • Must store all of the training data
  • Prediction phase can be slow when n is large
  • Sensitive to irrelevant features
  • Sensitive to the scale of the data
  • Accuracy is (generally) not competitive with the best supervised learning methods

Linear Regression

Advantages:

  • Simple to explain
  • Highly interpretable
  • Model training and prediction are fast
  • No tuning is required (excluding regularization)
  • Features don't need scaling
  • Can perform well with a small number of observations
  • Well-understood

Disadvantages:

  • Presumes a linear relationship between the features and the response
  • Performance is (generally) not competitive with the best supervised learning methods due to high bias
  • Can't automatically learn feature interactions

Logistic Regression

Advantages:

  • Highly interpretable (if you remember how)
  • Model training and prediction are fast
  • No tuning is required (excluding regularization)
  • Features don't need scaling
  • Can perform well with a small number of observations
  • Outputs well-calibrated predicted probabilities

Disadvantages:

  • Presumes a linear relationship between the features and the log-odds of the response
  • Performance is (generally) not competitive with the best supervised learning methods
  • Can't automatically learn feature interactions

Naive Bayes

Advantages:

  • Model training and prediction are very fast
  • Somewhat interpretable
  • No tuning is required
  • Features don't need scaling
  • Insensitive to irrelevant features (with enough observations)
  • Performs better than logistic regression when the training set is very small

Disadvantages:

  • Predicted probabilities are not well-calibrated
  • Correlated features can be problematic (due to the independence assumption)
  • Can't handle negative features (with Multinomial Naive Bayes)
  • Has a higher "asymptotic error" than logistic regression