Advantages:
- Simple to understand and explain
- Model training is fast
- Can be used for classification and regression
Disadvantages:
- Must store all of the training data
- Prediction phase can be slow when n is large
- Sensitive to irrelevant features
- Sensitive to the scale of the data
- Accuracy is (generally) not competitive with the best supervised learning methods
Advantages:
- Simple to explain
- Highly interpretable
- Model training and prediction are fast
- No tuning is required (excluding regularization)
- Features don't need scaling
- Can perform well with a small number of observations
- Well-understood
Disadvantages:
- Presumes a linear relationship between the features and the response
- Performance is (generally) not competitive with the best supervised learning methods due to high bias
- Can't automatically learn feature interactions
Advantages:
- Highly interpretable (if you remember how)
- Model training and prediction are fast
- No tuning is required (excluding regularization)
- Features don't need scaling
- Can perform well with a small number of observations
- Outputs well-calibrated predicted probabilities
Disadvantages:
- Presumes a linear relationship between the features and the log-odds of the response
- Performance is (generally) not competitive with the best supervised learning methods
- Can't automatically learn feature interactions
Advantages:
- Model training and prediction are very fast
- Somewhat interpretable
- No tuning is required
- Features don't need scaling
- Insensitive to irrelevant features (with enough observations)
- Performs better than logistic regression when the training set is very small
Disadvantages:
- Predicted probabilities are not well-calibrated
- Correlated features can be problematic (due to the independence assumption)
- Can't handle negative features (with Multinomial Naive Bayes)
- Has a higher "asymptotic error" than logistic regression
Advantages:
- Can be used for regression or classification
- Can be displayed graphically
- Highly interpretable
- Can be specified as a series of rules, and more closely approximate human decision-making than other models
- Prediction is fast
- Features don't need scaling
- Automatically learns feature interactions
- Tends to ignore irrelevant features
- Non-parametric (will outperform linear models if relationship between features and response is highly non-linear)
Disadvantages:
- Performance is (generally) not competitive with the best supervised learning methods
- Can easily overfit the training data (tuning is required)
- Small variations in the data can result in a completely different tree (high variance)
- Recursive binary splitting makes "locally optimal" decisions that may not result in a globally optimal tree
- Doesn't tend to work well if the classes are highly unbalanced
- Doesn't tend to work well with very small datasets
Advantages (compared to decision trees):
- Performance is competitive with the best supervised learning methods
- Provides a more reliable estimate of feature importance
- Allows you to estimate out-of-sample error without using train/test split or cross-validation
Disadvantages (compared to decision trees):
- Less interpretable
- Slower to train
- Slower to predict
Advantages (compared to unregularized linear models):
- Better performance
- L1 regularization performs automatic feature selection
- Useful for high-dimensional problems (p > n)
Disadvantages (compared to unregularized linear models):
- Tuning is required
- Feature scaling is recommended
- Less interpretable (due to feature scaling)