- Created notebook to regenerate the buggy lasso/elasticnet plots from Hastie's book (Vlad)
- L2 constraint for linear models gives Lipschitz continuity of prediction function (Thanks to Brian Dalessandro for pointing this out to me).
- Expanded discussion of L1/L2/ElasticNet with correlated random variables (Thanks Brett for the figures)
- New lecture on multiclass classification and an intro to structured prediction
- New homework on multiclass hinge loss and multiclass SVM
- New homework on Bayesian methods, specifically the beta-binomial model, hierarchical models, empirical Bayes ML-II, MAP-II
- New short lecture on correlated variables with L1, L2, and Elastic Net regularization
- Added some details about subgradient methods, including a one-slide proof that subgradient descent moves us towards a minimizer of a convex function (based on Boyd's notes)
- Added some review notes on directional derivatives, gradients, and first-order approximations
- Added light discussion of convergence rates for SGD vs GD (accidentally left out theorem for SGD)
- For lack of time, dropped the curse of dimensionality discussion, originally based on Guillaume Obozinski's slides
- New lecture (from slide 12) on the Representer Theorem (without RKHS), and its use for kernelization (based on Shalev-Shwartz and Ben-David's book)
- Dropped the kernel machine approach (slide 16) to introducing kernels, which was based on the approach in Kevin Murphy's book
- Added EM algorithm convergence theorem (slide 20) based on Vaida's result
- New lecture giving more details on gradient boosting, including brief mentions of some variants (stochastic gradient boosting, LogitBoost, XGBoost)
- New worked example for predicting exponential distributions with generalized linear models and gradient boosting models.
- Deconstructed 2015's lecture on generalized linear models, which started with natural exponential families (slide 15) and built up to a definition of GLMs (slide 20). Instead, presented the more general notion of conditional probability models, focused on using MLE and gave multiple examples; relegated formal introduction of exponential families and generalized linear models to the end;
- Removed equality constraints from convex optimization lecture to simplify, but check here if you want them back
- Dropped content on Bayesian Naive Bayes, for lack of time
- Dropped formal discussion of k-means objective function (slide 9)
- Dropped the brief introduction to information theory. Initially included, since we needed to introduce KL divergence and Gibbs inequality anyway, for the EM algorithm. The mathematical prerequisites are now given here (slide 15).
Machine Learning Course Materials by Various Authors is licensed under a Creative Commons Attribution 4.0 International License. The author of each document in this repository is considered the license holder for that document.