CS231n: Convolutional Neural Networks for Visual Recognition
How to Choose a Loss Function For Face Recognition
An overview of gradient descent optimization algorithms
Nonlinear Optimization Using Newton’s Method
Nonlinear Optimization Using Halley’s Method
An Interactive Tutorial on Numerical Optimization
Alternatives to the Gradient Descent Algorithm
The Unreasonable Effectiveness of Recurrent Neural Networks
Character-level recurrent sequence-to-sequence model - Keras Example
Understanding LSTM Networks
A simple overview of RNN, LSTM and Attention Mechanism
A Deep Dive Into the Transformer Architecture – The Development of Transformer Models
Transformers from scratch
Attention Mechanism
Sequence to Sequence (seq2seq) and Attention
Building Seq2Seq LSTM with Luong Attention in Keras for Time Series Forecasting
A Comprehensive Guide to Attention Mechanism in Deep Learning for Everyone
Attention Mechanisms in Recurrent Neural Networks (RNNs) With Keras
A simple overview of RNN, LSTM and Attention Mechanism
seq2seq Part F Encoder Decoder with Bahdanau & Luong Attention Mechanism.ipynb
The encoder-decoder model as a dimensionality reduction technique
Essential Math for Data Science: Eigenvectors and application to PCA
Determinant of a Matrix
Eigenvector and Eigenvalue
A geometric interpretation of the covariance matrix
6 Types of “Feature Importance” Any Data Scientist Should Know
Bayesian Optimization with Python
Variance Reduction
Long Short-Term Memory Recurrent Neural Network Architectures
for Large Scale Acoustic Modeling
Auto-encoder based Model for High-dimensional Imbalanced Industrial Data
TRAINING RECURRENT NEURAL NETWORKS by Ilya Sutskever
Deep learning via Hessian-free optimization
Accelerated gradient methods combining Tikhonov regularization with
geometric damping driven by the Hessian
Bayesian Methods for Hackers
An Introduction to Statistical Learning