fall + classes

sqqueak · Aug 24, 2023 · 0d632d5 · 0d632d5
1 parent c45bead
commit 0d632d5
Show file tree

Hide file tree

Showing 26 changed files with 250 additions and 18 deletions.
diff --git a/content/_index.md b/content/_index.md
@@ -4,12 +4,11 @@ enableToc: false
 ---
 
 # hey, it's squeak!
-Welcome to my pothole on the Internet. I'm Emily! My nickname is "*squeak*" — from which most of my usernames on the web are derived. I'm currently an undergraduate student at UW-Madison studying computer science, mathematics, and economics. This summer I am working at [Mandli Communications](https://www.mandli.com/) as an SWE intern!
+Welcome to my pothole on the Internet. I'm Emily! My nickname is "*squeak*" — from which most of my usernames on the web are derived. I'm currently an undergraduate student at UW-Madison studying computer science, mathematics, and economics. 
 
-My main hobby is puzzle-solving, which is where all my other interests stem from. I particularly enjoy competitive coding, among other computer science topics such as [computer networks](/ece537), image processing, robotics, and [low-level computing](https://store.steampowered.com/app/370360/TIS100/). In terms of math, my favorite subject is combinatorics, followed closely by probability theory and stochastic processes. In my free time, I like [reading](https://thebookerprizes.com/the-booker-library/books), playing poker, and listening to [video game music](https://youtu.be/HL9_xm5HwrE).
+My main hobby is puzzle-solving, which is where all my other interests stem from. I particularly enjoy competitive coding, among other computer science topics such as [computer networks](/ece537), image processing, robotics, and [low-level computing](https://store.steampowered.com/app/370360/TIS100/). In terms of math, my favorite subject is combinatorics, or more generally, probability theory. In my free time, I like [reading](https://thebookerprizes.com/the-booker-library/books), playing poker, and listening to [video game music](https://youtu.be/HL9_xm5HwrE).
 
-I like meeting interesting people -- if you want to say hi or grab lunch with me, send an email at `hello at emilyyao dot me`!
-<!-- If you're on campus, feel free to drop by during my [office hours]() at the Undergraduate Project Lab! -->
+I like meeting interesting people -- if you want to say hi or grab lunch with me, send an email at `hello at emilyyao dot me`! If you're on campus, feel free to drop by during my [office hours](https://www.upl.cs.wisc.edu/hours.html) at the Undergraduate Projects Lab!
 
 <!-- # projects...
 - Split wireless Lily58 Pro keyboard

diff --git a/content/academics.md b/content/academics.md
@@ -2,31 +2,31 @@
 title: Classes
 enableToc: false
 ---
-<!-- 
 # fall 2023
-&nbsp; &nbsp; ✏️ &nbsp; **CS 760:** Machine Learning  
-&nbsp; &nbsp; ✏️ &nbsp; **CS 538:** Introduction to Theory and Design of Programming Languages  
-&nbsp; &nbsp; ✏️ &nbsp; **MATH 632 (*Honors*):** Introduction to Stochastic Processes  
-&nbsp; &nbsp; ✏️ &nbsp; **ECON 111 (*Honors Accelerated*):** Principles of Economics  
-&nbsp; &nbsp; ✏️ &nbsp; **HISTORY 143:** History of Race and Inequality in Urban America  
-&nbsp; &nbsp; 💼 &nbsp; System Administrator Intern `@` Morgridge Institute of Research -->
+✏️ &nbsp; **CS 538:** Introduction to Theory and Design of Programming Languages  
+✏️ &nbsp; **MATH 525:** Introduction to Linear Optimization  
+✏️ &nbsp; **MATH 632 (*Honors*):** Introduction to Stochastic Processes  
+✏️ &nbsp; **ECON 111 (*Honors Accelerated*):** Principles of Economics  
+✏️ &nbsp; **HISTORY 143:** History of Race and Inequality in Urban America  
+✏️ &nbsp; **MUSIC 113:** Music in Performance    
+💼 &nbsp; System Administrator Intern `@` Morgridge Institute of Research
 
 # summer 2023 (self-studied)
-<!-- ✏️ &nbsp; **6.S191:** Introduction to Deep Learning `MIT` -->
+✏️ &nbsp; **[6.S191](/notes/6-s191):** Introduction to Deep Learning `MIT`  
 ✏️ &nbsp; **[CS 544](/notes/cs544):** Introduction to Big Data Systems  
 💼 &nbsp; Software Engineer Intern `@` Mandli Communications  
 
 # spring 2023
-✏️ &nbsp; **CS 354:** Machine Organization & Programming  
+✏️ &nbsp; **CS 354:** Machine Organization and Programming  
 ✏️ &nbsp; **CS 540:** Introduction to Artificial Intelligence  
 ✏️ &nbsp; **CS 577:** Introduction to Algorithms  
 ✏️ &nbsp; **MATH 431:** Introduction to the Theory of Probability  
 ✏️ &nbsp; **MATH 521 (*Honors*):** Analysis I  
-🔬 &nbsp; Research project: Training and optimizing AI image generation models on custom datasets. 
+🔬 &nbsp; Research project: Training and optimizing image generation models on custom datasets. 
 
 # fall 2022
-✏️ &nbsp; **CS 475:** Introduction to Combinatorics  
 ✏️ &nbsp; **CS 252:** Introduction to Computer Engineering  
+✏️ &nbsp; **CS 475:** Introduction to Combinatorics  
 ✏️ &nbsp; **[ECE 537](/ece537):** Communication Networks  
 ✏️ &nbsp; **PHYSICS 201 (*Honors*):** General Physics  
 ✏️ &nbsp; **MSE 299:** Independent Study -- Machine Learning for Engineering Research. Learned basic ML workflow like cleaning data, training models, and optimization.
@@ -36,8 +36,8 @@ enableToc: false
 ✏️ &nbsp; **[CS 61A](/notes/cs61a):** Structure and Interpretation of Computer Programs `UC-Berkeley`  
 
 # previous
-✏️ &nbsp; **CS 300:** Programming I (SU '20)  
-✏️ &nbsp; **CS 400:** Programming II (FA '20)  
+✏️ &nbsp; **CS 300:** Programming II (SU '20)  
+✏️ &nbsp; **CS 400:** Programming III (FA '20)  
 ✏️ &nbsp; **ECE 203:** Signals, Information, and Computation (SU '21)  
 ✏️ &nbsp; **MATH 20804232:** Calculus & Analytic Geometry 2 (SU '21) `Madison College`  
 ✏️ &nbsp; **MATH 234:** Calculus - Functions of Several Variables (FA '21)  

diff --git a/content/notes/6-s191.md b/content/notes/6-s191.md
@@ -0,0 +1,116 @@
+---
+title: "6.S191"
+---
+
+# introduction to deep learning
+- VISTA: synthesizing environments for autonomous vehicles to train in
+    - Don't have to send the vehicles out into the real world to train; can do so through simulation!
+- Artificial intelligence (AI): techniques that allow computers to mimic human behavior
+- Machine learning (ML): train a machine to make decisions based on a set of data, but not explicitly programming it to make decisions
+- Deep learning (DL): extracting patterns from neural networks to fill in the gaps
+    - Machine extracts core patterns, then applies it to new data
+        - This is as opposed to when humans hand pick and define correct and incorrect data and feed it into a machine to learn (ML)
+- **Perceptron**: a single neuron
+    - Composed of inputs, weights, a bias, nonlinear activation function, and summation
+    - Steps to get the output of a perceptron ($\hat{y}$): 
+        1. multiply inputs by weights  
+        2. sum  
+        3. add nonlinearity   
+![math representation of perceptron](/notes/image-2.png)
+- What is a **nonlinear function**? Why is it useful?
+    - What: a function that takes any real number and maps it to a specific range
+        - Example: the sigmoid function maps all numbers to be between [0, 1] using the function $g(z)=\frac{1}{1+e^{-z}}$
+    - Why: introduce nonlinearity into the network
+![example of why nonlinear data is important](/notes/image-3.png)
+- Deep neural networks are just neural networks with many hidden layers
+- **Loss functions** tell the neural network how big of a mistake it made, given the predicted value and the true value
+- Loss optimization involves minimizing the loss value -- we want to find NN weights that will achieve this
+    - Gradient descent:
+        - For any point, we can compute the gradient of the loss function for that point, and tweak the value such that the loss value decreases
+        - Repeat this until convergence to the minimum value
+- **Backpropagation:** computing the weight of the node by applying the chain rule on the loss function from the output to the input
+![backpropagation example](/notes/image-4.png)
+- Training NNs in practice is difficult
+    - What is the learning rate? How do we set it?
+        - Small LRs converge slowly and occasionally get stuck in local minima
+        - Large LRs overshoot and diverge and the NN doesn't train
+        - The learning rate can be an algorithm that adapts to the landscape (possible weights)
+    - What are batches? Why are they useful?
+        - It's not feasible to compute the gradient over the entire dataset because the dataset is too large
+        - Take a small "batch" (sample) of the dataset and compute the gradient over that instead of the entire set or a single point
+        - Using mini-batches allows for smoother convergence and faster training (allows for parallelization per batch!!)
+    - What is overfitting? How do we correct for it?
+        - A model that has overfit to a dataset means it has followed training data too well and can't generalize to other datasets
+        ![types of fit](/notes/image-5.png)
+        - Regularization is used to help discourage overfitting and is introduced into the NN
+            - Dropout: randomly set some neurons to be 0
+            - Early stopping: stop training before the model is good enough to overfit
+            ![early stopping graph](/notes/image-7.png)
+- Summary:
+    - What is the perceptron? What are its parts? What is a nonlinear activation function and why is it useful?
+    - How do we get from a single perceptron to a NN? How does the NN learn? What is backpropagation and what is its relation to weight calculation?
+    - What are some techniques applied in practice that allow for NNs to be accurate?
+
+---
+
+# recurrent neural networks, transformers, and attention
+- Sequential data are when the points in the dataset depend on other points, for example, sound waves defining audio or a timeseries such as a stock market
+![types of sequential models](/notes/image-13.png)
+- For working with sequential data, we define recurrence relations $h_t$ for a time step $t$ which retains information about the state which the NN was in when it produced the output $\hat{y}_t$ 
+    - Since the state of the NN is tracked for each output, then our output now depends on the state, $\hat{y}_t=f(x_t, h_{t-1})$
+- Formally, **recurrent neural networks (RNNs)** track state $h_t$ which is updated each time step
+    - Given an input vector, update the hidden state (which has its own weight matrix)
+    - Then combine the input's weight matrix and the hidden state's weight matrix with nonlinearity to get output
+    - Loss in calculated for each individual timestep and them combined into an overall loss value
+    - Backpropagation occurs through each individual timestep, then from the current time all the way to the beginning
+![RNN model example](/notes/image-10.png)
+- Issues with backpropagating in RNNs
+    - Computing the gradient wrt the initial input requires that you perform gradient calculations on many versions of the state weight matrix
+    - Exploding gradients > gradient clipping
+        - Gradients keep increasing and get extremely large
+        - Scale back large gradients by clipping
+    - Vanishing gradients
+        - Harder time capturing long term dependencies because many small numbers are being multiplied together
+        1. Activation function tweaking
+        ![ReLU vs sigmoid](/notes/image-11.png)
+        2. Parameter initialization via setting weights to identity matrices
+        3. Gated cells: using gates to filter information per recurrent unit (LSTM)
+        ![LSTM example](/notes/image-12.png)
+- What should we keep in mind when designing models for sequences?
+    - Variable-length of the sequences
+    - Dependencies between data points that are distant from each other
+    - Maintaining order
+    - One set of weights can be applied to any timestep input and still work
+- Encoding language for NNs: transform words into vector representations
+    - One-hot embedding
+        1. Obtain a vocabulary (corpus)
+        2. Map each word to an index
+        3. A word is a vector of 0s with a 1 at the word's index
+    - Cons to one-hot embedding: the words have no meaning to each other 
+    - Learned embedding: use a NN to learn an embedding
+- Limitations of RNNs in application
+    - Encoding bottlenecks: in the case of many-to-one models (sentiment classification), how do we encode all the text to just one single result without losing information?
+    - Slow: can't parallelize because every step depends on the previous one
+    - No long term memory
+- **Attention:** how can we eliminate the need for recurrence and improve on the above issues, but still analyze data in sequential order?
+    - Identify the most important features in the input
+        1. Encode position information (embedding)
+        2. Extract query, key, value for search -- what is the most important information related to my request?
+        3. Compute attention weighing -- compute pairwise similarity between each query and key
+            - How similar are any two features? 
+            - Computed using the dot product between query and key, called the cosine similarity or the similarity metric
+        4. Extract features with high attention  
+![Alt text](/notes/image-17.png)
+![Alt text](/notes/image-18.png)
+- Summary:
+    - What are RNNs? What are its key features? What are they useful for?
+    - How do RNNs perform backpropagation and what are some major issues of backpropagating in RNNs? What are some downsides of RNNs?
+    - What is attention? What is its importance with relation to RNNs? Why is it important?
+
+---
+
+# convolutional neural networks
+
+
+
+<!-- http://ruder.io/optimizing-gradient-descent/ -->