Skip to content

Commit

Permalink
fall + classes
Browse files Browse the repository at this point in the history
  • Loading branch information
sqqueak committed Aug 24, 2023
1 parent c45bead commit 0d632d5
Show file tree
Hide file tree
Showing 26 changed files with 250 additions and 18 deletions.
7 changes: 3 additions & 4 deletions content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ enableToc: false
---

# hey, it's squeak!
Welcome to my pothole on the Internet. I'm Emily! My nickname is "*squeak*" — from which most of my usernames on the web are derived. I'm currently an undergraduate student at UW-Madison studying computer science, mathematics, and economics. This summer I am working at [Mandli Communications](https://www.mandli.com/) as an SWE intern!
Welcome to my pothole on the Internet. I'm Emily! My nickname is "*squeak*" — from which most of my usernames on the web are derived. I'm currently an undergraduate student at UW-Madison studying computer science, mathematics, and economics.

My main hobby is puzzle-solving, which is where all my other interests stem from. I particularly enjoy competitive coding, among other computer science topics such as [computer networks](/ece537), image processing, robotics, and [low-level computing](https://store.steampowered.com/app/370360/TIS100/). In terms of math, my favorite subject is combinatorics, followed closely by probability theory and stochastic processes. In my free time, I like [reading](https://thebookerprizes.com/the-booker-library/books), playing poker, and listening to [video game music](https://youtu.be/HL9_xm5HwrE).
My main hobby is puzzle-solving, which is where all my other interests stem from. I particularly enjoy competitive coding, among other computer science topics such as [computer networks](/ece537), image processing, robotics, and [low-level computing](https://store.steampowered.com/app/370360/TIS100/). In terms of math, my favorite subject is combinatorics, or more generally, probability theory. In my free time, I like [reading](https://thebookerprizes.com/the-booker-library/books), playing poker, and listening to [video game music](https://youtu.be/HL9_xm5HwrE).

I like meeting interesting people -- if you want to say hi or grab lunch with me, send an email at `hello at emilyyao dot me`!
<!-- If you're on campus, feel free to drop by during my [office hours]() at the Undergraduate Project Lab! -->
I like meeting interesting people -- if you want to say hi or grab lunch with me, send an email at `hello at emilyyao dot me`! If you're on campus, feel free to drop by during my [office hours](https://www.upl.cs.wisc.edu/hours.html) at the Undergraduate Projects Lab!

<!-- # projects...
- Split wireless Lily58 Pro keyboard
Expand Down
26 changes: 13 additions & 13 deletions content/academics.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,31 @@
title: Classes
enableToc: false
---
<!--
# fall 2023
&nbsp; &nbsp; ✏️ &nbsp; **CS 760:** Machine Learning
&nbsp; &nbsp; ✏️ &nbsp; **CS 538:** Introduction to Theory and Design of Programming Languages
&nbsp; &nbsp; ✏️ &nbsp; **MATH 632 (*Honors*):** Introduction to Stochastic Processes
&nbsp; &nbsp; ✏️ &nbsp; **ECON 111 (*Honors Accelerated*):** Principles of Economics
&nbsp; &nbsp; ✏️ &nbsp; **HISTORY 143:** History of Race and Inequality in Urban America
&nbsp; &nbsp; 💼 &nbsp; System Administrator Intern `@` Morgridge Institute of Research -->
✏️ &nbsp; **CS 538:** Introduction to Theory and Design of Programming Languages
✏️ &nbsp; **MATH 525:** Introduction to Linear Optimization
✏️ &nbsp; **MATH 632 (*Honors*):** Introduction to Stochastic Processes
✏️ &nbsp; **ECON 111 (*Honors Accelerated*):** Principles of Economics
✏️ &nbsp; **HISTORY 143:** History of Race and Inequality in Urban America
✏️ &nbsp; **MUSIC 113:** Music in Performance
💼 &nbsp; System Administrator Intern `@` Morgridge Institute of Research

# summer 2023 (self-studied)
<!-- ✏️ &nbsp; **6.S191:** Introduction to Deep Learning `MIT` -->
✏️ &nbsp; **[6.S191](/notes/6-s191):** Introduction to Deep Learning `MIT`
✏️ &nbsp; **[CS 544](/notes/cs544):** Introduction to Big Data Systems
💼 &nbsp; Software Engineer Intern `@` Mandli Communications

# spring 2023
✏️ &nbsp; **CS 354:** Machine Organization & Programming
✏️ &nbsp; **CS 354:** Machine Organization and Programming
✏️ &nbsp; **CS 540:** Introduction to Artificial Intelligence
✏️ &nbsp; **CS 577:** Introduction to Algorithms
✏️ &nbsp; **MATH 431:** Introduction to the Theory of Probability
✏️ &nbsp; **MATH 521 (*Honors*):** Analysis I
🔬 &nbsp; Research project: Training and optimizing AI image generation models on custom datasets.
🔬 &nbsp; Research project: Training and optimizing image generation models on custom datasets.

# fall 2022
✏️ &nbsp; **CS 475:** Introduction to Combinatorics
✏️ &nbsp; **CS 252:** Introduction to Computer Engineering
✏️ &nbsp; **CS 475:** Introduction to Combinatorics
✏️ &nbsp; **[ECE 537](/ece537):** Communication Networks
✏️ &nbsp; **PHYSICS 201 (*Honors*):** General Physics
✏️ &nbsp; **MSE 299:** Independent Study -- Machine Learning for Engineering Research. Learned basic ML workflow like cleaning data, training models, and optimization.
Expand All @@ -36,8 +36,8 @@ enableToc: false
✏️ &nbsp; **[CS 61A](/notes/cs61a):** Structure and Interpretation of Computer Programs `UC-Berkeley`

# previous
✏️ &nbsp; **CS 300:** Programming I (SU '20)
✏️ &nbsp; **CS 400:** Programming II (FA '20)
✏️ &nbsp; **CS 300:** Programming II (SU '20)
✏️ &nbsp; **CS 400:** Programming III (FA '20)
✏️ &nbsp; **ECE 203:** Signals, Information, and Computation (SU '21)
✏️ &nbsp; **MATH 20804232:** Calculus & Analytic Geometry 2 (SU '21) `Madison College`
✏️ &nbsp; **MATH 234:** Calculus - Functions of Several Variables (FA '21)
Expand Down
116 changes: 116 additions & 0 deletions content/notes/6-s191.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: "6.S191"
---

# introduction to deep learning
- VISTA: synthesizing environments for autonomous vehicles to train in
- Don't have to send the vehicles out into the real world to train; can do so through simulation!
- Artificial intelligence (AI): techniques that allow computers to mimic human behavior
- Machine learning (ML): train a machine to make decisions based on a set of data, but not explicitly programming it to make decisions
- Deep learning (DL): extracting patterns from neural networks to fill in the gaps
- Machine extracts core patterns, then applies it to new data
- This is as opposed to when humans hand pick and define correct and incorrect data and feed it into a machine to learn (ML)
- **Perceptron**: a single neuron
- Composed of inputs, weights, a bias, nonlinear activation function, and summation
- Steps to get the output of a perceptron ($\hat{y}$):
1. multiply inputs by weights
2. sum
3. add nonlinearity
![math representation of perceptron](/notes/image-2.png)
- What is a **nonlinear function**? Why is it useful?
- What: a function that takes any real number and maps it to a specific range
- Example: the sigmoid function maps all numbers to be between [0, 1] using the function $g(z)=\frac{1}{1+e^{-z}}$
- Why: introduce nonlinearity into the network
![example of why nonlinear data is important](/notes/image-3.png)
- Deep neural networks are just neural networks with many hidden layers
- **Loss functions** tell the neural network how big of a mistake it made, given the predicted value and the true value
- Loss optimization involves minimizing the loss value -- we want to find NN weights that will achieve this
- Gradient descent:
- For any point, we can compute the gradient of the loss function for that point, and tweak the value such that the loss value decreases
- Repeat this until convergence to the minimum value
- **Backpropagation:** computing the weight of the node by applying the chain rule on the loss function from the output to the input
![backpropagation example](/notes/image-4.png)
- Training NNs in practice is difficult
- What is the learning rate? How do we set it?
- Small LRs converge slowly and occasionally get stuck in local minima
- Large LRs overshoot and diverge and the NN doesn't train
- The learning rate can be an algorithm that adapts to the landscape (possible weights)
- What are batches? Why are they useful?
- It's not feasible to compute the gradient over the entire dataset because the dataset is too large
- Take a small "batch" (sample) of the dataset and compute the gradient over that instead of the entire set or a single point
- Using mini-batches allows for smoother convergence and faster training (allows for parallelization per batch!!)
- What is overfitting? How do we correct for it?
- A model that has overfit to a dataset means it has followed training data too well and can't generalize to other datasets
![types of fit](/notes/image-5.png)
- Regularization is used to help discourage overfitting and is introduced into the NN
- Dropout: randomly set some neurons to be 0
- Early stopping: stop training before the model is good enough to overfit
![early stopping graph](/notes/image-7.png)
- Summary:
- What is the perceptron? What are its parts? What is a nonlinear activation function and why is it useful?
- How do we get from a single perceptron to a NN? How does the NN learn? What is backpropagation and what is its relation to weight calculation?
- What are some techniques applied in practice that allow for NNs to be accurate?

---

# recurrent neural networks, transformers, and attention
- Sequential data are when the points in the dataset depend on other points, for example, sound waves defining audio or a timeseries such as a stock market
![types of sequential models](/notes/image-13.png)
- For working with sequential data, we define recurrence relations $h_t$ for a time step $t$ which retains information about the state which the NN was in when it produced the output $\hat{y}_t$
- Since the state of the NN is tracked for each output, then our output now depends on the state, $\hat{y}_t=f(x_t, h_{t-1})$
- Formally, **recurrent neural networks (RNNs)** track state $h_t$ which is updated each time step
- Given an input vector, update the hidden state (which has its own weight matrix)
- Then combine the input's weight matrix and the hidden state's weight matrix with nonlinearity to get output
- Loss in calculated for each individual timestep and them combined into an overall loss value
- Backpropagation occurs through each individual timestep, then from the current time all the way to the beginning
![RNN model example](/notes/image-10.png)
- Issues with backpropagating in RNNs
- Computing the gradient wrt the initial input requires that you perform gradient calculations on many versions of the state weight matrix
- Exploding gradients > gradient clipping
- Gradients keep increasing and get extremely large
- Scale back large gradients by clipping
- Vanishing gradients
- Harder time capturing long term dependencies because many small numbers are being multiplied together
1. Activation function tweaking
![ReLU vs sigmoid](/notes/image-11.png)
2. Parameter initialization via setting weights to identity matrices
3. Gated cells: using gates to filter information per recurrent unit (LSTM)
![LSTM example](/notes/image-12.png)
- What should we keep in mind when designing models for sequences?
- Variable-length of the sequences
- Dependencies between data points that are distant from each other
- Maintaining order
- One set of weights can be applied to any timestep input and still work
- Encoding language for NNs: transform words into vector representations
- One-hot embedding
1. Obtain a vocabulary (corpus)
2. Map each word to an index
3. A word is a vector of 0s with a 1 at the word's index
- Cons to one-hot embedding: the words have no meaning to each other
- Learned embedding: use a NN to learn an embedding
- Limitations of RNNs in application
- Encoding bottlenecks: in the case of many-to-one models (sentiment classification), how do we encode all the text to just one single result without losing information?
- Slow: can't parallelize because every step depends on the previous one
- No long term memory
- **Attention:** how can we eliminate the need for recurrence and improve on the above issues, but still analyze data in sequential order?
- Identify the most important features in the input
1. Encode position information (embedding)
2. Extract query, key, value for search -- what is the most important information related to my request?
3. Compute attention weighing -- compute pairwise similarity between each query and key
- How similar are any two features?
- Computed using the dot product between query and key, called the cosine similarity or the similarity metric
4. Extract features with high attention
![Alt text](/notes/image-17.png)
![Alt text](/notes/image-18.png)
- Summary:
- What are RNNs? What are its key features? What are they useful for?
- How do RNNs perform backpropagation and what are some major issues of backpropagating in RNNs? What are some downsides of RNNs?
- What is attention? What is its importance with relation to RNNs? Why is it important?

---

# convolutional neural networks



<!-- http://ruder.io/optimizing-gradient-descent/ -->
Loading

0 comments on commit 0d632d5

Please sign in to comment.