Skip to content

Latest commit

 

History

History
204 lines (162 loc) · 10.1 KB

2023-10-17.md

File metadata and controls

204 lines (162 loc) · 10.1 KB

#dailyNote

spam for note testing

more preamble

Afternoon To-Do

#meta #todoList

Priority

  • Complete [[Python]] module
  • Complete Bias and Variance

Bonus/tomorrow:

  • Note cleanup with Aliases (see [[Career and Study To - Do]])
  • [[Linear Algebra]] [[Note Migration]]
  • Integrate with this daily note - [ ] [[Training, Test, and Dev Sets]] - [ ] [[regularization]] (#merge and integrate) - [ ] [[Cost and Loss]]
  • Big overhaul of [[Bias and Variance]] with regards to todays notes.

[[Machine Learning Specialization]] Notes:

These notes will be integrated into other notes but kept here in their entirety.

Bias and Variance

[[Machine Learning Specialization]] Advanced Machine Learning Algorithms, Week 3

Models almost never work the first time you try them out. Let's see how we can fix them.

Diagnosing Bias/Variance

See [[Bias and Variance]]

If you have more features, you can't visualize [[Bias and Variance]]

A more systematic way to see if you have high bias or high variance is to look at the performance of the algorithm on the [[Training, Test, and Dev Sets|training set and dev set]]

A characteristic of a High Bias (under-fit) model is that $Jtrain$ ([[cost]] of the [[Training, Test, and Dev Sets|training set]]) is high. A characteristic of a High Variance (overfit) model is that $J_{cv}$ is high but $J_{train}$ is low.

Slide

![[Screenshot 2023-10-17 at 1.26.00 PM.png]]

Understanding Bias and Variance

Take degree of polynomial $d$ As $d$ increases, that is to say we add degrees to the polynomial $J_{train}({\vec{w},b})$ will decrease. $J_{cv}(\vec{w},b)$ dips towards the middle, showing the ideal degree for the polynomial.

Slide

![[Screenshot 2023-10-17 at 1.29.21 PM.png]]

Technique for Diagnosing Bias and Variance

How do you tell if your algorithm has a high bias or variance problem?

One key takeaway is that high bias means it is not even doing well on the training set and high variance means it does much worse on the cross validation set and the training set.

High Bias (under-fit)

In an under-fit model $J_{train}$ will be high and $J_{train} \approx J_{cv}$

High Variance (overfit)

In an over-fit model $J_{cv} \gg J_{train}$ and $J_{train}$ will be low.

High Bias and High Variance

In a model with both high bias and high variance $J_{train}$ will be high and $J_{cv} \gg J_{train}$ This is rare, and doesn't really happen for linear models with one $d$, but it does happen. This is when it overfits for some part of the input and under-fits for another part of the input.

Slide

![[Screenshot 2023-10-17 at 1.38.51 PM.png]]

Regularization and bias/variance

Linear Regression with Regularization

[[Regularized Linear Regression]] model: [[Lambda]] $\lambda$ is the [[regularization]] parameter (Big #merge and cleanup needed here in the [[regularization]], [[Regularized Linear Regression]], [[cost]], [[Cost and Loss]], [[loss]], [[Loss Function]], etc. Aliases are needed in many of these)

Take this model: $$f_{\vec{w},b}(x)=w_1+w_2x^2 + w_4x^4+b$$ With this [[regularization|regularized]] [[Cost and Loss]] $$J(\vec{w},b)=\frac{1}{2m} \sum_{i=1}^{m}(f_{\vec{w},{b}}(\vec{x}^{(i)}-y^{(i)})^2+\frac{\lambda}{2m} \sum_{j=1}^nw_j^2$$

If lambda is large, say $\lambda = 10,000$ than $w_1 \approx 0, W_2 \approx 0$ and $f_{\vec{x},b}\vec{x} \approx b$, thus the model creates a flat line. This is underfit, $J_{train}()

On the other hand, if we set $\lambda=0$, then we have a forth order polynomial with no regularization. We end up with a very overfit curve.

So, how do we find a good value for $\lambda$?

Slide

![[Screenshot 2023-10-17 at 3.56.53 PM.png]]

Choosing the Regularization parameter $\lambda$

This will be similar to choosing $d$ with cross validation Try multiple values for $\lambda$ and then choose the option with the lowest cost.

Slide

![[Screenshot 2023-10-17 at 4.15.27 PM.png]]

Bias and Variance as a Function of regularization parameter $\lambda$

Cross Validation tries out many versions of $\lambda$ and then chooses the one with the lowest cost. Cross validation will help find us a a good value of $d$ as well as $\lambda$

Slide

![[Screenshot 2023-10-17 at 4.14.07 PM.png]]

Establishing A Baseline Level of Performance

Speech Recognition example

Job is to take in audio and output the text of what a person is saying

Training error $J_{train}$ is percentage of audio clips that the program does not transcribe correctly in it's entirety. Lets say: Human Level Performance: 10.6 $J_{train}$: 10.8% $J_{cv}$: 14.8%

Why is human level error so high? There is lots of noise in the audio. It seems unfair to expect a learning algorithm to do much better. It is thus is more useful to measure the training error against the human error. So, looking at these results, $J_{train}$ is only 0.2$ higher than the human level performance whereas $J_{cv}$ is a full 4.2% higher. We can thus conclude that this algorithm has more of a variance problem than a bias problem.

Establishing a baseline level of performance

What is the level of error you can reasonably hope to get to?

  • Human level performance
  • Competing algorithms performance
  • Guess based on experience

Bias Variance Examples

Gap between baseline and training error shows high bias. A gap between training error and cross validation error shows high variance.

If your goal is perfection, the baseline would be zero. But for a lot of real world examples, like audio recognition, there is a lot of noise in the data, so you need a higher baseline.

If there is a gap between all three is high you have both high bias and high variance.

Slide

![[Screenshot 2023-10-17 at 4.42.49 PM.png]]

Learning Curves

Noted in [[Learning Curves]]

Overview

Learning curves help understand how your learning algorithm is doing as a function of the amount of experience it has. Experience being the number of training examples it has.

The bigger the training set the harder it is to fit all examples perfectly. Thus; as the training set increases so does the training error $J_{train}(\vec{w},b)$

Plotting a learning curve by training different models based on different subsets of training data is computationally expensive, so in practice it isn't done that often. But, it's a good mental visualization.

Slides

![[Screenshot 2023-10-17 at 5.40.04 PM.png]]

High Bias Example

If a learning algorithm suffers from high bias, getting more training data will not (by itself) help that much.

Slide for High Bias

![[Screenshot 2023-10-17 at 5.44.29 PM.png]]

High Variance Example

If a learning algorithm suffers from high variance, getting more training data is likely to help.

Slide for High Variance

![[Screenshot 2023-10-17 at 5.49.45 PM.png]]

Deciding what to try next revisited

Examples when Debugging an Algorithm:

Get more training examples: fixes high variance Try smaller set of features: fixes high variance Try getting additional features: fixes high bias Examples; an algorithm that lacks information wont even do well on the training set Adding polynomial features ($x_1^2,X_2^2, etc$): Fixes high bias Decreasing $\lambda$ Fixes high bias Increasing $\lambda$ Fixes high variance Forces the algorithm to force a smoother function.

Note! Don't randomly throw away training examples just to fix a high bias problem.

Takeaway

#merge with [[Bias and Variance]] ? If your algorithm has high variance, try simplifying your model or getting more training data. Simplification can mean a smaller set of features or an increased [[regularization]] If your algorithm has high bias, that is to say its not even doing well on the training set, you mainly need to make your model more powerful and flexible to fit more complex functions. To do so you can give it additional features, add polynomial features, or decrease $\lambda$

Bias/Variance and Neural Networks

The bias variance tradeoff

Simple model = high bias Complex model = high variance Before neural networks, we had to worry about balancing this complexity between bias and variance. With neural networks we now are mostly worried about high variance.

Neural Networks and bias variance

Large [[Neural Network|neural networks]] are low bias machines. If you make your neural network large enough you can almost always fit your training set well.

Recipe for decreasing bias with a neural network

  1. Train a neural network
  2. If the training set error $J_{train}(\vec{w},b)$ is high relative to your baseline, increase the size of the neural network by adding hidden layers.
  3. Once $J_{train}(\vec{w},b)$ is low enough, see if it does well on the cross validation set
  4. If the cross validation set $J_{cv}(\vec{w},b)$ is too high, add more data, then test again from step 2.
  5. Repeat until $J_{cv}(\vec{w},b)$ is low enough for your liking.
Slide Illustration

![[Screenshot 2023-10-17 at 6.38.46 PM.png]]

Limitations and Notes

Bigger networks are restricted by your computing power, data is restricted to the amount of data you have.

Sometimes you will pingpong back between high bias and high variance as you move through this recipe and develop a machine learning algorithm. Use these observations to shape what you do next in the process.

Neural Networks and Regularization

A large neural network will usually do as well or better than a smaller one so long as [[regularization]] is chosen appropriately. Of course, larger neural networks are more computationally expensive.

Neural Network Regularization #important

#function and [[TensorFlow]] implementation: ![[Screenshot 2023-10-17 at 6.46.28 PM.png]]Note usually don't regularize B, it doesn't really affect anything

[[Python]] Notes

Keywords

From [[codecademy]]

  • Continue Keyword: used inside a loop to skip the remaining loop code block and begin the next loop iteration.
  • Break keyword escapes the loop, regardless of the iteration number. Once break executes the program will continue to execute after the loop.

New Techniques

  • List Comprehension
    • Run loops within a list:
    • new_prices=[price - 5 for price in prices]

This is similar to choosing the degree of the polynomial using [[Training, Test, and Dev Sets]]