Skip to content

Commit

Permalink
merged pull reqs
Browse files Browse the repository at this point in the history
  • Loading branch information
Aditya Grover authored and Aditya Grover committed Jan 16, 2017
1 parent c61071c commit 2e12f6a
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions preliminaries/introduction/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Perhaps the simplest model would be a linear equation of the form
y = \beta^T x,
{% endmath %}

where $$y$$ is an outcome variable that we want to predict, and $$x$$ are known (given) variables that affect the outcome. For example, $$y$$ may be the price of a house, and $$x$$ are a series of factors that affect this price, e.g. the location, the number of bedrooms, the age of the house, etc. We assume that $$y$$ is a linear function of this inputs (parametrized by $$\beta$$).
where $$y$$ is an outcome variable that we want to predict, and $$x$$ are known (given) variables that affect the outcome. For example, $$y$$ may be the price of a house, and $$x$$ are a series of factors that affect this price, e.g. the location, the number of bedrooms, the age of the house, etc. We assume that $$y$$ is a linear function of this input (parametrized by $$\beta$$).

Often, the real world that we are trying to model is very complicated; in particular, it often involves a significant amount of *uncertainty* (e.g., the price of a house has a certain chance of going up if a new subway station opens within a certain distance). It is therefore very natural to deal with this uncertainty by modeling the world in the form a probability distribution{% sidenote 2 'For a more philosophical discussion of why one should use probability theory as opposed to something else, see the [Dutch book argument](http://plato.stanford.edu/entries/dutch-book/) for probabilism.'%}

Expand Down Expand Up @@ -64,7 +64,7 @@ Each factor {%m%}p(x_i | y){%em%} can be completely described by a small number

## Describing probabilities with graphs

Our independence assumption can be conveniently represented in the form of a graph.{% marginfigure 'nb1' 'assets/img/naive-bayes.png' 'Graphical representation of the Naive Bayes spam classification model. We can interpret the directed graph as indicating a story of how the data was generated: first, we a spam/non-spam label was chosen at random; then a subset of $$n$$ possible English words sampled independently and at random.' %}
Our independence assumption can be conveniently represented in the form of a graph.{% marginfigure 'nb1' 'assets/img/naive-bayes.png' 'Graphical representation of the Naive Bayes spam classification model. We can interpret the directed graph as indicating a story of how the data was generated: first, a spam/non-spam label was chosen at random; then a subset of $$n$$ possible English words sampled independently and at random.' %}
This representation has the immediate advantage of being easy to understand. It can be interpreted as telling us a story: an email was generated by first choosing at random whether the email is spam or not (indicated by $$y$$), and then by sampling words one at a time. Conversely, if we have a story of how our dataset was generated, we can naturally express it as a graph with an associated probability distribution.

More importantly, we want to submit various queries to the model (e.g. what is the probability of spam given that I see the word "pill"?); answering these questions will require specialized algorithms that will be most naturally defined using graph-theoretical concepts. We will also use graph theory to analyze the speed of learning algorithms and to quantify the computational complexity (e.g. NP-hardness) of different learning tasks.
Expand Down Expand Up @@ -100,4 +100,4 @@ It turns out that inference is a very challenging task. For many probabilities o

### Learning

Our last key task refers to fitting a model to a dataset, which could be for example a large number of labeled examples of spam. By looking at the data, we can infer useful patterns (e.g. which word are found more frequently in spam emails), which we can then use to make predictions about the future. However, we will see that learning and inference are also inherently linked in a more subtle way, since inference will turn out to be a key subroutine that we will repeatedly call within learning algorithms. Also, the topic of learning will feature important connections to the field of computational learning theory --- which deals with questions such as generalization from limited data and overfitting --- as well as to Bayesian statistics --- which tells us (among other things) about how to combine prior knowledge and observed evidence in a principled way.
Our last key task refers to fitting a model to a dataset, which could be for example a large number of labeled examples of spam. By looking at the data, we can infer useful patterns (e.g. which words are found more frequently in spam emails), which we can then use to make predictions about the future. However, we will see that learning and inference are also inherently linked in a more subtle way, since inference will turn out to be a key subroutine that we will repeatedly call within learning algorithms. Also, the topic of learning will feature important connections to the field of computational learning theory --- which deals with questions such as generalization from limited data and overfitting --- as well as to Bayesian statistics --- which tells us (among other things) about how to combine prior knowledge and observed evidence in a principled way.

0 comments on commit 2e12f6a

Please sign in to comment.