...menustart

Bayes' Nets: Inference

...menuend

Bayes' Nets: Inference

Recap: Example: Alarm Network

Inference

Inference: calculating some useful quantity from a joint probability distribution
Examples:
- Posterior probability
  - P( Q| E₁=e₁,...,E_k=e_k )
  - computing the posterior probability of some set of query variables conditioned on some evidence having been observed
- Most likely explanation
  - argmax_q P( Q=q | E₁=e₁...)
  - given some evidence has bee observed, what's the most likely association of some query variables.

Inference by Enumeration

the first type of inference is the naive type of inference is inference by enumeration
we have some query , and the query has
- some evidence variables E₁...E_k ,
- some query variables Q , the ones which we want to distribution given the evidence variables.
- some hidden variables. H₁...H_r , the ones that are not query variables , not query variables, but yet still are in our joint distribution.
  - we have to deal with them, you will have to sum them out effectively to get rid of them.
see details in Probability

Inference by Enumeration in Bayes’ Net

Given unlimited time, inference in BNs is easy
Reminder of inference by enumeration by example:
- P(B|+j,+m) ∝_B P(B,+j,+m)
- = ∑_e,a P(B,e,a,+j,+m)
- now we can use the definition if the joint distribution as its specified through the Bayes net :
- = ∑_e,a P(B)P(e)P(a|B,e)P(+j|a)P(+m|a)
- = P(B)P(+e)P(+a|B,+e)P(+j|+a)P(+m|+a) + P(B)P(+e)P(-a|B,+e)P(+j|-a)P(+m|-a) + P(B)P(-e)P(+a|B,-e)P(+j|+a)P(+m|+a) + P(B)P(-e)P(-a|B,-e)P(+j|-a)P(+m|-a)
if we have 100 variables in the BNs , 50 of them are evidence variables, means 50 of them not evidence variables that will look at 2⁵⁰ entries
- it gets very expensive, exponential in a number of non-evidence variables , unless almost everything is evidence
It's not ideal to do it this way

Inference by Enumeration vs. Variable Elimination

Why is inference by enumeration so slow?
- You join up the whole joint distribution before you sum out the hidden variables
Idea: interleave joining and marginalizing!
- Called "Variable Elimination"
- Still NP-hard, but usually much faster than inference by enumeration
- we'll find a way to interleave joining CPTs together and summing out over hidden variables.
- keep in mind it's not a silver bullet. Inference in BNs is np-hard. There are BNs where no matter what you do to compute the answer to the query is equivalent to solving a SAP(storage assignment problem) problem which is known that nobody has a efficient solution for.
- First we’ll need some new notation: factors

Factor Zoo

Factor Zoo I

Joint distribution: P(X,Y)
- Entries P(x,y) for all x, y
- Sums to 1

Example: P (T,W)

T	W	P
hot	sun	0.4
hot	rain	0.1
cold	sun	0.2
cold	rain	0.3

Selected joint: P(x,Y)
- A slice of the joint distribution
- Entries P(x,y) for fixed x, all y
- Sums to P(x)
  - factors don't have to sum to 1

Example: P( cold , W)

T	W	P
cold	sun	0.2
cold	rain	0.3

Number of capitals = dimensionality of the table
So as we work in this variable elimination process , the game will be one of trying to keep the number of capitialized variables small in our factor.

Factor Zoo II

Single conditional: P(Y | x)
- Entries P(y | x) for fixed x, all y
- Sums to 1

Example: P(W|cold)

T	W	P
cold	sun	0.4
cold	rain	0.6

Family of conditionals: P(X |Y)
- Multiple conditionals
- Entries P(x | y) for all x, y
- Sums to |Y|

Example : P(W|T)

T	W	P
hot	sun	0.8
hot	rain	0.2
cold	sun	0.4
cold	rain	0.6

Factor Zoo III

Specified family: P( y | X )
- Entries P(y | x) for fixed y, but for all x
- Sums to … who knows!

Example: P(rain | T)

T	W	P
hot	rain	0.2
cold	rain	0.6

Factor Zoo Summary

In general, when we write P(Y₁ … Y_N | X₁ … X_M)
- It is a “factor,” a multi-dimensional array
- Its values are P(y₁ … y_N | x₁ … x_M)
- Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array

Example : Traffic Domain

R → T → L
Random Variables
- R: Raining
- T: Traffic
- L: Late for class!

P(R)

+r	0.1
-r	0.9

P(T|R)

+r	+t	0.8
+r	-t	0.2
-r	+t	0.1
-r	-t	0.9

P(L|T)

+t	+l	0.3
+t	-l	0.7
-t	+l	0.1
-t	-l	0.9

query: P(L) = ?
- for inference enumertation:
- P(L) = ∑_r,t P(r,t,L)
- = ∑_r,t P(r)P(t|r)P(L|t)

Inference by Enumeration: Procedural Outline

Track objects called factors
Initial factors are local CPTs (one per node)
Any known values are selected
- E.g. if we know L = +l , the initial factors are
Procedure: Join all factors, then eliminate all hidden variables

Operation 1: Join Factors

First basic operation: joining factors
Combining factors:
- Just like a database join
- Get all factors over the joining variable
- Build a new factor over the union of the variables involved
Example: Join on R
Computation for each entry: pointwise products
- ∀_r,t : P(r,t) = P(r)·P(t|r)

Example: Multiple Joins

we call "Join on R" means that you grab all tables that have R in them.

Operation 2: Eliminate

Second basic operation: marginalization
Take a factor and sum out a variable
- Shrinks a factor to a smaller one
- A projection operation
  - get rid of the variables that don't matter -- the hidden variables
  - why do we even have hidden variables ?
    - the reason we have it because we started with a joint distribution that was over more than the variables that appear in our query.
Example:

Multiple Elimination

Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)
Marginalizing Early (= Variable Elimination)
- switch the some order of join/eliminate
intuition
- if you want to eliminate a variable , you can not do this until you have joined on that variable.

Example: Traffic Domain again

R → T → L
P(L) = ?
- Inference by Enumeration
- Variable Elimination

Marginalizing Early! (aka VE)

so that is variable elimination .
what if your evidence ?
- just like with the inference by enumeration , when there is evidence you just look at your tabels and you only retain those entries consistent with your evidence.

Evidence

If evidence, start with factors that select that evidence
- No evidence looks like this : uses these initial factors
- with evidence looks like this: computing P(L|+r) the initial factors become
We eliminate all vars other than query + evidence
Result will be a selected joint of query and evidence
- E.g. for P(L | +r), we would end up with:
To get our answer, just normalize this!
That's it!

General Variable Elimination

Query: P( Q| E₁=e₁,...,E_k=e_k )
Start with initial factors:
- Local CPTs (but instantiated by evidence)
While there are still hidden variables (not Q or evidence):
- Pick a hidden variable H
  - any ordering of hidden variables is valid
  - but some orderings will lead to very big factors being generated along the way
  - and some orderings might be able to keep the factors generated along the way very small
- Join all factors mentioning H
- Eliminate (sum out) H
Join all remaining factors and normalize

Example

P(B|j,m) ∝ P(B,j,m)
after introducing evidence , we have following factors:
- P(B) , P(E) , P(A|B,E) , P(j|A) , P(m|A)

Choose A
- the size of the tables generated along the way is 2³, in this case it wasn't too bad , it's possible to handle. but if you have a very large BNs you need to be careful about your orderingto make sure you keep it low.
- P(B) , P(E) , P(j,m|B,E)
Choose E
- P(B) , P(j,m|B)
Finish with B

Same Example in Equations

equations from top to bottom
1. marginal can be obtained from joint by summing out
2. use Bayes' net joint distribution expression
3. use x*(y+z) = xy + xz
4. joining on a , and then summing out give f₁
5. use x*(y+z) = xy + xz
6. joining on e , and then summing out give f₂
All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency !
how do you decide which variables to pick first ?
- suggestion here was a variable with very few connections . Connections means that it is participating in a factor.

Another Variable Elimination Example

Query: P(X₃|Y₁=y₁,Y₂=y₂,Y₃=y₃)
Start by inserting evidence , which gives the following initial factors:
- p(Z),p(X₁|Z),p(X₂|Z),p(X₃|Z),p(y₁|X₁),p(y₂|X₂),p(y₃|X₃)
Eliminate X₁, this introduce the factor f₁(Z,y₁) = ∑ₓ₁ p(x₁|Z)p(y₁|x₁) , and we are left with
- p(Z),f₁(Z,y₁),p(X₂|Z),p(X₃|Z),p(y₂|X₂),p(y₃|X₃)
Eliminate X₂, this introduce the factor f₂(Z,y₂) = ∑ₓ₂ p(x₂|Z)p(y₂|x₂) , and we are left with
- p(Z),f₁(Z,y₁),f₂(Z,y₂),p(X₃|Z),p(y₃|X₃)
Eliminate Z, this introduces the factor f₃(y₁,y₂,X₃) = ∑_z p(z)f₁(Z,y₁)f₂(Z,y₂)p(X₃|z) , and we are left:
- p(y₃|X₃),f₃(y₁,y₂,X₃)
- quiz: why there is a factor p(X₃|z) ? why not the other ones , like p(X₃,z) or P(Z,y) ?
  - X₃ is the query variable, the query variable doesn't go through the elimination process, the same way as a hidden variable
No hidden variables left. Join the remaining factors to get
- f₄(y₁,y₂,y₃, X₃) = p(y₃|X₃)·f₃(y₁,y₂,X₃)
Normalizing over X₃ gives P(X₃|y₁,y₂,y₃)

Variable Elimination Ordering

For the query P(X_n|y₁,…,y_n) work through the following two different orderings as done in previous slide: Z, X₁, …, X_n-1 and X₁, …, X_n-1, Z. What is the size of the maximum factor generated for each of the orderings?
Answer: 2ⁿ⁺¹ versus 2² (assuming binary)
In general: the ordering can greatly affect efficiency.

start from X₁ , we get f₁(Z,y₁) , then X₂, we get f₂(Z,y₂) , then ... f_n-1(Z,y_n-1)
- not X_n , X_n is query variable
now we eliminate Z , we get f( y₁ , ... , y_n-1 , ... X_n ) , also only 2 variables

VE: Computational and Space Complexity

The computational and space complexity of variable elimination is determined by the largest factor
The elimination ordering can greatly affect the size of the largest factor.
- E.g., previous slide’s example 2ⁿ vs. 2
Does there always exist an ordering that only results in small factors?
- No!
- There are BNs and we'll see one in next slides where no matter which ordering you pick it's going to generate large factors along the way.

Worst Case Complexity?

CPS
- a 3-Sat problem, a special kind of CSP
- there are 7 variables , X₁ ... X₇, I want to find an assignment to these 7 variables , such that this clause is true
  - the clause is saying : x₁ or x₂ or not x₃ has to be true , and not x₁ or x₃ or not x₄ has to be true , and ... so forth
- P(Xᵢ=0) = P(Xᵢ=1) = 0.5
- Y₁ = X₁ v X₂ v ¬V₃ , ... , Y₈ = ¬X₅ v X₆ v V₇
- Y₁‚₂ = Y₁ ∧ Y₂ , ... , Y₇‚₈ = Y₇ ∧ Y₈
- Y₁‚₂‚₃‚₄ = Y₁‚₂ ∧ Y₃‚₄ , Y₅‚₆‚₇‚₈ = Y₅‚₆ ∧ Y₇‚₈
- Z = Y₁‚₂‚₃‚₄ ∧ Y₅‚₆‚₇‚₈
why we use so many Y variables here ?
- BNs where variable has many parents is a large BNs , it's not very compact
- those Y variables gives us a BNs where every variables has at most 3 parents. So it's a very small BNs.
If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.
Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.

Polytrees

There are atrist special graph structures of BNs , where inference can be done efficiently. One example is Ploytree.

A polytree is a directed graph with no undirected cycles
- so directed acyclic graph can have undirected cycles.
- but if you don't allow those undirected cycles, you have a poly tree.
For poly-trees you can always find an ordering that is efficient
- Try it!!
Cut-set conditioning for Bayes’ net inference
- Choose set of variables such that if removed only a polytree remains
- Exercise: Think about how the specifics would work out!

Quiz BN2-2

BNs
Query: P(C |e=1)
we have the following probability tables
Step 1: eliminate A
- so we get the factor A→B : f₁(B) = ∑ₐ P(a)P(B|a)

B	A	P(A,B)
0	0	0.35
1	0	0.35
0	1	0.27
1	1	0.03

sum out A, we get

B	f₁(B)
0	0.62
1	0.38

Step 2: eliminate B.
- so we get the factor C←B→D : f₂(C,D) = ∑_b P(C|b)P(D|b)f₁(b)

f₁(B)	C	D	P(f₁(B) ,C,D)
0	0	0	0.155
0	1	0	0.155
0	0	1	0.155
0	1	1	0.155
1	0	0	0.0076
1	1	0	0.0684
1	0	1	0.0304
1	1	1	0.2736

sum out B, we get

C	D	f₂(C,D)
0	0	0.1626
1	0	0.2234
0	1	0.1854
1	1	0.4286

Step 3: eliminate D.
- so we get factor C→E←D: f₃(C,e=1) = ∑_d P(e=1|C,d)f₂(C,d)

C	D	e=1	P( f₂(C,D) , e=1 )
0	0	1	0.900*0.1626
1	0	1	0.600*0.2234
0	1	1	0.700*0.1854
1	1	1	0.500*0.4286

sum out D , we get

C	f₃(C,e=1)
0	0.27612
1	0.34834

After getting the final factor f₃(C,e=1), a final renormalization step needs be carried out to obtain the conditional probability P(C|e=1).

C	P(C\|e=1)
0	0.44217404
1	0.55782596

Quiz BN2-3

BNs
Query : P(A|+f)
we run variable elimination with the following ordering: E,D,C,B.
After introducing evidence, we have the following factors:
- P(A) , P(B|A), P(C|A,B), P(D|C), P(E|C), P(+f|E,D)
Step 1:
- After joining on E and summing out over E, we have generated a new factor f₁ over the following variables and/or evidence
  - C,D,+f
    - factors contain variable E are P(E|C), P(+f|E,D)
    - which contain variables: C,D,E,+f,
    - and note E is not included after summing out over
- After this step, the remaining factors are:
  - P(A) , P(B|A), P(C|A,B), P(D|C), f₁
    - P(E|C), P(+f|E,D) which contain variable E does not availabel any more
    - and new factor f₁ comes in
Step 2:
- After joining on D and summing out over D, we have generated a new factor f₂ over the following variables and/or evidence
  - C,+f
- After this step, the remaining factors are:
  - P(A) , P(B|A), P(C|A,B), f₂
    - P(D|C) not available now
    - f₁ was comsumed
    - f₂ comes in
Step 3:
- After joining on C and summing out over C, we have generated a new factor f₃ over the following variables and/or evidence
  - A,B,+f
- After this step, the remaining factors are:
  - P(A) , P(B|A), f₃
Step 4: After joining on B and summing out over
- A,+f
- P(A) , f₄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI_CS188_BNs_Inference.md

AI_CS188_BNs_Inference.md

Bayes' Nets: Inference

Recap: Example: Alarm Network

Inference

Inference by Enumeration

Inference by Enumeration in Bayes’ Net

Inference by Enumeration vs. Variable Elimination

Factor Zoo

Factor Zoo I

Factor Zoo II

Factor Zoo III

Factor Zoo Summary

Example : Traffic Domain

Inference by Enumeration: Procedural Outline

Operation 1: Join Factors

Example: Multiple Joins

Operation 2: Eliminate

Multiple Elimination

Example: Traffic Domain again

Marginalizing Early! (aka VE)

Evidence

General Variable Elimination

Example

Same Example in Equations

Another Variable Elimination Example

Variable Elimination Ordering

VE: Computational and Space Complexity

Worst Case Complexity?

Polytrees

Quiz BN2-2

Quiz BN2-3

Files

AI_CS188_BNs_Inference.md

Latest commit

History

AI_CS188_BNs_Inference.md

File metadata and controls

Bayes' Nets: Inference

Recap: Example: Alarm Network

Inference

Inference by Enumeration

Inference by Enumeration in Bayes’ Net

Inference by Enumeration vs. Variable Elimination

Factor Zoo

Factor Zoo I

Factor Zoo II

Factor Zoo III

Factor Zoo Summary

Example : Traffic Domain

Inference by Enumeration: Procedural Outline

Operation 1: Join Factors

Example: Multiple Joins

Operation 2: Eliminate

Multiple Elimination

Example: Traffic Domain again

Marginalizing Early! (aka VE)

Evidence

General Variable Elimination

Example

Same Example in Equations

Another Variable Elimination Example

Variable Elimination Ordering

VE: Computational and Space Complexity

Worst Case Complexity?

Polytrees

Quiz BN2-2

Quiz BN2-3