...menustart
- Bayes' Nets: Inference
- Recap: Example: Alarm Network
- Inference
- Inference by Enumeration
- Inference by Enumeration in Bayes’ Net
- Inference by Enumeration vs. Variable Elimination
- Factor Zoo
- Inference by Enumeration: Procedural Outline
- General Variable Elimination
- Variable Elimination Ordering
- VE: Computational and Space Complexity
- Polytrees
- Quiz BN2-2
- Quiz BN2-3
...menuend
- Inference: calculating some useful quantity from a joint probability distribution
- Examples:
- Posterior probability
- P( Q| E₁=e₁,...,Ek=ek )
- computing the posterior probability of some set of query variables conditioned on some evidence having been observed
- Most likely explanation
- argmaxq P( Q=q | E₁=e₁...)
- given some evidence has bee observed, what's the most likely association of some query variables.
- Posterior probability
- the first type of inference is the naive type of inference is inference by enumeration
- we have some query , and the query has
- some evidence variables E₁...Ek ,
- some query variables Q , the ones which we want to distribution given the evidence variables.
- some hidden variables. H₁...Hr , the ones that are not query variables , not query variables, but yet still are in our joint distribution.
- we have to deal with them, you will have to sum them out effectively to get rid of them.
- see details in Probability
-
Given unlimited time, inference in BNs is easy
-
Reminder of inference by enumeration by example:
- P(B|+j,+m) ∝B P(B,+j,+m)
- = ∑e,a P(B,e,a,+j,+m)
- now we can use the definition if the joint distribution as its specified through the Bayes net :
- = ∑e,a P(B)P(e)P(a|B,e)P(+j|a)P(+m|a)
- = P(B)P(+e)P(+a|B,+e)P(+j|+a)P(+m|+a) + P(B)P(+e)P(-a|B,+e)P(+j|-a)P(+m|-a) + P(B)P(-e)P(+a|B,-e)P(+j|+a)P(+m|+a) + P(B)P(-e)P(-a|B,-e)P(+j|-a)P(+m|-a)
-
if we have 100 variables in the BNs , 50 of them are evidence variables, means 50 of them not evidence variables that will look at 2⁵⁰ entries
- it gets very expensive, exponential in a number of non-evidence variables , unless almost everything is evidence
-
It's not ideal to do it this way
- Why is inference by enumeration so slow?
- Idea: interleave joining and marginalizing!
- Called "Variable Elimination"
- Still NP-hard, but usually much faster than inference by enumeration
- we'll find a way to interleave joining CPTs together and summing out over hidden variables.
- keep in mind it's not a silver bullet. Inference in BNs is np-hard. There are BNs where no matter what you do to compute the answer to the query is equivalent to solving a SAP(storage assignment problem) problem which is known that nobody has a efficient solution for.
- First we’ll need some new notation: factors
- Joint distribution: P(X,Y)
- Entries P(x,y) for all x, y
- Sums to 1
Example: P (T,W)
T | W | P |
---|---|---|
hot | sun | 0.4 |
hot | rain | 0.1 |
cold | sun | 0.2 |
cold | rain | 0.3 |
- Selected joint: P(x,Y)
- A slice of the joint distribution
- Entries P(x,y) for fixed x, all y
- Sums to P(x)
- factors don't have to sum to 1
Example: P( cold , W)
T | W | P |
---|---|---|
cold | sun | 0.2 |
cold | rain | 0.3 |
- Number of capitals = dimensionality of the table
- So as we work in this variable elimination process , the game will be one of trying to keep the number of capitialized variables small in our factor.
Example: P(W|cold)
T | W | P |
---|---|---|
cold | sun | 0.4 |
cold | rain | 0.6 |
Example : P(W|T)
T | W | P |
---|---|---|
hot | sun | 0.8 |
hot | rain | 0.2 |
cold | sun | 0.4 |
cold | rain | 0.6 |
- Specified family: P( y | X )
- Entries P(y | x) for fixed y, but for all x
- Sums to … who knows!
Example: P(rain | T)
T | W | P |
---|---|---|
hot | rain | 0.2 |
cold | rain | 0.6 |
- In general, when we write P(Y₁ … YN | X₁ … XM)
- It is a “factor,” a multi-dimensional array
- Its values are P(y₁ … yN | x₁ … xM)
- Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array
- R → T → L
- Random Variables
- R: Raining
- T: Traffic
- L: Late for class!
P(R)
+r | 0.1 |
---|---|
-r | 0.9 |
P(T|R)
+r | +t | 0.8 |
---|---|---|
+r | -t | 0.2 |
-r | +t | 0.1 |
-r | -t | 0.9 |
P(L|T)
+t | +l | 0.3 |
---|---|---|
+t | -l | 0.7 |
-t | +l | 0.1 |
-t | -l | 0.9 |
- query: P(L) = ?
- for inference enumertation:
- P(L) = ∑r,t P(r,t,L)
- = ∑r,t P(r)P(t|r)P(L|t)
- Track objects called factors
- Initial factors are local CPTs (one per node)
- Any known values are selected
- Procedure: Join all factors, then eliminate all hidden variables
- First basic operation: joining factors
- Combining factors:
- Just like a database join
- Get all factors over the joining variable
- Build a new factor over the union of the variables involved
- Example: Join on R
- Computation for each entry: pointwise products
- ∀r,t : P(r,t) = P(r)·P(t|r)
- we call "Join on R" means that you grab all tables that have
R
in them.
- Second basic operation: marginalization
- Take a factor and sum out a variable
- Shrinks a factor to a smaller one
- A projection operation
- get rid of the variables that don't matter -- the hidden variables
- why do we even have hidden variables ?
- the reason we have it because we started with a joint distribution that was over more than the variables that appear in our query.
- Example:
- Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)
- Marginalizing Early (= Variable Elimination)
- intuition
- if you want to eliminate a variable , you can not do this until you have joined on that variable.
- so that is variable elimination .
- what if your evidence ?
- just like with the inference by enumeration , when there is evidence you just look at your tabels and you only retain those entries consistent with your evidence.
- If evidence, start with factors that select that evidence
- We eliminate all vars other than query + evidence
- Result will be a selected joint of query and evidence
- To get our answer, just normalize this!
- That's it!
- Query: P( Q| E₁=e₁,...,Ek=ek )
- Start with initial factors:
- Local CPTs (but instantiated by evidence)
- While there are still hidden variables (not Q or evidence):
- Pick a hidden variable H
- any ordering of hidden variables is valid
- but some orderings will lead to very big factors being generated along the way
- and some orderings might be able to keep the factors generated along the way very small
- Join all factors mentioning H
- Eliminate (sum out) H
- Pick a hidden variable H
- Join all remaining factors and normalize
- P(B|j,m) ∝ P(B,j,m)
- after introducing evidence , we have following factors:
- P(B) , P(E) , P(A|B,E) , P(j|A) , P(m|A)
- Choose A
- Choose E
- Finish with B
- equations from top to bottom
- marginal can be obtained from joint by summing out
- use Bayes' net joint distribution expression
- use
x*(y+z) = xy + xz
- joining on a , and then summing out give f₁
- use
x*(y+z) = xy + xz
- joining on e , and then summing out give f₂
- All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency !
- how do you decide which variables to pick first ?
- suggestion here was a variable with very few connections . Connections means that it is participating in a factor.
- Query: P(X₃|Y₁=y₁,Y₂=y₂,Y₃=y₃)
- Start by inserting evidence , which gives the following initial factors:
- p(Z),p(X₁|Z),p(X₂|Z),p(X₃|Z),p(y₁|X₁),p(y₂|X₂),p(y₃|X₃)
- Eliminate X₁, this introduce the factor f₁(Z,y₁) = ∑ₓ₁ p(x₁|Z)p(y₁|x₁) , and we are left with
- p(Z),f₁(Z,y₁),p(X₂|Z),p(X₃|Z),p(y₂|X₂),p(y₃|X₃)
- Eliminate X₂, this introduce the factor f₂(Z,y₂) = ∑ₓ₂ p(x₂|Z)p(y₂|x₂) , and we are left with
- p(Z),f₁(Z,y₁),f₂(Z,y₂),p(X₃|Z),p(y₃|X₃)
- Eliminate Z, this introduces the factor f₃(y₁,y₂,X₃) = ∑z p(z)f₁(Z,y₁)f₂(Z,y₂)p(X₃|z) , and we are left:
- p(y₃|X₃),f₃(y₁,y₂,X₃)
- quiz: why there is a factor p(X₃|z) ? why not the other ones , like p(X₃,z) or P(Z,y) ?
- X₃ is the query variable, the query variable doesn't go through the elimination process, the same way as a hidden variable
- No hidden variables left. Join the remaining factors to get
- f₄(y₁,y₂,y₃, X₃) = p(y₃|X₃)·f₃(y₁,y₂,X₃)
- Normalizing over X₃ gives P(X₃|y₁,y₂,y₃)
- For the query P(Xn|y₁,…,yn) work through the following two different orderings as done in previous slide: Z, X₁, …, Xn-1 and X₁, …, Xn-1, Z. What is the size of the maximum factor generated for each of the orderings?
- Answer: 2ⁿ⁺¹ versus 2² (assuming binary)
- In general: the ordering can greatly affect efficiency.
- start from X₁ , we get f₁(Z,y₁) , then X₂, we get f₂(Z,y₂) , then ... fn-1(Z,yn-1)
- not Xn , Xn is query variable
- now we eliminate Z , we get f( y₁ , ... , yn-1 , ... Xn ) , also only 2 variables
- The computational and space complexity of variable elimination is determined by the largest factor
- The elimination ordering can greatly affect the size of the largest factor.
- E.g., previous slide’s example 2ⁿ vs. 2
- Does there always exist an ordering that only results in small factors?
- No!
- There are BNs and we'll see one in next slides where no matter which ordering you pick it's going to generate large factors along the way.
- CPS
- a 3-Sat problem, a special kind of CSP
- there are 7 variables , X₁ ... X₇, I want to find an assignment to these 7 variables , such that this clause is true
- the clause is saying : x₁ or x₂ or not x₃ has to be true , and not x₁ or x₃ or not x₄ has to be true , and ... so forth
- P(Xᵢ=0) = P(Xᵢ=1) = 0.5
- Y₁ = X₁ v X₂ v ¬V₃ , ... , Y₈ = ¬X₅ v X₆ v V₇
- Y₁‚₂ = Y₁ ∧ Y₂ , ... , Y₇‚₈ = Y₇ ∧ Y₈
- Y₁‚₂‚₃‚₄ = Y₁‚₂ ∧ Y₃‚₄ , Y₅‚₆‚₇‚₈ = Y₅‚₆ ∧ Y₇‚₈
- Z = Y₁‚₂‚₃‚₄ ∧ Y₅‚₆‚₇‚₈
- why we use so many Y variables here ?
- BNs where variable has many parents is a large BNs , it's not very compact
- those Y variables gives us a BNs where every variables has at most 3 parents. So it's a very small BNs.
- If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.
- Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.
There are atrist special graph structures of BNs , where inference can be done efficiently. One example is Ploytree.
- A polytree is a directed graph with no undirected cycles
- so directed acyclic graph can have undirected cycles.
- but if you don't allow those undirected cycles, you have a poly tree.
- For poly-trees you can always find an ordering that is efficient
- Try it!!
- Cut-set conditioning for Bayes’ net inference
- Choose set of variables such that if removed only a polytree remains
- Exercise: Think about how the specifics would work out!
-
BNs
-
Query: P(C |e=1)
-
we have the following probability tables
-
Step 1: eliminate A
- so we get the factor A→B : f₁(B) = ∑ₐ P(a)P(B|a)
B | A | P(A,B) |
---|---|---|
0 | 0 | 0.35 |
1 | 0 | 0.35 |
0 | 1 | 0.27 |
1 | 1 | 0.03 |
sum out A, we get
B | f₁(B) |
---|---|
0 | 0.62 |
1 | 0.38 |
- Step 2: eliminate B.
- so we get the factor C←B→D : f₂(C,D) = ∑b P(C|b)P(D|b)f₁(b)
f₁(B) | C | D | P(f₁(B) ,C,D) |
---|---|---|---|
0 | 0 | 0 | 0.155 |
0 | 1 | 0 | 0.155 |
0 | 0 | 1 | 0.155 |
0 | 1 | 1 | 0.155 |
1 | 0 | 0 | 0.0076 |
1 | 1 | 0 | 0.0684 |
1 | 0 | 1 | 0.0304 |
1 | 1 | 1 | 0.2736 |
sum out B, we get
C | D | f₂(C,D) |
---|---|---|
0 | 0 | 0.1626 |
1 | 0 | 0.2234 |
0 | 1 | 0.1854 |
1 | 1 | 0.4286 |
- Step 3: eliminate D.
- so we get factor C→E←D: f₃(C,e=1) = ∑d P(e=1|C,d)f₂(C,d)
C | D | e=1 | P( f₂(C,D) , e=1 ) |
---|---|---|---|
0 | 0 | 1 | 0.900*0.1626 |
1 | 0 | 1 | 0.600*0.2234 |
0 | 1 | 1 | 0.700*0.1854 |
1 | 1 | 1 | 0.500*0.4286 |
sum out D , we get
C | f₃(C,e=1) |
---|---|
0 | 0.27612 |
1 | 0.34834 |
- After getting the final factor f₃(C,e=1), a final renormalization step needs be carried out to obtain the conditional probability P(C|e=1).
C | P(C|e=1) |
---|---|
0 | 0.44217404 |
1 | 0.55782596 |
-
BNs
-
Query : P(A|+f)
-
we run variable elimination with the following ordering: E,D,C,B.
-
After introducing evidence, we have the following factors:
- P(A) , P(B|A), P(C|A,B), P(D|C), P(E|C), P(+f|E,D)
-
Step 1:
- After joining on E and summing out over E, we have generated a new factor f₁ over the following variables and/or evidence
- C,D,+f
- factors contain variable E are P(E|C), P(+f|E,D)
- which contain variables: C,D,E,+f,
- and note E is not included after summing out over
- C,D,+f
- After this step, the remaining factors are:
- P(A) , P(B|A), P(C|A,B), P(D|C), f₁
- P(E|C), P(+f|E,D) which contain variable E does not availabel any more
- and new factor f₁ comes in
- P(A) , P(B|A), P(C|A,B), P(D|C), f₁
- After joining on E and summing out over E, we have generated a new factor f₁ over the following variables and/or evidence
-
Step 2:
- After joining on D and summing out over D, we have generated a new factor f₂ over the following variables and/or evidence
- C,+f
- After this step, the remaining factors are:
- P(A) , P(B|A), P(C|A,B), f₂
- P(D|C) not available now
- f₁ was comsumed
- f₂ comes in
- P(A) , P(B|A), P(C|A,B), f₂
- After joining on D and summing out over D, we have generated a new factor f₂ over the following variables and/or evidence
-
Step 3:
- After joining on C and summing out over C, we have generated a new factor f₃ over the following variables and/or evidence
- A,B,+f
- After this step, the remaining factors are:
- P(A) , P(B|A), f₃
- After joining on C and summing out over C, we have generated a new factor f₃ over the following variables and/or evidence
-
Step 4: After joining on B and summing out over
- A,+f
- P(A) , f₄