Skip to content

Latest commit

 

History

History
649 lines (456 loc) · 23.3 KB

AI_CS188_BNs_Inference.md

File metadata and controls

649 lines (456 loc) · 23.3 KB

...menustart

...menuend

Bayes' Nets: Inference

Recap: Example: Alarm Network

Inference

  • Inference: calculating some useful quantity from a joint probability distribution
  • Examples:
    • Posterior probability
      • P( Q| E₁=e₁,...,Ek=ek )
      • computing the posterior probability of some set of query variables conditioned on some evidence having been observed
    • Most likely explanation
      • argmaxq P( Q=q | E₁=e₁...)
      • given some evidence has bee observed, what's the most likely association of some query variables.

Inference by Enumeration

  • the first type of inference is the naive type of inference is inference by enumeration
  • we have some query , and the query has
    • some evidence variables E₁...Ek ,
    • some query variables Q , the ones which we want to distribution given the evidence variables.
    • some hidden variables. H₁...Hr , the ones that are not query variables , not query variables, but yet still are in our joint distribution.
      • we have to deal with them, you will have to sum them out effectively to get rid of them.
  • see details in Probability

Inference by Enumeration in Bayes’ Net

  • Given unlimited time, inference in BNs is easy

  • Reminder of inference by enumeration by example:

    • P(B|+j,+m) ∝B P(B,+j,+m)
    • = ∑e,a P(B,e,a,+j,+m)
    • now we can use the definition if the joint distribution as its specified through the Bayes net :
    • = ∑e,a P(B)P(e)P(a|B,e)P(+j|a)P(+m|a)
    • = P(B)P(+e)P(+a|B,+e)P(+j|+a)P(+m|+a) + P(B)P(+e)P(-a|B,+e)P(+j|-a)P(+m|-a) + P(B)P(-e)P(+a|B,-e)P(+j|+a)P(+m|+a) + P(B)P(-e)P(-a|B,-e)P(+j|-a)P(+m|-a)
  • if we have 100 variables in the BNs , 50 of them are evidence variables, means 50 of them not evidence variables that will look at 2⁵⁰ entries

    • it gets very expensive, exponential in a number of non-evidence variables , unless almost everything is evidence
  • It's not ideal to do it this way

Inference by Enumeration vs. Variable Elimination

  • Why is inference by enumeration so slow?
    • You join up the whole joint distribution before you sum out the hidden variables
  • Idea: interleave joining and marginalizing!
    • Called "Variable Elimination"
    • Still NP-hard, but usually much faster than inference by enumeration
    • we'll find a way to interleave joining CPTs together and summing out over hidden variables.
    • keep in mind it's not a silver bullet. Inference in BNs is np-hard. There are BNs where no matter what you do to compute the answer to the query is equivalent to solving a SAP(storage assignment problem) problem which is known that nobody has a efficient solution for.
    • First we’ll need some new notation: factors

Factor Zoo

Factor Zoo I

  • Joint distribution: P(X,Y)
    • Entries P(x,y) for all x, y
    • Sums to 1

Example: P (T,W)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
  • Selected joint: P(x,Y)
    • A slice of the joint distribution
    • Entries P(x,y) for fixed x, all y
    • Sums to P(x)
      • factors don't have to sum to 1

Example: P( cold , W)

T W P
cold sun 0.2
cold rain 0.3
  • Number of capitals = dimensionality of the table
  • So as we work in this variable elimination process , the game will be one of trying to keep the number of capitialized variables small in our factor.

Factor Zoo II

  • Single conditional: P(Y | x)
    • Entries P(y | x) for fixed x, all y
    • Sums to 1

Example: P(W|cold)

T W P
cold sun 0.4
cold rain 0.6
  • Family of conditionals: P(X |Y)
    • Multiple conditionals
    • Entries P(x | y) for all x, y
    • Sums to |Y|

Example : P(W|T)

T W P
hot sun 0.8
hot rain 0.2
cold sun 0.4
cold rain 0.6

Factor Zoo III

  • Specified family: P( y | X )
    • Entries P(y | x) for fixed y, but for all x
    • Sums to … who knows!

Example: P(rain | T)

T W P
hot rain 0.2
cold rain 0.6

Factor Zoo Summary

  • In general, when we write P(Y₁ … YN | X₁ … XM)
    • It is a “factor,” a multi-dimensional array
    • Its values are P(y₁ … yN | x₁ … xM)
    • Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array

Example : Traffic Domain

  • R → T → L
  • Random Variables
    • R: Raining
    • T: Traffic
    • L: Late for class!

P(R)

+r 0.1
-r 0.9

P(T|R)

+r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9

P(L|T)

+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9
  • query: P(L) = ?
    • for inference enumertation:
    • P(L) = ∑r,t P(r,t,L)
    • = ∑r,t P(r)P(t|r)P(L|t)

Inference by Enumeration: Procedural Outline

  • Track objects called factors
  • Initial factors are local CPTs (one per node)
  • Any known values are selected
    • E.g. if we know L = +l , the initial factors are
  • Procedure: Join all factors, then eliminate all hidden variables

Operation 1: Join Factors

  • First basic operation: joining factors
  • Combining factors:
    • Just like a database join
    • Get all factors over the joining variable
    • Build a new factor over the union of the variables involved
  • Example: Join on R
  • Computation for each entry: pointwise products
    • r,t : P(r,t) = P(r)·P(t|r)

Example: Multiple Joins

  • we call "Join on R" means that you grab all tables that have R in them.

Operation 2: Eliminate

  • Second basic operation: marginalization
  • Take a factor and sum out a variable
    • Shrinks a factor to a smaller one
    • A projection operation
      • get rid of the variables that don't matter -- the hidden variables
      • why do we even have hidden variables ?
        • the reason we have it because we started with a joint distribution that was over more than the variables that appear in our query.
  • Example:

Multiple Elimination


  • Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)
  • Marginalizing Early (= Variable Elimination)
    • switch the some order of join/eliminate
  • intuition
    • if you want to eliminate a variable , you can not do this until you have joined on that variable.

Example: Traffic Domain again

  • R → T → L
  • P(L) = ?
    • Inference by Enumeration
    • Variable Elimination

Marginalizing Early! (aka VE)

  • so that is variable elimination .
  • what if your evidence ?
    • just like with the inference by enumeration , when there is evidence you just look at your tabels and you only retain those entries consistent with your evidence.

Evidence

  • If evidence, start with factors that select that evidence
    • No evidence looks like this : uses these initial factors
    • with evidence looks like this: computing P(L|+r) the initial factors become
  • We eliminate all vars other than query + evidence
  • Result will be a selected joint of query and evidence
    • E.g. for P(L | +r), we would end up with:
  • To get our answer, just normalize this!
  • That's it!

General Variable Elimination

  • Query: P( Q| E₁=e₁,...,Ek=ek )
  • Start with initial factors:
    • Local CPTs (but instantiated by evidence)
  • While there are still hidden variables (not Q or evidence):
    • Pick a hidden variable H
      • any ordering of hidden variables is valid
      • but some orderings will lead to very big factors being generated along the way
      • and some orderings might be able to keep the factors generated along the way very small
    • Join all factors mentioning H
    • Eliminate (sum out) H
  • Join all remaining factors and normalize

Example

  • P(B|j,m) ∝ P(B,j,m)
  • after introducing evidence , we have following factors:
    • P(B) , P(E) , P(A|B,E) , P(j|A) , P(m|A)

  • Choose A
    • the size of the tables generated along the way is 2³, in this case it wasn't too bad , it's possible to handle. but if you have a very large BNs you need to be careful about your orderingto make sure you keep it low.
    • P(B) , P(E) , P(j,m|B,E)
  • Choose E
    • P(B) , P(j,m|B)
  • Finish with B

Same Example in Equations

  • equations from top to bottom
    1. marginal can be obtained from joint by summing out
    2. use Bayes' net joint distribution expression
    3. use x*(y+z) = xy + xz
    4. joining on a , and then summing out give f₁
    5. use x*(y+z) = xy + xz
    6. joining on e , and then summing out give f₂
  • All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency !
  • how do you decide which variables to pick first ?
    • suggestion here was a variable with very few connections . Connections means that it is participating in a factor.

Another Variable Elimination Example

  • Query: P(X₃|Y₁=y₁,Y₂=y₂,Y₃=y₃)
  • Start by inserting evidence , which gives the following initial factors:
    • p(Z),p(X₁|Z),p(X₂|Z),p(X₃|Z),p(y₁|X₁),p(y₂|X₂),p(y₃|X₃)
  • Eliminate X₁, this introduce the factor f₁(Z,y₁) = ∑ₓ₁ p(x₁|Z)p(y₁|x₁) , and we are left with
    • p(Z),f₁(Z,y₁),p(X₂|Z),p(X₃|Z),p(y₂|X₂),p(y₃|X₃)
  • Eliminate X₂, this introduce the factor f₂(Z,y₂) = ∑ₓ₂ p(x₂|Z)p(y₂|x₂) , and we are left with
    • p(Z),f₁(Z,y₁),f₂(Z,y₂),p(X₃|Z),p(y₃|X₃)
  • Eliminate Z, this introduces the factor f₃(y₁,y₂,X₃) = ∑z p(z)f₁(Z,y₁)f₂(Z,y₂)p(X₃|z) , and we are left:
    • p(y₃|X₃),f₃(y₁,y₂,X₃)
    • quiz: why there is a factor p(X₃|z) ? why not the other ones , like p(X₃,z) or P(Z,y) ?
      • X₃ is the query variable, the query variable doesn't go through the elimination process, the same way as a hidden variable
  • No hidden variables left. Join the remaining factors to get
    • f₄(y₁,y₂,y₃, X₃) = p(y₃|X₃)·f₃(y₁,y₂,X₃)
  • Normalizing over X₃ gives P(X₃|y₁,y₂,y₃)

Variable Elimination Ordering

  • For the query P(Xn|y₁,…,yn) work through the following two different orderings as done in previous slide: Z, X₁, …, Xn-1 and X₁, …, Xn-1, Z. What is the size of the maximum factor generated for each of the orderings?
  • Answer: 2ⁿ⁺¹ versus 2² (assuming binary)
  • In general: the ordering can greatly affect efficiency.

  • start from X₁ , we get f₁(Z,y₁) , then X₂, we get f₂(Z,y₂) , then ... fn-1(Z,yn-1)
    • not Xn , Xn is query variable
  • now we eliminate Z , we get f( y₁ , ... , yn-1 , ... Xn ) , also only 2 variables

VE: Computational and Space Complexity

  • The computational and space complexity of variable elimination is determined by the largest factor
  • The elimination ordering can greatly affect the size of the largest factor.
    • E.g., previous slide’s example 2ⁿ vs. 2
  • Does there always exist an ordering that only results in small factors?
    • No!
    • There are BNs and we'll see one in next slides where no matter which ordering you pick it's going to generate large factors along the way.

Worst Case Complexity?

  • CPS
    • a 3-Sat problem, a special kind of CSP
    • there are 7 variables , X₁ ... X₇, I want to find an assignment to these 7 variables , such that this clause is true
      • the clause is saying : x₁ or x₂ or not x₃ has to be true , and not x₁ or x₃ or not x₄ has to be true , and ... so forth
    • P(Xᵢ=0) = P(Xᵢ=1) = 0.5
    • Y₁ = X₁ v X₂ v ¬V₃ , ... , Y₈ = ¬X₅ v X₆ v V₇
    • Y₁‚₂ = Y₁ ∧ Y₂ , ... , Y₇‚₈ = Y₇ ∧ Y₈
    • Y₁‚₂‚₃‚₄ = Y₁‚₂ ∧ Y₃‚₄ , Y₅‚₆‚₇‚₈ = Y₅‚₆ ∧ Y₇‚₈
    • Z = Y₁‚₂‚₃‚₄ ∧ Y₅‚₆‚₇‚₈
  • why we use so many Y variables here ?
    • BNs where variable has many parents is a large BNs , it's not very compact
    • those Y variables gives us a BNs where every variables has at most 3 parents. So it's a very small BNs.
  • If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.
  • Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general.

Polytrees

There are atrist special graph structures of BNs , where inference can be done efficiently. One example is Ploytree.

  • A polytree is a directed graph with no undirected cycles
    • so directed acyclic graph can have undirected cycles.
    • but if you don't allow those undirected cycles, you have a poly tree.
  • For poly-trees you can always find an ordering that is efficient
    • Try it!!
  • Cut-set conditioning for Bayes’ net inference
    • Choose set of variables such that if removed only a polytree remains
    • Exercise: Think about how the specifics would work out!

Quiz BN2-2

  • BNs

  • Query: P(C |e=1)

  • we have the following probability tables

  • Step 1: eliminate A

    • so we get the factor A→B : f₁(B) = ∑ₐ P(a)P(B|a)
B A P(A,B)
0 0 0.35
1 0 0.35
0 1 0.27
1 1 0.03

sum out A, we get

B f₁(B)
0 0.62
1 0.38
  • Step 2: eliminate B.
    • so we get the factor C←B→D : f₂(C,D) = ∑b P(C|b)P(D|b)f₁(b)
f₁(B) C D P(f₁(B) ,C,D)
0 0 0 0.155
0 1 0 0.155
0 0 1 0.155
0 1 1 0.155
1 0 0 0.0076
1 1 0 0.0684
1 0 1 0.0304
1 1 1 0.2736

sum out B, we get

C D f₂(C,D)
0 0 0.1626
1 0 0.2234
0 1 0.1854
1 1 0.4286
  • Step 3: eliminate D.
    • so we get factor C→E←D: f₃(C,e=1) = ∑d P(e=1|C,d)f₂(C,d)
C D e=1 P( f₂(C,D) , e=1 )
0 0 1 0.900*0.1626
1 0 1 0.600*0.2234
0 1 1 0.700*0.1854
1 1 1 0.500*0.4286

sum out D , we get

C f₃(C,e=1)
0 0.27612
1 0.34834
  • After getting the final factor f₃(C,e=1), a final renormalization step needs be carried out to obtain the conditional probability P(C|e=1).
C P(C|e=1)
0 0.44217404
1 0.55782596

Quiz BN2-3

  • BNs

  • Query : P(A|+f)

  • we run variable elimination with the following ordering: E,D,C,B.

  • After introducing evidence, we have the following factors:

    • P(A) , P(B|A), P(C|A,B), P(D|C), P(E|C), P(+f|E,D)
  • Step 1:

    • After joining on E and summing out over E, we have generated a new factor f₁ over the following variables and/or evidence
      • C,D,+f
        • factors contain variable E are P(E|C), P(+f|E,D)
        • which contain variables: C,D,E,+f,
        • and note E is not included after summing out over
    • After this step, the remaining factors are:
      • P(A) , P(B|A), P(C|A,B), P(D|C), f₁
        • P(E|C), P(+f|E,D) which contain variable E does not availabel any more
        • and new factor f₁ comes in
  • Step 2:

    • After joining on D and summing out over D, we have generated a new factor f₂ over the following variables and/or evidence
      • C,+f
    • After this step, the remaining factors are:
      • P(A) , P(B|A), P(C|A,B), f₂
        • P(D|C) not available now
        • f₁ was comsumed
        • f₂ comes in
  • Step 3:

    • After joining on C and summing out over C, we have generated a new factor f₃ over the following variables and/or evidence
      • A,B,+f
    • After this step, the remaining factors are:
      • P(A) , P(B|A), f₃
  • Step 4: After joining on B and summing out over

    • A,+f
    • P(A) , f₄