forked from bayesnet/bnt
-
Notifications
You must be signed in to change notification settings - Fork 0
/
usage_dbn_02nov13.html
715 lines (637 loc) · 22.7 KB
/
usage_dbn_02nov13.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
<HEAD>
<TITLE>How to use BNT for DBNs</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<!-- white background is better for the pictures and equations -->
Documentation last updated on 13 November 2002
<h1>How to use BNT for DBNs</h1>
<p>
<ul>
<li> <a href="#spec">Model specification</a>
<ul>
<li> <a href="#hmm">HMMs</a>
<li> <a href="#lds">Kalman filters</a>
<li> <a href="#chmm">Coupled HMMs</a>
<li> <a href="#water">Water network</a>
<li> <a href="#bat">BAT network</a>
</ul>
<li> <a href="#inf">Inference</a>
<ul>
<li> <a href="#discrete">Discrete hidden nodes</a>
<li> <a href="#cts">Continuous hidden nodes</a>
</ul>
<li> <a href="#learn">Learning</a>
<ul>
<li> <a href="#param_learn">Parameter learning</a>
<li> <a href="#struct_learn">Structure learning</a>
</ul>
</ul>
Note:
you are recommended to read an introduction
to DBNs first, such as
<a href="http://www.ai.mit.edu/~murphyk/Papers/dbnchapter.pdf">
this book chapter</a>.
<h1><a name="spec">Model specification</h1>
<!--<h1><a name="dbn_intro">Dynamic Bayesian Networks (DBNs)</h1>-->
Dynamic Bayesian Networks (DBNs) are directed graphical models of stochastic
processes.
They generalise <a href="#hmm">hidden Markov models (HMMs)</a>
and <a href="#lds">linear dynamical systems (LDSs)</a>
by representing the hidden (and observed) state in terms of state
variables, which can have complex interdependencies.
The graphical structure provides an easy way to specify these
conditional independencies, and hence to provide a compact
parameterization of the model.
<p>
Note that "temporal Bayesian network" would be a better name than
"dynamic Bayesian network", since
it is assumed that the model structure does not change, but
the term DBN has become entrenched.
We also normally assume that the parameters do not
change, i.e., the model is time-invariant.
However, we can always add extra
hidden nodes to represent the current "regime", thereby creating
mixtures of models to capture periodic non-stationarities.
<p>
There are some cases where the size of the state space can change over
time, e.g., tracking a variable, but unknown, number of objects.
In this case, we need to change the model structure over time.
BNT does not support this.
<!--
, but see the following paper for a
discussion of some of the issues:
<ul>
<li> <a href="ftp://ftp.cs.monash.edu.au/pub/annn/smc.ps">
Dynamic belief networks for discrete monitoring</a>,
A. E. Nicholson and J. M. Brady.
IEEE Systems, Man and Cybernetics, 24(11):1593-1610, 1994.
</ul>
-->
<h2><a name="hmm">Hidden Markov Models (HMMs)</h2>
The simplest kind of DBN is a Hidden Markov Model (HMM), which has
one discrete hidden node and one discrete or continuous
observed node per slice. We illustrate this below.
As before, circles denote continuous nodes, squares denote
discrete nodes, clear means hidden, shaded means observed.
<!--
(The observed nodes can be
discrete or continuous; the crucial thing about an HMM is that the
hidden nodes are discrete, so the system can model arbitrary dynamics
-- providing, of course, that the hidden state space is large enough.)
-->
<p>
<img src="Figures/hmm3.gif">
<p>
We have "unrolled" the model for three "time slices" -- the structure and parameters are
assumed to repeat as the model is unrolled further.
Hence to specify a DBN, we need to
define the intra-slice topology (within a slice),
the inter-slice topology (between two slices),
as well as the parameters for the first two slices.
(Such a two-slice temporal Bayes net is often called a 2TBN.)
<p>
We can specify the topology as follows.
<PRE>
intra = zeros(2);
intra(1,2) = 1; % node 1 in slice t connects to node 2 in slice t
inter = zeros(2);
inter(1,1) = 1; % node 1 in slice t-1 connects to node 1 in slice t
</pre>
We can specify the parameters as follows,
where for simplicity we assume the observed node is discrete.
<pre>
Q = 2; % num hidden states
O = 2; % num observable symbols
ns = [Q O];
dnodes = 1:2;
bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes);
for i=1:4
bnet.CPD{i} = tabular_CPD(bnet, i);
end
</pre>
<p>
We assume the distributions P(X(t) | X(t-1)) and
P(Y(t) | X(t)) are independent of t for t > 1.
Hence the CPD for nodes 5, 7, ... is the same as for node 3, so we say they
are in the same equivalence class, with node 3 being the "representative"
for this class. In other words, we have tied the parameters for nodes
3, 5, 7, ...
Similarly, nodes 4, 6, 8, ... are tied.
Note, however, that (the parameters for) nodes 1 and 2 are not tied to
subsequent slices.
<p>
Above we assumed the observation model P(Y(t) | X(t)) is independent of t for t>1, but
it is conventional to assume this is true for all t.
So we would like to put nodes 2, 4, 6, ... all in the same class.
We can do this by explicitely defining the equivalence classes, as
follows (see <a href="usage.html#tying">here</a> for more details on
parameter tying).
<p>
We define eclass1(i) to be the equivalence class that node i in slice
1 belongs to.
Similarly, we define eclass2(i) to be the equivalence class that node i in slice
2, 3, ..., belongs to.
For an HMM, we have
<pre>
eclass1 = [1 2];
eclass2 = [3 2];
eclass = [eclass1 eclass2];
</pre>
This ties the observation model across slices,
since e.g., eclass(4) = eclass(2) = 2.
<p>
By default,
eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the
number of nodes per slice.
<!--This will tie nodes in slices 3, 4, ... to the the nodes in slice 2,
but none of the nodes in slice 2 to any in slice 1.-->
But by using the above tieing pattern,
we now only have 3 CPDs to specify, instead of 4:
<pre>
bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));
bnet.CPD{1} = tabular_CPD(bnet, 1, prior0);
bnet.CPD{2} = tabular_CPD(bnet, 2, obsmat0);
bnet.CPD{3} = tabular_CPD(bnet, 3, transmat0);
</pre>
We discuss how to do <a href="#inf">inference</a> and <a href="#learn">learning</a> on this model
below.
(See also
my <a href="../HMM/hmm.html">HMM toolbox</a>, which is included with BNT.)
<p>
Some common variants on HMMs are shown below.
BNT can handle all of these.
<p>
<center>
<table>
<tr>
<td><img src="Figures/hmm_gauss.gif">
<td><img src="Figures/hmm_mixgauss.gif"
<td><img src="Figures/hmm_ar.gif">
<tr>
<td><img src="Figures/hmm_factorial.gif">
<td><img src="Figures/hmm_coupled.gif"
<td><img src="Figures/hmm_io.gif">
<tr>
</table>
</center>
<h2><a name="lds">Linear Dynamical Systems (LDSs) and Kalman filters</h2>
A Linear Dynamical System (LDS) has the same topology as an HMM, but
all the nodes are assumed to have linear-Gaussian distributions, i.e.,
<pre>
x(t+1) = A*x(t) + w(t), w ~ N(0, Q), x(0) ~ N(init_x, init_V)
y(t) = C*x(t) + v(t), v ~ N(0, R)
</pre>
Some simple variants are shown below.
<p>
<center>
<table>
<tr>
<td><img src="Figures/ar1.gif">
<td><img src="Figures/sar.gif">
<td><img src="Figures/kf.gif">
<td><img src="Figures/skf.gif">
</table>
</center>
<p>
We can create a regular LDS in BNT as follows.
<pre>
intra = zeros(2);
intra(1,2) = 1;
inter = zeros(2);
inter(1,1) = 1;
n = 2;
X = 2; % size of hidden state
Y = 2; % size of observable state
ns = [X Y];
dnodes = [];
onodes = [2];
eclass1 = [1 2];
eclass2 = [3 2];
bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
x0 = rand(X,1);
V0 = eye(X); % must be positive semi definite!
C0 = rand(Y,X);
R0 = eye(Y);
A0 = rand(X,X);
Q0 = eye(X);
bnet.CPD{1} = gaussian_CPD(bnet, 1, 'mean', x0, 'cov', V0, 'cov_prior_weight', 0);
bnet.CPD{2} = gaussian_CPD(bnet, 2, 'mean', zeros(Y,1), 'cov', R0, 'weights', C0, ...
'clamp_mean', 1, 'cov_prior_weight', 0);
bnet.CPD{3} = gaussian_CPD(bnet, 3, 'mean', zeros(X,1), 'cov', Q0, 'weights', A0, ...
'clamp_mean', 1, 'cov_prior_weight', 0);
</pre>
We discuss how to do <a href="#inf">inference</a> and <a href="#learn">learning</a> on this model
below.
(See also
my <a href="../Kalman/kalman.html">Kalman filter toolbox</a>, which is included with BNT.)
<p>
<h2><a name="chmm">Coupled HMMs</h2>
Here is an example of a coupled HMM with N=5 chains, unrolled for T=3
slices. Each hidden discrete node has a private observed Gaussian
child.
<p>
<img src="Figures/chmm5.gif">
<p>
We can make this using the function
<pre>
Q = 2; % binary hidden nodes
discrete_obs = 0; % cts observed nodes
Y = 1; % scalar observed nodes
bnet = mk_chmm(N, Q, Y, discrete_obs);
</pre>
<!--We will use this model <a href="#pred">below</a> to illustrate online prediction.-->
<h2><a name="water">Water network</h2>
Consider the following model
of a water purification plant, developed
by Finn V. Jensen, Uffe Kjærulff, Kristian G. Olesen, and Jan
Pedersen.
<!--
The clear nodes represent the hidden state of the system in
factored form, and the shaded nodes represent the observations in
factored form.
-->
<!--
(Click <a
href="http://www-nt.cs.berkeley.edu/home/nir/public_html/Repository/water.htm">here</a>
for more details on this model.
Following Boyen and Koller, we have added discrete evidence nodes.)
-->
<!--
We have "unrolled" the model for three "time slices" -- the structure and parameters are
assumed to repeat as the model is unrolled further.
Hence to specify a DBN, we need to
define the intra-slice topology (within a slice),
the inter-slice topology (between two slices),
as well as the parameters for the first two slices.
(Such a two-slice temporal Bayes net is often called a 2TBN.)
-->
<p>
<center>
<IMG SRC="Figures/water3_75.gif">
</center>
We now show how to specify this model in BNT.
<pre>
ss = 12; % slice size
intra = zeros(ss);
intra(1,9) = 1;
intra(3,10) = 1;
intra(4,11) = 1;
intra(8,12) = 1;
inter = zeros(ss);
inter(1, [1 3]) = 1; % node 1 in slice 1 connects to nodes 1 and 3 in slice 2
inter(2, [2 3 7]) = 1;
inter(3, [3 4 5]) = 1;
inter(4, [3 4 6]) = 1;
inter(5, [3 5 6]) = 1;
inter(6, [4 5 6]) = 1;
inter(7, [7 8]) = 1;
inter(8, [6 7 8]) = 1;
onodes = 9:12; % observed
dnodes = 1:ss; % discrete
ns = 2*ones(1,ss); % binary nodes
eclass1 = 1:12;
eclass2 = [13:20 9:12];
eclass = [eclass1 eclass2];
bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
for e=1:max(eclass)
bnet.CPD{e} = tabular_CPD(bnet, e);
end
</pre>
We have tied the observation parameters across all slices.
Click <a href="param_tieing.html">here</a> for a more complex example
of parameter tieing.
<!--
Let X(i,t) denote the i'th hidden node in slice t,
and Y(i,y) denote the i'th observed node in slice t.
We also use the notation Nj to refer to the j'th node in the
unrolled network, e.g., N25 = X(1,3), N33 = Y(1,3).
<p>
We assume the distributions P(X(i,t) | X(i,t-1)) and
P(Y(i,t) | X(i,t)) are independent of t for t > 1 and for all i.
Hence the CPD for N25, N37, ... is the same as for N13, so we say they
are in the same equivalence class, with N13 being the "representative"
for this class. In other words, we have tied the parameters for nodes
N13, N25, N37, ...
Note, however, that the parameters for the nodes in the first slice
are not tied, so each equivalence class for nodes 1..12 contains a
single node.
<p>
Above we assumed P(Y(i,t) | X(i,t)) is independent of t for t>1, but
it is conventional to assume this is true for all t.
So we would like to put N9, N21, N33, ... all in the same class, and
similarly for the other observed nodes.
We can do this by explicitely defining the equivalence classes, as
follows.
<p>
We define eclass1(i) to be the equivalence class that node i in slice
1 belongs to.
Similarly, we define eclass2(i) to be the equivalence class that node i in slice
2, 3, ..., belongs to.
For the water model, we have
<pre>
</pre>
This ties the observation model across slices,
since e.g., eclass(9) = eclass(21) = 9, so Y(1,1) and Y(1,2) belong to the
same class.
<p>
By default,
eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the
number of nodes per slice.
This will tie nodes in slices 3, 4, ... to the the nodes in slice 2,
but none of the nodes in slice 2 to any in slice 1.
By using the above tieing pattern,
we now only have 20 CPDs to specify, instead of 24:
<pre>
bnet = mk_dbn(intra, inter, ns, dnodes, eclass1, eclass2);
for e=1:max(eclass)
bnet.CPD{e} = tabular_CPD(bnet, e);
end
</pre>
-->
<h2><a name="bat">BATnet</h2>
As an example of a more complicated DBN, consider the following
example,
which is a model of a car's high level state, as might be used by
an automated car.
(The model is from Forbes, Huang, Kanazawa and Russell, "The BATmobile: Towards a
Bayesian Automated Taxi", IJCAI 95. The figure is from
Boyen and Koller, "Tractable Inference for Complex Stochastic
Processes", UAI98.
For simplicity, we only show the observed nodes for slice 2.)
<p>
<center>
<IMG SRC="Figures/batnet.gif">
</center>
<p>
Since this topology is so complicated,
it is useful to be able to refer to the nodes by name, instead of
number.
<pre>
names = {'LeftClr', 'RightClr', 'LatAct', ... 'Bclr', 'BYdotDiff'};
ss = length(names);
</pre>
We can specify the intra-slice topology using a cell array as follows,
where each row specifies a connection between two named nodes:
<pre>
intrac = {...
'LeftClr', 'LeftClrSens';
'RightClr', 'RightClrSens';
...
'BYdotDiff', 'BcloseFast'};
</pre>
Finally, we can convert this cell array to an adjacency matrix using
the following function:
<pre>
[intra, names] = mk_adj_mat(intrac, names, 1);
</pre>
This function also permutes the names so that they are in topological
order.
Given this ordering of the names, we can make the inter-slice
connectivity matrix as follows:
<pre>
interc = {...
'LeftClr', 'LeftClr';
'LeftClr', 'LatAct';
...
'FBStatus', 'LatAct'};
inter = mk_adj_mat(interc, names, 0);
</pre>
To refer to a node, we must know its number, which can be computed as
in the following example:
<pre>
obs = {'LeftClrSens', 'RightClrSens', 'TurnSignalSens', 'XdotSens', 'YdotSens', 'FYdotDiffSens', ...
'FclrSens', 'BXdotSens', 'BclrSens', 'BYdotDiffSens'};
for i=1:length(obs)
onodes(i) = stringmatch(obs{i}, names);
end
onodes = sort(onodes);
</pre>
(We sort the onodes since most BNT routines assume that set-valued
arguments are in sorted order.)
We can now make the DBN:
<pre>
dnodes = 1:ss;
ns = 2*ones(1,ss); % binary nodes
bnet = mk_dbn(intra, inter, ns, 'iscrete', dnodes);
</pre>
To specify the parameters, we must know the order of the parents.
See the function BNT/general/mk_named_CPT for a way to do this in the
case of tabular nodes. For simplicity, we just generate random
parameters:
<pre>
for i=1:2*ss
bnet.CPD{i} = tabular_CPD(bnet, i);
end
</pre>
A complete version of this example is available in BNT/examples/dynamic/bat1.m.
<h1><a name="inf">Inference</h1>
The general inference problem for DBNs is to compute
P(X(i,t0) | Y(:, t1:t2)), where X(i,t) represents the i'th hidden
variable at time t and Y(:,t1:t2) represents all the evidence
between times t1 and t2.
There are several special cases of interest, illustrated below.
The arrow indicates t0: it is X(t0) that we are trying to estimate.
The shaded region denotes t1:t2, the available data.
<p>
<img src="Figures/filter.gif">
<p>
BNT can currently only handle offline smoothing.
(The HMM engine handles filtering and, to a limited extent, prediction.)
The usage is similar to static
inference engines, except now the evidence is a 2D cell array of
size ss*T, where ss is the number of nodes per slice (ss = slice sizee) and T is the
number of slices.
Also, 'marginal_nodes' takes two arguments, the nodes and the time-slice.
For example, to compute P(X(i,t) | y(:,1:T)), we proceed as follows
(where onodes are the indices of the observedd nodes in each slice,
which correspond to y):
<pre>
ev = sample_dbn(bnet, T);
evidence = cell(ss,T);
evidence(onodes,:) = ev(onodes, :); % all cells besides onodes are empty
[engine, ll] = enter_evidence(engine, evidence);
marg = marginal_nodes(engine, i, t);
</pre>
<h2><a name="discrete">Discrete hidden nodes</h2>
If all the hidden nodes are discrete,
we can use the junction tree algorithm to perform inference.
The simplest approach,
<tt>jtree_unrolled_dbn_inf_engine</tt>,
unrolls the DBN into a static network and applies jtree; however, for
long sequences, this
can be very slow and can result in numerical underflow.
A better approach is to apply the jtree algorithm to pairs of
neighboring slices at a time; this is implemented in
<tt>jtree_dbn_inf_engine</tt>.
<p>
A DBN can be converted to an HMM if all the hidden nodes are discrete.
In this case, you can use
<tt>hmm_inf_engine</tt>. This is faster than jtree for small models
because the constant factors of the algorithm are lower, but can be
exponentially slower for models with many variables
(e.g., > 6 binary hidden nodes).
<p>
The use of both
<tt>jtree_dbn_inf_engine</tt>
and
<tt>hmm_inf_engine</tt>
is deprecated.
A better approach is to construct a smoother engine out of lower-level
engines, which implement forward/backward operators.
You can create these engines as follows.
<pre>
engine = smoother_engine(hmm_2TBN_inf_engine(bnet));
or
engine = smoother_engine(jtree_2TBN_inf_engine(bnet));
</pre>
You then call them in the usual way:
<pre>
engine = enter_evidence(engine, evidence);
m = marginal_nodes(engine, nodes, t);
</pre>
Note: you must declare the observed nodes in the bnet before using
hmm_2TBN_inf_engine.
<p>
Unfortunately, when all the hiddden nodes are discrete,
exact inference takes O(2^n) time, where n is the number of hidden
nodes per slice,
even if the model is sparse.
The basic reason for this is that two nodes become correlated, even if
there is no direct connection between them in the 2TBN,
by virtue of sharing common ancestors in the past.
Hence we need to use approximations.
<p>
A popular approximate inference algorithm for discrete DBNs, known as BK, is described in
<ul>
<li>
<A HREF="http://robotics.Stanford.EDU/~xb/uai98/index.html">
Tractable inference for complex stochastic processes </A>,
Boyen and Koller, UAI 1998
<li>
<A HREF="http://robotics.Stanford.EDU/~xb/nips98/index.html">
Approximate learning of dynamic models</a>, Boyen and Koller, NIPS
1998.
</ul>
This approximates the belief state with a product of
marginals on a specified set of clusters. For example,
in the water network, we might use the following clusters:
<pre>
engine = bk_inf_engine(bnet, { [1 2], [3 4 5 6], [7 8] });
</pre>
This engine can now be used just like the jtree engine.
Two special cases of the BK algorithm are supported: 'ff' (fully
factored) means each node has its own cluster, and 'exact' means there
is 1 cluster that contains the whole slice. These can be created as
follows:
<pre>
engine = bk_inf_engine(bnet, 'ff');
engine = bk_inf_engine(bnet, 'exact');
</pre>
For pedagogical purposes, an implementation of BK-FF that uses an HMM
instead of junction tree is available at
<tt>bk_ff_hmm_inf_engine</tt>.
<h2><a name="cts">Continuous hidden nodes</h2>
If all the hidden nodes are linear-Gaussian, <em>and</em> the observed nodes are
linear-Gaussian,
the model is a <a href="http://www.cs.berkeley.edu/~murphyk/Bayes/kalman.html">
linear dynamical system</a> (LDS).
A DBN can be converted to an LDS if all the hidden nodes are linear-Gaussian
and if they are all persistent. In this case, you can use
<tt>kalman_inf_engine</tt>.
For more general linear-gaussian models, you can use
<tt>jtree_dbn_inf_engine</tt> or <tt>jtree_unrolled_dbn_inf_engine</tt>.
<p>
For nonlinear systems with Gaussian noise, the unscented Kalman filter (UKF),
due to Julier and Uhlmann, is far superior to the well-known extended Kalman
filter (EKF), both in theory and practice.
<!--
See
<A HREF="http://phoebe.robots.ox.ac.uk/default.html">"A General Method for
Approximating Nonlinear Transformations of
Probability Distributions"</A>.
(If the above link is down,
try <a href="http://www.ece.ogi.edu/~ericwan/pubs.html">Eric Wan's</a>
page, who has done a lot of work on the UKF.)
<p>
-->
The key idea of the UKF is that it is easier to estimate a Gaussian distribution
from a set of points than to approximate an arbitrary non-linear
function.
We start with points that are plus/minus sigma away from the mean along
each dimension, and then pipe them through the nonlinearity, and
then fit a Gaussian to the transformed points.
(No need to compute Jacobians, unlike the EKF!)
<p>
For systems with non-Gaussian noise, I recommend
<a href="http://www.cs.berkeley.edu/~jfgf/smc/">Particle
filtering</a> (PF), which is a popular sequential Monte Carlo technique.
<p>
The EKF can be used as a proposal distribution for a PF.
This method is better than either one alone.
See <a href="http://www.cs.berkeley.edu/~jfgf/upf.ps.gz">The Unscented Particle Filter</a>,
by R van der Merwe, A Doucet, JFG de Freitas and E Wan, May 2000.
<a href="http://www.cs.berkeley.edu/~jfgf/software.html">Matlab
software</a> for the UPF is also available.
<p>
Note: none of this software is part of BNT.
<h1><a name="learn">Learning</h1>
Learning in DBNs can be done online or offline.
Currently only offline learning is implemented in BNT.
<h2><a name="param_learn">Parameter learning</h2>
Offline parameter learning is very similar to learning in static networks,
except now the training data is a cell-array of 2D cell-arrays.
For example,
cases{l}{i,t} is the value of node i in slice t in sequence l, or []
if unobserved.
Each sequence can be a different length, and may have missing values
in arbitrary locations.
Here is a typical code fragment for using EM.
<pre>
ncases = 2;
cases = cell(1, ncases);
for i=1:ncases
ev = sample_dbn(bnet, T);
cases{i} = cell(ss,T);
cases{i}(onodes,:) = ev(onodes, :);
end
[bnet2, LLtrace] = learn_params_dbn_em(engine, cases, 'max_iter', 10);
</pre>
If the observed node is vector-valued and stored in an OxT array, you
need to assign each vector to a single cell, as in the following
example.
<pre>
data = [xpos(:)'; ypos(:)'];
ncases = 1;
cases = cell(1, ncases);
onodes = bnet.observed;
for i=1:ncases
cases{i} = cell(ss,T);
cases{i}(onodes,:) = num2cell(data(:,1:T), 1);
end
</pre>
<p>
For a complete code listing of how to do EM in a simple DBN, click
<a href="dbn_hmm_demo.m">here</a>.
<h2><a name="struct_learn">Structure learning</h2>
There is currently only one structure learning algorithm for DBNs.
This assumes all nodes are tabular and observed, and that there are
no intra-slice connections. Hence we can find the optimal set of
parents for each node separately, without worrying about directed
cycles or node orderings.
The function is called as follows
<pre>
inter = learn_struct_dbn_reveal(cases, ns, max_fan_in, penalty)
</pre>
A full example is given in BNT/examples/dynamic/reveal1.m.
Setting the penalty term to 0 gives the maximum likelihood model; this
is equivalent to maximizing the mutual information between parents and
child (in the bioinformatics community, this is known as the REVEAL
algorithm). A non-zero penalty invokes the BIC criterion, which
lessens the chance of overfitting.
<p>
<a href="http://www.bioss.sari.ac.uk/~dirk/software/DBmcmc/">
Dirk Husmeier has extended MCMC model selection to DBNs</a>.
</BODY>