Professional Documents
Culture Documents
1. Introduction to Bayesian AI (20 min) AI99, Sydney 6 December 1999 Ann E. Nicholson and Kevin B. Korb School of Computer Science and Software Engineering Monash University, Clayton, VIC 3168 AUSTRALIA 2. Bayesian networks (50 min) Break (10 min) 3. Applications (50 min) Break (10 min) 4. Learning Bayesian networks (50 min) 5. Current research issues (10 min) 6. Bayesian Net Lab (60 min: Optional) 7. Dinner (Optional)
fannn,korbg@csse.monash.edu.au
Bayesian AI Tutorial
Bayesian AI Tutorial
Introduction to Bayesian AI
Reasoning under uncertainty Probabilities Alternative formalisms Fuzzy logic MYCINs certainty factors Default Logic Bayesian philosophy Dutch book arguments Bayes Theorem Conditionalization Conrmation theory Bayesian decision theory Towards a Bayesian AI
Bayesian AI Tutorial
P (U ) = 1
8X U P (X ) 0 8X; Y U if X \ Y = ; then P (X ^ Y ) = P (X ) + P (Y ) q Y i P (X jY ) = P (X )
P (X ^Y ) P (Y )
Bayesian AI Tutorial
Bayesian AI Tutorial
Bayesian AI Tutorial
Probability Theory
So, why not use probability theory to represent uncertainty? Thats what it was invented for. . . dealing with physical randomness and degrees of ignorance. Furthermore, if you make bets which violate probability theory, you are subject to Dutch books: A Dutch book is a sequence of fair bets which collectively guarantee a loss. Fair bets are bets based upon the standard odds-probability relation:
A Dutch Book
Payoff table on a bet for h (Odds = p=1 p; S = betting unit) h T F Payoff $(1-p) -$p S S
Given a fair bet, the expected value from such a payoff is always $0. Now, lets violate the probability axioms. Example Say, P (A) =
:A
T F
11
12
P 0 (h) = P (hje)
Bayesian AI Tutorial
Bayesian AI Tutorial
Bayesian AI
A Bayesian conception of an AI is: An autonomous agent which Has a utility structure (preferences) Can learn about its world and the relation between its actions and future states (probabilities) Maximizes its expected utility The techniques used in learning about the world are (primarily) statistical. . . Hence Bayesian data mining
Bayesian AI Tutorial
Bayesian AI Tutorial
15
16
Bayesian Networks
Data Structure which represents the dependence between variables; Gives concise specication of the joint probability distribution. A Bayesian Network is a graph in which the following holds: 1. A set of random variables makes up the nodes in the network. 2. A set of directed links or arrows connects pairs of nodes. 3. Each node has a conditional probability table that quanties the effects the parents have on the node. 4. Directed, acyclic graph (DAG), i.e. no directed cycles.
Bayesian AI Tutorial
Bayesian AI Tutorial
Assumptions: John and Mary dont perceive burglary directly; they do not feel minor earthquakes. Note: no info about loud music or telephone ringing and confusing John. Summarised in uncertainty in links from Alarm to JohnCalls and MaryCalls. Once specied topology, need to specify conditional probability table (CPT) for each node. Each row contains the cond prob of each node value for a conditioning case. Each row must sum to 1. A table for a Boolean var with n Boolean parents contain 2n+1 probs. A node with no parents has one row (the prior probabilities)
Earthquake
MaryCalls A T F
T T T F F T F F
Bayesian AI Tutorial
Bayesian AI Tutorial
19
20
Example:
P (J ^ M ^ A ^ :B ^ :E )
= P (J jA)P (M jA)P (Aj:B ^ :E )P (:B )P (:E ) = 0:9 0:7 0:001 0:999 0:998 = 0:0067:
Bayesian AI Tutorial
Bayesian AI Tutorial
Network Construction
1. Choose the set of relevant variables Xi that describe the domain. 2. Choose an ordering for the variables. 3. While there are variables left: (a) Pick a variable Xi and add a node to the network for it. (b) Set (Xi ) to some minimal set of nodes already in the net such that the conditional independence property is satised. (c)
Alarm
Burglary
Earthquake
Bayesian AI Tutorial
Bayesian AI Tutorial
23
24
Earthquake
P (C jA ^ B ) = P (C jB )
Alarm
Burglary
Example More probabilities than the full joint! See below for why. C = Jills u A = Jacks u B = severe cough
Bayesian AI Tutorial
Bayesian AI Tutorial
Common Causes
Common Effects
B B
P (AjC ^ B ) 6= P (A)P (C )
Example A = u B = severe cough C = tuberculosis Given a severe cough, u explains away tuberculosis.
P (C jA ^ B ) = P (C jB )
Example A = Jacks u B = Joes u C = Jills u
Bayesian AI Tutorial
Bayesian AI Tutorial
27
28
D-separation
Graph-theoretic criterion of conditional independence. We can determine whether a set of nodes X is independent of another set Y, given a set of evidence nodes E, i.e., X q Y jE . Earthquake example
Burglary Earthquake
Causal Ordering
Why does variable order affect network density? Because Using the causal order allows direct representation of conditional independencies Violating causal order requires new arcs to re-establish conditional independencies
Alarm
JohnCalls
MaryCalls
Bayesian AI Tutorial
Bayesian AI Tutorial
Networks
Basic task for any probabilistic inference system: Compute the posterior probability distribution for a set of query variables, given values for some evidence variables. Also called Belief Updating. Types of Inference:
Flu
TB
Cough
Flu and TB are marginally independent. Given the ordering: Cough, Flu, TB:
Q
Cough
Flu
TB
E Diagnostic
Q Causal
E Mixed
Bayesian AI Tutorial
Bayesian AI Tutorial
31
32
Kinds of Inference
Diagnostic inferences: from effect to causes. P(Burglary|JohnCalls) Causal Inferences: from causes to effects. P(JohnCalls|Burglary) P(MaryCalls|Burglary) Intercausal Inferences: between causes of a common effect. P(Burglary|Alarm) P(Burglary|Alarm Exact inference Trees and polytrees: message-passing algorithm Multiply-connected networks: Clustering Approximate Inference
Earthquake)
Large, complex networks: Stochastic Simulation Other approximation methods In the general case, both sorts of inference are computationally complex (NP-hard).
Bayesian AI Tutorial
P(B) 0.01
Burglary
Earthquake
connected networks
Networks where two nodes are connected by more than one path Two or more possible causes which share a common ancestor One variable can inuence another through more than one causal mechanism Example: Cancer network
Metastatic Cancer
Alarm
T T T F F T F F
T T T F F T F F
A (B)
A (E) M(A)
J (Ph)
M(A) J (A)
(J) = (1,1)
(M) = (1,0)
Bayesian AI Tutorial
35
36
Compile into join-tree May be slow May require too much memory if original network is highly connected
Z=B,C E D
Do belief updating in join-tree (usually fast) Caveat: clustered nodes have increased complexity; updates may be computationally complex
P (z ja) = P (b; cja) = P (bja)P (cja) P (ejz ) = P (ejb; c) = P (ejc) P (djz ) = P (djb; c)
Bayesian AI Tutorial
Bayesian AI Tutorial
Making Decisions
Bayesian networks can be extended to support decision making. Preferences between different outcomes of various plans. Utility theory Decision theory = Utility theory + Probability theory.
Bayesian AI Tutorial
Bayesian AI Tutorial
39
40
Bayesian AI Tutorial
Bayesian AI Tutorial
Example: Umbrella
Weather
Forecast
Take Umbrella
P (Weather = Rainj) = 0:3 P (Forecast = RainyjWeather = Rain) = 0:60 P (Forecast = CloudyjWeather = Rain) = 0:25 P (Forecast = SunnyjWeather = Rain) = 0:15 P (Forecast = RainyjWeather = NoRain) = 0:1 P (Forecast = CloudyjWeather = NoRain) = 0:2 P (Forecast = SunnyjWeather = NoRain) = 0:7 U (NoRain; TakeUmbrella) = 20 U (NoRain; LeaveAtHome) = 100 U (Rain; TakeIt) = 70 U (Rain; LeaveAtHome) = 0
Bayesian AI Tutorial
Bayesian AI Tutorial
43
44
State t-2
State t-1
Obs t-2
Obs t-1
Obs t+2
Similarly, Decision Networks can be extended to include temporal aspects. Sequence of decisions taken = Plan.
The values of state variables at time t depend only on the values at t 1. Can calculate distributions for St+1 and further: probabilistic projection. Can be done using standard BN updating algorithms This type of DBN gets very large, very quickly. Usually only keep two time slices of the network.
Dt
Dt+1
Dt+1
Dt+1
State t
State t+1
State t+2
State t+3
Bayesian AI Tutorial
Bayesian AI Tutorial
Bayes rule allows unknown probabilities to be computed from known ones. Conditional independence (due to causal relationships) allows efcient updating Bayesian networks are a natural way to represent conditional independence info. links between nodes: qualitative aspects; conditional probability tables: quantitative aspects. Inference means computer the probability distribution for a set of query variables, given a set of evidence variables. Inference in Bayesian networks is very exible: can enter evidence about any node and update beliefs in any other nodes. The speed of inference in practice depends on the structure of the network: how many
Bayesian AI Tutorial
Bayesian AI Tutorial
47
48
Applications: Overview
(Simple) Example Networks loops; numbers of parents; location of evidence and query nodes. Bayesian networks can be extended with decision nodes and utility nodes to support decision making: Decision Networks or Inuence Diagrams. Bayesian and Decision networks can be extended to allow explicit reasoning about changes over time. Applications Medical Decision Making: Survey of applications Planning and Plan Recognition Natural Language Generation (NAG) Bayesian poker Deployed Bayesian Networks (See Handout for details) BN Software Web Resources
Bayesian AI Tutorial
Bayesian AI Tutorial
Metastatic cancer is a possible cause of a brain tumor and is also an explanation for increased total serum calcium. In turn, either of these could explain a patient falling into a coma. Severe headache is also possibly associated with a brain tumor. (Example from (Pearl, 1988).)
Metastatic Cancer A Brain tumour B Increased total serum calcium C D Coma E Severe Headaches
Example: Asia
A patient presents to a doctor with shortness of breath. The doctor considers that possibles causes are tuberculosis, lung cancer and bronchitis. Other additional information that is relevant is whether the patient has recently visited Asia (where tuberculosis is more prevalent), whether or not the patient is a smoker (which increases the chances of cancer and bronchitis). A positive xray would indicate either TB or lung cancer. (Example from (Lauritzen, 1988).)
visit to Asia smoking
P (a) = 0:2 P (bja) = 0:80 P (cja) = 0:20 P (djb; c) = 0:80 P (djb; :c) = 0:80 P (ejc) = 0:80
P (bj:a) = 0:20 P (cj:a) = 0:05 P (dj:b; c) = 0:80 P (dj:b; :c) = 0:05 P (ej:c) = 0:60
lung cancer
bronchitis
positive X-ray
dyspnoea
Bayesian AI Tutorial
Bayesian AI Tutorial
51
52
in-office
lights-on
logged-on
Multiply-connected network (QMR structure) B = background information (e.g. age, sex of patient)
Bayesian AI Tutorial
Bayesian AI Tutorial
Medical Applications
Pathnder case study: see handout using material from (Russell&Norvig, 1995, pp.457-458). QMR (Quick Medical Reference): 600 diseases, 4,000 ndings, 40,000 arcs. (Dean&Wellman, 1991) MUNIN (Andreassen et al., 1989): neuromuscular disorders, about 1000 nodes; exact computation < 5 seconds. Glucose prediction and insulin dose adjustment (DBN application) (Andreassen et al., 1991). CPSC project (Pradham et al., 1994) 448 nodes, 906 links, 8254 conditional probability values LW algorithm - answers in 35 mins (1994)
Bayesian AI Tutorial Bayesian AI Tutorial
55
56
Application of LW to medical diagnosis (Shwe&Cooper, 1990). Forecasting sleep apnea (Dagum et al., 1993). ALARM (Beinlich et al., 1989): 37 nodes, 42 arcs. (See Netica examples.)
MinVolSet (3)
.976
Ventmach (4)
1.158
Disconnect (2)
.617
PulmEmbolus(2)
.369 .288 .428
Intubation (3)
.141 .140
PAP (3)
Shunt (2)
.067 .100
Press (4)
1.201
VentLung (4)
1.189
FiO2 (2)
.411 .213
VentAlv (4)
.805 .743
MinVol (4)
.891 .362
ArtCO2 (3)
.054 .066
TPR (3)
.470
Hypovolemia (2)
.538
ErrCauter (2)
.324
.888
HR (3)
ErrLowOutput(2)
.344
History (2)
.724
StrokeVolume (3)
.746
LVEDVolume(3)
.874
HRSat (3)
HRBP (3)
.251
CO (3)
.199
CVP (3)
PCPW (3)
.485
BP (3)
Bayesian AI Tutorial
Bayesian AI Tutorial
L 0
L 1
L 2
L 3
L 0
L 1
L 2
L 3
(a) mainModel
A 0 A 1 A 2 A 3
(b) indepModel
Q Q
L 0
L 1
L 2
L 3
(c) actionModel
(d) locationModel
Bayesian AI Tutorial
Bayesian AI Tutorial
59
60
Higher level Ec concepts like %% EE cc motivation or ability %% EE ccc % Lower level cc EE concepts like %% Grade Point Average cc EE cc %% + Semantic @ EE H cc Network @ %% @ cc 2nd layer EE BB HHHH % E A Q @% EA H B cc EEQQQBB %@@ EA %- @ Semantic cc E Network EE Q R @ E c 1st layer %% c EE CC % HH HH EE CC %% H Bayesian EE C %% Network EE %% % 6 E %
Bayesian Poker
(Korb et al., 1999) Poker is ideal for testing automated reasoning under uncertainty Physical randomisation Incomplete hand information Incomplete opponent info (strategies, blufng, etc) Bayesian networks are a good representation for complex game playing. Our Bayesian Poker Player (BPP) plays 5-Card stud poker at the level of a good amateur human player. To play: telnet indy13.cs.monash.edu.au login: poker password: maverick
Bayesian Poker BN
Bayesian network provides an estimate of winning at any point in the hand. Betting curves based on pot-odds used to determine action (bet/call, pass or fold).
BPP Win
OPP Final
BPP Final
C|F
BPP Current
M A|C
U|C
OPP Action
OPP Upcards
Bayesian AI Tutorial
Bayesian AI Tutorial
63
64
Initial 9 hand types too coarse. We use a ner granularity for most common hands (busted and a pair): low, medium, Q-high, K-high, A-high results in 17 hand-types Conditional Probability Matrices
MU jC and MC jF
poker hands.
Belief Updating: Since network is a polytree, simple fast propagation updating algorithm used.
Bayesian AI Tutorial Bayesian AI Tutorial
From Web Site database: See handout for details. TRACS: Predicting reliability of military vehicles. Andes: intelligent tutoring system for physics. Distributed Virtual Agents advising online users on web sites. Information extraction from natural language text DXPLAIN: decision support for medical diagnosis. Illiad: teaching tool for medical students. Microsoft Health Produce: nd by symptom feature.
Bayesian AI Tutorial
67
68
Weapons scheduling. Monitoring power generation. Processor fault diagnosis. Knowledge Industries applications: (a) in medicine, sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, and home-based health evaluations (b) in capital equipment, locomotives, gas-turbine engines for aircraft and land-based power production, the space shuttle, and ofce equipment. Software debuggin. Vista: decision support system used at NASA Mission Control Center. MS: (a) Answer Wizard (Ofce 95), Information retrieval; (b) Print Troubleshooter; (c) Aladdin, troubleshooting customer support.
BN Software: Issues
Functionality Especially application vs API Price Many free for demo versions or educational use Commercial licence costs. Availability (platforms) Quality GUI Documentation and Help Leading edge Robustness software company
Bayesian AI Tutorial
Bayesian AI Tutorial
BN Software
Analytica: www.lumina.com Hugin: www.hugin.com
Web Resources
Bayesian Belief Network site (Russell Greiner):
www.cs.ualberta.ca/ greiner/bn.html
Netica: www.norsys.com Bayesian Network Repository (Nir Friedman) Above 3 available during tutorial lab session.
www-
nt.cs.berkeley.edu/home/nir/public html/Repository/index.htm
Bayesian AI Tutorial
Bayesian AI Tutorial
71
72
Applications: Summary
Various BN structures are available to compactly and accurately represent certain types of domain features. Bayesian networks have been used for a wide range of AI applications. Robust and easy to use Bayesian network software is now readily available.
Learning Probability Tables Learning Causal Structure Conditional Independence Learning Statistical Equivalence TETRAD II Bayesian Learning of Bayesian Networks Cooper & Herskovits: K2 Learning Variable Order Statistical Equivalence Learners Full Causal Learners Minimum Encoding Methods Lam & Bacchuss MDL learner MML metrics MML search algorithms MML Sampling Empirical Results
Bayesian AI Tutorial
Bayesian AI Tutorial
X3
X3 = a13 X1 + a23 X2 +
Discrete models: Bayesian nets replace vectors of linear coefcients with CPTs.
Bayesian AI Tutorial
Bayesian AI Tutorial
75
76
for K parents
D 1; : : : ; i; : : : ; K ]
i = K=1 k k
Dual log-linear and full CPT models (Neil, Wallace, Korb 1999).
prob of outcome i is
D 1; : : : ; i + 1; : : : ; K ]
Others are looking at learning without parameter independence. E.g., Decision trees to learn structure within CPTs (Boutillier et al. 1996).
Bayesian AI Tutorial Bayesian AI Tutorial
This is the real problem; parameterizing models is essentially numerical computing. There are two basic methods: Learning from conditional independencies (CI learning) Learning using a scoring metric (Metric learning)
Statistical Equivalence
Verma and Pearls rules identify the set of causal models which are statistically equivalent Two causal models H1 and H2 are statistically equivalent iff they contain the same variables and joint samples over them provide no statistical grounds for preferring one over the other. Examples All fully connected models are equivalent. A !B !C and A A !B !D B C. B !D C.
CI learning (Verma and Pearl, 1991) Suppose you have an Oracle who can answer yes or no to any question of the type:
X q Y jS?
Then you can learn the correct causal model, up to statistical equivalence.
Bayesian AI Tutorial
C and A
Bayesian AI Tutorial
79
80
Statistical Equivalence
Chickering (1995): Any two causal models over the same variables which have the same skeleton (undirected arcs) and the same directed v-structures are statistically equivalent. If H1 and H2 are statistically equivalent, then they have the same maximum likelihoods relative to any joint samples:
TETRAD II
Spirtes, Glymour and Scheines (1993) Replace the Oracle with statistical tests: for linear models a signicance test on partial correlation
X q Y jS i
XY S = 0
for discrete models a 2 test on the difference between CPT counts expected with independence (Ei ) and observed (Oi )
is a parameterization of Hi
X q Y jS i
Oi i Oi ln Ei
Bayesian AI Tutorial
Bayesian AI Tutorial
Herskovits TETRAD II
Asymptotically nds causal structure to within the statistical equivalence class of the true model. Requires larger sample sizes than MML (Dai, Korb, Wallace & Wu, 1997): Statistical tests are not robust given weak causal interactions and/or small samples. Cheap, and easy to use. Cooper & Herskovits (1991, 1992) Compute P (hi je) by brute force, under the assumptions: 1. All variables are discrete. 2. Samples are i.i.d. 3. No missing values. 4. All values of child variables are uniformly distributed. 5. Priors over hypotheses are uniform. With these assumptions, Cooper & Herskovits reduce the computation of PCH (h; e) to a polynomial time counting problem.
Bayesian AI Tutorial
Bayesian AI Tutorial
83
84
Reliance upon a given variable order is a major drawback to K2 And many other algorithms (Buntine 1991, Bouckert 1994, Suzuki 1996, Madigan & Raftery 1994) Whats wrong with that? We want autonomous AI (data mining). If experts can order the variables they can likely supply models. Determining variable ordering is half the problem. If we know A comes before B , the only remaining issue is whether there is a link between the two. The number of orderings consistent with dags is apparently exponential (Brightwell & Winkler 1990). So iterating over all possible orderings will not scale up.
Bayesian AI Tutorial
n)=2
Bayesian AI Tutorial
)Madigan, Andersson, Perlman & Volinsky (1996) follow this advice, use uniform prior over equivalence classes. )Geiger and Heckerman (1994) dene Bayesian metrics for linear and discrete equivalence classes of models (BGe and BDe)
Bayesian AI Tutorial
Bayesian AI Tutorial
87
88
MDL
Minimum Description Length (MDL) inference Invented by Rissanen (1978) based upon Minimum Message Length (MML) invented by Wallace (Wallace and Boulton, 1968). Plays trade-off btw model simplicity model t to the data by minimizing the length of a joint description of model and data given the model.
Bayesian AI Tutorial
Bayesian AI Tutorial
1)
d(si 1) j=1 sj for specifying the CPT: d is the xed bit-length per probability si is the number of states for node i N N M (Xi; (i)) is mutual information btw Xi
and its parent set
n i=1 H (Xi )
(NB: This code is not efcient. E.g., treats every node as equally likely to be a parent; assumes knowledge of all ki .)
Bayesian AI Tutorial Bayesian AI Tutorial
91
92
MML
Minimum Message Length (Wallace & Boulton 1968) uses Shannons measure of information:
for connectivity
Xj
f log p( j jh) F ( j)
where j are the parameters for Xj and F ( j ) is the Fisher information. f ( j jh) is assumed to be N (0; j ). (Cf. with MDLs xed length for parms)
Bayesian AI Tutorial
MML Metric for discrete models MML Metric for Linear Models
Sample for Xj given h and We can use PCH (hi ; e) (from Cooper & Herskovits) to dene an MML metric for discrete models. Difference between MML and Bayesian metrics:
2
jk
j:
log P (ejh; j ) =
K k=1
p1 e 2 j
2 2 j
where K is the number of sample values and jk is the difference between the observed value of Xj and its linear prediction.
MML partitions the parameter space and selects optimal parameters. Equivalent to a penalty of 1 log 6e per parameter 2 (Wallace & Freeman 1987); hence:
(1)
Bayesian AI Tutorial
Bayesian AI Tutorial
95
96
MML Sampling
Search space of totally ordered models (TOMs). Sampled via a Metropolis algorithm (Metropolis et al. 1953). From current model M , nd the next model M 0 by: Randomly select a variable; attempt to swap order with its predecessor. Or, randomly select a pair; attempt to add/delete an arc. Attempts succeed whenever P (M 0 )=P (M ) > U (per MML metric), where U is uniformly random from 0 : 1].
Bayesian AI Tutorial
Bayesian AI Tutorial
99
100
(Other) Limitations
Bayesian AI Tutorial
Bayesian AI Tutorial
AND
Introduction to Bayesian AI
T. Bayes (1764) An Essay Towards Solving a Problem in the Doctrine of Chances. Phil Trans of the Royal Soc of London. Reprinted in Biometrika, 45 (1958), 296-315. B. Buchanan and E. Shortliffe (eds.) (1984) Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Addison-Wesley. B. de Finetti (1964) Foresight: Its Logical Laws, Its Subjective Sources, in Kyburg and Smokler (eds.) Studies in Subjective Probability. NY: Wiley. D. Heckerman (1986) Probabilistic Interpretations for MYCINs Certainty Factors, in L.N. Kanal and J.F. Lemmer (eds.) Uncertainty in Articial Intelligence. North-Holland. C. Howson and P. Urbach (1993) Scientic Reasoning: The Bayesian Approach. Open Court. A MODERN REVIEW OF BAYESIAN THEORY. K.B. Korb (1995) Inductive learning and defeasible inference, Jrn for Experimental and Theoretical AI, 7, 291-324.
Bayesian AI Tutorial
F. P. Ramsey (1931) Truth and Probability in The Foundations of Mathematics and Other Essays. NY: Humanities Press. T HE ORIGIN OF MODERN BAYESIANISM . I NCLUDES LOTTERY- BASED ELICITATION AND D UTCH - BOOK ARGUMENTS FOR THE USE OF PROBABILITIES. R. Reiter (1980) A logic for default reasoning, Articial Intelligence, 13, 81-132. J. von Neumann and O.Morgenstern (1947) Theory of Games and Economic Behavior, 2nd ed. Princeton Univ. S TANDARD REFERENCE ON ELICITING UTILITIES VIA LOTTERIES.
Bayesian Networks
E. Charniak (1991) Bayesian Networks Without Tears, Articial Intelligence Magazine, pp. 50-63, Vol 12. A N ELEMENTARY INTRODUCTION.
Bayesian AI Tutorial
103
104
D. DAmbrosio (1999) Inference in Bayesian Networks. Articial Intelligence Magazine, Vol 20, No. 2. P. Haddaway (1999) An Overview of Some Recent Developments in Bayesian Problem-Solving Techniques. Articial Intelligence Magazine, Vol 20, No. 2. Howard & Matheson (1981) Inuence Diagrams, Principles and Applications of Decision Analysis. F. V. Jensen (1996) An Introduction to Bayesian Networks, Springer. R. Neapolitan (1990) Probabilistic Reasoning in Expert Systems. Wiley. S IMILAR COVERAGE TO THAT OF P EARL ; MORE
EMPHASIS ON PRACTICAL ALGORITHMS FOR NETWORK UPDATING.
Applications
D.W. Albrecht, I. Zukerman and Nicholson, A.E. (1998) Bayesian Models for Keyhole Plan Recognition in an Adventure Game. User Modeling and User-Adapted Interaction, 8(1-2), 5-47, Kluwer Academic Publishers. S. Andreassen, F.V. Jensen, S.K. Andersen, B. Falck, U. Kjrulff, M. Woldbye, A.R. Srensen, A. Rosenfalck and F. Jensen (1989) MUNIN An Expert EMG Assistant, Computer-Aided Electromyography and Expert Systems, Chapter 21, J.E. Desmedt (Ed.), Elsevier. S.A. Andreassen, J.J Benn, R. Hovorks, K.G. Olesen and R.E. Carson (1991) A Probabilistic Approach to Glucose Prediction and Insulin Dose Adjustment: Description of Metabolic Model and Pilot Evaluation Study. I. Beinlich, H. Suermondt, R. Chavez and G. Cooper (1992) The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks, Proc. of the 2nd European Conf. on Articial Intelligence in medicine, pp. 689-693. T.L Dean and M.P. Wellman (1991) Planning and control, Morgan Kaufman. T.L. Dean, J. Allen and J. Aloimonos (1994) Articial Intelligence: Theory and Practice, Benjamin/Cummings.
Bayesian AI Tutorial
J. Pearl (1988) Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann. T HIS IS THE CLASSIC TEXT INTRODUCING BAYESIAN NETWORKS TO THE AI COMMUNITY. Poole, D., Mackworth, A., and Goebel, R. (1998) Computational Intelligence: a logical approach. Oxford University Press. Russell & Norvig (1995) Articial Intelligence: A Modern Approach, Prentice Hall.
Bayesian AI Tutorial
Network Models for Forecasting, Proceedings of the 8th Conference on Uncertainty in Articial Intelligence, pp. 41-48. J. Forbes, T. Huang, K. Kanazawa and S. Russell (1995) The BATmobile: Towards a Bayesian Automated Taxi, Proceedings of the 14th Int. Joint Conf. on Articial Intelligence (IJCAI95), pp. 1878-1885. S.L Lauritzen and D.J. Spiegelhalter (1988) Local Computations with Probabilities on Graphical Structures and their Application to Expert Systems, Journal of the Royal Statistical Society, 50(2), pp. 157-224. McConachy et al (1999) A.E. Nicholson (1999) CSE2309/3309 Articial Intelligence, Monash University, Lecture Notes, http://www.csse.monash.edu.au/nnn/2-3309.html. a M. Pradham, G. Provan, B. Middleton and M. Henrion (1994) Knowledge engineering for large belief networks, Proceedings of the 10th Conference on Uncertainty in Articial Intelligence. D. Pynadeth and M. P. Wellman (1995) Accounting for Context in Plan Recogniition, with Application to Trafc Monitoring, Proceedings of the 11th Conference on Uncertainty in Articial Intelligence, pp.472-481.
Bayesian AI Tutorial
Likelihood-Weighting Simulation on a Large, Multiply Connected Belief Network, Proceedings of the Sixth Workshop on Uncertainty in Articial Intelligence, pp. 498-508, 1990. L.C. van der Gaag, S. Renooij, C.L.M. Witteman, B.M.P. Aleman, B.G. Tall (1999) How to Elicit Many Probabilities, Laskey & Prade (eds) UAI99, 647-654. Zukerman, I., McConachy, R., Korb, K. and Pickett, D. (1999) Exploratory Interaction with a Bayesian Argumentation System, in IJCAI-99 Proceedings the Sixteenth International Joint Conference on Articial Intelligence, pp. 1294-1299, Stockholm, Sweden, Morgan Kaufmann.
107
108
G. Brightwell and P. Winkler (1990) Counting linear extensions is #P-complete. Technical Report DIMACS 90-49, Dept of Computer Science, Rutgers Univ. W. Buntine (1991) Theory renement on Bayesian networks, in DAmbrosio, Smets and Bonissone (eds.) UAI 1991, 52-69. W. Buntine (1996) A Guide to the Literature on Learning Probabilistic Networks from Data, IEEE Transactions on Knowledge and Data Engineering,8, 195-210. D.M. Chickering (1995) A Tranformational Characterization of Equivalent Bayesian Network Structures, in P. Besnard and S. Hanks (eds.) Proceedings of the Eleventh Conference on Uncertainty in Articial Intelligence (pp. 87-98). San Francisco: Morgan Kaufmann. STATISTICAL EQUIVALENCE . G.F. Cooper and E. Herskovits (1991) A Bayesian Method for Constructing Bayesian Belief Networks from Databases, in DAmbrosio, Smets and Bonissone (eds.) UAI 1991, 86-94. G.F. Cooper and E. Herskovits (1992) A Bayesian Method for the Induction of Probabilistic Networks from Data, Machine Learning, 9, 309-347. A N EARLY BAYESIAN CAUSAL DISCOVERY METHOD.
Bayesian AI Tutorial
H. Dai, K.B. Korb, C.S. Wallace and X. Wu (1997) A study of casual discovery with weak links and small samples. Proceedings of the Fifteenth International Joint Conference on Articial Intelligence (IJCAI), pp. 1304-1309. Morgan Kaufmann. N. Friedman (1997) The Bayesian Structural EM Algorithm, in D. Geiger and P.P. Shenoy (eds.) Proceedings of the Thirteenth Conference on Uncertainty in Articial Intelligence (pp. 129-138). San Francisco: Morgan Kaufmann. Geiger and Heckerman (1994) Learning Gaussian networks, in Lopes de Mantras and Poole (eds.) UAI 1994, 235-243. D. Heckerman and D. Geiger (1995) Learning Bayesian networks: A unication for discrete and Gaussian domains, in Besnard and Hankds (eds.) UAI 1995, 274-284. D. Heckerman, D. Geiger, and D.M. Chickering (1995) Learning Bayesian Networks: The Combination of Knowledge and Statistical Data, Machine Learning, 20, 197-243. BAYESIAN LEARNING OF STATISTICAL EQUIVALENCE CLASSES. K. Korb (1999) Probabilistic Causal Structure in H. Sankey (ed.) Causation and Laws of Nature:
Bayesian AI Tutorial
Science 14. Kluwer Academic. I NTRODUCTION TO THE RELEVANT PHILOSOPHY OF CAUSATION FOR LEARNING BAYESIAN NETWORKS. P. Krause (1998) Learning Probabilistic Networks.
Methodologies for Knowledge Discovery and Data Mining: Third Pacic-Asia Conference (pp. 432-437). Springer Verlag. G ENETIC ALGORITHMS FOR CAUSAL DISCOVERY; STRUCTURE PRIORS. J.R. Neil, C.S. Wallace and K.B. Korb (1999) Learning Bayesian networks with restricted causal interactions, in Laskey and Prade (eds.) UAI 99, 486-493. J. Rissanen (1978) Modeling by shortest data description, Automatica, 14, 465-471. H. Simon (1954) Spurious Correlation: A Causal Interpretation, Jrn Amer Stat Assoc, 49, 467-479. D. Spiegelhalter & S. Lauritzen (1990) Sequential Updating of Conditional Probabilities on Directed Graphical Structures, Networks, 20, 579-605. P. Spirtes, C. Glymour and R. Scheines (1990) Causality from Probability, in J.E. Tiles, G.T. McKee and G.C. Dean Evolving Knowledge in Natural Science and Articial Intelligence. London: Pitman. A N
ELEMENTARY INTRODUCTION TO STRUCTURE LEARNING VIA CONDITIONAL INDEPENDENCE .
BN S,
PARAMETERIZATION
W. Lam and F. Bacchus (1993) Learning Bayesian belief networks: An approach based on the MDL principle, Jrn Comp Intelligence, 10, 269-293. D. Madigan, S.A. Andersson, M.D. Perlman & C.T. Volinsky (1996) Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs, Comm in Statistics: Theory and Methods, 25, 2493-2519. D. Madigan and A. E. Raftery (1994) Model selection and accounting for model uncertainty in graphical modesl using Occams window, Jrn AMer Stat Assoc, 89, 1535-1546. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller (1953) Equations of state calculations by fast computing machines, Jrn Chemical Physics, 21, 1087-1091. J.R. Neil and K.B. Korb (1999) The Evolution of Causal Models: A Comparison of Bayesian Metrics and
Bayesian AI Tutorial
P. Spirtes, C. Glymour and R. Scheines (1993) Causation, Prediction and Search: Lecture Notes in Statistics 81.
Bayesian AI Tutorial
111
112
OF THE ORTHODOX
J. Suzuki (1996) Learning Bayesian Belief Networks Based on the Minimum Description Length Principle, in L. Saitta (ed.) Proceedings of the Thirteenth International Conference on Machine Learning (pp. 462-470). San Francisco: Morgan Kaufmann. T.S. Verma and J. Pearl (1991) Equivalence and Synthesis of Causal Models, in P. Bonissone, M. Henrion, L. Kanal and J.F. Lemmer (eds) Uncertainty in Articial Intelligence 6 (pp. 255-268). Elsevier. T HE GRAPHICAL CRITERION FOR STATISTICAL EQUIVALENCE . C.S. Wallace and D. Boulton (1968) An information measure for classication, Computer Jrn, 11, 185-194. C.S. Wallace and P.R. Freeman (1987) Estimation and inference by compact coding, Jrn Royal Stat Soc (Series B), 49, 240-252. C. S. Wallace and K. B. Korb (1999) Learning Linear Causal Models by MML Sampling, in A. Gammerman (ed.) Causal Models and Intelligent Data Management. Springer Verlag. S AMPLING APPROACH TO LEARNING CAUSAL MODELS ; DISCUSSION OF STRUCTURE PRIORS.
Bayesian AI Tutorial
C. S. Wallace, K. B. Korb, and H. Dai (1996) Causal Discovery via MML, in L. Saitta (ed.) Proceedings of the Thirteenth International Conference on Machine Learning (pp. 516-524). San Francisco: Morgan Kaufmann. I NTRODUCES AN MML METRIC FOR CAUSAL MODELS. S. Wright (1921) Correlation and Causation, Jrn Agricultural Research, 20, 557-585. S. Wright (1934) The Method of Path Coefcients, Annals of Mathematical Statistics, 5, 161-215.
Current Research
Bayesian AI Tutorial