The Chow-Liu algorithm constructs a maximum weight spanning tree to model the dependencies between discrete variables. It runs in O(n2 log n) time and finds the tree structure that best approximates the probability distribution of the data according to Kullback-Leibler divergence. The algorithm takes as input a dataset and measures of association between variable pairs, and outputs a tree where nodes are variables and edges indicate conditional dependencies.
2. What is the Chow-Liu Algorithm? Reported by Chow and Liu (1968). Also known as maximum weight spanning tree (MWST) algorithm or Kruskal¡¯s algorithm. Running time complexity is O(n 2 log(n)), where n is number of variables. The probability distribution associated to the tree constructed by the Chow-Liu algorithm is the one that is closest to the probability distribution associated to the data as measured by the Kullback-Leibler divergence (Pearl and Dechter 1989). One of many uses include constructing graphical relationships between variables. Assumptions No missing data Variables are discrete Data is independently and identically distributed Copyright 2009 by Jee Vang
3. Pseudo-code of Chow-Liu Algorithm Denote the following A(X i ,X j ) : a measure of association between two variables, X i and X j U = {X 1 , X 2 , ¡, X n } : a set of n variables D : a database of cases of the variables in U T : a tree whose nodes have a 1-to-1 correspondence to the variables in U Inputs to Chow-Liu algorithm A, U, D Output of Chow-Liu algorithm T Copyright 2009 by Jee Vang
4. Pseudo-code of Chow-Liu Algorithm, Procedure Begin i <- 1 //a counter, set to one L = {L 1 , L 2 , ¡, L ((n+1)/2)-n } <- all pairwise associations //L is a list of all pairwise associations in descending order; no self-pairing P <- L i //P is a pointer to the first pair of variables in L construct T //create a disconnected tree with nodes corresponding to variables in U Do until n-1 edges have been added If a path does not exists between the pair of variables in P, add an undirected arc between them P <- L (i+1) //set P to the next pair of variables in L End Copyright 2009 by Jee Vang
5. Measure of Association Mutual information was originally used as the measure of association (Chow and Liu 1968) However, any measure of association satisfying the following property may be used Give three variables, X i , X j , and X k , such that X i and X k are conditionally independent given X j , I(X i ,X j ,X k ), any measure of association, A(X i ,X j ), such that, min(A(X i ,X j ), A(X j ,X k )) > A(X i ,X k ), can be used for the Chow-Liu algorithm (Acid and de Campos 1994). Copyright 2009 by Jee Vang
6. Example¡ªInputs A(X i ,X j ) : mutual information between two variables, X i and X j U = {X 1 , X 2 , X 3 , X 4 , X 5 } : 5 variables Domains of X 1 through X 5 are {true, false} D : database of cases of X 1 through X 5 Copyright 2009 by Jee Vang X 1 X 2 X 3 X 4 X 5 true false false false false false true true false true false true false false true ¡ ¡ ¡ ¡ ¡
7. Example¡ªProcedure, Pairwise Associations Compute pairwise associations L 1,2 = 0.55 L 1,3 = 0.34 L 1,4 = 0.11 L 1,5 = 0.59 L 2,3 = 0.68 L 2,4 = 0.03 L 2,5 = 0.25 L 3,4 = 0.01 L 3,5 = 0.22 L 4,5 = 0.10 Sort pairwise associations descendingly into list L L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Copyright 2009 by Jee Vang
8. Example¡ªProcedure, Tree Construction, Empty Tree L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Construct disconnected tree, T, with 5 nodes corresponding to variables in U X 2 X 3 X 5 X 4 X 1 Copyright 2009 by Jee Vang
9. Example¡ªProcedure, Tree Construction (cont¡) L = { L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Add arc between X 2 and X 3 in T, X 2 ¡ªX 3 X 3 X 2 X 5 X 4 X 1 Copyright 2009 by Jee Vang
10. Example¡ªProcedure, Tree Construction (cont¡) L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Add arc between X 1 and X 5 in T, X 1 ¡ªX 5 X 3 X 2 X 5 X 4 X 1 Copyright 2009 by Jee Vang
11. Example¡ªProcedure, Tree Construction (cont¡) L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Add arc between X 1 and X 2 in T, X 1 ¡ªX 2 X 3 X 2 X 5 X 4 X 1 Copyright 2009 by Jee Vang
12. Example¡ªProcedure, Tree Construction (cont¡) L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Skip adding arc between X 1 and X 3 //path exists Skip adding arc between X 2 and X 5 //path exists Skip adding arc between X 3 and X 5 //path exists X 3 X 2 X 5 X 4 X 1 Copyright 2009 by Jee Vang
13. Example¡ªProcedure, Tree Construction (cont¡) L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 , L 1,4 ,L 4,5 , L 2,4 , L 3,4 } Add arc between X 1 and X 4 in T, X 1 ¡ªX 4 X 3 X 2 X 5 X 4 X 1 Copyright 2009 by Jee Vang
14. Example¡ªProcedure, Tree Construction (cont¡) L = {L 2,3 , L 1,5 , L 1,2 , L 1,3 , L 2,5 , L 3,5 ,L 1,4 , L 4,5 , L 2,4 , L 3,4 } Final tree constructed X 3 X 2 X 5 X 4 X 1 Copyright 2009 by Jee Vang
15. References Chow, C.K., and Liu, C.N. Approximating discrete probability distributions with dependence trees . IEEE Transactions on Information Theory , 14(3):462-467, 1968. Pearl, J., and Dechter, R. Learning Structure from Data: A Survey . UCLA Cognitive Systems Laboratory, Technical Report CSD-910048 (R-132), June 1989. Acid, S., and de Campos, L.M. Approximation of causal networks by polytrees. Proceedings of Information Processing and Management of Uncertainty in Knowledge-Based Systems , 1994. Copyright 2009 by Jee Vang