ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Stefano Bragaglia
DEIS ¨C University of Bologna
stefano.bragaglia@unibo.it

Fabrizio Riguzzi
ENDIF ¨C University of Ferrara
fabrizio.riguzzi@unife.it
?

?

Increasing interest in combining
logic and probability

?

C1: sneezing(X): 0.7 v
not_sneezing(X): 0.3 ¡û
flu(X).
C2: sneezing(X): 0.6 v
not_sneezing(X): 0.4 ¡û
hayfever(X).
C3: flu(andrew).
C4: flu(david).
C5: hayfever(david).
C6: hayfever(robert).

Many formalism:
? Markov Logic Networks
? ProbLog
? Logic Programs with Annotated

Disjunctions (LPADs)
? etc.
?

LPADs can easily express:

?

Queries:
?- sneezing(andrew).
?- sneezing(robert).
?- sneezing(david).

? cause-effect relationships among

events
? possible effects of a single cause
? contemporary contribution of
more causes to the same effect

Example: simple medical diagnosis

?

0.7
0.6
0.88

Probability:
Instances where query is true: 3 out of 4
P = 0.7¡Á0.6 + 0.7¡Á0.4 + 0.3¡Á0.6 =

0.88
?

Syntax
?
?

?

?

?

Program: set of disjunctive clauses
Head: set of mutually exclusive and
exhaustive logical atoms annotated with
probability values between 0 and 1
whose sum is 1
Body: event whose effects are
represented by the atoms of the
corresponding head

1.
2.

Semantic
?
?
?

Inference: in 2 steps (by cplint)
Explanations: a meta-interpreter
performs resolution keeping current
set of choices
Probability of a query:
a dynamic algorithm converts
explanations into a Binary Decision
Diagram (BDD) and traverse it; this
makes explanations mutually
exclusive
?- sneezing(david).
(C1;{X1/david};0)
(C2;{X2/david};0)
:- flu(david).
:- hayfever(david).

Instance: normal logic program obtained
by choosing a logical atom from the
head of each grounding of every clause
Probability of an instance: product of
the probability values of all the atoms
chosen for that instance
Probability of a query: sum of the
probabilities of each instance where the
query is true

?

?
0.4
0.3

0

XC2

XC1
0.7

0.6

1

P = 0.3¡Á0.6 + 0.7 = 0.88

?

Not possible in some domains
? Best K
? Deterministic algorithm
based on branch and bound
technique
? Explanations built
incrementally by keeping
only the k most probable
ones (k = 64)
? Probability of the query
computed on chosen
explanations trough BDD
(lower bound)
? Note: limiting explanations
allows better control on
complexity

? Monte Carlo
? Stochastic algorithm based
on Monte Carlo approach
? Instances sampled
repeatedly from LPAD by
considering only relevant
clauses
? Head atoms of resolving
clauses chosen stochastically
by a meta-interpreter
? Note: the probability of the
query is the fraction of
sampled instances where the
query is true (no BDD is
needed)
?

Real datasets: 10 incremental
samples of a graph describing
biological entities responsible of
Alzheimer¡¯s disease

Biological Graph
HGNC_1505

PubMed_12653
0.643

0.493

0.665
EntrezGene_11803

PubMed_15529

0.523

?

?

Artificial datasets: 3
procedurally built graphs of
increasing complexity
Queries: compute the probability
that a path exists between two
given nodes of each graph

0.567
EntrezProtein_7891032

Tests: performed on Linux
machines equipped with Intel
Core 2 Duo E6550 (2333 MHz) and
4 Gb RAM with a 24 hours time
limit

PubMed_1741124

0.602
0.621
PubMed_157782

0.713

EntrezProtein_8147603

Artificial Graphs
0.3
0.3

0.3

0.3

0.3
0.3

0.3

0.3

0.3

0.3
0.3
0.3

¡°lanes¡±

0.3

0.3

0.3

0.3

?

HGNC_620

0.3

0.3

0.3

0.3

0.3

0.3
0.3

¡°branches¡±

¡°parachutes¡±
12
10
8
6
4
2
0

Biological Graphs:
CPU times averaged on successes
100000

Standard
Best K
Monte Carlo

Size (edges)

Time (log s)

Answers

Biological Graphs:
number of successes

1000
Standard
10

Best K
Monte Carlo

0,1
0,001

Size (edges)
Lanes Graphs: CPU times

Parachutes Graphs: CPU times
100

Standard
Best K
Monte Carlo

0,01

Monte Carlo

0,000001

Size (steps)

Size (steps)

Branches Graphs: CPU times

Time (log s)

100
1
1

3

5

7

9

11 13 15 17 19

0,01

Standard
Best K
Monte Carlo

0,0001
0,000001

Size (steps)

Standard
Best K

0,0001

0,0001
0,000001

1
1
23
45
67
89
111
133
155
177
199
221
243
265
287

0,01

Time (log s)

1
1
23
45
67
89
111
133
155
177
199
221
243
265
287

Time (log s)

100

More Related Content

Approximate Inference for Logic Programs with Annotated Disjunctions (RCRA 2009)

  • 1. Stefano Bragaglia DEIS ¨C University of Bologna stefano.bragaglia@unibo.it Fabrizio Riguzzi ENDIF ¨C University of Ferrara fabrizio.riguzzi@unife.it
  • 2. ? ? Increasing interest in combining logic and probability ? C1: sneezing(X): 0.7 v not_sneezing(X): 0.3 ¡û flu(X). C2: sneezing(X): 0.6 v not_sneezing(X): 0.4 ¡û hayfever(X). C3: flu(andrew). C4: flu(david). C5: hayfever(david). C6: hayfever(robert). Many formalism: ? Markov Logic Networks ? ProbLog ? Logic Programs with Annotated Disjunctions (LPADs) ? etc. ? LPADs can easily express: ? Queries: ?- sneezing(andrew). ?- sneezing(robert). ?- sneezing(david). ? cause-effect relationships among events ? possible effects of a single cause ? contemporary contribution of more causes to the same effect Example: simple medical diagnosis ? 0.7 0.6 0.88 Probability: Instances where query is true: 3 out of 4 P = 0.7¡Á0.6 + 0.7¡Á0.4 + 0.3¡Á0.6 = 0.88
  • 3. ? Syntax ? ? ? ? ? Program: set of disjunctive clauses Head: set of mutually exclusive and exhaustive logical atoms annotated with probability values between 0 and 1 whose sum is 1 Body: event whose effects are represented by the atoms of the corresponding head 1. 2. Semantic ? ? ? Inference: in 2 steps (by cplint) Explanations: a meta-interpreter performs resolution keeping current set of choices Probability of a query: a dynamic algorithm converts explanations into a Binary Decision Diagram (BDD) and traverse it; this makes explanations mutually exclusive ?- sneezing(david). (C1;{X1/david};0) (C2;{X2/david};0) :- flu(david). :- hayfever(david). Instance: normal logic program obtained by choosing a logical atom from the head of each grounding of every clause Probability of an instance: product of the probability values of all the atoms chosen for that instance Probability of a query: sum of the probabilities of each instance where the query is true ? ? 0.4 0.3 0 XC2 XC1 0.7 0.6 1 P = 0.3¡Á0.6 + 0.7 = 0.88 ? Not possible in some domains
  • 4. ? Best K ? Deterministic algorithm based on branch and bound technique ? Explanations built incrementally by keeping only the k most probable ones (k = 64) ? Probability of the query computed on chosen explanations trough BDD (lower bound) ? Note: limiting explanations allows better control on complexity ? Monte Carlo ? Stochastic algorithm based on Monte Carlo approach ? Instances sampled repeatedly from LPAD by considering only relevant clauses ? Head atoms of resolving clauses chosen stochastically by a meta-interpreter ? Note: the probability of the query is the fraction of sampled instances where the query is true (no BDD is needed)
  • 5. ? Real datasets: 10 incremental samples of a graph describing biological entities responsible of Alzheimer¡¯s disease Biological Graph HGNC_1505 PubMed_12653 0.643 0.493 0.665 EntrezGene_11803 PubMed_15529 0.523 ? ? Artificial datasets: 3 procedurally built graphs of increasing complexity Queries: compute the probability that a path exists between two given nodes of each graph 0.567 EntrezProtein_7891032 Tests: performed on Linux machines equipped with Intel Core 2 Duo E6550 (2333 MHz) and 4 Gb RAM with a 24 hours time limit PubMed_1741124 0.602 0.621 PubMed_157782 0.713 EntrezProtein_8147603 Artificial Graphs 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 ¡°lanes¡± 0.3 0.3 0.3 0.3 ? HGNC_620 0.3 0.3 0.3 0.3 0.3 0.3 0.3 ¡°branches¡± ¡°parachutes¡±
  • 6. 12 10 8 6 4 2 0 Biological Graphs: CPU times averaged on successes 100000 Standard Best K Monte Carlo Size (edges) Time (log s) Answers Biological Graphs: number of successes 1000 Standard 10 Best K Monte Carlo 0,1 0,001 Size (edges)
  • 7. Lanes Graphs: CPU times Parachutes Graphs: CPU times 100 Standard Best K Monte Carlo 0,01 Monte Carlo 0,000001 Size (steps) Size (steps) Branches Graphs: CPU times Time (log s) 100 1 1 3 5 7 9 11 13 15 17 19 0,01 Standard Best K Monte Carlo 0,0001 0,000001 Size (steps) Standard Best K 0,0001 0,0001 0,000001 1 1 23 45 67 89 111 133 155 177 199 221 243 265 287 0,01 Time (log s) 1 1 23 45 67 89 111 133 155 177 199 221 243 265 287 Time (log s) 100