VIS 2013 Presentation
Paper is available here: http://www.oerc.ox.ac.uk/personal-pages/emaguire/AutoMacron.pdf
Code is available here: http://github.com/isa-tools/automacron
1 of 60
More Related Content
Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs
1. Visual Compression of Workflow Visualizations with
Automated Detection of Macro Motifs
Eamonn Maguire, Philippe Rocca-Serra, Susanna-Assunta Sansone, Jim Davies
and Min Chen
University of Oxford e-Research Centre
University of Oxford Department of Computer Science
VIS 2013, 13th-18th October 2013
2. Some terminology
Workflow
Literally a flow of work showing the
processes enacted from start to finish in
say business processes, software
execution, analysis procedures, or in our
case, biological experiments.
They are used to enable reproducibility.
Motif
Commonly observed subgraphs
Very commonly seen used in:
biology - protein-protein
interaction, transcription/regulation
networks; chemistry; and even
visualization (e.g. VisComplete)
e.g VisTrails in our VIS community - 40,000 downloads
Macro
D
A single instruction that expands
automatically in to a more complex
set of instructions.
VIS 2013, 13th-18th October 2013
E
Q
Q
D
Q
E
Q
7. Blockades
Macros in electronic circuit
diagrams are the product of
years of refinement.
No semantics
Current Motif Detection
Algorithm Limitations
VIS 2013, 13th-18th October 2013
Limited motif sizes
(Max 10)
Deciding what should
be a Macro
Macros in biological
workflows for instance is
new...how do we determine
what should be a macro?
9. Extension on Previous Work
Taxonomy-based Glyph Design
Visualizing (ISA based) workflows of
biological experiments
Maguire et al, 2012
IEEE TVCG
VIS 2013, 13th-18th October 2013
10. A Typical Biological Experiment
Hypothesis
VIS 2013, 13th-18th October 2013
Experiment
Analysis
Results
&
Paper
11. Representing an Experiment - Workflows!
Source name
Sampling Protocol
Sample name
Chemical Label
Labeling Protocol
Describe the flow of work from a
biological sample to the data file.
Workflow varies between technologies,
but there is a large commonality in steps.
Labeled Extract
Hybridisation Protocol
Assay Name
Scanning Protocol
Raw Data File
Feature Extraction Protocol
For example, the labeling step is very
common in DNA microarray experiments.
Processed Data File
VIS 2013, 13th-18th October 2013
u
od
pr
e
R
bi
ci
y!
lit
12. Our Process
1
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
G
s4
OCCURRENCE
WORKFLOWS
476
3276
F
n
E
MOTIF EXTRACTION
ALGORITHM
VIS 2013, 13th-18th October 2013
...
MOTIFS
DOMAIN EXPERT
COMPRESSION
1092
C
BIOLOGICAL
WORKFLOW
REPOSITORY
2.87
...
2.87
DOMAIN EXPERT
2.4
OCCURRENCE
WORKFLOWS
COMPRESSION
OCCURRENCE
WORKFLOWS
COMPRESSION
1092
-2.43
Branch & Merge
476
3276
600
240
2400
Branch & Merge
OCCURRENCE
WORKFLOWS
COMPRESSION
20
10
200
RANKING
ALGORITHM
Branch & Merge
SELECTED MACROS
Branch & Merge
MACRO SELECTION
VIA UI
MACRO
SELECTION
GLYPH DESIGN
MACRO
ANNOTATION
MACRO INSERTION
IN GRAPH
13. Our Process
1
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
G
s4
OCCURRENCE
WORKFLOWS
476
3276
F
n
E
MOTIF EXTRACTION
ALGORITHM
VIS 2013, 13th-18th October 2013
...
MOTIFS
DOMAIN EXPERT
COMPRESSION
1092
C
BIOLOGICAL
WORKFLOW
REPOSITORY
2.87
...
2.87
DOMAIN EXPERT
2.4
OCCURRENCE
WORKFLOWS
COMPRESSION
OCCURRENCE
WORKFLOWS
COMPRESSION
1092
-2.43
Branch & Merge
476
3276
600
240
2400
Branch & Merge
OCCURRENCE
WORKFLOWS
COMPRESSION
20
10
200
RANKING
ALGORITHM
Branch & Merge
SELECTED MACROS
Branch & Merge
MACRO SELECTION
VIA UI
MACRO
SELECTION
GLYPH DESIGN
MACRO
ANNOTATION
MACRO INSERTION
IN GRAPH
14. Workflow Repository
9,670 Biological Experiment Workflows
Why such a large number?
We can statistically make suggestions to users about what motifs
can be macros based on a number of metrics (detailed later)
+ we can robustly test our algorithm performance across a huge
cross section of experiments...
VIS 2013, 13th-18th October 2013
15. 1
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
G
s4
OCCURRENCE
WORKFLOWS
476
3276
F
n
E
MOTIF EXTRACTION
ALGORITHM
VIS 2013, 13th-18th October 2013
...
MOTIFS
DOMAIN EXPERT
COMPRESSION
1092
C
BIOLOGICAL
WORKFLOW
REPOSITORY
2.87
...
2.87
DOMAIN EXPERT
2.4
OCCURRENCE
WORKFLOWS
COMPRESSION
OCCURRENCE
WORKFLOWS
COMPRESSION
1092
-2.43
Branch & Merge
476
3276
600
240
2400
Branch & Merge
OCCURRENCE
WORKFLOWS
COMPRESSION
20
10
200
RANKING
ALGORITHM
Branch & Merge
SELECTED MACROS
Branch & Merge
MACRO SELECTION
VIA UI
MACRO
SELECTION
GLYPH DESIGN
MACRO
ANNOTATION
MACRO INSERTION
IN GRAPH
17. The Current Weaknesses
FANMOD, mFinder etc.
No semantics (edge or node)
Small node limit normally <10
Imagine n-grams with no
information other than topology
e.g. bi-grams of DNA ¡®motifs¡¯ where instead
of A-T, T-C, T-G > x-x, x-x, x-x
VIS 2013, 13th-18th October 2013
18. The Problem...Current Motif Extraction Algorithms
Unable to infer function
Unable to produce a macro
What¡¯s up?
We can¡¯t infer function
from these results
Ah, and you can¡¯t
have macros without
function...
Exactly!
VIS 2013, 13th-18th October 2013
19. Solution
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
C
E
G
s4
a holding state, with
a pseudo-
F
a star
state
a tr
that
generates a
a tr
does not
generate a
a normal state, with
a ¡®¡¯legal¡¯¡¯
G
s4
a normal state, with
a ¡®¡¯legal¡¯¡¯
a holding state, with
a pseudo-
F
a star
state
a tr
that
generates a
a tr
does not
generate a
VIS 2013, 13th-18th October 2013
More detail about each individual case, A-H available in paper.
20. Solution
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
C
E
G
s4
a holding state, with
a pseudo-
F
a star
state
a tr
that
generates a
a tr
does not
generate a
a normal state, with
a ¡®¡¯legal¡¯¡¯
G
s4
a normal state, with
a ¡®¡¯legal¡¯¡¯
a holding state, with
a pseudo-
F
a star
state
3
a tr
that
generates a
a tr
does not
generate a
VIS 2013, 13th-18th October 2013
More detail about each individual case, A-H available in paper.
21. Resulting In...
VIS 2013, 13th-18th October 2013
From our algorithm, running over 9,670 workflows, we
retrieved ~12,000 motifs up to depth 12
22. Resulting In...
From our algorithm, running over 9,670 workflows, we
retrieved ~12,000 motifs up to depth 12
Semantically aware
Limited by depth, not node count - we
have motifs with > 80 nodes
Essentially, more complicated topologically sensitive n-grams
VIS 2013, 13th-18th October 2013
23. 1
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
G
s4
OCCURRENCE
WORKFLOWS
476
3276
F
n
E
MOTIF EXTRACTION
ALGORITHM
VIS 2013, 13th-18th October 2013
...
MOTIFS
DOMAIN EXPERT
COMPRESSION
1092
C
BIOLOGICAL
WORKFLOW
REPOSITORY
2.87
...
2.87
DOMAIN EXPERT
2.4
OCCURRENCE
WORKFLOWS
COMPRESSION
OCCURRENCE
WORKFLOWS
COMPRESSION
1092
-2.43
Branch & Merge
476
3276
600
240
2400
Branch & Merge
OCCURRENCE
WORKFLOWS
COMPRESSION
20
10
200
RANKING
ALGORITHM
Branch & Merge
SELECTED MACROS
Branch & Merge
MACRO SELECTION
VIA UI
MACRO
SELECTION
GLYPH DESIGN
MACRO
ANNOTATION
MACRO INSERTION
IN GRAPH
25. Ranking Algorithm
1,043
M1 - Occurrences in data
repository
VIS 2013, 13th-18th October 2013
...
640
M2 -Workflow Presence
M3 -Compression
Potention
26. Ranking Algorithm
1,043
M1 - Occurrences in data
repository
VIS 2013, 13th-18th October 2013
...
640
M2 -Workflow Presence
M3 -Compression
Potention
27. Ranking Algorithm
1,043
M1 - Occurrences in data
repository
VIS 2013, 13th-18th October 2013
...
640
M2 -Workflow Presence
M3 -Compression
Potention
28. Ranking Algorithm
1,043
M1 - Occurrences in data
repository
...
640
M2 -Workflow Presence
M3 -Compression
Potention
For At, Aw and Ac, we map it to a fixed range [?1, 1] using a linear mapping based on the min-max
range of each indicator, yielding three normalized metrics M1 , M2 and M3
No algorithm would be complete without a weighting element. So each metric can be weighted. We
use a default weight of 1.
VIS 2013, 13th-18th October 2013
29. Ranking Algorithm
Motifs arranged
by depth
3 Normalized metrics
Filter by
min/max depth
Filter by pattern presence
Linear, branching and merging
Motif subgraph 3 Glyph representations
VIS 2013, 13th-18th October 2013
Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
30. Ranking Algorithm
Motifs arranged
by depth
3 Normalized metrics
Filter by
min/max depth
Filter by pattern presence
Linear, branching and merging
Motif subgraph 3 Glyph representations
VIS 2013, 13th-18th October 2013
Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
Score
Occurrences Workflow Compression
presence
Potential
31. Ranking Algorithm
Motifs arranged
by depth
Filter by
min/max depth
Filter by pattern presence
Linear, branching and merging
Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
Score
Occurrences Workflow Compression
presence
Potential
Downgrade
Icon
Adjusted
Score
3 Normalized metrics
Motif subgraph 3 Glyph representations
VIS 2013, 13th-18th October 2013
32. Ranking Algorithm
Motifs arranged
by depth
Filter by
min/max depth
Filter by pattern presence
Linear, branching and merging
Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
Score
Occurrences Workflow Compression
presence
Potential
1000
Downgrade
Icon
Adjusted
Score
3 Normalized metrics
Motif subgraph 3 Glyph representations
VIS 2013, 13th-18th October 2013
33. Ranking Algorithm
Motifs arranged
by depth
Filter by
min/max depth
Filter by pattern presence
Linear, branching and merging
Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
Score
Occurrences Workflow Compression
presence
Potential
1000
Subset of
1200
3 Normalized metrics
Motif subgraph 3 Glyph representations
VIS 2013, 13th-18th October 2013
Downgrade
Icon
Adjusted
Score
34. Ranking Algorithm
Motifs arranged
by depth
Filter by
min/max depth
Filter by pattern presence
Linear, branching and merging
Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
Score
Occurrences Workflow Compression
presence
Potential
1000
Subset of
1200
3 Normalized metrics
200
Motif subgraph 3 Glyph representations
VIS 2013, 13th-18th October 2013
Downgrade
Icon
Adjusted
Score
35. 1
A
s0
A
s1
B
E
s2
B
s3
C
D
H
E
E
G
s4
OCCURRENCE
WORKFLOWS
476
3276
F
n
E
MOTIF EXTRACTION
ALGORITHM
VIS 2013, 13th-18th October 2013
...
MOTIFS
DOMAIN EXPERT
COMPRESSION
1092
C
BIOLOGICAL
WORKFLOW
REPOSITORY
2.87
...
2.87
DOMAIN EXPERT
2.4
OCCURRENCE
WORKFLOWS
COMPRESSION
OCCURRENCE
WORKFLOWS
COMPRESSION
1092
-2.43
Branch & Merge
476
3276
600
240
2400
Branch & Merge
OCCURRENCE
WORKFLOWS
COMPRESSION
20
10
200
RANKING
ALGORITHM
Branch & Merge
SELECTED MACROS
Branch & Merge
MACRO SELECTION
VIA UI
MACRO
SELECTION
GLYPH DESIGN
MACRO
ANNOTATION
MACRO INSERTION
IN GRAPH
56. Overcoming the blockades
Macros in electronic circuit
diagrams are the product of
years of refinement.
No semantics
Current Motif Detection
Algorithm Limitations
VIS 2013, 13th-18th October 2013
Limited motif sizes
(Max 10)
Deciding what should
be a Macro
Macros in biological
workflows for instance is
new...how do we determine
what should be a macro?
57. Overcoming the blockades
Macros in electronic circuit
diagrams are the product of
years of refinement.
hm
ri t
al
t ic
en
ly
an
Current Motif Detection
em
s
ew
Algorithm Limitations
N
VIS 2013, 13th-18th October 2013
go
al semantics
d
eNo
bl
a
Limited motif sizes
(Max 10)
Deciding what should
be a Macro
Macros in biological
workflows for instance is
new...how do we determine
what should be a macro?
58. Overcoming the blockades
hm
ri t
al
t ic
en
ly
an
Current Motif Detection
em
s
ew
Algorithm Limitations
N
go
al semantics
d
eNo
bl
a
Limited motif sizes
(Max 10)
om s
Macros n fr w
in electronic circuit
tio arelo product of
c
kf the
diagramsor
e le w
s
f
dyearsoof refinement.
e
rm rpus
fo co
in e
l l y rg
ica a la
Macros in biological
ist f
t
t a is o
Swhatsshould
Deciding
workflows for instance is
y
be a nal
a Macro
new...how do we determine
what should be a macro?
VIS 2013, 13th-18th October 2013
59. Summary
A
s0
A
s1
B
E
s2
B
s3
C
C
D
H
E
E
G
s4
F
New semantically enabled motif discovery algorithm
E
Statistically informed selection of macro candidates for use in biological workflow
visualizations
Automated macro image generation from inferred from algorithm states
Integration of final selections and utility to compress in ISAcreator tool for curators
and biologists alike
Open source - we want you to extend!
VIS 2013, 13th-18th October 2013
60. And yes.
Co-authors
Philippe Rocca-Serra
Susanna-Assunta Sansone
Jim Davies
Min Chen
It is open source!
Bye.
You can download this software
now!
Also
Alejandra Gonzalez Beltran
for many useful discussions
VIS 2013, 13th-18th October 2013
github.com/isa-tools/automacron