ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Visual Compression of Workflow Visualizations with
Automated Detection of Macro Motifs

Eamonn Maguire, Philippe Rocca-Serra, Susanna-Assunta Sansone, Jim Davies
and Min Chen
University of Oxford e-Research Centre
University of Oxford Department of Computer Science

VIS 2013, 13th-18th October 2013
Some terminology
Workflow

Literally a flow of work showing the
processes enacted from start to finish in
say business processes, software
execution, analysis procedures, or in our
case, biological experiments.
They are used to enable reproducibility.

Motif
Commonly observed subgraphs
Very commonly seen used in:
biology - protein-protein
interaction, transcription/regulation
networks; chemistry; and even
visualization (e.g. VisComplete)

e.g VisTrails in our VIS community - 40,000 downloads

Macro
D

A single instruction that expands
automatically in to a more complex
set of instructions.

VIS 2013, 13th-18th October 2013

E

Q

Q

D

Q

E

Q
Roadmap

VIS 2013, 13th-18th October 2013
Roadmap

Workflow
VIS 2013, 13th-18th October 2013

Automatically
Detect Motifs

Substitute motifs with
¡®macros¡¯
Blockades

VIS 2013, 13th-18th October 2013
Blockades

No semantics

Current Motif Detection
Algorithm Limitations

VIS 2013, 13th-18th October 2013

Limited motif sizes
(Max 10)
Blockades

Macros in electronic circuit
diagrams are the product of
years of refinement.

No semantics

Current Motif Detection
Algorithm Limitations

VIS 2013, 13th-18th October 2013

Limited motif sizes
(Max 10)

Deciding what should
be a Macro

Macros in biological
workflows for instance is
new...how do we determine
what should be a macro?
Example case

Biology

VIS 2013, 13th-18th October 2013
Extension on Previous Work

Taxonomy-based Glyph Design
Visualizing (ISA based) workflows of
biological experiments
Maguire et al, 2012
IEEE TVCG

VIS 2013, 13th-18th October 2013
A Typical Biological Experiment

Hypothesis

VIS 2013, 13th-18th October 2013

Experiment

Analysis

Results
&
Paper
Representing an Experiment - Workflows!
Source name
Sampling Protocol
Sample name
Chemical Label
Labeling Protocol

Describe the flow of work from a
biological sample to the data file.
Workflow varies between technologies,
but there is a large commonality in steps.

Labeled Extract
Hybridisation Protocol
Assay Name
Scanning Protocol
Raw Data File
Feature Extraction Protocol

For example, the labeling step is very
common in DNA microarray experiments.

Processed Data File

VIS 2013, 13th-18th October 2013

u
od
pr
e
R

bi
ci

y!
lit
Our Process

1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

VIS 2013, 13th-18th October 2013

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

Branch & Merge

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
Our Process

1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

VIS 2013, 13th-18th October 2013

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

Branch & Merge

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
Workflow Repository

9,670 Biological Experiment Workflows
Why such a large number?
We can statistically make suggestions to users about what motifs
can be macros based on a number of metrics (detailed later)
+ we can robustly test our algorithm performance across a huge
cross section of experiments...

VIS 2013, 13th-18th October 2013
1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

VIS 2013, 13th-18th October 2013

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

Branch & Merge

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
A
A

s0
1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

2.87
WORKFLOWS

476

3276

F

n

E

...

...

B

DOMAIN EXPERT

COMPRESSION

1092

C

B

E

OCCURRENCE

s1

2.87

2.4

OCCURRENCE

-2.43
OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

476

3276

600

240

2400

s2

C

SELECTED MACROS

s3

D
H
E

E

G
DOMAIN EXPERT

s4

F

Branch & Merge

Branch & Merge

Branch & Merge
Branch & Merge

C
BIOLOGICAL
WORKFLOW
REPOSITORY

MOTIF EXTRACTION
ALGORITHM

MOTIFS

RANKING
ALGORITHM

MACRO SELECTION
VIA UI

E

MACRO
SELECTION

GLYPH DESIGN

Motif Extraction Algorithm

VIS 2013, 13th-18th October 2013

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
The Current Weaknesses
FANMOD, mFinder etc.
No semantics (edge or node)
Small node limit normally <10
Imagine n-grams with no
information other than topology
e.g. bi-grams of DNA ¡®motifs¡¯ where instead
of A-T, T-C, T-G > x-x, x-x, x-x

VIS 2013, 13th-18th October 2013
The Problem...Current Motif Extraction Algorithms

Unable to infer function

Unable to produce a macro

What¡¯s up?
We can¡¯t infer function
from these results
Ah, and you can¡¯t
have macros without
function...
Exactly!

VIS 2013, 13th-18th October 2013
Solution
A

s0

A

s1

B

E

s2

B

s3

C

D
H

E

E

C
E

G

s4

a holding state, with
a pseudo-

F

a star

state

a tr
that
generates a
a tr
does not
generate a

a normal state, with
a ¡®¡¯legal¡¯¡¯

G

s4

a normal state, with
a ¡®¡¯legal¡¯¡¯

a holding state, with
a pseudo-

F

a star

state

a tr
that
generates a
a tr
does not
generate a

VIS 2013, 13th-18th October 2013

More detail about each individual case, A-H available in paper.
Solution
A

s0

A

s1

B

E

s2

B

s3

C

D
H

E

E

C
E

G

s4

a holding state, with
a pseudo-

F

a star

state

a tr
that
generates a
a tr
does not
generate a

a normal state, with
a ¡®¡¯legal¡¯¡¯

G

s4

a normal state, with
a ¡®¡¯legal¡¯¡¯

a holding state, with
a pseudo-

F

a star

state

3

a tr
that
generates a
a tr
does not
generate a

VIS 2013, 13th-18th October 2013

More detail about each individual case, A-H available in paper.
Resulting In...

VIS 2013, 13th-18th October 2013

From our algorithm, running over 9,670 workflows, we
retrieved ~12,000 motifs up to depth 12
Resulting In...

From our algorithm, running over 9,670 workflows, we
retrieved ~12,000 motifs up to depth 12

Semantically aware
Limited by depth, not node count - we
have motifs with > 80 nodes

Essentially, more complicated topologically sensitive n-grams
VIS 2013, 13th-18th October 2013
1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

VIS 2013, 13th-18th October 2013

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

Branch & Merge

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
1

2.87
OCCURRENCE

1

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

WORKFLOWS

476

3276

F

E

...

3276

...

DOMAIN EXPERT

COMPRESSION

1092

n

476

2.87
OCCURRENCE

C

COMPRESSION

1092
A

WORKFLOWS

...

-2.43

n

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

2.87

MOTIF EXTRACTION
ALGORITHM

MOTIFS

RANKING
ALGORITHM

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

476

3276

600

240

2400

-2.43

Branch & Merge

Branch & Merge

SELECTED MACROS

Branch & Merge

WORKFLOWS

20

10

MACRO SELECTION
VIA UI

COMPRESSION

200

MACRO
SELECTION

GLYPH DESIGN

Ranking Algorithm
...because 12,000 is just too much.

VIS 2013, 13th-18th October 2013

Branch & Merge

2.4

OCCURRENCE

BIOLOGICAL
WORKFLOW
REPOSITORY

DOMAIN EXPERT

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
Ranking Algorithm

1,043

M1 - Occurrences in data
repository

VIS 2013, 13th-18th October 2013

...

640

M2 -Workflow Presence

M3 -Compression
Potention
Ranking Algorithm

1,043

M1 - Occurrences in data
repository

VIS 2013, 13th-18th October 2013

...

640

M2 -Workflow Presence

M3 -Compression
Potention
Ranking Algorithm

1,043

M1 - Occurrences in data
repository

VIS 2013, 13th-18th October 2013

...

640

M2 -Workflow Presence

M3 -Compression
Potention
Ranking Algorithm

1,043

M1 - Occurrences in data
repository

...

640

M2 -Workflow Presence

M3 -Compression
Potention

For At, Aw and Ac, we map it to a fixed range [?1, 1] using a linear mapping based on the min-max
range of each indicator, yielding three normalized metrics M1 , M2 and M3

No algorithm would be complete without a weighting element. So each metric can be weighted. We
use a default weight of 1.

VIS 2013, 13th-18th October 2013
Ranking Algorithm
Motifs arranged
by depth

3 Normalized metrics

Filter by
min/max depth

Filter by pattern presence
Linear, branching and merging

Motif subgraph 3 Glyph representations

VIS 2013, 13th-18th October 2013

Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D
Ranking Algorithm
Motifs arranged
by depth

3 Normalized metrics

Filter by
min/max depth

Filter by pattern presence
Linear, branching and merging

Motif subgraph 3 Glyph representations

VIS 2013, 13th-18th October 2013

Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D

Score

Occurrences Workflow Compression
presence
Potential
Ranking Algorithm
Motifs arranged
by depth

Filter by
min/max depth

Filter by pattern presence
Linear, branching and merging

Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D

Score

Occurrences Workflow Compression
presence
Potential

Downgrade
Icon
Adjusted
Score

3 Normalized metrics

Motif subgraph 3 Glyph representations

VIS 2013, 13th-18th October 2013
Ranking Algorithm
Motifs arranged
by depth

Filter by
min/max depth

Filter by pattern presence
Linear, branching and merging

Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D

Score

Occurrences Workflow Compression
presence
Potential

1000

Downgrade
Icon
Adjusted
Score

3 Normalized metrics

Motif subgraph 3 Glyph representations

VIS 2013, 13th-18th October 2013
Ranking Algorithm
Motifs arranged
by depth

Filter by
min/max depth

Filter by pattern presence
Linear, branching and merging

Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D

Score

Occurrences Workflow Compression
presence
Potential

1000

Subset of

1200

3 Normalized metrics

Motif subgraph 3 Glyph representations

VIS 2013, 13th-18th October 2013

Downgrade
Icon
Adjusted
Score
Ranking Algorithm
Motifs arranged
by depth

Filter by
min/max depth

Filter by pattern presence
Linear, branching and merging

Depth 6 motifs with
magnified view in B
and detailed popup of
selected motif in D

Score

Occurrences Workflow Compression
presence
Potential

1000

Subset of

1200

3 Normalized metrics

200

Motif subgraph 3 Glyph representations

VIS 2013, 13th-18th October 2013

Downgrade
Icon
Adjusted
Score
1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

VIS 2013, 13th-18th October 2013

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

Branch & Merge

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

Glyph Design

VIS 2013, 13th-18th October 2013

Branch & Merge

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
Glyph Design
Density
Annotation
Topology/structure
within a macro
Node type

Things we¡¯d like to see...

VIS 2013, 13th-18th October 2013
Glyph Design
Breadth
Topology

overall

Length
Node type
annotation

colour

Breadth
Node type

Length

colour/shape

annotation

Topology
arrangement

Breadth
Length

Node type
colour/shape

annotation
VIS 2013, 13th-18th October 2013

Topology
arrangement
STATE-TRANSITION MODEL

EXAMPLES

A

s0

A

s1

B

E

s2

s0
s

B

s3

C

D
H
E

E

G
F

s4

C
E

Breadth
Topology

overall

Length
Node type
annotation

colour

Breadth
Node type

Length

colour/shape

annotation

Topology
arrangement

Breadth
Length

Node type
colour/shape

annotation

Topology
arrangement

s0

A

s1

A

s1

s1

B

s3

C

s3

A

s1

s1

E

s4

F

s4

s4

G

s1
STATE-TRANSITION MODEL

EXAMPLES

A

s0

A

s1

B

E

s2

s0
s

B

s3

C

D
H
E

E

G
F

s4

C
E

Breadth
Topology

overall

Length
Node type
annotation

colour

Breadth
Node type

Length

colour/shape

annotation

Topology
arrangement

Breadth
Length

Node type
colour/shape

annotation

Topology
arrangement

s0

A

s1

A

s1

s1

B

s3

C

s3

A

s1

s1

E

s4

F

s4

s4

G

s1
1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

MOTIF EXTRACTION
ALGORITHM

VIS 2013, 13th-18th October 2013

...

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

Branch & Merge

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

GLYPH DESIGN

MACRO
ANNOTATION

MACRO INSERTION
IN GRAPH
Branch & Merge

1

A

s0

A

s1

B

E

s2

B

s3

C

D
H
E

E

G

s4

OCCURRENCE

WORKFLOWS

476

3276

F

n

E

...

MOTIF EXTRACTION
ALGORITHM

MOTIFS

DOMAIN EXPERT

COMPRESSION

1092

C

BIOLOGICAL
WORKFLOW
REPOSITORY

2.87

...

2.87

DOMAIN EXPERT

2.4

OCCURRENCE

WORKFLOWS

COMPRESSION

OCCURRENCE

WORKFLOWS

COMPRESSION

1092

-2.43

476

3276

600

240

2400
Branch & Merge

OCCURRENCE

WORKFLOWS

COMPRESSION

20

10

200

RANKING
ALGORITHM

Branch & Merge

SELECTED MACROS

Branch & Merge

MACRO SELECTION
VIA UI

MACRO
SELECTION

Branch & Merge

GLYPH DESIGN

MACRO
ANNOTATION

Macro Insertion for Workflow Compression

VIS 2013, 13th-18th October 2013

Branch & Merge

MACRO INSERTION
IN GRAPH
Macro Insertion for Workflow Compression

VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression

A

VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression

B

A

VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
C

B

A

VIS 2013, 13th-18th October 2013
Macro Insertion for Workflow Compression
C

B

A

VIS 2013, 13th-18th October 2013

D
Evaluation

User Testing

VIS 2013, 13th-18th October 2013

Performance
Evaluation

VIS 2013, 13th-18th October 2013
Evaluation

VIS 2013, 13th-18th October 2013
Evaluation

VIS 2013, 13th-18th October 2013
Evaluation

VIS 2013, 13th-18th October 2013
Community Dissemination

VIS 2013, 13th-18th October 2013
Dissemination of macros to community
B

Automacron API available as an OSGi plugin for ISAcreator
VIS 2013, 13th-18th October 2013
Roadmap

Workflow
VIS 2013, 13th-18th October 2013

Automatically
Detect Motifs

Substitute motifs with
¡®macros¡¯
Overcoming the blockades

Macros in electronic circuit
diagrams are the product of
years of refinement.

No semantics

Current Motif Detection
Algorithm Limitations

VIS 2013, 13th-18th October 2013

Limited motif sizes
(Max 10)

Deciding what should
be a Macro

Macros in biological
workflows for instance is
new...how do we determine
what should be a macro?
Overcoming the blockades

Macros in electronic circuit
diagrams are the product of
years of refinement.

hm
ri t

al
t ic

en
ly

an
Current Motif Detection
em
s
ew
Algorithm Limitations
N

VIS 2013, 13th-18th October 2013

go
al semantics
d
eNo
bl
a

Limited motif sizes
(Max 10)

Deciding what should
be a Macro

Macros in biological
workflows for instance is
new...how do we determine
what should be a macro?
Overcoming the blockades

hm
ri t

al
t ic

en
ly

an
Current Motif Detection
em
s
ew
Algorithm Limitations
N

go
al semantics
d
eNo
bl
a

Limited motif sizes
(Max 10)

om s
Macros n fr w
in electronic circuit
tio arelo product of
c
kf the
diagramsor
e le w
s
f
dyearsoof refinement.
e
rm rpus
fo co
in e
l l y rg
ica a la
Macros in biological
ist f
t
t a is o
Swhatsshould
Deciding
workflows for instance is
y
be a nal
a Macro
new...how do we determine

what should be a macro?

VIS 2013, 13th-18th October 2013
Summary
A

s0

A

s1

B

E

s2

B

s3

C
C

D
H
E

E

G

s4

F

New semantically enabled motif discovery algorithm

E

Statistically informed selection of macro candidates for use in biological workflow
visualizations
Automated macro image generation from inferred from algorithm states
Integration of final selections and utility to compress in ISAcreator tool for curators
and biologists alike
Open source - we want you to extend!

VIS 2013, 13th-18th October 2013
And yes.

Co-authors
Philippe Rocca-Serra
Susanna-Assunta Sansone
Jim Davies
Min Chen

It is open source!

Bye.
You can download this software
now!

Also
Alejandra Gonzalez Beltran
for many useful discussions

VIS 2013, 13th-18th October 2013

github.com/isa-tools/automacron

More Related Content

Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs