際際滷

際際滷Share a Scribd company logo
Analysis and visualization of
large collections of trees
A case study in Chalcidoidea (Insecta:
Hymenoptera)
Ana Dal Molin, Suzanne Matthews
James Munro, John Heraty, Jim Woolley
The tree space






The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses
Heuristics
The tree space






The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses
Heuristics
Case study





525 terminals
2992 characters, rDNA (18S and 28S D2-D5)
sequences
Structural alignment + MAFFT alignment of the
RAA's (EINSI)
Secondary structure is characterized by stems (paired bases) and loops
(unpaired bases): alignment
Case study: symptoms


Inconsistencies across repeated analyses



Spurious relationships



Why?
low support
results highly
sensitive to the
method used
recognized tribes
and subfamilies
group, but not in a
plausible place
Problems
1. Growing data sets lead to growing number of trees,
sometimes too large to be compared by eye
2. Dozens of thousands of trees with hundreds of
terminals = really large files


Can I even load them?

3. Inconsistencies and polytomies in consensus trees:


Do we have rogue taxa?



Has the search run enough?



Do we have enough signal?
Methods



TNT, 5 seeds, unweighted parsimony
5 different seeds resulted in 30,000 trees, 20061
steps, CI=0.165, RI=0.62



Portability: TreeZip



Set operations: TreeZip



Comparison via matrices of RF distances: MrsRF


Heatmaps of the distance matrices plotted using R
1. Portability and set operations
File size comparison

 a print screen of file
structure
 [hashing?]
 Reference for details
Set Operations
 All trees were unique in
every set
but
 Union = 32,300 (unique)
trees, not 150,000
 Intersection = 28,422
trees
 Consensus.
2. Comparisons: MrsRF


5x5 heatmap
Large heatmap
Distance between consensus trees =
0
Color strict consensus tree
More Information
Acknowledgements

More Related Content

Whs121