This document summarizes a case study analyzing and visualizing large collections of trees representing relationships within Chalcidoidea (Insecta: Hymenoptera). The study used sequences from 525 terminals and 2992 characters to generate 30,000 trees across 5 analyses. Tree sets were compared using operations like union and intersection in TreeZip. MrsRF was used to calculate Robinson-Foulds distances between trees and consensus trees, visualized with heatmaps. While inconsistencies emerged, the consensus tree agreed across analyses and recognized taxonomic groups were recovered.
1 of 17
More Related Content
Whs121
1. Analysis and visualization of
large collections of trees
A case study in Chalcidoidea (Insecta:
Hymenoptera)
Ana Dal Molin, Suzanne Matthews
James Munro, John Heraty, Jim Woolley
2. The tree space
The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses
Heuristics
3. The tree space
The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses
Heuristics
4. Case study
525 terminals
2992 characters, rDNA (18S and 28S D2-D5)
sequences
Structural alignment + MAFFT alignment of the
RAA's (EINSI)
5. Secondary structure is characterized by stems (paired bases) and loops
(unpaired bases): alignment
8. Problems
1. Growing data sets lead to growing number of trees,
sometimes too large to be compared by eye
2. Dozens of thousands of trees with hundreds of
terminals = really large files
Can I even load them?
3. Inconsistencies and polytomies in consensus trees:
Do we have rogue taxa?
Has the search run enough?
Do we have enough signal?
9. Methods
TNT, 5 seeds, unweighted parsimony
5 different seeds resulted in 30,000 trees, 20061
steps, CI=0.165, RI=0.62
Portability: TreeZip
Set operations: TreeZip
Comparison via matrices of RF distances: MrsRF
Heatmaps of the distance matrices plotted using R
10. 1. Portability and set operations
File size comparison
a print screen of file
structure
[hashing?]
Reference for details
11. Set Operations
All trees were unique in
every set
but
Union = 32,300 (unique)
trees, not 150,000
Intersection = 28,422
trees
Consensus.