際際滷

際際滷Share a Scribd company logo
Metagenomic+tools+for+the+
   fungal+community+
      Holly+Bik,+UC+Davis+
       19+October+2012+
hAp://phylosiE.wordpress.com+
Explicitly+PhylogeneLc+Approaches+
                   Aligned+                EvoluLonary+
                   environmental+          Placement+of+
                   sequences+              short+reads+




                                    ++++




     Guide+Tree+
We+provide:+
≒ Support+for+Paired+End+(raw)+Illumina+data+
≒ Marker+gene+data+for+Bacteria,+Archaea,+
   Eukaryotes,+Viruses+
≒ Taxonomy+assignments+based+on+probability+
   distribuLons+over+a+reference+phylogeny+
≒ Complement+to+exisLng+tools++QIIME/VAMPs+
   Inputs/outputs+will+be+compaLble+for+use+with+
     other+soEware+tools+
Markers+
≒   PMPROK++Dongying+Wus+Bac/Arch+markers+
≒   EukaryoLc+Orthologs++Parfrey+2011+paper+
≒   16S/18S+rRNA++
≒   Mitochondria+_+protein_coding+genes+
≒   Viral+Markers++Markov+clustering+on+genomes+
≒   Codon+Subtrees++鍖ner+scale+taxonomy+

≒ Extended+Markers++plasLds,+gene+families+
Reference+Marker+Genes+
Fungal ITS meeting presentation
壊沿艶界庄鍖e糸&#恰27;永禽&#恰27;界顎岳看鍖仰&#恰27;(艶.乙.&#恰27;99%)&#恰27;&#恰27;




                                                                                               Quan?ta?ve'metric'(minimum'
                                                                                                                                       Tree          Reconcile'NCBI'taxonomy'IDs'



The+Monkey++Build+Marker+Packages+
                                                                                               hamming'distance)'used'to'match'
                                                                                                                                   Reconciliation
   with'phylogene?c'topology'
                                                                                               edges'between'NCBI'taxon'tree'
                                                                                               and'molecular'phylogeny'

                                                                                                                                             Clean'and'package'new'marker'genes'


                                                                                                                                   Built Marker      New'marker'gene'packages'placed'into'
                                                                                                                                    Packages
        shared'PhyloSiS'marker'directory'

      Mapping'File'                        PD'            Alignment'File'                                                                    Execute'index'mode'
    (sequence'name,'NCBI'taxon'ID)'       cuto鍖'        (Marker'sequences'in'FASTA'format)'
                                                                                               Locally'indexed'marker'packages'
                                                                                               will'not'interfere'with'automa?c'
                                                                                                                                   Index Marker       Indexes'the'marker'databases'needed'
  NOTE:'New'marker'packages'are'                                                                                                                      for'LAST'and'Bow?e'
  named'according'to'input'鍖lenames'                  Execute'build_marker'mode'               updates'to'PhyloSiS'core'markers'     Database
  (e.g.'MarkerAlignment.fasta).'Core'
  marker'data'will'be'overwriXen'
  during'new'marker'builds'if'input'     hmmbuild          Create'pro鍖le'HMMs'(or'CMs'for'
  鍖les'do'not'have'unique'names'                           rRNA'data)'using'input'sequences'
                                         (ssu-build)
  compared'to'exis?ng'PhyloSiS'
  markers.'
                                                                                                                Built'PhyloSiS'Marker'package'
                                                Generate'unique'IDs'for'input'sequences'

                                                            Build'tree'and'collapse'                                  Tree'                          HMM'pro鍖le''
                                          FastTree
         topology'according'to'a'userM                                                             (CMs'for'rRNA)'
                                                            壊沿艶界庄鍖e糸&#恰27;永禽&#恰27;界顎岳看鍖仰&#恰27;(艶.乙.&#恰27;99%)&#恰27;&#恰27;

                                                                                                                                                  Representa?ve'
                                                                                                                Taxon'map'
  Quan?ta?ve'metric'(minimum'
                                                                                                                                                    sequences'
  hamming'distance)'used'to'match'
                                            Tree            Reconcile'NCBI'taxonomy'IDs'
                                        Reconciliation
     with'phylogene?c'topology'
  edges'between'NCBI'taxon'tree'
  and'molecular'phylogeny'                                                                                                          Alignment'
                                                   Clean'and'package'new'marker'genes'


                                        Built Marker       New'marker'gene'packages'placed'into'
                                         Packages
         shared'PhyloSiS'marker'directory'


                                                   Execute'index'mode'
  Locally'indexed'marker'packages'
  will'not'interfere'with'automa?c'
                                        Index Marker         Indexes'the'marker'databases'needed'
                                          Database
          for'LAST'and'Bow?e'
  updates'to'PhyloSiS'core'markers'




                   Built'PhyloSiS'Marker'package'
意鞄艶+悪温稼乙温姻看看++皆庄馨顎鉛温晦看稼+禽温岳温+
                      Genome&Directory&
         De鍖ne&the&number&of&&genomes&to&pick&(default&=&10)&and&number&of&
                    reads&to&generate&per&鍖le&(default&=&100,000)&


                                              Execute&sim&mode&
                                                      Determines&PD&contribuFons&for&taxa&
                                   PD on         present&in&concatenated&guide&tree&
                              concatenated tree
 in&PhyloSiH&marker&directory&



                                                      Two&separate&approaches&used:&
                                                      1. Select&some&number&of&taxa&that&contribute&
                                Select Taxa
              to&PD&(user&input,&default&=&10&taxa)&
                                                      2. Sample&taxa&uniformly&without&replacement&



                                Compute metrics       Calculated&metrics&include:&the&distance&to&
                               between target and     nearest&neighbors,&connecFng&branch&
                                 remaining taxa
      lengths,&and&the&number&of&sampled&nodes&
                                                      within&various&PD&units&of&connecFng&nodes.&



                                  Knockout            Work鍖ow&plugs&into&updateDB&to&
                                                      remove&genomes&which&have&been&used&
                                Swaths of Taxa
                                                      to&simulate&metagenome&data,&as&well&as&
                                                      a&swath&of&related&taxa.&


                                                Grinder&algorithm&randomly&generates&
                                  Generated
                                                reads&from&selected&genomes,&outputs&
                               Simulated Reads
 simulated&PEAIllumina&and&454&datasets&




                                                      A&new&marker&directory&is&created,&
                                 Simulation
                                                      where&simulated&genomes&have&been&
                               Marker Directory
      knocked&out&from&marker&packages.&&
禽京顎沿糸温岳艶++珂庄稼庄稼乙+稼艶敬+乙艶稼看馨艶壊+
                     EBI'                            Private'                   NCBI'                          JGI'
                   Genomes'                         Genomes'                  Genomes'                       Genomes'

                                                                                                         Execute'
                                                                                                         phylosi/_dbupdate.pl'
                                                              Run PhyloSift
                                                                (search + align)


                                                                                    Add'new'sequences'to'marker'packages'

                                                              Infer Updated
                                                                    Tree



                                                      Amino Acid               Nucleotide
                                                        Tree
                    Tree


                                                                                                      PD'metric'used'to'split'guide'tree'into'
                  A'taxa'set'is'selected'with'a'                                 Codon                smaller'subtrees;'subsets'of'taxa'are'
               maxPD'cuto鍖'of'0.02'and'a'new'        Prune Tree 
                                tree'is'inferred'                               Subtrees
             selected'such'that'no'branch'connecEng'
                                                                                                      them'has'length'>0.X'for'some'value'of'X'


     New'sequences'added'at'0.25'PD'for'amino'
         acid'tree;'higher'PD'threshold'enables'     Update reference
         more'aggressive'searches'of'reference'       sequences with
        database,'since'LAST'searching'is'faster'        new data
                         with'fewer'sequences.'


                                                                                          Reconcile'NCBI'taxonomy'IDs'with'
                                                                    Tree                  phylogeneEc'topologies,'for'both'
                                                                Reconciliation
           amino'acid'tree'and'codon'subtrees'




                                                                   Package
                                                                   Markers



                                                                   Automated 
            Users'local'marker'databases'are'automaEcally'
                                                                  Download to 
           scanned'each'Eme'PhyloSi/'is'run'and'any'new'
                                                                 PhyloSift Users
        updates'are'automaEcally'downloaded'if'available'
Tree+ReconciliaLon+in+PhyloSiE+



                      Environmental,   Named,
                      Sequences,       Taxa,
Fungal ITS meeting presentation
Great!,,




           Not,Bad,,




             Ge9ng,Tricky,,
Tree+Placement+
  Fat+Tree+_+Guppy+
Chemoautotrophic+
Marine+
              bacteria++oxidize+
Metagenome+
              ammonia+into+nitrite+




                                      Alveolate+ProLsts+




                                              Common+seawater+
                                              Archaea+
Tree+Placement+
  Tog+Tree+_+Guppy+
Marine+
Metagenome+
Marine+
               Metagenome+


Tree+Placement+
  Sing+Tree+_+Guppy+
Linking+with+the+Fungal+ITS+community+
≒ How+does+fungal+ITS+sequence+data+relate+to+your+
   project?+
    PhyloSiE+has+the+capability+to+add+any+marker+gene+
      reference+packages+that+are+relevant+for+speci鍖c+
      taxonomic+communiLes++
≒ What+fungal+ITS+data+does+your+project+currently+
   provide+
    None++but+we+do+mine+other+marker+genes+from+
      fungal+genomes+
≒ What+fungal+ITS+data+is+your+project+hoping+to+
   provide?+
    We+wouldnt+provide+data,+but+can+work+with+users+to+
      increase+support+for+fungal+analyses+
Linking+with+the+Fungal+ITS+community+
≒ Is+your+project+involved+with+curaLng+fungal+ITS+
   sequences+
    No,+but+we+would+curate+alignments+and+marker+
      packages+of+ITS+sequences+mined+from+public+
      databases+
≒ If+so,+what+curaLon+strategies+are+being+
   implemented+for+your+project?+
    Alignment+鍖ltering+and+masking,+pruning+reference+
      trees+
≒ What+tools+for+working+with+fungal+ITS+sequences+
   does+your+project+currently+provide?++
    None+so+far++but+can+be+implemented+if+given+a+
      reference+dataset+(e.g.+alignment)+
Linking+with+the+Fungal+ITS+community+
≒ What+tools+are+you+developing+/+planning+to+
   develop?++
   Current+focus+is+on+mulLsample+comparisons+
   Gene+tree+reconciliaLon+
   Probability+distribuLon+over+tree+topology+to+
     delimit+OTUs+(PhylogeneLc+OTUs)+
≒ What+framework+of+fungal+taxonomy+does+
   your+project+use?++
   NCBI_derived+taxonomy+(because+of+tree+
     mapping/reconciliaLon+issues)+
SATELLITE
                    MEETING 




Eukaryotic Metagenomics
              


      March/April 2013
         UC Davis
Acknowledgements+
UC+Davis+
≒ Jonathan+Eisen+
≒ Aaron+Darling+
≒ Guillaume+Jospin+
≒ Dongying+Wu+
≒ David+Coil+

+
PhyloSiE+SoEware+Development+on+Github:+
hAps://github.com/gjospin/PhyloSiE+
+
Google+Group+for+user+support:++
hAps://groups.google.com/d/forum/phylosiE+
+
TwiAer:+@PhyloSiE+

More Related Content

Fungal ITS meeting presentation

  • 1. Metagenomic+tools+for+the+ fungal+community+ Holly+Bik,+UC+Davis+ 19+October+2012+
  • 3. Explicitly+PhylogeneLc+Approaches+ Aligned+ EvoluLonary+ environmental+ Placement+of+ sequences+ short+reads+ ++++ Guide+Tree+
  • 4. We+provide:+ ≒ Support+for+Paired+End+(raw)+Illumina+data+ ≒ Marker+gene+data+for+Bacteria,+Archaea,+ Eukaryotes,+Viruses+ ≒ Taxonomy+assignments+based+on+probability+ distribuLons+over+a+reference+phylogeny+ ≒ Complement+to+exisLng+tools++QIIME/VAMPs+ Inputs/outputs+will+be+compaLble+for+use+with+ other+soEware+tools+
  • 5. Markers+ ≒ PMPROK++Dongying+Wus+Bac/Arch+markers+ ≒ EukaryoLc+Orthologs++Parfrey+2011+paper+ ≒ 16S/18S+rRNA++ ≒ Mitochondria+_+protein_coding+genes+ ≒ Viral+Markers++Markov+clustering+on+genomes+ ≒ Codon+Subtrees++鍖ner+scale+taxonomy+ ≒ Extended+Markers++plasLds,+gene+families+
  • 8. 壊沿艶界庄鍖e糸&#恰27;永禽&#恰27;界顎岳看鍖仰&#恰27;(艶.乙.&#恰27;99%)&#恰27;&#恰27; Quan?ta?ve'metric'(minimum' Tree Reconcile'NCBI'taxonomy'IDs' The+Monkey++Build+Marker+Packages+ hamming'distance)'used'to'match' Reconciliation with'phylogene?c'topology' edges'between'NCBI'taxon'tree' and'molecular'phylogeny' Clean'and'package'new'marker'genes' Built Marker New'marker'gene'packages'placed'into' Packages shared'PhyloSiS'marker'directory' Mapping'File' PD' Alignment'File' Execute'index'mode' (sequence'name,'NCBI'taxon'ID)' cuto鍖' (Marker'sequences'in'FASTA'format)' Locally'indexed'marker'packages' will'not'interfere'with'automa?c' Index Marker Indexes'the'marker'databases'needed' NOTE:'New'marker'packages'are' for'LAST'and'Bow?e' named'according'to'input'鍖lenames' Execute'build_marker'mode' updates'to'PhyloSiS'core'markers' Database (e.g.'MarkerAlignment.fasta).'Core' marker'data'will'be'overwriXen' during'new'marker'builds'if'input' hmmbuild Create'pro鍖le'HMMs'(or'CMs'for' 鍖les'do'not'have'unique'names' rRNA'data)'using'input'sequences' (ssu-build) compared'to'exis?ng'PhyloSiS' markers.' Built'PhyloSiS'Marker'package' Generate'unique'IDs'for'input'sequences' Build'tree'and'collapse' Tree' HMM'pro鍖le'' FastTree topology'according'to'a'userM (CMs'for'rRNA)' 壊沿艶界庄鍖e糸&#恰27;永禽&#恰27;界顎岳看鍖仰&#恰27;(艶.乙.&#恰27;99%)&#恰27;&#恰27; Representa?ve' Taxon'map' Quan?ta?ve'metric'(minimum' sequences' hamming'distance)'used'to'match' Tree Reconcile'NCBI'taxonomy'IDs' Reconciliation with'phylogene?c'topology' edges'between'NCBI'taxon'tree' and'molecular'phylogeny' Alignment' Clean'and'package'new'marker'genes' Built Marker New'marker'gene'packages'placed'into' Packages shared'PhyloSiS'marker'directory' Execute'index'mode' Locally'indexed'marker'packages' will'not'interfere'with'automa?c' Index Marker Indexes'the'marker'databases'needed' Database for'LAST'and'Bow?e' updates'to'PhyloSiS'core'markers' Built'PhyloSiS'Marker'package'
  • 9. 意鞄艶+悪温稼乙温姻看看++皆庄馨顎鉛温晦看稼+禽温岳温+ Genome&Directory& De鍖ne&the&number&of&&genomes&to&pick&(default&=&10)&and&number&of& reads&to&generate&per&鍖le&(default&=&100,000)& Execute&sim&mode& Determines&PD&contribuFons&for&taxa& PD on present&in&concatenated&guide&tree& concatenated tree in&PhyloSiH&marker&directory& Two&separate&approaches&used:& 1. Select&some&number&of&taxa&that&contribute& Select Taxa to&PD&(user&input,&default&=&10&taxa)& 2. Sample&taxa&uniformly&without&replacement& Compute metrics Calculated&metrics&include:&the&distance&to& between target and nearest&neighbors,&connecFng&branch& remaining taxa lengths,&and&the&number&of&sampled&nodes& within&various&PD&units&of&connecFng&nodes.& Knockout Work鍖ow&plugs&into&updateDB&to& remove&genomes&which&have&been&used& Swaths of Taxa to&simulate&metagenome&data,&as&well&as& a&swath&of&related&taxa.& Grinder&algorithm&randomly&generates& Generated reads&from&selected&genomes,&outputs& Simulated Reads simulated&PEAIllumina&and&454&datasets& A&new&marker&directory&is&created,& Simulation where&simulated&genomes&have&been& Marker Directory knocked&out&from&marker&packages.&&
  • 10. 禽京顎沿糸温岳艶++珂庄稼庄稼乙+稼艶敬+乙艶稼看馨艶壊+ EBI' Private' NCBI' JGI' Genomes' Genomes' Genomes' Genomes' Execute' phylosi/_dbupdate.pl' Run PhyloSift (search + align) Add'new'sequences'to'marker'packages' Infer Updated Tree Amino Acid Nucleotide Tree Tree PD'metric'used'to'split'guide'tree'into' A'taxa'set'is'selected'with'a' Codon smaller'subtrees;'subsets'of'taxa'are' maxPD'cuto鍖'of'0.02'and'a'new' Prune Tree tree'is'inferred' Subtrees selected'such'that'no'branch'connecEng' them'has'length'>0.X'for'some'value'of'X' New'sequences'added'at'0.25'PD'for'amino' acid'tree;'higher'PD'threshold'enables' Update reference more'aggressive'searches'of'reference' sequences with database,'since'LAST'searching'is'faster' new data with'fewer'sequences.' Reconcile'NCBI'taxonomy'IDs'with' Tree phylogeneEc'topologies,'for'both' Reconciliation amino'acid'tree'and'codon'subtrees' Package Markers Automated Users'local'marker'databases'are'automaEcally' Download to scanned'each'Eme'PhyloSi/'is'run'and'any'new' PhyloSift Users updates'are'automaEcally'downloaded'if'available'
  • 11. Tree+ReconciliaLon+in+PhyloSiE+ Environmental, Named, Sequences, Taxa,
  • 13. Great!,, Not,Bad,, Ge9ng,Tricky,,
  • 15. Chemoautotrophic+ Marine+ bacteria++oxidize+ Metagenome+ ammonia+into+nitrite+ Alveolate+ProLsts+ Common+seawater+ Archaea+
  • 18. Marine+ Metagenome+ Tree+Placement+ Sing+Tree+_+Guppy+
  • 19. Linking+with+the+Fungal+ITS+community+ ≒ How+does+fungal+ITS+sequence+data+relate+to+your+ project?+ PhyloSiE+has+the+capability+to+add+any+marker+gene+ reference+packages+that+are+relevant+for+speci鍖c+ taxonomic+communiLes++ ≒ What+fungal+ITS+data+does+your+project+currently+ provide+ None++but+we+do+mine+other+marker+genes+from+ fungal+genomes+ ≒ What+fungal+ITS+data+is+your+project+hoping+to+ provide?+ We+wouldnt+provide+data,+but+can+work+with+users+to+ increase+support+for+fungal+analyses+
  • 20. Linking+with+the+Fungal+ITS+community+ ≒ Is+your+project+involved+with+curaLng+fungal+ITS+ sequences+ No,+but+we+would+curate+alignments+and+marker+ packages+of+ITS+sequences+mined+from+public+ databases+ ≒ If+so,+what+curaLon+strategies+are+being+ implemented+for+your+project?+ Alignment+鍖ltering+and+masking,+pruning+reference+ trees+ ≒ What+tools+for+working+with+fungal+ITS+sequences+ does+your+project+currently+provide?++ None+so+far++but+can+be+implemented+if+given+a+ reference+dataset+(e.g.+alignment)+
  • 21. Linking+with+the+Fungal+ITS+community+ ≒ What+tools+are+you+developing+/+planning+to+ develop?++ Current+focus+is+on+mulLsample+comparisons+ Gene+tree+reconciliaLon+ Probability+distribuLon+over+tree+topology+to+ delimit+OTUs+(PhylogeneLc+OTUs)+ ≒ What+framework+of+fungal+taxonomy+does+ your+project+use?++ NCBI_derived+taxonomy+(because+of+tree+ mapping/reconciliaLon+issues)+
  • 22. SATELLITE MEETING Eukaryotic Metagenomics March/April 2013 UC Davis
  • 23. Acknowledgements+ UC+Davis+ ≒ Jonathan+Eisen+ ≒ Aaron+Darling+ ≒ Guillaume+Jospin+ ≒ Dongying+Wu+ ≒ David+Coil+ + PhyloSiE+SoEware+Development+on+Github:+ hAps://github.com/gjospin/PhyloSiE+ + Google+Group+for+user+support:++ hAps://groups.google.com/d/forum/phylosiE+ + TwiAer:+@PhyloSiE+