ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Hands on Exercises – Day1
Sucheta Tripathy, VBI
Genetics of being Phytophthora?
• Objective: Find a coding sequence that is unique
in Phytophthora.
• What is starting material?
– 16 million RNASeq reads are assembled into P.sojae
reference sequence to generate junctions. These
junctions are judged using some of the best available
algorithms.
• http://vmd.vbi.vt.edu/download/data/workshop
2010/
– Coverage.wig
– ps1V1.fasta
Transcript discovery
• Sort the coverage file on the basis of the
number of hits to the reads on column 4.
• Find the upper 25% percentile.
• Remove sequences larger than 1000 or less
than 10 bases long.
• Fetch data from ps1V1 file.
• Split fasta file into N equal parts.
Annotation Steps
• Blast against P.sojae gene
models(vmd.vbi.vt.edu/toolkit).
• Check coding potential with P.sojae codon usage
tables.
- If found hit, then get the gene model and compare
the splice sites and correct it.
- If not found, then blast against
P.ramorum/H.arabidopsidis/P.infestans coding
sequences.
- See if matches with the splice junctions correctly – if
not, the gene models in those organisms are
INCORRECT.
Annotation
• Blast against nr database. If blast hit is not
found with any coding sequences in nr
database, then most probably you found a
new gene..
• Check if the sequence is a signal
peptide/target peptide to determine if it is
secretory in nature.
• Run MEME motif analysis search on the
sequence.

More Related Content

Hands on exercise day1

  • 1. Hands on Exercises – Day1 Sucheta Tripathy, VBI
  • 2. Genetics of being Phytophthora? • Objective: Find a coding sequence that is unique in Phytophthora. • What is starting material? – 16 million RNASeq reads are assembled into P.sojae reference sequence to generate junctions. These junctions are judged using some of the best available algorithms. • http://vmd.vbi.vt.edu/download/data/workshop 2010/ – Coverage.wig – ps1V1.fasta
  • 3. Transcript discovery • Sort the coverage file on the basis of the number of hits to the reads on column 4. • Find the upper 25% percentile. • Remove sequences larger than 1000 or less than 10 bases long. • Fetch data from ps1V1 file. • Split fasta file into N equal parts.
  • 4. Annotation Steps • Blast against P.sojae gene models(vmd.vbi.vt.edu/toolkit). • Check coding potential with P.sojae codon usage tables. - If found hit, then get the gene model and compare the splice sites and correct it. - If not found, then blast against P.ramorum/H.arabidopsidis/P.infestans coding sequences. - See if matches with the splice junctions correctly – if not, the gene models in those organisms are INCORRECT.
  • 5. Annotation • Blast against nr database. If blast hit is not found with any coding sequences in nr database, then most probably you found a new gene.. • Check if the sequence is a signal peptide/target peptide to determine if it is secretory in nature. • Run MEME motif analysis search on the sequence.