This document provides instructions for hands-on genetics exercises to identify coding sequences unique to Phytophthora. Students will analyze RNA-Seq reads assembled into the P. sojae reference sequence to discover transcripts. They will then annotate the sequences by blasting against gene models and databases to determine if they represent new genes or correct errors in other organism's gene models. The goal is to find a coding sequence that is unique in Phytophthora.
2. Genetics of being Phytophthora?
• Objective: Find a coding sequence that is unique
in Phytophthora.
• What is starting material?
– 16 million RNASeq reads are assembled into P.sojae
reference sequence to generate junctions. These
junctions are judged using some of the best available
algorithms.
• http://vmd.vbi.vt.edu/download/data/workshop
2010/
– Coverage.wig
– ps1V1.fasta
3. Transcript discovery
• Sort the coverage file on the basis of the
number of hits to the reads on column 4.
• Find the upper 25% percentile.
• Remove sequences larger than 1000 or less
than 10 bases long.
• Fetch data from ps1V1 file.
• Split fasta file into N equal parts.
4. Annotation Steps
• Blast against P.sojae gene
models(vmd.vbi.vt.edu/toolkit).
• Check coding potential with P.sojae codon usage
tables.
- If found hit, then get the gene model and compare
the splice sites and correct it.
- If not found, then blast against
P.ramorum/H.arabidopsidis/P.infestans coding
sequences.
- See if matches with the splice junctions correctly – if
not, the gene models in those organisms are
INCORRECT.
5. Annotation
• Blast against nr database. If blast hit is not
found with any coding sequences in nr
database, then most probably you found a
new gene..
• Check if the sequence is a signal
peptide/target peptide to determine if it is
secretory in nature.
• Run MEME motif analysis search on the
sequence.