際際滷

際際滷Share a Scribd company logo
RNA 2nd structure prediction
based on multiple alignments
RNA evolution
 Homologous RNAs can have a common 2nd structure
without sharing a significant sequence similarity
 Mutations can lead to compensatory mutations to maintain
the base-paring complementarity
Comparative sequence analysis
 In a structurally correct multiple alignment of RNAs,
conserved base pairs are often revealed by the presence of
frequent correlated compensatory mutations
 Measure sequence covariation: mutual information
 is the frequency of one of the four bases observed in col I
 is the joint frequency of the base pairs observed in
columns i and j
 = 
 , 
 , 
log2
 , 
 
  
 
 ,
Covariance method
G U C U U C G G A C
G A C U U C G G U C
G G C U U C G G C C 2,9 = 3 
1
3
 log2
1/3
1/9
= log23  1.59
 = 
 , 
 , 
log2
 , 
 
  
 Mij varies between 0 and 2
 Mij is 2 when i and j appear completely random but are perfectly
correlated
 if i and j are uncorrelated, the mutual information is 0
 if either i or j are highly conserved positions, we also get little or
no mutual information
 Mij is 2 when i and j appear completely random but are perfectly
correlated
 if i and j are uncorrelated, the mutual information is 0
 if either i or j are highly conserved positions, we also get little or
no mutual information
Covariance method
G U C U U C G G A C
G A C U U C G G U C
G G C U U C G G C C
G C C U U C G G G C
1,9 = 4 
1
4
 log2
1/4
1/4
= 0
 = 
 , 
 , 
log2
 , 
 
  
2,9 = 4 
1
4
 log2
1/4
1/16
= 2
Comparative analysis
 Start with a multiple alignment
 Predict 2nd structure base on alignment
 Refine alignment based on 2nd structure
 Repeat
 The sequences to be compared must be sufficiently:
 similar that they can be initially aligned by primary sequence
 dissimilar that a number of covarying substitutions can be
detected
Comparative analysis
 How to build 2nd structure based on alignment?
 Greedy method
 choose the pair of columns that have the highest Mij
 make a base pairs
 carry on with the second highest Mij
 problem columns might end up in more than one base pair
SCFGs and RNA alignments
 An SCFG could be modified to generate columns of
alignments instead of nucleotides
 Requires a fixed number of sequences in the alignment
 Instead, change it to generate the structure!
  .   .


竜
        
      
    
    

竜
SCFGs and RNA alignments
 How to determine the probability of a structure for a given
sequence?
 A C G U C G U C
 ( ( ( . ) ) ) .
 Use CYK to calculate the maximum probability of a
structure for a given sequence...
  .   .   .   .  .  .  . .
SCFGs and RNA alignments
 Use a phylogenetic tree (including branch lengths) to:
 determine the probability of a column to be single
 determine the probability of two columns to form a base pair
 Use the SCFG and the columns probability to determine the
best secondary structure for the alignment
 CYK and the other SCFGs algorithms are basically the same
SCFGs and RNA alignments
Knudsen&Hein 1999

More Related Content

AB-RNA-alignments-2010

  • 1. RNA 2nd structure prediction based on multiple alignments
  • 2. RNA evolution Homologous RNAs can have a common 2nd structure without sharing a significant sequence similarity Mutations can lead to compensatory mutations to maintain the base-paring complementarity
  • 3. Comparative sequence analysis In a structurally correct multiple alignment of RNAs, conserved base pairs are often revealed by the presence of frequent correlated compensatory mutations Measure sequence covariation: mutual information is the frequency of one of the four bases observed in col I is the joint frequency of the base pairs observed in columns i and j = , , log2 , ,
  • 4. Covariance method G U C U U C G G A C G A C U U C G G U C G G C U U C G G C C 2,9 = 3 1 3 log2 1/3 1/9 = log23 1.59 = , , log2 , Mij varies between 0 and 2 Mij is 2 when i and j appear completely random but are perfectly correlated if i and j are uncorrelated, the mutual information is 0 if either i or j are highly conserved positions, we also get little or no mutual information
  • 5. Mij is 2 when i and j appear completely random but are perfectly correlated if i and j are uncorrelated, the mutual information is 0 if either i or j are highly conserved positions, we also get little or no mutual information Covariance method G U C U U C G G A C G A C U U C G G U C G G C U U C G G C C G C C U U C G G G C 1,9 = 4 1 4 log2 1/4 1/4 = 0 = , , log2 , 2,9 = 4 1 4 log2 1/4 1/16 = 2
  • 6. Comparative analysis Start with a multiple alignment Predict 2nd structure base on alignment Refine alignment based on 2nd structure Repeat The sequences to be compared must be sufficiently: similar that they can be initially aligned by primary sequence dissimilar that a number of covarying substitutions can be detected
  • 7. Comparative analysis How to build 2nd structure based on alignment? Greedy method choose the pair of columns that have the highest Mij make a base pairs carry on with the second highest Mij problem columns might end up in more than one base pair
  • 8. SCFGs and RNA alignments An SCFG could be modified to generate columns of alignments instead of nucleotides Requires a fixed number of sequences in the alignment Instead, change it to generate the structure! . . 竜 竜
  • 9. SCFGs and RNA alignments How to determine the probability of a structure for a given sequence? A C G U C G U C ( ( ( . ) ) ) . Use CYK to calculate the maximum probability of a structure for a given sequence... . . . . . . . .
  • 10. SCFGs and RNA alignments Use a phylogenetic tree (including branch lengths) to: determine the probability of a column to be single determine the probability of two columns to form a base pair Use the SCFG and the columns probability to determine the best secondary structure for the alignment CYK and the other SCFGs algorithms are basically the same
  • 11. SCFGs and RNA alignments Knudsen&Hein 1999