The document discusses RNA secondary structure prediction based on multiple sequence alignments. It explains that homologous RNAs can share a common secondary structure without high sequence similarity due to compensatory mutations. Comparative sequence analysis can reveal conserved base pairs through frequent correlated compensatory mutations detected as high mutual information between columns in an alignment. The mutual information measure is described and secondary structure can be predicted through a greedy approach pairing columns with highest mutual information. Refining the alignment based on predicted structure can improve predictions.
2. RNA evolution
Homologous RNAs can have a common 2nd
structure
without sharing a significant sequence similarity
Mutations can lead to compensatory mutations to
maintain the base-paring complementarity
3. Comparative sequence analysis
In a structurally correct multiple alignment of RNAs,
conserved base pairs are often revealed by the
presence of frequent correlated compensatory
mutations
Measure sequence covariation: mutual information
fXi
is the frequency of one of the five possible characters
observed in col i: four nucleotides + gap
fXi,Xj
is the joint frequency of the pairs observed in columns i
and j
Mij =
xi , x j
f xi , x j
log2
f xi , x j
f xi
f x j
4. Mutual information
G U C U G G A C
G A C U G G U C
G G C U G G C C
Mij =
xi , x j
f xi , x j
log2
f xi , x j
f xi
f x j
M2,7 = 3(1
3
log2
1/ 3
1/ 9)= log2 3 1.59
Mij
is maximum if i and j appear completely random but
are perfectly correlated
if i and j are uncorrelated, the mutual information is 0
if either i or j are highly conserved positions, we also get
little or no mutual information
5. Mutual information
Mij
is maximum if i and j appear completely random but
are perfectly correlated
if i and j are uncorrelated, the mutual information is 0
if either i or j are highly conserved positions, we also get
little or no mutual information
Mij =
xi , x j
f xi , x j
log2
f xi , x j
f xi
f x j
M2,7 = 4(1
4
log2
1 /4
1/16)= 2
M1,8 = log2
1
1
= 0
G U C U G G A C
G A C U G G U C
G G C U G G C C
G C C U G G G C
6. Comparative analysis
Start with a multiple alignment
Predict 2nd
structure base on alignment
Refine alignment based on 2nd
structure
Repeat
The sequences to be compared must be sufficiently:
similar that they can be initially aligned by primary
sequence
dissimilar that a number of co-varying substitutions can be
detected
7. Comparative analysis
How to build 2nd
structure based on alignment?
Greedy method
choose the pair of columns that have the highest Mij
make a base pair
carry on with the second highest Mij
Problem columns might end up in more than one base pair
8. Nussinov and alignments
Notations
aln the RNA alignment
alnk
the kth
sequence in the alignment
aln[i, j] the RNA alignment from position i to j
str the best 2nd
structure for aln
(over alphabet {(, ), .})
str[i, j] the best2nd
structure for aln[i, j]
score[i, j] the number of base pairs in str[i, j]
aln[i] 揃 aln[j] if for all k, alnk
[i] 揃 alnk
[j]
9. Nussinov and alignments
i unpaired and str[i+1, j]
j unpaired and str[i, j-1]
aln[i] 揃 aln[j] and str[i+1, j-1]
str[i, k] and str[k+1, j]
for some i < k < j
i ji+1
i jj-1
i ji+1 j-1
i jk k+1
10. Nussinov and alignments
Scoring base pairs
on one sequence + 1
on an alignment + 1 + Mij
Base pairs between columns with high mutual
information are favoured
Other scoring schemes?