際際滷

際際滷Share a Scribd company logo
RNA 2nd
structure prediction
based on multiple alignments
RNA evolution

Homologous RNAs can have a common 2nd
structure
without sharing a significant sequence similarity

Mutations can lead to compensatory mutations to
maintain the base-paring complementarity
Comparative sequence analysis

In a structurally correct multiple alignment of RNAs,
conserved base pairs are often revealed by the
presence of frequent correlated compensatory
mutations

Measure sequence covariation: mutual information
 fXi
is the frequency of one of the five possible characters
observed in col i: four nucleotides + gap
 fXi,Xj
is the joint frequency of the pairs observed in columns i
and j
Mij = 
xi , x j
f xi , x j
log2
f xi , x j
f xi
f x j
Mutual information
G U C U G G A C
G A C U G G U C
G G C U G G C C
Mij = 
xi , x j
f xi , x j
log2
f xi , x j
f xi
f x j
M2,7 = 3(1
3
log2
1/ 3
1/ 9)= log2 3  1.59
 Mij
is maximum if i and j appear completely random but
are perfectly correlated

if i and j are uncorrelated, the mutual information is 0

if either i or j are highly conserved positions, we also get
little or no mutual information
Mutual information
 Mij
is maximum if i and j appear completely random but
are perfectly correlated

if i and j are uncorrelated, the mutual information is 0

if either i or j are highly conserved positions, we also get
little or no mutual information
Mij = 
xi , x j
f xi , x j
log2
f xi , x j
f xi
f x j
M2,7 = 4(1
4
log2
1 /4
1/16)= 2
M1,8 = log2
1
1
= 0
G U C U G G A C
G A C U G G U C
G G C U G G C C
G C C U G G G C
Comparative analysis

Start with a multiple alignment

Predict 2nd
structure base on alignment

Refine alignment based on 2nd
structure

Repeat

The sequences to be compared must be sufficiently:

similar that they can be initially aligned by primary
sequence

dissimilar that a number of co-varying substitutions can be
detected
Comparative analysis

How to build 2nd
structure based on alignment?

Greedy method
 choose the pair of columns that have the highest Mij

make a base pair
 carry on with the second highest Mij

Problem columns might end up in more than one base pair
Nussinov and alignments

Notations

aln the RNA alignment
 alnk
the kth
sequence in the alignment

aln[i, j] the RNA alignment from position i to j

str the best 2nd
structure for aln
(over alphabet {(, ), .})

str[i, j] the best2nd
structure for aln[i, j]

score[i, j] the number of base pairs in str[i, j]
 aln[i] 揃 aln[j] if for all k, alnk
[i] 揃 alnk
[j]
Nussinov and alignments

i unpaired and str[i+1, j]

j unpaired and str[i, j-1]

aln[i] 揃 aln[j] and str[i+1, j-1]

str[i, k] and str[k+1, j]
for some i < k < j
i ji+1
i jj-1
i ji+1 j-1
i jk k+1
Nussinov and alignments

Scoring base pairs

on one sequence + 1
 on an alignment + 1 + Mij

Base pairs between columns with high mutual
information are favoured

Other scoring schemes?
True
Nussinov on alignment
Nussinov single
From alignment structure
to sequence structure
A C G - - A A - U
. . . . .(1
(2
)2
)1
A C G A A U
. . . .(1
)1

More Related Content

AB-RNA-alignments-2011

  • 1. RNA 2nd structure prediction based on multiple alignments
  • 2. RNA evolution Homologous RNAs can have a common 2nd structure without sharing a significant sequence similarity Mutations can lead to compensatory mutations to maintain the base-paring complementarity
  • 3. Comparative sequence analysis In a structurally correct multiple alignment of RNAs, conserved base pairs are often revealed by the presence of frequent correlated compensatory mutations Measure sequence covariation: mutual information fXi is the frequency of one of the five possible characters observed in col i: four nucleotides + gap fXi,Xj is the joint frequency of the pairs observed in columns i and j Mij = xi , x j f xi , x j log2 f xi , x j f xi f x j
  • 4. Mutual information G U C U G G A C G A C U G G U C G G C U G G C C Mij = xi , x j f xi , x j log2 f xi , x j f xi f x j M2,7 = 3(1 3 log2 1/ 3 1/ 9)= log2 3 1.59 Mij is maximum if i and j appear completely random but are perfectly correlated if i and j are uncorrelated, the mutual information is 0 if either i or j are highly conserved positions, we also get little or no mutual information
  • 5. Mutual information Mij is maximum if i and j appear completely random but are perfectly correlated if i and j are uncorrelated, the mutual information is 0 if either i or j are highly conserved positions, we also get little or no mutual information Mij = xi , x j f xi , x j log2 f xi , x j f xi f x j M2,7 = 4(1 4 log2 1 /4 1/16)= 2 M1,8 = log2 1 1 = 0 G U C U G G A C G A C U G G U C G G C U G G C C G C C U G G G C
  • 6. Comparative analysis Start with a multiple alignment Predict 2nd structure base on alignment Refine alignment based on 2nd structure Repeat The sequences to be compared must be sufficiently: similar that they can be initially aligned by primary sequence dissimilar that a number of co-varying substitutions can be detected
  • 7. Comparative analysis How to build 2nd structure based on alignment? Greedy method choose the pair of columns that have the highest Mij make a base pair carry on with the second highest Mij Problem columns might end up in more than one base pair
  • 8. Nussinov and alignments Notations aln the RNA alignment alnk the kth sequence in the alignment aln[i, j] the RNA alignment from position i to j str the best 2nd structure for aln (over alphabet {(, ), .}) str[i, j] the best2nd structure for aln[i, j] score[i, j] the number of base pairs in str[i, j] aln[i] 揃 aln[j] if for all k, alnk [i] 揃 alnk [j]
  • 9. Nussinov and alignments i unpaired and str[i+1, j] j unpaired and str[i, j-1] aln[i] 揃 aln[j] and str[i+1, j-1] str[i, k] and str[k+1, j] for some i < k < j i ji+1 i jj-1 i ji+1 j-1 i jk k+1
  • 10. Nussinov and alignments Scoring base pairs on one sequence + 1 on an alignment + 1 + Mij Base pairs between columns with high mutual information are favoured Other scoring schemes?
  • 12. From alignment structure to sequence structure A C G - - A A - U . . . . .(1 (2 )2 )1 A C G A A U . . . .(1 )1