際際滷

際際滷Share a Scribd company logo
Protein structure prediction
Prediction in bioinformatics
Important prediction problems:
Protein sequence from genomic DNA.
Protein 3D structure from sequence.
Protein function from structure.
Protein function from sequence.
From DNA to Cell Function
DNA sequence
(split into genes)
AminoAcid
Sequence
Protein
3D
Structure
Protein
Function
Cell
Activity
codes for
folds into
dictates determines
has
MNIFEMLRID EGLRLKIYKD TEGYYTIGIG
HLLTKSPSLN AAKSELDKAI GRNCNGVITK
DEAEKLFNQD VDAAVRGILR NAKLKPVYDS
LDAVRRCALI NMVFQMGETG VAGFTNSLRM
LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI
TTFRTGTWDA YKNL
?
Protein structure: Limitations
 Not all proteins or parts of proteins assume a well-defined
3D structure in solution.
 Protein structure is not static, there are various degrees of
thermal motion for different parts of the structure.
 There may be a number of slightly different
conformations in solution.
 Some proteins undergo conformational changes when
interacting with certain substances.
 Expected best residue-by-residue accuracies for secondary
structure prediction from multiple protein sequence
alignment.
 To address detailed functional biological questions.
Experimental Protein Structure Determination
 X-ray crystallography
 the most advanced method available for obtaining high-resolution
structural information about biological macromolecules
 in vitro
 needs crystals
 ~$100-200K per structure
 NMR
 fairly accurate
 in vivo
 no need for crystals
 limited to very small proteins
 Cryo-electron-microscopy
 imaging technology
 low resolution
Why predict protein structure?
 Over millions known sequences, 1,25,309 known structures.
 Structural knowledge brings understanding of function and
mechanism of action.
 Predicted structures can be used in structure-based drug design.
 It can help us understand the effects of mutations on structure and
function.
 To analyze sequence structure gap.
 Can help in prediction of function.
 It is a very interesting scientific problem-50 years effort.
 Prediction in one dimension
 Secondary structure prediction
 Surface accessibility prediction
 Historically first structure prediction methods predicted
secondary structure.
 Can be used to improve alignment accuracy.
 Can be used to detect domain boundaries within proteins
with remote sequence homology.
 Often the first step towards 3D structure prediction.
 Informative for mutagenesis studies.
Secondary structure prediction
Predicting Secondary Structure From Primary Structure
 accuracy 64-75%.
 higher accuracy for a-helices than for b-sheets.
 accuracy is dependent on protein family.
 predictions of engineered (artificial) proteins are less accurate.
Assumptions
 The entire information for forming secondary structure is contained
in the primary sequence.
 Side groups of residues will determine structure.
 Examining windows of 13-17 residues is sufficient to predict secondary
structure .
-留-helices 540 residues long
-硫-strands 510 residues long
Why Secondary Structure Prediction?
 Simply easier problem than 3D structure prediction.
 Accurate secondary structure prediction can be an important
information for the tertiary structure prediction.
 Improving alignment accuracy.
 Protein function prediction.
 Protein classification.
Protein structure prediction
 The inference of the three-dimensional structure of
a protein from its amino acid sequence.
 i.e. the prediction of its folding and its secondary and tertiary
structure from its primary structure.
 Structure prediction is fundamentally different from the
inverse problem of protein design.
 Protein structure prediction is one of the most important
goals pursued by bioinformatics and theoretical chemistry.
 It is highly important in medicine (in drug design)
and biotechnology (in the design of novel enzymes).
Methods of structure prediction
Ab initio protein folding approaches
Comparative (homology) modelling
Fold recognition/threading
History of protein secondary structure prediction
First generation
Based on single residue statistics.
Example: Chou-Fasman method, LIM method, GOR I, etc
Accuracy: low
Secondary generation
Based on segment statistics.
Examples: ALB method, GOR III, etc
Accuracy: ~60%
Third generation
Based on long-range interaction, homology based
Examples: PHD
Accuracy: ~70%
First generation methods:
single residue statistics
Chou & Fasman (1974 & 1978) :
 Some residues have particular secondary-structure preferences.
 Based on experimental frequencies of residues in -helices, -sheets,
and coils.
Examples: Glu 留-helix
Val 硫-strand
 Accuracy ~50 - 60% Q3
Chou-Fasman statistics
 R  amino acid, S- secondary structure
 f(R,S)  number of occurrences of R in S
 Ns  total number of amino acids in conformation S
 N  total number of amino acids
 P(R,S)  propensity of amino acid R to be in structure S
 P(R,S) = (f(R,S)/f(R))/(Ns/N)
Example
 #residues=20,000,
 #helix=4,000,
 #Ala=2,000,
 #Ala in helix=500
 f(Ala, ) = 500/20,000,
留
 f(Ala) = 2,000/20,000
 p( ) = / =4,000/20,000
留 留 
 P = (500/2000) / (4,000/20000) = 1.25
Second generation methods: segment statistics
 Similar to single-residue methods, but incorporating
additional information (adjacent residues, segmental
statistics).
 Problems:
 Low accuracy - Q3 below 66% (results).
 Q3 of -strands (E) : 28% - 48%.
 Predicted structures were too short.
The GOR method
 Developed by Garnier, Osguthorpe & Robson
 Build on Chou-Fasman Pij values
 Evaluate each residue PLUS adjacent 8 N-terminal and 8
carboxyl-terminal residues
 Sliding window of 17 residues.
 underpredicts b-strand regions
 GOR method accuracy Q3 = ~64%
Third generation methods
 Third generation methods reached 77% accuracy.
 They consist of two new ideas:
1. A biological idea 
Using evolutionary information based on
conservation analysis of multiple sequence
alignments.
2. A technological idea 
Using neural networks.
Artificial Neural Networks
An attempt to imitate the human brain (assuming that
this is the way it works).
Neural network models
- machine learning approach
- provide training sets of structures (e.g. a-helices, non
a -helices)
- computers are trained to recognize patterns in known
secondary structures
- provide test set (proteins with known structures)
- accuracy ~ 70 75%
Correlation coefficient
True positive
p留
False positive
(overpredicted)
o留
True negative
n留
False negative
(underpredicted)
u留
])
][
][
[
]
([ 











 o
p
u
p
o
n
u
n
o
u
n
p
C 







Ca = 1 (=100%)
Reasons for improved accuracy
 Align sequence with other related proteins of the
same protein family.
 Find members that has a known structure.
 If significant matches between structure and sequence
assign secondary structures to corresponding
residues.
New and Improved Third-Generation Methods
Exploit evolutionary information. Based on conservation
analysis of multiple sequence alignments.
 PHD (Q3 ~ 70%)
Rost B, Sander, C. (1993) J. Mol. Biol. 232, 584-599.
 PSIPRED (Q3 ~ 77%)
Jones, D. T. (1999) J. Mol. Biol. 292, 195-202.
Arguably remains the top secondary structure prediction method.
Secondary Structure Prediction Summary
1st Generation - 1970s
 Q3 = 50-55%
 Chou & Fausman, GOR
2nd Generation -1980s
 Q3 = 60-65%
 Qian & Sejnowski, GORIII
3rd Generation - 1990s
 Q3 = 70-80%
 PhD, PSIPRED
Many 3rd+ generation methods exist:
PSI-PRED - http://bioinf.cs.ucl.ac.uk/psipred/
JPRED - http://www.compbio.dundee.ac.uk/~www-jpred/
PHD -
http://www.embl-heidelberg.de/predictprotein/predictprotein.html
NNPRED - http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html
Protein 3D structure data
The structure of a protein consists of the 3D (X,Y,Z) coordinates of each
non-hydrogen atom of the protein.
Some protein structure also include coordinates of covalently linked
prosthetic groups, non-covalently linked ligand molecules, or metal ions.
For some purposes (e.g. structural alignment) only the C留 coordinates are
needed.
Example of PDB format: X Y Z occupancy / temp.
ATOM 18 N GLY 27 40.315 161.004 11.211 1.00 10.11
ATOM 19 CA GLY 27 39.049 160.737 10.462 1.00 14.18
ATOM 20 C GLY 27 38.729 159.239 10.784 1.00 20.75
ATOM 21 O GLY 27 39.507 158.484 11.404 1.00 21.88
Note: the PDB format provides no information about connectivity between
atoms. The last two numbers (occupancy, temperature factor) relate to
disorders of atomic positions in crystals.
protein structure prediction in bioinformatics.ppt
Building a protein structure model from X-ray data
Building a protein structure model from NMR data
Computing the energy for a given protein structure (conformation)
Energy minimization: Finding the structure with the minimal energy according
to some empirical force fields.
Simulating the protein folding process (molecular dynamics)
Structure visualization
Structure visualization
Computing secondary structure from atomic coordinates
Protein superposition, structural alignment
Protein superposition, structural alignment
Protein fold classification
Protein fold classification
Threading: finding a fold (prototype structure) that fits to a sequence
Threading: finding a fold (prototype structure) that fits to a sequence
Docking: fitting ligands onto a protein surface by molecular dynamics or energy
minimization
Protein 3D structure prediction from sequence
Protein 3D structure prediction from sequence
Protein structure: Some computational tasks
Protein structure: Some computational tasks
Viewing protein structures
When looking at a protein structure, we may ask the following types of
questions:
 Is a particular residue on the inside or outside of a protein?
 Which amino acids interact with each other?
 Which amino acids are in contact with a ligand (DNA, peptide
hormone, small molecule, etc.)?
 Is an observed mutation likely to disturb the protein structure?
Standard capabilities of protein structure software:
 Display of protein structures in different ways (wireframe, backbone,
sticks, spacefill, ribbon.
 Highlighting of individual atoms, residues or groups of residues
 Calculation of interatomic distances
 Advanced feature: Superposition of related structures
Example: c-abl oncoprotein SH2 domain, display wireframe
Example: c-abl oncoprotein SH2 domain, display sticks
Example: c-abl oncoprotein SH2 domain, display backbone
Example: c-abl oncoprotein SH2 domain, display spacefill
Example: c-abl oncoprotein SH2 domain, display ribbons

More Related Content

Similar to protein structure prediction in bioinformatics.ppt (20)

Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
Robin Gutell
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
Joel Ricci-L坦pez
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
Saramita De Chakravarti
Drug discovery presentation
Drug discovery presentationDrug discovery presentation
Drug discovery presentation
Theertha Raveendran
protein Modeling Abi.pptx
protein Modeling Abi.pptxprotein Modeling Abi.pptx
protein Modeling Abi.pptx
MuhammadRizwan863722
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
Shikha Popali
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Melissa Moody
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
Samvartika Majumdar
Powerpoint
PowerpointPowerpoint
Powerpoint
butest
Modelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural BiologyModelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural Biology
Antonio E. Serrano
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
AliAhamd7
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
AliAhamd7
Homology modeling
Homology modelingHomology modeling
Homology modeling
Malla Reddy College of Pharmacy
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
BITS
A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...
pfermat
Applications of NMR in Protein Structure Prediction.pptx
Applications of NMR in Protein Structure Prediction.pptxApplications of NMR in Protein Structure Prediction.pptx
Applications of NMR in Protein Structure Prediction.pptx
Anagha R Anil
Protein struc pred-Ab initio and other methods as a short introduction.ppt
Protein struc pred-Ab initio and other methods as a short introduction.pptProtein struc pred-Ab initio and other methods as a short introduction.ppt
Protein struc pred-Ab initio and other methods as a short introduction.ppt
60BT119YAZHINIK
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
Subin E K
Homology modelling
Homology modellingHomology modelling
Homology modelling
Ayesha Choudhury
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
Robin Gutell
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
Joel Ricci-L坦pez
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
Saramita De Chakravarti
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
Shikha Popali
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Melissa Moody
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
Samvartika Majumdar
Powerpoint
PowerpointPowerpoint
Powerpoint
butest
Modelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural BiologyModelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural Biology
Antonio E. Serrano
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
AliAhamd7
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
AliAhamd7
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
BITS
A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...A family of global protein shape descriptors using gauss integrals, christian...
A family of global protein shape descriptors using gauss integrals, christian...
pfermat
Applications of NMR in Protein Structure Prediction.pptx
Applications of NMR in Protein Structure Prediction.pptxApplications of NMR in Protein Structure Prediction.pptx
Applications of NMR in Protein Structure Prediction.pptx
Anagha R Anil
Protein struc pred-Ab initio and other methods as a short introduction.ppt
Protein struc pred-Ab initio and other methods as a short introduction.pptProtein struc pred-Ab initio and other methods as a short introduction.ppt
Protein struc pred-Ab initio and other methods as a short introduction.ppt
60BT119YAZHINIK
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
Subin E K

More from DrSudha2 (11)

distance based phylogenetics-methodology
distance based phylogenetics-methodologydistance based phylogenetics-methodology
distance based phylogenetics-methodology
DrSudha2
Laboratory techniques in immunology-ag-ab complex
Laboratory techniques in immunology-ag-ab complexLaboratory techniques in immunology-ag-ab complex
Laboratory techniques in immunology-ag-ab complex
DrSudha2
Phylogenetic tree analysis-Rooted and unrooted
Phylogenetic tree analysis-Rooted and unrootedPhylogenetic tree analysis-Rooted and unrooted
Phylogenetic tree analysis-Rooted and unrooted
DrSudha2
Primary and secondary lymphoid organs.ppt
Primary and secondary lymphoid organs.pptPrimary and secondary lymphoid organs.ppt
Primary and secondary lymphoid organs.ppt
DrSudha2
Dissacharides and polysaccharides notes.ppt
Dissacharides and polysaccharides notes.pptDissacharides and polysaccharides notes.ppt
Dissacharides and polysaccharides notes.ppt
DrSudha2
Characteristic features of swiss-prot-Protein database ot
Characteristic features of swiss-prot-Protein database         otCharacteristic features of swiss-prot-Protein database         ot
Characteristic features of swiss-prot-Protein database ot
DrSudha2
Complement System-Properties and functions
Complement System-Properties and functionsComplement System-Properties and functions
Complement System-Properties and functions
DrSudha2
5-structure and functoins of carbohydrates.ppt
5-structure and functoins of carbohydrates.ppt5-structure and functoins of carbohydrates.ppt
5-structure and functoins of carbohydrates.ppt
DrSudha2
biochemistry- unit1-carbohydrates-structure and functions
biochemistry- unit1-carbohydrates-structure and functionsbiochemistry- unit1-carbohydrates-structure and functions
biochemistry- unit1-carbohydrates-structure and functions
DrSudha2
Protein-Structure and its classification
Protein-Structure and its classificationProtein-Structure and its classification
Protein-Structure and its classification
DrSudha2
fundamentals of ecology and its importance
fundamentals of ecology and its importancefundamentals of ecology and its importance
fundamentals of ecology and its importance
DrSudha2
distance based phylogenetics-methodology
distance based phylogenetics-methodologydistance based phylogenetics-methodology
distance based phylogenetics-methodology
DrSudha2
Laboratory techniques in immunology-ag-ab complex
Laboratory techniques in immunology-ag-ab complexLaboratory techniques in immunology-ag-ab complex
Laboratory techniques in immunology-ag-ab complex
DrSudha2
Phylogenetic tree analysis-Rooted and unrooted
Phylogenetic tree analysis-Rooted and unrootedPhylogenetic tree analysis-Rooted and unrooted
Phylogenetic tree analysis-Rooted and unrooted
DrSudha2
Primary and secondary lymphoid organs.ppt
Primary and secondary lymphoid organs.pptPrimary and secondary lymphoid organs.ppt
Primary and secondary lymphoid organs.ppt
DrSudha2
Dissacharides and polysaccharides notes.ppt
Dissacharides and polysaccharides notes.pptDissacharides and polysaccharides notes.ppt
Dissacharides and polysaccharides notes.ppt
DrSudha2
Characteristic features of swiss-prot-Protein database ot
Characteristic features of swiss-prot-Protein database         otCharacteristic features of swiss-prot-Protein database         ot
Characteristic features of swiss-prot-Protein database ot
DrSudha2
Complement System-Properties and functions
Complement System-Properties and functionsComplement System-Properties and functions
Complement System-Properties and functions
DrSudha2
5-structure and functoins of carbohydrates.ppt
5-structure and functoins of carbohydrates.ppt5-structure and functoins of carbohydrates.ppt
5-structure and functoins of carbohydrates.ppt
DrSudha2
biochemistry- unit1-carbohydrates-structure and functions
biochemistry- unit1-carbohydrates-structure and functionsbiochemistry- unit1-carbohydrates-structure and functions
biochemistry- unit1-carbohydrates-structure and functions
DrSudha2
Protein-Structure and its classification
Protein-Structure and its classificationProtein-Structure and its classification
Protein-Structure and its classification
DrSudha2
fundamentals of ecology and its importance
fundamentals of ecology and its importancefundamentals of ecology and its importance
fundamentals of ecology and its importance
DrSudha2

Recently uploaded (20)

Unit II_Classical methods of Analysis_PPT.pdf
Unit II_Classical methods of Analysis_PPT.pdfUnit II_Classical methods of Analysis_PPT.pdf
Unit II_Classical methods of Analysis_PPT.pdf
daya667887
Presentation on Lavender, Plant biochemistry
Presentation on Lavender, Plant biochemistryPresentation on Lavender, Plant biochemistry
Presentation on Lavender, Plant biochemistry
SarahAshfaqKhan
Class 6 ICSE biology (BIO THE CELL) NOTES
Class 6 ICSE biology (BIO THE CELL) NOTESClass 6 ICSE biology (BIO THE CELL) NOTES
Class 6 ICSE biology (BIO THE CELL) NOTES
kj347012
Different Strategies in Scientific Publishing
Different Strategies in Scientific PublishingDifferent Strategies in Scientific Publishing
Different Strategies in Scientific Publishing
Carlos Baquero
Climate change consequences and preventives measure.pptx
Climate change consequences and preventives measure.pptxClimate change consequences and preventives measure.pptx
Climate change consequences and preventives measure.pptx
AwmaPachuau
EDIC Old Exames Q 3.pdfs fefeegh5uyttbtrr
EDIC Old Exames Q 3.pdfs fefeegh5uyttbtrrEDIC Old Exames Q 3.pdfs fefeegh5uyttbtrr
EDIC Old Exames Q 3.pdfs fefeegh5uyttbtrr
EmanEssa14
Mutation and its types (Point, Silent, Mis sense and Non sense mutations)
Mutation and its types (Point, Silent, Mis sense and Non sense mutations)Mutation and its types (Point, Silent, Mis sense and Non sense mutations)
Mutation and its types (Point, Silent, Mis sense and Non sense mutations)
Anoja Kurian
Lesson-0-Review about Atoms and Elements.pptx
Lesson-0-Review about Atoms and Elements.pptxLesson-0-Review about Atoms and Elements.pptx
Lesson-0-Review about Atoms and Elements.pptx
Grade12Research
The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...
The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...
The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...
S辿rgio Sacani
Isotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptx
Isotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptxIsotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptx
Isotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptx
NarcisoJimenezlll
Vaccines: types, preparations, efficacies and recent developments.pptx
Vaccines: types, preparations, efficacies and recent developments.pptxVaccines: types, preparations, efficacies and recent developments.pptx
Vaccines: types, preparations, efficacies and recent developments.pptx
krishna moorthy
Fibrous Proteins .pptx (Biochemistry , Microbiology )
Fibrous Proteins .pptx (Biochemistry , Microbiology )Fibrous Proteins .pptx (Biochemistry , Microbiology )
Fibrous Proteins .pptx (Biochemistry , Microbiology )
Vasim Patel
Play whole.in children and adults..en.pdf
Play whole.in children and adults..en.pdfPlay whole.in children and adults..en.pdf
Play whole.in children and adults..en.pdf
mhmahmodian
Actinobacterium Producing Antimicrobials Against Drug-Resistant Bacteria
Actinobacterium Producing Antimicrobials Against Drug-Resistant BacteriaActinobacterium Producing Antimicrobials Against Drug-Resistant Bacteria
Actinobacterium Producing Antimicrobials Against Drug-Resistant Bacteria
Abdulmajid Almasabi
MUTATION AND GENETIC DRIFT &NATURAL SELECTION
MUTATION AND GENETIC DRIFT &NATURAL SELECTIONMUTATION AND GENETIC DRIFT &NATURAL SELECTION
MUTATION AND GENETIC DRIFT &NATURAL SELECTION
nilahefx
Data and Computing Infrastructure for the Life Sciences
Data and Computing Infrastructure for the Life SciencesData and Computing Infrastructure for the Life Sciences
Data and Computing Infrastructure for the Life Sciences
Chris Dwan
Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...
Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...
Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...
S辿rgio Sacani
Comic Strip Hb, do you take O2 as your wife.pdf
Comic Strip Hb, do you take O2 as your wife.pdfComic Strip Hb, do you take O2 as your wife.pdf
Comic Strip Hb, do you take O2 as your wife.pdf
nampa1
Vaccine Delivery : Strategies & Future
Vaccine Delivery :  Strategies &  FutureVaccine Delivery :  Strategies &  Future
Vaccine Delivery : Strategies & Future
LubdhaBadgujar
case presentation on LRTI,SEPTIS with MODS
case presentation on LRTI,SEPTIS with MODScase presentation on LRTI,SEPTIS with MODS
case presentation on LRTI,SEPTIS with MODS
nukeshpandey5678
Unit II_Classical methods of Analysis_PPT.pdf
Unit II_Classical methods of Analysis_PPT.pdfUnit II_Classical methods of Analysis_PPT.pdf
Unit II_Classical methods of Analysis_PPT.pdf
daya667887
Presentation on Lavender, Plant biochemistry
Presentation on Lavender, Plant biochemistryPresentation on Lavender, Plant biochemistry
Presentation on Lavender, Plant biochemistry
SarahAshfaqKhan
Class 6 ICSE biology (BIO THE CELL) NOTES
Class 6 ICSE biology (BIO THE CELL) NOTESClass 6 ICSE biology (BIO THE CELL) NOTES
Class 6 ICSE biology (BIO THE CELL) NOTES
kj347012
Different Strategies in Scientific Publishing
Different Strategies in Scientific PublishingDifferent Strategies in Scientific Publishing
Different Strategies in Scientific Publishing
Carlos Baquero
Climate change consequences and preventives measure.pptx
Climate change consequences and preventives measure.pptxClimate change consequences and preventives measure.pptx
Climate change consequences and preventives measure.pptx
AwmaPachuau
EDIC Old Exames Q 3.pdfs fefeegh5uyttbtrr
EDIC Old Exames Q 3.pdfs fefeegh5uyttbtrrEDIC Old Exames Q 3.pdfs fefeegh5uyttbtrr
EDIC Old Exames Q 3.pdfs fefeegh5uyttbtrr
EmanEssa14
Mutation and its types (Point, Silent, Mis sense and Non sense mutations)
Mutation and its types (Point, Silent, Mis sense and Non sense mutations)Mutation and its types (Point, Silent, Mis sense and Non sense mutations)
Mutation and its types (Point, Silent, Mis sense and Non sense mutations)
Anoja Kurian
Lesson-0-Review about Atoms and Elements.pptx
Lesson-0-Review about Atoms and Elements.pptxLesson-0-Review about Atoms and Elements.pptx
Lesson-0-Review about Atoms and Elements.pptx
Grade12Research
The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...
The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...
The JWST-NIRCamViewofSagittarius C. II. Evidence for Magnetically Dominated H...
S辿rgio Sacani
Isotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptx
Isotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptxIsotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptx
Isotopes-Chemistry-Presentation-in-a-Fun-Colorful-Style.pptx
NarcisoJimenezlll
Vaccines: types, preparations, efficacies and recent developments.pptx
Vaccines: types, preparations, efficacies and recent developments.pptxVaccines: types, preparations, efficacies and recent developments.pptx
Vaccines: types, preparations, efficacies and recent developments.pptx
krishna moorthy
Fibrous Proteins .pptx (Biochemistry , Microbiology )
Fibrous Proteins .pptx (Biochemistry , Microbiology )Fibrous Proteins .pptx (Biochemistry , Microbiology )
Fibrous Proteins .pptx (Biochemistry , Microbiology )
Vasim Patel
Play whole.in children and adults..en.pdf
Play whole.in children and adults..en.pdfPlay whole.in children and adults..en.pdf
Play whole.in children and adults..en.pdf
mhmahmodian
Actinobacterium Producing Antimicrobials Against Drug-Resistant Bacteria
Actinobacterium Producing Antimicrobials Against Drug-Resistant BacteriaActinobacterium Producing Antimicrobials Against Drug-Resistant Bacteria
Actinobacterium Producing Antimicrobials Against Drug-Resistant Bacteria
Abdulmajid Almasabi
MUTATION AND GENETIC DRIFT &NATURAL SELECTION
MUTATION AND GENETIC DRIFT &NATURAL SELECTIONMUTATION AND GENETIC DRIFT &NATURAL SELECTION
MUTATION AND GENETIC DRIFT &NATURAL SELECTION
nilahefx
Data and Computing Infrastructure for the Life Sciences
Data and Computing Infrastructure for the Life SciencesData and Computing Infrastructure for the Life Sciences
Data and Computing Infrastructure for the Life Sciences
Chris Dwan
Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...
Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...
Fading Light, Fierce Winds: JWST Snapshot of a Sub-Eddington Quasar at Cosmic...
S辿rgio Sacani
Comic Strip Hb, do you take O2 as your wife.pdf
Comic Strip Hb, do you take O2 as your wife.pdfComic Strip Hb, do you take O2 as your wife.pdf
Comic Strip Hb, do you take O2 as your wife.pdf
nampa1
Vaccine Delivery : Strategies & Future
Vaccine Delivery :  Strategies &  FutureVaccine Delivery :  Strategies &  Future
Vaccine Delivery : Strategies & Future
LubdhaBadgujar
case presentation on LRTI,SEPTIS with MODS
case presentation on LRTI,SEPTIS with MODScase presentation on LRTI,SEPTIS with MODS
case presentation on LRTI,SEPTIS with MODS
nukeshpandey5678

protein structure prediction in bioinformatics.ppt

  • 2. Prediction in bioinformatics Important prediction problems: Protein sequence from genomic DNA. Protein 3D structure from sequence. Protein function from structure. Protein function from sequence.
  • 3. From DNA to Cell Function DNA sequence (split into genes) AminoAcid Sequence Protein 3D Structure Protein Function Cell Activity codes for folds into dictates determines has MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL ?
  • 4. Protein structure: Limitations Not all proteins or parts of proteins assume a well-defined 3D structure in solution. Protein structure is not static, there are various degrees of thermal motion for different parts of the structure. There may be a number of slightly different conformations in solution. Some proteins undergo conformational changes when interacting with certain substances. Expected best residue-by-residue accuracies for secondary structure prediction from multiple protein sequence alignment. To address detailed functional biological questions.
  • 5. Experimental Protein Structure Determination X-ray crystallography the most advanced method available for obtaining high-resolution structural information about biological macromolecules in vitro needs crystals ~$100-200K per structure NMR fairly accurate in vivo no need for crystals limited to very small proteins Cryo-electron-microscopy imaging technology low resolution
  • 6. Why predict protein structure? Over millions known sequences, 1,25,309 known structures. Structural knowledge brings understanding of function and mechanism of action. Predicted structures can be used in structure-based drug design. It can help us understand the effects of mutations on structure and function. To analyze sequence structure gap. Can help in prediction of function. It is a very interesting scientific problem-50 years effort. Prediction in one dimension Secondary structure prediction Surface accessibility prediction
  • 7. Historically first structure prediction methods predicted secondary structure. Can be used to improve alignment accuracy. Can be used to detect domain boundaries within proteins with remote sequence homology. Often the first step towards 3D structure prediction. Informative for mutagenesis studies. Secondary structure prediction
  • 8. Predicting Secondary Structure From Primary Structure accuracy 64-75%. higher accuracy for a-helices than for b-sheets. accuracy is dependent on protein family. predictions of engineered (artificial) proteins are less accurate. Assumptions The entire information for forming secondary structure is contained in the primary sequence. Side groups of residues will determine structure. Examining windows of 13-17 residues is sufficient to predict secondary structure . -留-helices 540 residues long -硫-strands 510 residues long
  • 9. Why Secondary Structure Prediction? Simply easier problem than 3D structure prediction. Accurate secondary structure prediction can be an important information for the tertiary structure prediction. Improving alignment accuracy. Protein function prediction. Protein classification.
  • 10. Protein structure prediction The inference of the three-dimensional structure of a protein from its amino acid sequence. i.e. the prediction of its folding and its secondary and tertiary structure from its primary structure. Structure prediction is fundamentally different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry. It is highly important in medicine (in drug design) and biotechnology (in the design of novel enzymes).
  • 11. Methods of structure prediction Ab initio protein folding approaches Comparative (homology) modelling Fold recognition/threading
  • 12. History of protein secondary structure prediction First generation Based on single residue statistics. Example: Chou-Fasman method, LIM method, GOR I, etc Accuracy: low Secondary generation Based on segment statistics. Examples: ALB method, GOR III, etc Accuracy: ~60% Third generation Based on long-range interaction, homology based Examples: PHD Accuracy: ~70%
  • 13. First generation methods: single residue statistics Chou & Fasman (1974 & 1978) : Some residues have particular secondary-structure preferences. Based on experimental frequencies of residues in -helices, -sheets, and coils. Examples: Glu 留-helix Val 硫-strand Accuracy ~50 - 60% Q3
  • 14. Chou-Fasman statistics R amino acid, S- secondary structure f(R,S) number of occurrences of R in S Ns total number of amino acids in conformation S N total number of amino acids P(R,S) propensity of amino acid R to be in structure S P(R,S) = (f(R,S)/f(R))/(Ns/N)
  • 15. Example #residues=20,000, #helix=4,000, #Ala=2,000, #Ala in helix=500 f(Ala, ) = 500/20,000, 留 f(Ala) = 2,000/20,000 p( ) = / =4,000/20,000 留 留 P = (500/2000) / (4,000/20000) = 1.25
  • 16. Second generation methods: segment statistics Similar to single-residue methods, but incorporating additional information (adjacent residues, segmental statistics). Problems: Low accuracy - Q3 below 66% (results). Q3 of -strands (E) : 28% - 48%. Predicted structures were too short.
  • 17. The GOR method Developed by Garnier, Osguthorpe & Robson Build on Chou-Fasman Pij values Evaluate each residue PLUS adjacent 8 N-terminal and 8 carboxyl-terminal residues Sliding window of 17 residues. underpredicts b-strand regions GOR method accuracy Q3 = ~64%
  • 18. Third generation methods Third generation methods reached 77% accuracy. They consist of two new ideas: 1. A biological idea Using evolutionary information based on conservation analysis of multiple sequence alignments. 2. A technological idea Using neural networks.
  • 19. Artificial Neural Networks An attempt to imitate the human brain (assuming that this is the way it works).
  • 20. Neural network models - machine learning approach - provide training sets of structures (e.g. a-helices, non a -helices) - computers are trained to recognize patterns in known secondary structures - provide test set (proteins with known structures) - accuracy ~ 70 75%
  • 21. Correlation coefficient True positive p留 False positive (overpredicted) o留 True negative n留 False negative (underpredicted) u留 ]) ][ ][ [ ] ([ o p u p o n u n o u n p C Ca = 1 (=100%)
  • 22. Reasons for improved accuracy Align sequence with other related proteins of the same protein family. Find members that has a known structure. If significant matches between structure and sequence assign secondary structures to corresponding residues.
  • 23. New and Improved Third-Generation Methods Exploit evolutionary information. Based on conservation analysis of multiple sequence alignments. PHD (Q3 ~ 70%) Rost B, Sander, C. (1993) J. Mol. Biol. 232, 584-599. PSIPRED (Q3 ~ 77%) Jones, D. T. (1999) J. Mol. Biol. 292, 195-202. Arguably remains the top secondary structure prediction method.
  • 24. Secondary Structure Prediction Summary 1st Generation - 1970s Q3 = 50-55% Chou & Fausman, GOR 2nd Generation -1980s Q3 = 60-65% Qian & Sejnowski, GORIII 3rd Generation - 1990s Q3 = 70-80% PhD, PSIPRED Many 3rd+ generation methods exist: PSI-PRED - http://bioinf.cs.ucl.ac.uk/psipred/ JPRED - http://www.compbio.dundee.ac.uk/~www-jpred/ PHD - http://www.embl-heidelberg.de/predictprotein/predictprotein.html NNPRED - http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html
  • 25. Protein 3D structure data The structure of a protein consists of the 3D (X,Y,Z) coordinates of each non-hydrogen atom of the protein. Some protein structure also include coordinates of covalently linked prosthetic groups, non-covalently linked ligand molecules, or metal ions. For some purposes (e.g. structural alignment) only the C留 coordinates are needed. Example of PDB format: X Y Z occupancy / temp. ATOM 18 N GLY 27 40.315 161.004 11.211 1.00 10.11 ATOM 19 CA GLY 27 39.049 160.737 10.462 1.00 14.18 ATOM 20 C GLY 27 38.729 159.239 10.784 1.00 20.75 ATOM 21 O GLY 27 39.507 158.484 11.404 1.00 21.88 Note: the PDB format provides no information about connectivity between atoms. The last two numbers (occupancy, temperature factor) relate to disorders of atomic positions in crystals.
  • 27. Building a protein structure model from X-ray data Building a protein structure model from NMR data Computing the energy for a given protein structure (conformation) Energy minimization: Finding the structure with the minimal energy according to some empirical force fields. Simulating the protein folding process (molecular dynamics) Structure visualization Structure visualization Computing secondary structure from atomic coordinates Protein superposition, structural alignment Protein superposition, structural alignment Protein fold classification Protein fold classification Threading: finding a fold (prototype structure) that fits to a sequence Threading: finding a fold (prototype structure) that fits to a sequence Docking: fitting ligands onto a protein surface by molecular dynamics or energy minimization Protein 3D structure prediction from sequence Protein 3D structure prediction from sequence Protein structure: Some computational tasks Protein structure: Some computational tasks
  • 28. Viewing protein structures When looking at a protein structure, we may ask the following types of questions: Is a particular residue on the inside or outside of a protein? Which amino acids interact with each other? Which amino acids are in contact with a ligand (DNA, peptide hormone, small molecule, etc.)? Is an observed mutation likely to disturb the protein structure? Standard capabilities of protein structure software: Display of protein structures in different ways (wireframe, backbone, sticks, spacefill, ribbon. Highlighting of individual atoms, residues or groups of residues Calculation of interatomic distances Advanced feature: Superposition of related structures
  • 29. Example: c-abl oncoprotein SH2 domain, display wireframe
  • 30. Example: c-abl oncoprotein SH2 domain, display sticks
  • 31. Example: c-abl oncoprotein SH2 domain, display backbone
  • 32. Example: c-abl oncoprotein SH2 domain, display spacefill
  • 33. Example: c-abl oncoprotein SH2 domain, display ribbons

Editor's Notes

  • #20: Simulate the brain. Selection of training sets is extremely important. Different protein families, only one or two representative from each family.