際際滷

際際滷Share a Scribd company logo
By
Dr.Abdul Hameed
Chief Scientific Officer
IBGE, Islamabad, Pakistan
DNA SEQUENCING
 DNA sequencing is the process of
determining the precise order of
nucleotides within a DNA molecule.
 It includes any method or technology that
is used to determine the order of the four
basesadenine, guanine, cytosine, and
thyminein a strand of DNA.
DNA SEQUENCING METHODS
Historically there are two main
methods of DNA sequencing
1. Maxam and Gilbert method
2. Sanger method
A. M. Maxam and W.Gilbert-1977
Chemical Sequencing
Treatment of DNA with certain
Chemicals  DNA cuts into
Fragments  Monitoring of
sequences
MAXAM & GILBERT METHOD
Principle
A graphical demonstration
 Most common approach used
for DNA sequencing .
 Invented by Frederick Sanger -
1977
 Nobel prize - 1980
 Also termed as Chain
Termination or Dideoxy method
SANGER METHOD
SANGER METHOD
 The chain termination reaction
 Dideoxynucleotide triphosphates (ddNTPs) chain
terminators
havig an H on the 3C of the ribose sugar
(normally OH found in dNTPs)
 ssDNA  addition of dNTPs  elongation
 ssDNA  addition of ddNTPs  elongation stops
DEOXY VERSUS DIDEOXY
HMD_Sequencing_KIBGE_KCHI.pptx
HMD_Sequencing_KIBGE_KCHI.pptx
HMD_Sequencing_KIBGE_KCHI.pptx
Fluorescent Dyes
 Fluorescent dyes are multicyclic
molecules that absorb and emit
fluorescent light at specific wavelengths.
 Examples are fluorescein and rhodamine
derivatives.
 For sequencing applications, these
molecules can be covalently linked to
nucleotides.
AC
GT
The fragments are
distinguished by size and
color.
Dye Terminator Sequencing
 A distinct dye or color is used for each of the
four ddNTP.
 Since the terminating nucleotides can be
distinguished by color, all four reactions can be
performed in a single tube.
A
T
G
T
DNA Sanger Sequencing
DNA Sequence Analysis
ABI_Sequencing Analysis 5.2 Software
MEGA7.0.26
The Human Genome Project
 First draft genome of human in 2001,
final 2004
 Estimated costs $3 billion, time 13 years
 Used Sanger Sequencing
 Today:
Illumina: 1 week, 9500$
Exome: 6 weeks*, $1000
Towards 1000$ genome
Setia Pramana
18
The Human Genome Project
 The draft sequence of the
HGP was imperfect
because of the incomplete
coverage of many regions
 a huge number of gaps
 The IHGSC published a
finished version of the
human genome sequence
in 2004 and the HGP was
then deemed to be
complete
19
The Human Genome Project
 This finished version of the
genome achieved almost
complete coverage of all the
regions and also significantly
reduced the number of gaps
to 341 from the initial
hundreds of thousands
 Initiated a new era in the
study of genetic variation and
the functional
characterization of the
human genome
20
Next (second) Generation Sequencing
 New technologies allowing the massive
production of tens of millions of short
sequencing fragments. Thus, it is also
called: Massively parallel sequencing
 These techniques could be used to
 deal with similar problems than microarrays,
 but also with many other.
 They raised the promise of personalized
medicine
21
NGS
 The advent of high-throughput
sequencing technologies has initiated
the personal genome sequencing era
for both normal and cancer genomes
 Large-scale international projects such
as the 1000 Genomes Project and the
International Cancer Genome
Consortium
22
NGS
 NGS technologies have been on the
market only since 2004
 Have now largely replaced Sanger
sequencing technologies (owing to the
ultra-high-throughput
production/hundreds gigabases)
 Ability to simultaneously sequence
millions of DNA fragments - massively
parallel sequencing technologies
23
NGS
 Reduced sequencing costs
significantly, making large-scale or
WGS studies much more affordable
Setia Pramana
24
 https://www.abmgood.com/marketing/knowledge_
base/next_generation_sequencing_introduction.php
?__hstc=78008651.ac2f879252631e74a7d5a792c7309
b26.1575388813433.1575388813433.1575388813433.1&
__hssc=78008651.1.1575388813436&submissionGuid=e
7693a0c-1efc-4ae4-bcdc-9ef87ccb5773
Third Generation Sequencing
26
Bioinformatics Challenges of NGS
Setia Pramana
27
Sequencing has gotten Cheaper and Faster
Cost of one human genome
 HGP $ 3 billion (13 yrs)
2004: $ 30,000,000
2008: $100,000
2010: $ 30,000
2011: $10,000
2012-13: $7,000
2014: $4,000 (~1 week)
???: $1,000
The Race for the $1,000 Genome
equencing) Cost is Getting Cheaper
 Reduced sequencing costs significantly, making
large-scale or WGS studies much more affordable
Setia Pramana
29
NGS Challenges
Setia Pramana
30
Huge Data Storage and HPC
Demand
NGS Challenges
 Highest cost is (almost) not the sequencing
but storage and analysis.
 A standard human (30-40x) whole genome
sequencing would create 100 Gb of data
 Extreme data size causes problems
 Just transferring and storing the data
 Standard comparisons fail (N*N)
 Standard tools can not be used
 Think in fast and parallel programs
Setia Pramana
32
Bioinformatics Challenges of NGS
 Need for large amount of CPU power
- Informatics groups must manage
compute clusters
-Challenges in parallelizing existing
software or redesign of algorithms to work
in a parallel environment
- Another level of software complexity
and challenges to interoperability
Setia Pramana
33
Bioinformatics Challenges of NGS
 VERY large text files (~10 million lines
long)
- Cant do business as usualwith
familiar tools such as Perl/Python.
- Impossible memory usage and
execution time - Impossible to
browse for problems
 Need sequence Quality filtering
Setia Pramana
34
Data Management Issues
 Raw data are large. How long should be kept?
 Processed data are manageable for most people
 20 million reads (50bp) ~1Gb
 More of an issue for a facility: HiSeq recommends
32 CPU cores, each with 4GB RAM
 Certain studies much more data intensive than
other
 Whole genome sequencing
30X coverage genome pair (tumor/normal)
~500 GB
50 genome pairs ~ 25 TB
Setia Pramana
35
Data Management
 Primary data usually discarded soon after run
 Secondary and tertiary data maintained on fast access
disk during analysis, then moved to slower access disk
afterward
Interpretation Bottleneck
Big Collaboration
 Need Collaborative expertise (human intelligence
and intuition) are required for meaning and
interpretation (Bergeron 2002)
 Including on-demand communication & sharing of
protocols, electronic resources, data, and findings
among the stakeholders
 Collaboration with other Big DATA sources: National
Registers, BPJS, Hospitals, etc.
Summary
 Challenges:
 Still expensive
 Lack of Infrastructure (in developing
countries)
 Lack of skilled personal on Bioinformatics
 Need (large scale) collaborations
 Integrate different technologies and system
 Making it all clinically relevant
Setia Pramana
39
HMD_Sequencing_KIBGE_KCHI.pptx

More Related Content

HMD_Sequencing_KIBGE_KCHI.pptx

  • 1. By Dr.Abdul Hameed Chief Scientific Officer IBGE, Islamabad, Pakistan
  • 2. DNA SEQUENCING DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four basesadenine, guanine, cytosine, and thyminein a strand of DNA.
  • 3. DNA SEQUENCING METHODS Historically there are two main methods of DNA sequencing 1. Maxam and Gilbert method 2. Sanger method
  • 4. A. M. Maxam and W.Gilbert-1977 Chemical Sequencing Treatment of DNA with certain Chemicals DNA cuts into Fragments Monitoring of sequences MAXAM & GILBERT METHOD
  • 6. Most common approach used for DNA sequencing . Invented by Frederick Sanger - 1977 Nobel prize - 1980 Also termed as Chain Termination or Dideoxy method SANGER METHOD
  • 7. SANGER METHOD The chain termination reaction Dideoxynucleotide triphosphates (ddNTPs) chain terminators havig an H on the 3C of the ribose sugar (normally OH found in dNTPs) ssDNA addition of dNTPs elongation ssDNA addition of ddNTPs elongation stops
  • 12. Fluorescent Dyes Fluorescent dyes are multicyclic molecules that absorb and emit fluorescent light at specific wavelengths. Examples are fluorescein and rhodamine derivatives. For sequencing applications, these molecules can be covalently linked to nucleotides.
  • 13. AC GT The fragments are distinguished by size and color. Dye Terminator Sequencing A distinct dye or color is used for each of the four ddNTP. Since the terminating nucleotides can be distinguished by color, all four reactions can be performed in a single tube. A T G T
  • 18. The Human Genome Project First draft genome of human in 2001, final 2004 Estimated costs $3 billion, time 13 years Used Sanger Sequencing Today: Illumina: 1 week, 9500$ Exome: 6 weeks*, $1000 Towards 1000$ genome Setia Pramana 18
  • 19. The Human Genome Project The draft sequence of the HGP was imperfect because of the incomplete coverage of many regions a huge number of gaps The IHGSC published a finished version of the human genome sequence in 2004 and the HGP was then deemed to be complete 19
  • 20. The Human Genome Project This finished version of the genome achieved almost complete coverage of all the regions and also significantly reduced the number of gaps to 341 from the initial hundreds of thousands Initiated a new era in the study of genetic variation and the functional characterization of the human genome 20
  • 21. Next (second) Generation Sequencing New technologies allowing the massive production of tens of millions of short sequencing fragments. Thus, it is also called: Massively parallel sequencing These techniques could be used to deal with similar problems than microarrays, but also with many other. They raised the promise of personalized medicine 21
  • 22. NGS The advent of high-throughput sequencing technologies has initiated the personal genome sequencing era for both normal and cancer genomes Large-scale international projects such as the 1000 Genomes Project and the International Cancer Genome Consortium 22
  • 23. NGS NGS technologies have been on the market only since 2004 Have now largely replaced Sanger sequencing technologies (owing to the ultra-high-throughput production/hundreds gigabases) Ability to simultaneously sequence millions of DNA fragments - massively parallel sequencing technologies 23
  • 24. NGS Reduced sequencing costs significantly, making large-scale or WGS studies much more affordable Setia Pramana 24
  • 27. Bioinformatics Challenges of NGS Setia Pramana 27
  • 28. Sequencing has gotten Cheaper and Faster Cost of one human genome HGP $ 3 billion (13 yrs) 2004: $ 30,000,000 2008: $100,000 2010: $ 30,000 2011: $10,000 2012-13: $7,000 2014: $4,000 (~1 week) ???: $1,000 The Race for the $1,000 Genome
  • 29. equencing) Cost is Getting Cheaper Reduced sequencing costs significantly, making large-scale or WGS studies much more affordable Setia Pramana 29
  • 31. Huge Data Storage and HPC Demand
  • 32. NGS Challenges Highest cost is (almost) not the sequencing but storage and analysis. A standard human (30-40x) whole genome sequencing would create 100 Gb of data Extreme data size causes problems Just transferring and storing the data Standard comparisons fail (N*N) Standard tools can not be used Think in fast and parallel programs Setia Pramana 32
  • 33. Bioinformatics Challenges of NGS Need for large amount of CPU power - Informatics groups must manage compute clusters -Challenges in parallelizing existing software or redesign of algorithms to work in a parallel environment - Another level of software complexity and challenges to interoperability Setia Pramana 33
  • 34. Bioinformatics Challenges of NGS VERY large text files (~10 million lines long) - Cant do business as usualwith familiar tools such as Perl/Python. - Impossible memory usage and execution time - Impossible to browse for problems Need sequence Quality filtering Setia Pramana 34
  • 35. Data Management Issues Raw data are large. How long should be kept? Processed data are manageable for most people 20 million reads (50bp) ~1Gb More of an issue for a facility: HiSeq recommends 32 CPU cores, each with 4GB RAM Certain studies much more data intensive than other Whole genome sequencing 30X coverage genome pair (tumor/normal) ~500 GB 50 genome pairs ~ 25 TB Setia Pramana 35
  • 36. Data Management Primary data usually discarded soon after run Secondary and tertiary data maintained on fast access disk during analysis, then moved to slower access disk afterward
  • 38. Big Collaboration Need Collaborative expertise (human intelligence and intuition) are required for meaning and interpretation (Bergeron 2002) Including on-demand communication & sharing of protocols, electronic resources, data, and findings among the stakeholders Collaboration with other Big DATA sources: National Registers, BPJS, Hospitals, etc.
  • 39. Summary Challenges: Still expensive Lack of Infrastructure (in developing countries) Lack of skilled personal on Bioinformatics Need (large scale) collaborations Integrate different technologies and system Making it all clinically relevant Setia Pramana 39