This document discusses the role of bioinformatics in medicine today. It begins by explaining how genomics differs from genetics in studying many genes and genomic features together rather than single genes. It then describes some of the key genomic databases that are used in bioinformatics, including primary sequence databases like GenBank, metadatabases like Entrez, genome databases like Ensembl and UCSC, and pathway and protein databases. The document provides an example of how bioinformatics is used to analyze autism data, including processing sequencing data, identifying copy number variations, mapping genes, building networks, and identifying significant clusters to understand autism better.
The document provides an introduction to the field of bioinformatics, including definitions, history, applications and key concepts. It discusses how bioinformatics uses computer algorithms and databases to analyze biological data like genomes, proteins and genes. Major databases that store DNA sequences are described, such as GenBank, EMBL and DDBJ. Tools for analyzing sequences like BLAST are also introduced.
1. Bioinformatics is the science of using computer hardware and software to analyze biological data such as DNA sequences, protein sequences, and gene expression data.
2. It has three main branches - genomics which analyzes genome sequences, transcriptomics which analyzes gene expression data, and proteomics which analyzes protein sequences and structures.
3. The goals of bioinformatics include acquiring biological data, developing tools and databases, analyzing the data, and integrating different types of biological data to gain new biological insights.
The document summarizes a bioinformatics summer camp, including:
1. The camp will cover basic molecular biology and bioinformatics topics like DNA, proteins, gene expression and the genetic code.
2. Students will work on computational analysis projects involving whole genome sequencing, gene expression profiling, and functional and comparative genomics.
3. The camp will teach techniques for analyzing protein structures and interactions, gene expression data, and identifying pockets on protein surfaces.
WHAT IS BIOINFORMATICS?
Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between:
Observations (data) in diverse biologically-related disciplines and
The derivations of understanding (information)
APPLICATIONS OF BIOINFORMATICS
Computer Aided Drug Design
Microarray Bioinformatics
Proteomics
Genomics
Biological Databases
Phylogenetics
Systems Biology
The document provides an introduction to the field of bioinformatics. It discusses how bioinformatics applies computer science to analyze large amounts of biological data from fields like molecular biology, medicine, and biotechnology. It also outlines some of the main topics that will be covered in the course, including biological databases, gene and protein analysis, phylogenetic analysis, and gene prediction.
This document provides an overview of genomics, bioinformatics, and related topics. It discusses:
- The genomics and bioinformatics group members Amit Garg, Lokesh Joshi, and Pankaj Phogat.
- Definitions of genomics, genome, and bioinformatics.
- An overview of the human genome project including its history, goals of identifying and sequencing all human genes, and completion in 2003.
- Other completed genome projects such as for bacteria and yeast.
- The role of bioinformatics in collecting, organizing, analyzing, and sharing biological data through computational modeling, databases, and other tools.
Role of bioinformatics in life sciences researchAnshika Bansal
油
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
Bioinformatics is the use of computers for storage, retrieval, manipulation, and distribution of information related to biological macromolecules such as DNA, RNA, and proteins. It involves developing computational tools and databases to analyze biological data. Key areas include sequence analysis, structural analysis, functional analysis, biological databases, sequence alignment, protein structure prediction, molecular phylogenetics, and genomics. The goals are to better understand living systems at the molecular level through computational analysis of biological data.
This document provides an overview of the field of bioinformatics. It defines bioinformatics as the intersection of biology and computer science, using computational tools to analyze and distribute biological information like DNA, RNA, and proteins. The goals of bioinformatics are to better understand cells at the molecular level by analyzing sequence and structure data. Key applications include drug design, DNA analysis, and agricultural biotechnology. The document also describes different types of biological databases like primary databases that contain raw sequence data, and secondary databases that provide additional annotation and analysis of sequences.
Bioinformatics is the branch of life science that deals with the use of mathematical, statistical and computer methods to analyze biological and biochemical data.
Types of Bioinformatics (see the slides)
Here are some suggestions for open online bioinformatics lectures and courses from famous universities:
- MIT OpenCourseWare has free bioinformatics course materials and videos from MIT courses.
- edX has massive open online courses (MOOCs) in bioinformatics from universities like Harvard, Berkeley, MIT. Some are free to audit.
- Coursera has bioinformatics courses from top universities like Johns Hopkins, University of Toronto, Peking University.
- YouTube has full lecture videos from bioinformatics courses at universities like Stanford, UC San Diego, University of Cambridge.
- Khan Academy has introductory bioinformatics lectures on topics like sequence alignment, gene finding, protein structure.
- EMBL-
An Introduction to Bioinformatics
Drexel University INFO648-900-200915
A Presentation of Health Informatics Group 5
Cecilia Vernes
Joel Abueg
Kadodjomon Yeo
Sharon McDowell Hall
Terrence Hughes
This document provides an overview of bioinformatics. It begins by explaining how bioinformatics emerged from the need to analyze vast amounts of genetic sequence data produced by projects like the Human Genome Project. It then defines bioinformatics as the field that develops tools and methods for understanding biological data by combining computer science, statistics, and other disciplines. The document outlines several goals and applications of bioinformatics, such as identifying genes and their functions, modeling protein structures, comparing genomes, and its uses in medicine, microbial research, and more. It also provides a brief history of important developments in bioinformatics and DNA sequencing.
This document provides definitions and descriptions of the field of bioinformatics from multiple perspectives:
- Bioinformatics is the use of computers to analyze and interpret massive amounts of biological data, especially related to genomics, through techniques like modeling, algorithm development, and statistics.
- It involves the convergence of biology, biotechnology, computer science, and information technology to address challenges in managing and understanding biological data.
- Bioinformatics encompasses a range of activities from database management and analysis to developing tools that facilitate biological research and applications in fields like medicine.
This document provides an introduction to the field of bioinformatics. It defines bioinformatics as a branch of science that uses computer technology to analyze and integrate biological information that can be applied to gene-based drug discoveries. It discusses the emergence of bioinformatics due to the desire to understand how genetic structure affects traits. It also outlines some common applications of bioinformatics like drug design, gene therapy, and microbial genomic analysis. Finally, it provides examples of some bioinformatics tools, databases, and centers in India.
The EMBL-European Bioinformatics Institute (EBI) is a large bioinformatics research and services institute located in Hinxton, UK. It is part of the European Molecular Biology Laboratory and houses massive biological databases and bioinformatics software tools that are freely available to researchers. Key goals of EBI include building and maintaining biological databases, making data widely accessible, and conducting bioinformatics research to advance biology. EBI coordinates data collection and dissemination internationally and houses over 500 staff from diverse backgrounds.
Bioinformatics emerged as a field in the 1970s-1980s as areas of biology increasingly relied on computational methods. There were two main types of students in bioinformatics - computer scientists interested in biology and biologists skilled in computing. The bioinformatics market continues to grow worldwide and major employers include pharmaceutical and biotech companies. A career in bioinformatics requires strong skills in biology, computing, programming, data analysis, visualization and teamwork. Opportunities exist in areas like sequence assembly, genomic analysis, functional genomics, and database administration.
This document provides an overview of bioinformatics and related topics across 7 parts:
Part I introduces bioinformatics and its areas including genomics, proteomics, computational biology, and databases.
Part II discusses the history of bioinformatics from Darwin's theory of evolution to the human genome project.
Part III focuses on the human genome project, its goals of identifying genes and sequencing DNA, and its benefits like improved medicine.
Part IV explains how the internet plays an important role in bioinformatics for retrieving biological information and resources like databases, tools, and software.
Part V describes different types of biological databases including primary, secondary, and composite databases that combine different sources.
Part VI discusses knowledge discovery
Bioinformatics in present and its future爨項Σ爭爨 爨項逗Ξ爭
油
Bioinformatics is a hybrid油science油that links biological data with techniques for information storage, distribution, and analysis to support multiple areas of scientific research, including biomedicine.
This document discusses bioinformatics, including its goals and applications. Bioinformatics is defined as applying information technology to store, organize, and analyze vast amounts of biological data, such as sequences and structures of proteins and nucleic acids. It merges biology, mathematics, statistics, computer science, and information technology. Bioinformatics helps analyze gene and protein expression, compare genomic data, and simulate DNA, RNA, and proteins. It has applications in molecular medicine, drug development, microbial genomics, crop improvement, and more. Common bioinformatics tools include BLAST for comparing biological sequences.
Bioinformatics uses techniques from applied mathematics, computer science, and statistics to understand and organize biological information on a large scale, especially regarding molecules like DNA, RNA, and proteins. Functional genomics uses high-throughput methods and bioinformatics to describe gene and protein functions and interactions at a genome-wide level. Key tools for functional genomics include sequence-based tools, microarray-based tools, and Gene Ontology for organizing gene function information. A systems biology approach integrates vast amounts of correlative genomic and proteomic data to help understand complex human diseases.
This document provides an introduction and overview of the field of bioinformatics. It discusses how bioinformatics combines computer science and biology to analyze large amounts of biological data. Specifically, it mentions that bioinformatics uses algorithms and techniques from computer science to solve complex biological problems related to areas like molecular biology, genomics, drug discovery, and more. It also outlines some of the key applications of bioinformatics like sequence analysis, protein structure prediction, genome annotation, and comparative genomics. Finally, it provides brief descriptions of important biological databases and resources that bioinformaticians use to store and analyze genomic and protein sequence data.
This document provides an overview of bioinformatics, including its history, major areas of research, databases, tools, and applications. Bioinformatics is defined as the use of computer science and information technology to analyze and interpret biological data. The document traces the history of bioinformatics from early genetics experiments in the 1860s to advances in computing and molecular biology in the 1970s that enabled the field. It outlines major research areas like sequence analysis, genome annotation, and computational evolutionary biology. It also discusses biological databases, common bioinformatics tools, and applications of bioinformatics in fields like medicine, agriculture, and comparative genomics.
The Ensembl genome browser is a web-based tool that allows researchers to visualize and analyze genomic data. It was launched in 1999 by the Ensembl project, a joint initiative between EMBL's European Bioinformatics Institute and the Wellcome Sanger Institute. Ensembl contains genome data for humans and many other species, allowing users to browse genes, view their molecular functions, and utilize tools for variant effect prediction, data mining, and more. Key features include separate browsing options for domains like fungi, plants, animals, and bacteria.
This document provides an overview of the field of bioinformatics. It defines bioinformatics as the intersection of biology and computer science, using computational tools to analyze and distribute biological information like DNA, RNA, and proteins. The goals of bioinformatics are to better understand cells at the molecular level by analyzing sequence and structure data. Key applications include drug design, DNA analysis, and agricultural biotechnology. The document also describes different types of biological databases like primary databases that contain raw sequence data, and secondary databases that provide additional annotation and analysis of sequences.
Bioinformatics is the branch of life science that deals with the use of mathematical, statistical and computer methods to analyze biological and biochemical data.
Types of Bioinformatics (see the slides)
Here are some suggestions for open online bioinformatics lectures and courses from famous universities:
- MIT OpenCourseWare has free bioinformatics course materials and videos from MIT courses.
- edX has massive open online courses (MOOCs) in bioinformatics from universities like Harvard, Berkeley, MIT. Some are free to audit.
- Coursera has bioinformatics courses from top universities like Johns Hopkins, University of Toronto, Peking University.
- YouTube has full lecture videos from bioinformatics courses at universities like Stanford, UC San Diego, University of Cambridge.
- Khan Academy has introductory bioinformatics lectures on topics like sequence alignment, gene finding, protein structure.
- EMBL-
An Introduction to Bioinformatics
Drexel University INFO648-900-200915
A Presentation of Health Informatics Group 5
Cecilia Vernes
Joel Abueg
Kadodjomon Yeo
Sharon McDowell Hall
Terrence Hughes
This document provides an overview of bioinformatics. It begins by explaining how bioinformatics emerged from the need to analyze vast amounts of genetic sequence data produced by projects like the Human Genome Project. It then defines bioinformatics as the field that develops tools and methods for understanding biological data by combining computer science, statistics, and other disciplines. The document outlines several goals and applications of bioinformatics, such as identifying genes and their functions, modeling protein structures, comparing genomes, and its uses in medicine, microbial research, and more. It also provides a brief history of important developments in bioinformatics and DNA sequencing.
This document provides definitions and descriptions of the field of bioinformatics from multiple perspectives:
- Bioinformatics is the use of computers to analyze and interpret massive amounts of biological data, especially related to genomics, through techniques like modeling, algorithm development, and statistics.
- It involves the convergence of biology, biotechnology, computer science, and information technology to address challenges in managing and understanding biological data.
- Bioinformatics encompasses a range of activities from database management and analysis to developing tools that facilitate biological research and applications in fields like medicine.
This document provides an introduction to the field of bioinformatics. It defines bioinformatics as a branch of science that uses computer technology to analyze and integrate biological information that can be applied to gene-based drug discoveries. It discusses the emergence of bioinformatics due to the desire to understand how genetic structure affects traits. It also outlines some common applications of bioinformatics like drug design, gene therapy, and microbial genomic analysis. Finally, it provides examples of some bioinformatics tools, databases, and centers in India.
The EMBL-European Bioinformatics Institute (EBI) is a large bioinformatics research and services institute located in Hinxton, UK. It is part of the European Molecular Biology Laboratory and houses massive biological databases and bioinformatics software tools that are freely available to researchers. Key goals of EBI include building and maintaining biological databases, making data widely accessible, and conducting bioinformatics research to advance biology. EBI coordinates data collection and dissemination internationally and houses over 500 staff from diverse backgrounds.
Bioinformatics emerged as a field in the 1970s-1980s as areas of biology increasingly relied on computational methods. There were two main types of students in bioinformatics - computer scientists interested in biology and biologists skilled in computing. The bioinformatics market continues to grow worldwide and major employers include pharmaceutical and biotech companies. A career in bioinformatics requires strong skills in biology, computing, programming, data analysis, visualization and teamwork. Opportunities exist in areas like sequence assembly, genomic analysis, functional genomics, and database administration.
This document provides an overview of bioinformatics and related topics across 7 parts:
Part I introduces bioinformatics and its areas including genomics, proteomics, computational biology, and databases.
Part II discusses the history of bioinformatics from Darwin's theory of evolution to the human genome project.
Part III focuses on the human genome project, its goals of identifying genes and sequencing DNA, and its benefits like improved medicine.
Part IV explains how the internet plays an important role in bioinformatics for retrieving biological information and resources like databases, tools, and software.
Part V describes different types of biological databases including primary, secondary, and composite databases that combine different sources.
Part VI discusses knowledge discovery
Bioinformatics in present and its future爨項Σ爭爨 爨項逗Ξ爭
油
Bioinformatics is a hybrid油science油that links biological data with techniques for information storage, distribution, and analysis to support multiple areas of scientific research, including biomedicine.
This document discusses bioinformatics, including its goals and applications. Bioinformatics is defined as applying information technology to store, organize, and analyze vast amounts of biological data, such as sequences and structures of proteins and nucleic acids. It merges biology, mathematics, statistics, computer science, and information technology. Bioinformatics helps analyze gene and protein expression, compare genomic data, and simulate DNA, RNA, and proteins. It has applications in molecular medicine, drug development, microbial genomics, crop improvement, and more. Common bioinformatics tools include BLAST for comparing biological sequences.
Bioinformatics uses techniques from applied mathematics, computer science, and statistics to understand and organize biological information on a large scale, especially regarding molecules like DNA, RNA, and proteins. Functional genomics uses high-throughput methods and bioinformatics to describe gene and protein functions and interactions at a genome-wide level. Key tools for functional genomics include sequence-based tools, microarray-based tools, and Gene Ontology for organizing gene function information. A systems biology approach integrates vast amounts of correlative genomic and proteomic data to help understand complex human diseases.
This document provides an introduction and overview of the field of bioinformatics. It discusses how bioinformatics combines computer science and biology to analyze large amounts of biological data. Specifically, it mentions that bioinformatics uses algorithms and techniques from computer science to solve complex biological problems related to areas like molecular biology, genomics, drug discovery, and more. It also outlines some of the key applications of bioinformatics like sequence analysis, protein structure prediction, genome annotation, and comparative genomics. Finally, it provides brief descriptions of important biological databases and resources that bioinformaticians use to store and analyze genomic and protein sequence data.
This document provides an overview of bioinformatics, including its history, major areas of research, databases, tools, and applications. Bioinformatics is defined as the use of computer science and information technology to analyze and interpret biological data. The document traces the history of bioinformatics from early genetics experiments in the 1860s to advances in computing and molecular biology in the 1970s that enabled the field. It outlines major research areas like sequence analysis, genome annotation, and computational evolutionary biology. It also discusses biological databases, common bioinformatics tools, and applications of bioinformatics in fields like medicine, agriculture, and comparative genomics.
The Ensembl genome browser is a web-based tool that allows researchers to visualize and analyze genomic data. It was launched in 1999 by the Ensembl project, a joint initiative between EMBL's European Bioinformatics Institute and the Wellcome Sanger Institute. Ensembl contains genome data for humans and many other species, allowing users to browse genes, view their molecular functions, and utilize tools for variant effect prediction, data mining, and more. Key features include separate browsing options for domains like fungi, plants, animals, and bacteria.
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
油
This document summarizes genomic big data management, integration and mining. It discusses the exponential growth of biological data due to advances in sequencing technologies. Next generation sequencing techniques generate large amounts of short DNA reads. Several public databases contain heterogeneous biological data sources. Effective data management and integration methods are needed to analyze these large and complex datasets. Supervised machine learning can be used to extract knowledge and classify samples. Tools like CAMUR apply rule-based classification to problems like analyzing gene expression from cancer datasets. Future work involves advanced integration systems and new big data approaches for biological data.
Aim1: To study the method of genome identification through ENSEMBL browser.
Aim2: To study the method of genome identification through VISTA.
Aim3: To study the method of genome identification through UCSC Genome Browser.
Aim4: To study the method of genome and amino acid sequences through UCSC Genome Browser.
This document provides an introduction to biological databases and bioinformatics tools. It defines biological sequences and databases, and describes the types of bioinformatics databases including primary, secondary, and composite databases. Examples of specific biological databases like GenBank, EMBL, and SwissProt are outlined. Common bioinformatics tools for sequence analysis, structural analysis, protein function analysis, and homology/similarity searches are listed, including BLAST, FASTA, EMBOSS, ClustalW, and RasMol. Finally, important bioinformatics resources on the web are highlighted.
This document provides an overview of the November 2000 issue of JALA (Journal of Analytical Laboratories Automation). It describes the development of a novel robotic system for the New York Cancer Project biorepository in collaboration with the Medical Automation Research Center. The biorepository receives 50-100 blood samples per day which are processed robotically to extract, quantify, aliquot and store DNA, plasma and RNA to be accessible to investigators. The robotic system aims to provide rapid random access to the hundreds of thousands of DNA samples stored for high-throughput analysis in studies of gene-environment interactions and cancer risk.
This document introduces bioinformatics and discusses some of its key concepts and applications. It defines bioinformatics as an interdisciplinary field that combines computer science, statistics and engineering to study and process biological data. It describes some basic cell components like DNA, RNA and proteins, and how genetics and the genetic code work. It also provides a brief history of bioinformatics, highlighting projects like the Human Genome Project. Finally, it outlines several applications of bioinformatics like phylogenetic analysis, drug design, microarray analysis and protein-protein interaction networks.
Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
油
Talk given at the Netherlands Institute of Ecology in Wageningen, where I describe the development of the MetaboLights database and the value of data sharing in Metabolomics and molecular Biology in General
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
油
The document discusses biocuration activities for the International Cancer Genome Consortium (ICGC). It provides information on the goals of ICGC including comprehensively analyzing 50 different cancer types/subtypes and making the genomic and clinical data publicly available. It describes the types of data being collected, standards being developed for data access and sharing, and current status of datasets released.
Bioinformatics is an interdisciplinary field that uses computational tools to analyze and manage biological data such as genes, genomes, proteins, and medical information. It involves developing mathematical models to understand relationships in complex biological systems. Key areas include analyzing protein and gene sequences, structures, and functions; understanding evolution and molecular interactions; and developing "virtual cells" through integrated modeling. Major challenges include integrating heterogeneous biological data sources and developing robust computational methods.
This document provides an overview of the field of bioinformatics. It defines bioinformatics as using computational techniques to solve biological problems by analyzing large amounts of biological data like DNA sequences, amino acid sequences, and more. It discusses the need for bioinformatics due to the exponential growth of biological data from sequencing projects. Some key applications of bioinformatics mentioned include data management, knowledge discovery, drug discovery, proteomics, personalized medicine, agriculture, and its use in systems biology.
The document discusses several key databases for nucleotide and protein sequences. It describes NCBI, EMBL, DDBJ, PIR, and SWISS-PROT as the primary databases that store nucleotide and protein sequence data. NCBI, EMBL, and DDBJ work together through the International Nucleotide Sequence Database Collaboration to share data daily and provide a comprehensive set of sequence information. The document provides details on the history and role of each database.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
This document provides a review and summary of major scientific events, trends, and publications in translational bioinformatics in 2008 by Russ B. Altman from Stanford University. Some of the key topics covered include the sequencing and analysis of an individual's diploid genome, next-generation sequencing technologies, genome-wide association studies, pharmacogenomics, analysis of high-throughput molecular data, neuroscience datasets, and using molecular information to improve disease detection and treatment. The review highlights over 25 seminal papers from 2008 and provides insights on emerging trends in the field.
The document outlines David Montaner's presentation on the 100,000 Genomes Project at Valencia University on October 6th, 2016. The key points are:
1) The 100,000 Genomes Project aims to sequence 100,000 genomes from NHS patients with rare diseases or cancer to further medical research and genomic medicine in the UK.
2) Genomics England was established to deliver the project, working with the NHS, academics, and industry. Genome sequencing, analysis and interpretation is conducted through various centers and partnerships across the UK.
3) Interpreting the large number of variants identified in whole genomes remains a major challenge for the project. Information on variant frequencies, segregation, inheritance patterns, and
David Montaner is a statistician who works in computational genomics, focusing on massive data analysis and gene set analysis. He has developed methods for multi-dimensional gene set analysis and improving gene set analysis for next generation sequencing data. His ongoing work includes improving software implementation, adjusting methods for NGS data, extending the approach to other genomic features, and investigating topological pathway analysis and metagenomics.
The Biostatistics Unit at the Computational Genomics Institute provides data analysis, develops bioinformatics methods and tools, and offers technical support and training courses. The unit, consisting of David Montaner, Francisco Garc鱈a, and Martina Marb, analyzes microarray, sequencing, and clinical data through statistical modeling, experiment design, and data preprocessing. They help with statistical modeling, experiment design, organizing data, and presenting results.
Este documento presenta un resumen de la tesis doctoral de David Montaner sobre el an叩lisis funcional de datos gen坦micos. Describe los avances metodol坦gicos realizados por el autor en el uso de regresi坦n log鱈stica y an叩lisis multidimensional para interpretar datos gen坦micos a nivel de conjuntos de genes. Adem叩s, propone una metodolog鱈a para estimar la importancia relativa de cada gen dentro de los conjuntos funcionales.
Used for a seminar on business opportunities in the health sector.
Official Master in Entrepreneurship and Business Management; Faculty of Economics of the University of Valencia.
http://www.genometra.com/seminar-on-business-opportunities
Seguimiento y Evaluaci坦n OnLine de Trabajos de Pr叩cticas en Asignaturas de Es...David Montaner
油
III Jornadas de Intercambio de Experiencias de Innovaci坦n Educativa en Estad鱈stica.
Valencia, 16 y 17 de Julio de 2012.
www.dmontaner.es
Seguimiento y Evaluaci坦n OnLine de Trabajos de Pr叩cticas en Asignaturas de Es...David Montaner
油
Bioinformatics Introduction
1. Bioinformatics in medicine
today
David Montaner
dmontaner@cipf.es
Centro de Investigaci坦n Pr鱈ncipe Felipe
Institute of Computational Genomics
9 May 2013
in Valencia
David Montaner Bioinformatics in medicine 1/26
2. Genomics
Progress in science depends on new techniques, new
discoveries and new ideas, probably in that order.
Sydney Brenner, 1980
Microarray devices and high-throughput sequencing allow us
measuring thousands or millions of genomic characteristics.
David Montaner Bioinformatics in medicine 2/26
3. Genomics vs. genetics
Genetics:
Single genes are responsible for biological changes.
one gene one hypothesis one p-value conclusions
Genomics:
Genes or genomic features act together to produce
biological changes.
many genes many hypothesis many p-value
more data analysis
Computational support is needed even for drawing
conclusions
David Montaner Bioinformatics in medicine 3/26
4. Genomic numbers
Microarray:
30.000 genes
2 million SNPs
100 Mb
Measured features:
genes, isoforms
SNPs, Polymorphisms
IN-DELS
loss of heterozygosity
methylation
copy number alterations
NGS:
30.000 genes
30.000 transcripts
20 million SNPs
10-100 GB
Registered information:
Genomic characteristics:
position, chromosome ...
Biological function
Disease association
miRNA targets
David Montaner Bioinformatics in medicine 4/26
5. Genomic databases
Nucleic Acid Research lists +1500 online databases!
http://www.oxfordjournals.org/nar/database/c
Many different databases for each category, which should I
use?
No standards: different IDs, methods, servers, formats, ...
Lack of international initiatives, many local and small
databases
Different gene IDs, more than 50
In vivo vs in silico databases
David Montaner Bioinformatics in medicine 5/26
6. Biological databases (Wikipedia)
1 Primary nucleotide
sequence databases
2 Metadatabases
3 Genome databases
4 Protein sequence
databases
5 Proteomics databases
6 Protein structure
databases
7 Protein model databases
8 RNA databases
9 Carbohydrate structure
databases
10 Protein-protein interactions
11 Signal transduction
pathway databases
12 Metabolic pathway
databases
13 Experimental data
repositories (Microarrays
NGS, Sanger)
14 Exosomal databases
15 Mathematical model
databases
16 PCR / real time PCR
primer databases
17 Specialized databases
18 Taxonomic databases
19 Wiki-style databasesDavid Montaner Bioinformatics in medicine 6/26
7. Primary nucleotide sequence
databases
Contain any kind of nucleotide sequences, form genes to
genomes.
The International Nucleotide Sequence Database (INSD)
Collaboration:
GenBank
National Center for Biotechnology Information (NCBI)
European Nucleotide Archive (ENA)
European Bioinformatics Institute (EBI)
DNA Data Bank of Japan (DDBJ)
David Montaner Bioinformatics in medicine 7/26
8. GenBank
Primary nucleotide sequence databases
available on the NCBI ftp site:
http://www.ncbi.nlm.nih.gov/Ftp/
A new release is made every two months.
3 types of entries:
CoreNucleotide (the main collection)
dbEST (Expressed Sequence Tags)
dbGSS (Genome Survey Sequences)
Access:
Search for sequence identi鍖ers using Entrez Nucleotide:
http://www.ncbi.nlm.nih.gov/nucleotide/
Align GenBank sequences to a query sequence using
BLAST (Basic Local Alignment Search Tool).
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Several other e-utilities (see book)
See an example of a GenBank record.
David Montaner Bioinformatics in medicine 8/26
9. Metadatabases
Collect and organize data from primary nucleotide
sequence databases and may other resources.
Make the information available in a convenient format and
provide data handling resources: web pages, application
programming interface (API)
Focus on particular species, diseases
Examples
Entrez: searches through almost all NCBI resources.
http://www.ncbi.nlm.nih.gov/sites/gquery
GeneCards: provides genomic, proteomic, transcriptomic,
genetic and functional information for human genes (known
and predicted)
http://www.genecards.org/
David Montaner Bioinformatics in medicine 9/26
10. Entrez
Metadatabases
Searches through almost all NCBI resources.
Entrez search page: http://www.ncbi.nlm.nih.gov/sites/gquery
queries can be saved if you have a a MyNCBI account
http://www.ncbi.nlm.nih.gov/
David Montaner Bioinformatics in medicine 10/26
11. Genome databases
Collect genome sequences and annotation (speci鍖cation about
genes) for particular organisms, and try to improve them:
Data curation.
Complete missing information using insilico methods.
Generate new relational organization.
Complement feature IDs.
Provide easy access, visualization
Examples
Ensembl: automatic annotation on selected eukaryote
genomes.
UCSC Genome Browser: reference sequence and working
draft assemblies for a large collection of genomes
Wormbase: genome of the model organism C.elegans.
David Montaner Bioinformatics in medicine 11/26
12. Ensembl
Genome databases
Ensembl is a joint project between European Bioinformatics
Institute (EBI) the European Molecular Biology Laboratory
(EMBL) and the Wellcome Trust Sanger Institute.
Develop a software system which produces and maintains
automatic annotation on selected vertebrate and
eukaryote genomes.
http://www.ensembl.org
David Montaner Bioinformatics in medicine 12/26
13. UCSC Genome Browser
Genome databases
UCSC: University of California, Santa Cruz.
This site contains the reference sequence and working
draft assemblies for a large collection of genomes.
http://genome.ucsc.edu/
David Montaner Bioinformatics in medicine 13/26
14. Protein sequence databases
Most times proteins are the 鍖nal unit of interest to research.
There is a direct conversion from DNA/RNA sequences to
protein sequences.
Gene IDs and protein IDs are equivalently used by
researchers (biologists not bioinformaticians )
Examples
UniProt: Universal Protein Resource (EBI)
Swiss-Prot (Swiss Institute of Bioinformatics)
InterPro Classi鍖es proteins into families and predicts the
presence of domains and sites.
Pfam Protein families database of alignments and HMMs
(Sanger Institute)
David Montaner Bioinformatics in medicine 14/26
15. RNA databases
Contain information about RNA molecules.
Most of them regarding gene regulatory factors. (Gene
information is usually in other repositories).
Examples
mirBase: microRNAs
http://www.mirbase.org/
TRANSFAC: transcription factors in eukaryote (Proprietary
database).
JASPAR: transcription factor binding sites for eukaryote
(Open access, curated, non-redundant).
http://jaspar.genereg.net/
David Montaner Bioinformatics in medicine 15/26
16. Protein-protein interactions
Proteins are the main functional units.
But they do not work in isolation.
Pretty useless at the moment but promising in the future
some information is experimental, but most of it is
generated insilico.
Examples
IntAct: proteinsmall molecule
and proteinnucleic acid
interactions.
BIND: Biomolecular Interaction
Network Database.
David Montaner Bioinformatics in medicine 16/26
17. Signal transduction pathway
databases
& Metabolic pathway databases
Information about how genes (or proteins) interact among
them.
not only physical interactions
Examples
Reactome: free online database of biological pathways.
http://www.reactome.org
KEGG: Kyoto Encyclopedia of Genes and Genomes.
Metabolic pathways.
http://www.genome.jp/kegg/pathway.html
David Montaner Bioinformatics in medicine 17/26
19. Experimental data repositories
Contain Microarray, NGS, Sanger, and other experimental high
throughput data.
GEO: Gene Expression Omnibus (NCBI)
http://www.ncbi.nlm.nih.gov/geo/
ArrayExpress: database of functional genomics
experiments including (EBI)
http://www.ebi.ac.uk/arrayexpress/
The Cancer Genome Atlas (TCGA): Data on different
cancer related tissues.
http://cancergenome.nih.gov/
David Montaner Bioinformatics in medicine 19/26
20. Bioinformatics
Training
Biology 1/3
Statistics 1/3
Computer science 1/3
Ef鍖ciently combine:
Experimental information
Database registered knowledge
Time and resources:
As in the wet lab
David Montaner Bioinformatics in medicine 20/26
22. Example I
Autistic children
1 (microarray) NGS data processing
data quality control, 鍖ltering...
map against reference genome
CNV calling
2 CNV 鍖ltering
just 75 rare de novo CNV events (not registered in
databases)
鍖lter out the long ones
keep the ones that contain genes
David Montaner Bioinformatics in medicine 22/26
23. Example II
3 move to the gene level
47 loci in total affecting 433 human genes
4 Building the background likelihood network
GO annotations
KEGG pathways
InterPro domains
protein-proteins interactions. Databases: BIND, BioGRID,
DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS
sequence homology between the gene pair (BLAST)
David Montaner Bioinformatics in medicine 23/26
24. Example III
5 Search for high scoring clusters affected by CNVs
6 Evaluating signi鍖cance of cluster scores:
10.000 simulations
David Montaner Bioinformatics in medicine 24/26
25. Example IV
7 Functional characterization of the identi鍖ed network
8 And, 鍖nally, draw conclusions
David Montaner Bioinformatics in medicine 25/26