際際滷

際際滷Share a Scribd company logo
How do you solve a problem like a
biological database?
(BNF 216 - Database Modeling and Design for Bioinformatics)
Arjei Balandra
Software Developer
National Telehealth Center
University of the Philippines  Manila
http://bumblebest.net
Database
 A database is a set of data that has a regular
structure and that is organized in such a way
that a computer can easily find the
desired information.
 The Linux Information Project
(http://www.linfo.org/database.html)
Biological Database
 Biological databases are libraries of life
sciences information collected from scientific
experiments, published literature, high-
throughput experiment technology, and
computational analyses.
- Wikipedia (en.wikipedia.org/wiki/Biological_database)
NCBI - GenBank
European Nucleotide Archive 
EMBL-EBI
DDBJ  DNA Data Bank Of Japan
Why Database?
 Data-intensive techniques such as high-
throughput screening and gene expression
experiments demand methods to correlate
large and diverse datasets.
 Databases integrate information from a
variety of sources allowing faster and more
powerful searches.
DO A GOOD DATABASE DESIGN
Tip #1:
Good Database Design
 Provides easy access to previous results.
 Supports both expert- and machine-guided
searches for novel correlations in data.
Bad Database Design
 Obfuscates the correlations for which the user
is searching
 makes it difficult for biologists to fit their data
into the database or to find previously stored
data resulting to user contempt.
 brittle
LEARN FROM EXISTING LITERATURE
Tip #2:
 Generalizations
 Incorporate existing schema into the database
design
 Use existing structures for common data
Generalizations
aMAZE (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308873/figure/gkh139f2/)
RESPECT THE UNIQUE NEEDS OF
BIOLOGISTS (AND USERS)
Tip #3:
Business rules
 constraints
 based on data derived from the real-world
entities
 specific to the needs of the organization.
What they need?
 Use free-text Comments
 Create user-specific
categories
Dealing with Business Rules
User-Specific Categories
DESIGN THE DATABASE BEFORE
BUILDING IT
Tip #4:
USE THE DATABASE TO ENFORCE
DATA INTEGRITY
Tip #5:
Normalization
Normalization
Normalization
KEEP THE DATABASE SCOPE
MANAGEABLE
Tip #6:
 In Biology, one size does not fit all
 Focus on a subset of Biology (ie. Genes,
Proteins)
 In large subsets, do it one at a time
 Inclusive
Keep the database scope manageable
LISTEN TO THE PEOPLE WHO HAVE TO
WRITE AND USE THE INTERFACE
Tip #7:
 Databases are successful only when people
use it
Users know what they want and need
+ Developers know what they can do
+ Designers know what must be done
---------------------------------------------------------
= Collaborative approach to develop a
successful database
TEST THE DESIGN WITH REALISTIC
DATA
Tip #8:
MAKE THE DATABASE STRUCTURE
UNDERSTANDABLE AND
EASY TO MAINTAIN
Tip #9:
Designing Biological Databases
THANK YOU!
REPLACE(quote,
pagmamahal,
data);
quote
References
 The Linux Information Project
(http://www.linfo.org/database.html)
 Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing
databases to store biological information. BIOSILICO
Vol. 1, No. 4
 Wikipedia (en.wikipedia.org/wiki/Biological_database)
 Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria,
X., Janky, R.,  Wodak, S. J. (2004). The aMAZE
LightBench: a web interface to a relational database
of cellular processes. Nucleic Acids
Research, 32(Database issue), D443D448.
doi:10.1093/nar/gkh139

More Related Content

Designing Biological Databases

  • 1. How do you solve a problem like a biological database? (BNF 216 - Database Modeling and Design for Bioinformatics) Arjei Balandra Software Developer National Telehealth Center University of the Philippines Manila http://bumblebest.net
  • 2. Database A database is a set of data that has a regular structure and that is organized in such a way that a computer can easily find the desired information. The Linux Information Project (http://www.linfo.org/database.html)
  • 3. Biological Database Biological databases are libraries of life sciences information collected from scientific experiments, published literature, high- throughput experiment technology, and computational analyses. - Wikipedia (en.wikipedia.org/wiki/Biological_database)
  • 6. DDBJ DNA Data Bank Of Japan
  • 7. Why Database? Data-intensive techniques such as high- throughput screening and gene expression experiments demand methods to correlate large and diverse datasets. Databases integrate information from a variety of sources allowing faster and more powerful searches.
  • 8. DO A GOOD DATABASE DESIGN Tip #1:
  • 9. Good Database Design Provides easy access to previous results. Supports both expert- and machine-guided searches for novel correlations in data.
  • 10. Bad Database Design Obfuscates the correlations for which the user is searching makes it difficult for biologists to fit their data into the database or to find previously stored data resulting to user contempt. brittle
  • 11. LEARN FROM EXISTING LITERATURE Tip #2:
  • 12. Generalizations Incorporate existing schema into the database design Use existing structures for common data
  • 15. RESPECT THE UNIQUE NEEDS OF BIOLOGISTS (AND USERS) Tip #3:
  • 16. Business rules constraints based on data derived from the real-world entities specific to the needs of the organization.
  • 17. What they need? Use free-text Comments Create user-specific categories Dealing with Business Rules
  • 19. DESIGN THE DATABASE BEFORE BUILDING IT Tip #4:
  • 20. USE THE DATABASE TO ENFORCE DATA INTEGRITY Tip #5:
  • 24. KEEP THE DATABASE SCOPE MANAGEABLE Tip #6:
  • 25. In Biology, one size does not fit all Focus on a subset of Biology (ie. Genes, Proteins) In large subsets, do it one at a time Inclusive Keep the database scope manageable
  • 26. LISTEN TO THE PEOPLE WHO HAVE TO WRITE AND USE THE INTERFACE Tip #7:
  • 27. Databases are successful only when people use it Users know what they want and need + Developers know what they can do + Designers know what must be done --------------------------------------------------------- = Collaborative approach to develop a successful database
  • 28. TEST THE DESIGN WITH REALISTIC DATA Tip #8:
  • 29. MAKE THE DATABASE STRUCTURE UNDERSTANDABLE AND EASY TO MAINTAIN Tip #9:
  • 32. References The Linux Information Project (http://www.linfo.org/database.html) Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing databases to store biological information. BIOSILICO Vol. 1, No. 4 Wikipedia (en.wikipedia.org/wiki/Biological_database) Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria, X., Janky, R., Wodak, S. J. (2004). The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32(Database issue), D443D448. doi:10.1093/nar/gkh139