際際滷

際際滷Share a Scribd company logo
Overview of Data Management 
Grant Weddell 
David R. Cheriton School of Computer Science 
University of Waterloo 
CS 348 
Introduction to Database Management 
Spring 2014 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 1 / 22 
Notes
Course Logistics 
Webpage  www.cs.uwaterloo.ca/gweddell/cs348 
Text Book  Database Management Systems (3rd Edition). 
Raghu Ramakrishnan and Johannes Gehrke. 
McGraw Hill, 2000. 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 2 / 22 
Notes
Course Content 
Why do we use databases? 
 Functionality provided by a Database Management System 
 Database Models: Relational, Network, OO 
How do we use a DBMS? 
 Relational model, foundational query languages 
 SQL 
 Application programming 
 Transactions and concurrency 
How do we design a database? 
 Entity-Relationship (ER) modeling 
 Dependencies and constraints 
 Redundancy and normal forms 
How do we administer a DBMS? 
 Security and authorization 
 Physical design/tuning 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 3 / 22 
Notes
What is a Database? 
Definition (Database) 
A large and persistent collection of (more-or-less similar) pieces of 
information organized in a way that facilitates efficient retrieval and 
modification. 
Examples: 
 a file cabinet 
 a library system 
 a personnel management system 
Definition (Database Management System (DBMS)) 
A program (or set of programs) that manages details related to storage 
and access for a database. 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 4 / 22 
Notes
Application of Databases 
Original  inventory control 
 payroll 
 banking and financial systems 
 reservation systems 
More recent  computer aided design (CAD) 
 software development (CASE, SDE/SSE) 
 telecommunication systems 
 e-commerce 
 dynamic/personalized web content 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 5 / 22 
Notes
Application of Databases (contd) 
Common Circumstances: 
 There is lots of data (mass storage) 
 Data is formatted 
 Requirements: 
 persistence and reliability 
 efficient and concurrent access 
 Issues: 
 many files with different structure 
 shared files or replicated data 
 need to exchange data (translation programs) 
Note 
The data maintained by the system are much more important and 
valuable than the system itself. 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 6 / 22 
Notes
Brief History of Data Management: Ancient 
2000 BC: Sumerian Records 
350 BC: Syllogisms (Aristotle) 
296 BC: Library of Alexandria 
1879: Modern Logic (Frege) 
1884: U.S. Census (Hollerith) 
1941: Model Theory (Tarski) 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 7 / 22 
Notes
Brief History of Data Management: 1950s 
First generation 50s and 60s 
 batch processing 
 sequential files and tape 
 input on punched cards 
Second generation (60s) 
 disk enabled random access files 
 new access methods (ISAM, hash files) 
 mostly batch with some interactivity 
 independent applications with separate files 
 growing applications base 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 8 / 22 
Notes
Brief History of Data Management: 1960s (contd) 
As the application base grows, we end up with 
 many shared files 
 a multitude of file structures 
 a need to exchange data among applications 
This causes a variety of problems 
 redundancy: multiple copies 
 inconsistency: independent updates 
 inaccuracy: concurrent updates 
 incompatibility: multiple formats 
 insecurity: proliferation 
 inauditability: poor chain of responsibility 
 inflexibility: changes are difficult to apply 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 9 / 22 
Notes
Brief History of Data Management: 1960s (contd) 
 Hierarchical data model 
 IBMs Information Management System (IMS): concurrent access 
 only allows 1:N parent-child relationships (i.e. a tree) 
 hierarchy can be exploited for efficiency 
 queries navigate up and down treesone record at a time 
 data access language embedded in business processing language 
 difficult to express some queries 
 Network data model 
 Charles Bachmans Integrated Data Store (IDS) 
 model standardized by Conference On DAta SYstems Languages 
(CODASYL) 
 data organized as collections of sets of records 
 separation of physical data representation from users view of data 
 pointers between records represent relationships 
 set types encoded as lists 
 queries navigate between recordsstill one record at a time 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 10 / 22 
Notes
Database Management System 
Idea 
Abstracts common functions and creates a uniform well defined 
interface for applications accessing data. 
1 Data Model 
all data stored in a well defined way 
2 Access control 
only authorized people get to see/modify it 
3 Concurrency control 
multiple concurrent applications access data 
4 Database recovery 
nothing gets accidentally lost 
5 Database maintenance 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 11 / 22 
Notes
Brief History of Data Management: 1970s 
 Edgar Codd proposes relational data model (1970) 
 firm mathematical foundation ! declarative queries 
 Charles Bachman wins ACM Turing award (1973) 
 The Programmer as Navigator 
 Peter Chen proposes E-R model (1976) 
 Transaction concepts (Jim Gray and others) 
 IBMs System R and UC Berkeleys Ingres systems demonstrate 
feasibility of relational DBMS (late 1970s) 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 12 / 22 
Notes
Three Level Schema Architecture 
Definition (Schema) 
A schema is a description of the data interface to the database (i.e., 
how the data is organized). 
1 External schema (view): what the application programs and user 
see. May differ for different users of the same database. 
2 Conceptual schema: description of the logical structure of all data 
in the database. 
3 Physical schema: description of physical aspects (selection of files, 
devices, storage algorithms, etc.) 
Definition (Instance) 
A database instance is a database (real data) that conforms to a 
given schema. 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 13 / 22 
Notes
Three-level Schema Architecture (cont.) 
External 
Schema 
Conceptual 
Schema 
Internal 
Schema 
Conceptual 
view 
Internal 
view 
External 
view 
External 
view 
External 
view 
Users/ 
Applications 
DBMS 
Database 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 14 / 22 
Notes
Data Independence 
Idea 
Applications do not access data directly but, rather through an 
abstract data model provided by the DBMS. 
Two kinds of data independence: 
Physical: applications immune to changes in storage structures 
Logical: applications immune to changes in data organization 
Note 
One of the most important reasons to use a DBMS! 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 15 / 22 
Notes
Interfacing to the DBMS 
Data Definition Language (DDL): for specifying schemas 
 may have different DDLs for external schema, conceptual schema, 
internal schema 
 information is stored in the data dictionary, or catalog 
Data Manipulation Language (DML): for specifying queries and 
updates 
 navigational (procedural) 
 non-navigational (declarative) 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 16 / 22 
Notes
Types of Database Users 
End user: 
 Accesses the database indirectly through forms or other 
query-generating applications, or 
 Generates ad-hoc queries using the DML. 
Application developer: 
 Designs and implements applications that access the database. 
Database administrator (DBA): 
 Manages conceptual schema. 
 Assists with application view integration. 
 Monitors and tunes DBMS performance. 
 Defines internal schema. 
 Loads and reformats database. 
 Is responsible for security and reliability. 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 17 / 22 
Notes
Transactions 
When multiple applications access the same data, undesirable results 
occur. 
Example: 
withdraw(AC,1000) withdraw(AC,500) 
Bal := getbal(AC) 
Bal := getbal(AC) 
if (Bal1000) if (Bal500) 
give-money give-money 
setbal(AC,Bal-1000) 
setbal(AC,Bal-500) 
Idea 
Every application may think it is the sole application accessing the 
data. The DBMS should guarantee correct execution. 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 18 / 22 
Notes
Transactions (contd) 
Definition (Transaction) 
An application-specified atomic and durable unit of work. 
Properties of transactions ensured by the DBMS: 
Atomic: a transaction occurs entirely, or not at all 
Consistency: each transaction preserves the consistency 
of the database 
Isolated: concurrent transactions do not interfere 
with each other 
Durable: once completed, a transactions changes 
are permanent 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 19 / 22 
Notes
Brief History of Data Management: 1980s 
 Development of commercial relational technology 
 IBM DB2, Oracle, Informix, Sybase 
 Edgar Codd wins ACM Turing award (1981) 
 SQL standardization efforts through ANSI and ISO 
 Object-oriented DBMSs 
 persistent objects 
 object ids, methods, inheritence 
 navigational interface reminicent of hierarchical model 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 20 / 22 
Notes
Brief History of Data Management: 1990s-Present 
 Continued expansion of SQL and system capabilities 
 New application areas: 
 the Internet 
 On-Line Analytic Processing (OLAP) 
 data warehousing 
 embedded systems 
 multimedia 
 XML 
 data streams 
 Jim Gray wins ACM Turing award (1998) 
 Relational DBMSs incorporate objects (late 1990s) 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 21 / 22 
Notes
Summary 
Using a DBMS to manage data helps: 
 to remove common code from applications 
 to provide uniform access to data 
 to guarantee data integrity 
 to manage concurrent access 
 to protect against system failure 
 to set access policies for data 
CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 22 / 22 
Notes

More Related Content

1 overview-handout-notes

  • 1. Overview of Data Management Grant Weddell David R. Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2014 CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 1 / 22 Notes
  • 2. Course Logistics Webpage www.cs.uwaterloo.ca/gweddell/cs348 Text Book Database Management Systems (3rd Edition). Raghu Ramakrishnan and Johannes Gehrke. McGraw Hill, 2000. CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 2 / 22 Notes
  • 3. Course Content Why do we use databases? Functionality provided by a Database Management System Database Models: Relational, Network, OO How do we use a DBMS? Relational model, foundational query languages SQL Application programming Transactions and concurrency How do we design a database? Entity-Relationship (ER) modeling Dependencies and constraints Redundancy and normal forms How do we administer a DBMS? Security and authorization Physical design/tuning CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 3 / 22 Notes
  • 4. What is a Database? Definition (Database) A large and persistent collection of (more-or-less similar) pieces of information organized in a way that facilitates efficient retrieval and modification. Examples: a file cabinet a library system a personnel management system Definition (Database Management System (DBMS)) A program (or set of programs) that manages details related to storage and access for a database. CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 4 / 22 Notes
  • 5. Application of Databases Original inventory control payroll banking and financial systems reservation systems More recent computer aided design (CAD) software development (CASE, SDE/SSE) telecommunication systems e-commerce dynamic/personalized web content CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 5 / 22 Notes
  • 6. Application of Databases (contd) Common Circumstances: There is lots of data (mass storage) Data is formatted Requirements: persistence and reliability efficient and concurrent access Issues: many files with different structure shared files or replicated data need to exchange data (translation programs) Note The data maintained by the system are much more important and valuable than the system itself. CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 6 / 22 Notes
  • 7. Brief History of Data Management: Ancient 2000 BC: Sumerian Records 350 BC: Syllogisms (Aristotle) 296 BC: Library of Alexandria 1879: Modern Logic (Frege) 1884: U.S. Census (Hollerith) 1941: Model Theory (Tarski) CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 7 / 22 Notes
  • 8. Brief History of Data Management: 1950s First generation 50s and 60s batch processing sequential files and tape input on punched cards Second generation (60s) disk enabled random access files new access methods (ISAM, hash files) mostly batch with some interactivity independent applications with separate files growing applications base CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 8 / 22 Notes
  • 9. Brief History of Data Management: 1960s (contd) As the application base grows, we end up with many shared files a multitude of file structures a need to exchange data among applications This causes a variety of problems redundancy: multiple copies inconsistency: independent updates inaccuracy: concurrent updates incompatibility: multiple formats insecurity: proliferation inauditability: poor chain of responsibility inflexibility: changes are difficult to apply CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 9 / 22 Notes
  • 10. Brief History of Data Management: 1960s (contd) Hierarchical data model IBMs Information Management System (IMS): concurrent access only allows 1:N parent-child relationships (i.e. a tree) hierarchy can be exploited for efficiency queries navigate up and down treesone record at a time data access language embedded in business processing language difficult to express some queries Network data model Charles Bachmans Integrated Data Store (IDS) model standardized by Conference On DAta SYstems Languages (CODASYL) data organized as collections of sets of records separation of physical data representation from users view of data pointers between records represent relationships set types encoded as lists queries navigate between recordsstill one record at a time CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 10 / 22 Notes
  • 11. Database Management System Idea Abstracts common functions and creates a uniform well defined interface for applications accessing data. 1 Data Model all data stored in a well defined way 2 Access control only authorized people get to see/modify it 3 Concurrency control multiple concurrent applications access data 4 Database recovery nothing gets accidentally lost 5 Database maintenance CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 11 / 22 Notes
  • 12. Brief History of Data Management: 1970s Edgar Codd proposes relational data model (1970) firm mathematical foundation ! declarative queries Charles Bachman wins ACM Turing award (1973) The Programmer as Navigator Peter Chen proposes E-R model (1976) Transaction concepts (Jim Gray and others) IBMs System R and UC Berkeleys Ingres systems demonstrate feasibility of relational DBMS (late 1970s) CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 12 / 22 Notes
  • 13. Three Level Schema Architecture Definition (Schema) A schema is a description of the data interface to the database (i.e., how the data is organized). 1 External schema (view): what the application programs and user see. May differ for different users of the same database. 2 Conceptual schema: description of the logical structure of all data in the database. 3 Physical schema: description of physical aspects (selection of files, devices, storage algorithms, etc.) Definition (Instance) A database instance is a database (real data) that conforms to a given schema. CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 13 / 22 Notes
  • 14. Three-level Schema Architecture (cont.) External Schema Conceptual Schema Internal Schema Conceptual view Internal view External view External view External view Users/ Applications DBMS Database CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 14 / 22 Notes
  • 15. Data Independence Idea Applications do not access data directly but, rather through an abstract data model provided by the DBMS. Two kinds of data independence: Physical: applications immune to changes in storage structures Logical: applications immune to changes in data organization Note One of the most important reasons to use a DBMS! CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 15 / 22 Notes
  • 16. Interfacing to the DBMS Data Definition Language (DDL): for specifying schemas may have different DDLs for external schema, conceptual schema, internal schema information is stored in the data dictionary, or catalog Data Manipulation Language (DML): for specifying queries and updates navigational (procedural) non-navigational (declarative) CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 16 / 22 Notes
  • 17. Types of Database Users End user: Accesses the database indirectly through forms or other query-generating applications, or Generates ad-hoc queries using the DML. Application developer: Designs and implements applications that access the database. Database administrator (DBA): Manages conceptual schema. Assists with application view integration. Monitors and tunes DBMS performance. Defines internal schema. Loads and reformats database. Is responsible for security and reliability. CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 17 / 22 Notes
  • 18. Transactions When multiple applications access the same data, undesirable results occur. Example: withdraw(AC,1000) withdraw(AC,500) Bal := getbal(AC) Bal := getbal(AC) if (Bal1000) if (Bal500) give-money give-money setbal(AC,Bal-1000) setbal(AC,Bal-500) Idea Every application may think it is the sole application accessing the data. The DBMS should guarantee correct execution. CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 18 / 22 Notes
  • 19. Transactions (contd) Definition (Transaction) An application-specified atomic and durable unit of work. Properties of transactions ensured by the DBMS: Atomic: a transaction occurs entirely, or not at all Consistency: each transaction preserves the consistency of the database Isolated: concurrent transactions do not interfere with each other Durable: once completed, a transactions changes are permanent CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 19 / 22 Notes
  • 20. Brief History of Data Management: 1980s Development of commercial relational technology IBM DB2, Oracle, Informix, Sybase Edgar Codd wins ACM Turing award (1981) SQL standardization efforts through ANSI and ISO Object-oriented DBMSs persistent objects object ids, methods, inheritence navigational interface reminicent of hierarchical model CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 20 / 22 Notes
  • 21. Brief History of Data Management: 1990s-Present Continued expansion of SQL and system capabilities New application areas: the Internet On-Line Analytic Processing (OLAP) data warehousing embedded systems multimedia XML data streams Jim Gray wins ACM Turing award (1998) Relational DBMSs incorporate objects (late 1990s) CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 21 / 22 Notes
  • 22. Summary Using a DBMS to manage data helps: to remove common code from applications to provide uniform access to data to guarantee data integrity to manage concurrent access to protect against system failure to set access policies for data CS 348 (Intro to DB Mgmt) Overview of Data Management Spring 2014 22 / 22 Notes