The document describes part of a school database containing student records and class enrollment information. It identifies several issues with data integrity and redundancy that could impact the database's ability to reliably store and retrieve information. Specifically, it notes typos, duplicate or inconsistent data, and non-key fields being included could lead to incorrect or ambiguous results when querying the data. Relational databases aim to reduce these problems by structuring data into normalized tables linked by unique keys.
2. School Database
Student ID Student Student Form Sex Age Class Class Room
Surname FirstName Name Teacher Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
S10412 CRIST Graeme 10D M 14 GEO101 RW 532
S10412 CRIST Graeme 10D M 14 HIS101 HJD 541
S10101 CRIST Graeme 10D M 14 Maths101 PRJ 651
S10414 HASLETT Oliver 10D M 14 Maths101 PRJ 651
S10101 FUNG JUNE 10A F 15 Maths101 PRJ 651
S10101 FUNG JUNE 10B F 15 GEO101 HJD 532
S10101 FUNF JUNE 10A F 15 HIS101 HJD 541
This is part of a database showing all students and
their classes in a school.
3. Data integrity and Data Redundancy
These issues can have a significant impact on
the ability for a database to actually do the
job that it is intended to do.
Data Integrity is the idea that information
always keeps its meaning. This is particularly
difficult when databases get really big. If you
imagine the database above includes every
student and every class in the school it could
contain over 10000 records.
4. School Database
Student ID Student Student Form Sex Age Class Class Room
Surname FirstName Name Teacher Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
S10412 CRIST Graeme 10D M 14 GEO101 RW 532
S10412 CRIST Graeme 10D M 14 HIS101 HJD 541
S10101 CRIST Graeme 10D M 14 Maths101 PRJ 651
S10414 HASLETT Oliver 10D M 14 Maths101 PRJ 651
S10101 FUNG JUNE 10A F 15 Maths101 PRJ 651
S10101 FUNG JUNE 10B F 15 GEO101 HJD 532
S10101 FUNF JUNE 10A F 15 HIS101 HJD 541
This is part of a database showing all students and their classes in a school.
5. Examples of Data integrity
Examples of Data integrity
A simple typo means that it might mean
that information gets lost if we search for June
in the data. This problem is much more likely
because of the redundancies issues we will look
at later.
6. School Database
Student ID Student Student Form Sex Age Class Class Room
Surname FirstName Name Teacher Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
S10412 CRIST Graeme 10D M 14 GEO101 RW 532
S10412 CRIST Graeme 10D M 14 HIS101 HJD 541
S10101 CRIST Graeme 10D M 14 Maths101 PRJ 651
S10414 HASLETT Oliver 10D M 14 Maths101 PRJ 651
S10101 FUNG JUNE 10A F 15 Maths101 PRJ 651
S10101 FUNG JUNE 10B F 15 GEO101 HJD 532
S10101 FUNF JUNE 10A F 15 HIS101 HJD 541
This is part of a database showing all students and their classes in a school.
7. Examples of Data integrity
Examples of Data integrity
This example can cause real problems it
shows that the same information in the
database can mean two different things is
s10101 Graeme or June.
8. School Database
Student ID Student Student Form Sex Age Class Class Room
Surname FirstName Name Teacher Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
S10412 CRIST Graeme 10D M 14 GEO101 RW 532
S10412 CRIST Graeme 10D M 14 HIS101 HJD 541
S10101 CRIST Graeme 10D M 14 Maths101 PRJ 651
S10414 HASLETT Oliver 10D M 14 Maths101 PRJ 651
S10101 FUNG JUNE 10A F 15 Maths101 PRJ 651
S10101 FUNG JUNE 10B F 15 GEO101 HJD 532
S10101 FUNF JUNE 10A F 15 HIS101 HJD 541
This is part of a database showing all students and their classes in a school.
9. Examples of Data integrity
Examples of Data integrity
Here we have June in two classes? Which
one is right? What if we need to find June in
registration urgently
10. Hard to find?
They can be really hard to find in this Database
there are more integrity errors can you find
them?
11. Student ID Student Student Form Sex Age Class Class Room
Surname FirstName Name Teacher Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
S10412 CRIST Graeme 10D M 14 GEO101 RW 532
S10412 CRIST Graeme 10D M 14 HIS101 HJD 541
S10101 CRIST Graeme 10D M 14 Maths101 PRJ 651
S10414 HASLETT Oliver 10D M 14 Maths101 PRJ 651
S10101 FUNG JUNE 10A F 15 Maths101 PRJ 651
S10101 FUNG JUNE 10B F 15 GEO101 HJD 532
S10101 FUNF JUNE 10A F 15 HIS101 HJD 541
12. Bad or unnecessary Fields Names
Age is an example of a bad field name for the
obvious reason that it can change at anytime.
It would be much better to use DOB as age
can always be calculated and ofcourse it does
not change.
Form may also be unnecessary in this case as
the reference number is CODED to give the
form.
Can you explain the code?
13. Data Redundancy
Data redundancy is the where data needs to be
entered and stored on more than one occasion.
For example if you look at the table you can see
that we have entered Oliver Hasletts first name
and surname every time we put him into a
class. This can cause problems such as
increasing the chance of someone entering his
name wrong. It also means more time is
needed to enter data and more space is needed
when it is saved.
14. School Database
Student ID Student Student Form Sex Age Class Class Room
Surname FirstName Name Teacher Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
S10412 CRIST Graeme 10D M 14 GEO101 RW 532
S10412 CRIST Graeme 10D M 14 HIS101 HJD 541
S10101 CRIST Graeme 10D M 14 Maths101 PRJ 651
S10414 HASLETT Oliver 10D M 14 Maths101 PRJ 651
S10101 FUNG JUNE 10A F 15 Maths101 PRJ 651
S10101 FUNG JUNE 10B F 15 GEO101 HJD 532
S10101 FUNF JUNE 10A F 15 HIS101 HJD 541
Table showing data redundancy .
15. Relational Databases
The simple way to think about this is that
when you create a table of data in a database
you should have fields which directly relate to
the key field of the table.
Student Student Student First Form Sex Age Class Name Class Teacher Room
ID Surname Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
16. Relational Databases
In our table we can see that the Class teacher,
Class Name and Room Number are not related
to the Student. The Student ID is unique so
the first name, surname, form, Sex is all
related to the student but the other fields do
not.
Student Student Student First Form Sex Age Class Name Class Teacher Room
ID Surname Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
17. Relational Databases
We can now look at how to remove redundancy
issues a process called normalization.
This can be very complex and we are only going
to look it is a simple way.
Student Student Student First Form Sex Age Class Name Class Teacher Room
ID Surname Number
S10204 HALL Taighan 10B F 13 Maths101 PRJ 651
S10204 HALL Taighan 10B M 13 GEO101 RW 532
S10204 HALL Taighan 10B F 13 SCI10 CW 161
18. Removing Redundancy
(Normalization)
Instead of one single table we create 2 tables as
shown below.
CLASS STUDENT
ClassID StudentID
Class Name Student First Name
Class Teacher Student Surname
Room Number Student DOB
Student Form
19. Removing Redundancy
(Normalization)
CLASS STUDENT
ClassID StudentID
Class Name Student First Name
Class Teacher Student Surname
Room Number Student DOB
Student Form
Unfortunately these tables have what is called a many to
many relationship. That is there are many students in
each class and there are many classes for each student.
This kind of relationship will not work for what we need.
20. Step 2
To solve this problem we create an extra table which contains
the StudentID and ClassID for each student in each class.
What this does is remove the many to much relationship and
replace it with a one to many or many to one relationship.
That is one student can be in many classes and one class has
many students.
We can use the relationships to find the information we need in
each case making use of something called Queries that we will
look at later.
21. How it looks
STUDENT
CLASS
StudentID
ClassID
FirstName
Class Name
Surname
Class Teacher
DOB
Room Number
Form
STUDENTTAKES
StudentID
ClassID
This goes a long way to solving our problem but there remains
a potential problem with the room number field which could
give use problems in what we want to do with the data. Why ?
22. STEP 3 (optional?)
STUDENT
CLASS
StudentID
ClassID
FirstName STUDENTTAKES Class Name
Surname StudentID Class Teacher
DOB ClassID Room Number
Form
Room number is not actually related to the class. What happens if a
class is in more than one room for example? If you were to do the
normalization properly you would need to remove Room Number
and create another table.
23. Entity Relationship Diagram
This is called an Entity Relationship Diagram and they are
used to show the data structure in a database.
Student StudentTakes Class
Room Class
Room
24. Try it for your self
Lets look at our example we are going to create a database to
store information about videos stored in our library and students
who can borrower from the library. Using the fields listed below
you are going to need to create the table structure for our
Database in Normalized form.
Video Video Video Video Student Student Student Video Borrow
Title Year Description Genre First Name Surname ID ID Data
Titanic 1997 Film about a Romance Kate Winslet 101234 R101 01/12/11
sinking ship
Titanic 1997 Film about a Romance Leonardo Di Caprio 120123 R101 06/12/11
sinking ship
Star 1977 Film about space Space Mark Hamill 130232 s101 05/11/11
Wars
Star 1977 Film about space Space Harrison Ford 101222 101 04/12/11
Wars