This document discusses different file organization methods including sequential files, indexed sequential files, indexed files, and direct/hashed files. Sequential files store records in the order they are entered with each record having a fixed format. Indexed sequential files add an index to allow random access by key fields while maintaining sequential ordering. Indexed files use multiple indexes on different keys to allow searching by different fields. Direct/hashed files directly access records by key values using hashing techniques for fast random access.
1 of 17
Downloaded 685 times
More Related Content
File Organization
1. FILES ORGANIZATION
PRESENTED BY :
KU MAN YI B031210161
MOHAMMAD SAFAR A/L SHARIFF MOHAMED B031210227
ADAM RIDHWAN BIN SUKIMAN B031210083
SOO BOON JIAN B031210199
MUHAMMAD ZULKHAIRI BIN ZAINI B031210140
3. THE PILE
ï‚¢ A form of file organization where data are collected in
the same order they arrived
ï‚¢ This organization simply accumulate mass of data and
save it
ï‚¢ Each field is self-describing, includes a field name and a
value. Length of each field is determined as shown
ï‚— Implicitly indicated by delimiters
ï‚— Explicitly included as subfield or
ï‚— Done by default for that particular field type
ï‚¢ Records may have different fields and there is no visible
structure. Hence, record access is through exhaustive
search where all record or the entire file is examined
ï‚¢ This type of file uses space efficiently and is updated
easily. However, retrieval of a single record could be
tedious
4. THE PILE ORGANIZATION
Variable-length records Delimiter
Delimiter
Delimiter
Delimiter
Delimiter
Delimiter
Variable-length records
Variable-length records
Variable-length records
Variable-length records
5. THE SEQUENTIAL FILE
ï‚¢ Most common form of file structure
ï‚¢ Fixed format are used for records and records are of the
same length
ï‚¢ Field names and lengths are attributes of the file
ï‚¢ One particular field (usually first field) is referred to as
the key field :-
ï‚— Uniquely identifies the record
ï‚— Records are stored in key sequence
ï‚¢ Alphabetical order or numerical order depending on the key field
ï‚¢ It is the simplest structure and easy to implement in any
device. However, in the worst case scenario, accessing
a single record takes a long time.
7. THE SEQUENTIAL FILE ORGANIZATION
ï‚¢ To enable a sequential form of records, new
records are placed in a log file or transaction file.
Then, a batch update is performed to merge the log
file with the master file to produce a new file with
the correct key sequence
1 2 n-1 n
…
Record
Terminators
8. THE INDEXED SEQUENTIAL FILE
ï‚¢ A file management system that allows records to be
accessed either sequentially (in the order they were
entered) or randomly (with an index)
ï‚¢ A secondary set of hash tables known as indexes is
created that contains pointers to the main file
ï‚¢ In indexed sequential file, records are organized in
sequence based on key fields
ï‚¢ Each file has an index to support random search
ï‚¢ Overflow file is added such as each record in
overflow file is located by following a pointer from
its predecessor record in main file
9. DESCRIPTION
ï‚¢ As an example:-
ï‚¢ Firstly, a single level of indexing is used. Hence, the
index is a simple sequential file.
ï‚¢ Each record in the index file consists of:-
ï‚— Key field(same as key field in the main file)
ï‚— Pointer to the main file
ï‚¢ To find a specific field, the index file is searched for
matching key values
ï‚¢ Then, the pointer indicates the record having the
matching key values as depicted by figure below
ï‚¢ This reduces the average search length
11. ï‚¢ There is an addition to the organization where each record in
the main file contains an additional field which is a pointer to
the overflow file
ï‚¢ When a new record is added, it is added into the overflow file.
The record in main file which is immediately before the new
record in a logical sequence to be inserted is updated to
contain a pointer to the new record in the overflow file
ï‚¢ However, if the record before the new record in logical
sequence is itself in the overflow file, then the pointer in that
record is updated
ï‚¢ The processing of entire file sequentially involves processing
of records of main files sequentially until the pointer to the
overflow file is found, then accessing is continued in the
overflow file until a null pointer is encountered
ï‚¢ Secondly, we visualize the organization using multiple levels
of indexes
ï‚¢ The lowest level of index points to the main file, whereas the
index files sitting on the level above points to the index file
below it
ï‚¢ Hence, the efficiency in access is greatly increased and
average length of search is greatly reduced as conveyed in
figure below
13. THE INDEXED FILE
ï‚¢ Uses multiple indexes for different key fields which
may be the subject of a search
ï‚¢ Record accessibility are through their indexes
ï‚¢ May contain an exhaustive index that contains one
entry for every record in the main file. The index
itself is organized in the sequential form for ease of
searching
ï‚¢ May contain a partial index. It contains entries to
records where the desired field exists
14. THE INDEXED FILE ORGANIZATION
Primary file
Exhaustive PartialExhaustive
Index Attributes
Pointer
Indicator
15. THE DIRECT/HASHED FILE
ï‚¢ This file management system that access directly any
block of a known address.
ï‚¢ A key field is required for each record.
ï‚¢ Uses hashing on the key value.
ï‚¢ No concept of sequential ordering implemented.
ï‚¢ Such examples are:-
ï‚— Directories
ï‚— Pricing table
ï‚— Schedules
ï‚— Name lists
ï‚¢ It is used where rapid access is required, where fixed-
length records are used and where records are
accessed one-at-a-time.
ï‚¢ The concept of hashing can be shown as below.
17. HASHING METHODS
ï‚¢ Direct method
ï‚— The key is the address
ï‚— a key value might be between 1-100. The address of a certain
records will be the same as the key.
ï‚¢ Subtraction method
ï‚— Subtract certain amount of numbers from the key
ï‚— a key value might start with 1001 and ends with 1100. A simple
hashing function could subtract 1000 from the key to determine
the address.
ï‚¢ Modulo-division method
ï‚— Key value is divided by the size of records.
ï‚— the remainder becomes the address of the key.
ï‚¢ Digit-extraction method
ï‚— Selected digits are extracted from the key
ï‚— As an example:- a field has 10 digits. The hashing function only
extracts the first 2 digits and the last digit and use them as
address.