This document discusses different types of keyword indexing systems used for information retrieval, including KWIC (Keyword-In-Context), KWOC (Keyword-Out-Of-Context), and KWAC (Keyword-Augmented-In-Context). It provides examples and explanations of how each system works, including selecting keywords from document titles and presenting them with surrounding context or alone. Advantages are speed of production without indexing staff, while disadvantages include scattered related topics and need for extensive searching.
2. 2 | P a g e
Definition of Keyword Indexing:
An indexing system without controlling the vocabulary may be
referred as Natural Language Indexing or sometimes as Free
Text Indexing. Keyword indexing is also known as Natural
Language or Free Text Indexing. Keyword means catchword or
significant word or subject denoting word taken mainly from the
titles and / or sometimes from abstract or text of the document
for the purpose of indexing. Thus keyword indexing is based on
the natural language of the documents to generate index entries
and no controlled vocabulary is required for this indexing
system. Keyword indexing is not new. It existed in the nineteenth
century, when it was referred to as a catchword indexing.
Computers began to be used to aid information retrieval system
in the 1950s. H P Luhn and his associates produced and
distributed copies of machine produced permuted title indexes in
the International Conference of Scientific Information held at
Washington in 1958, which he named it as Keyword-In-Context
(KWIC) index and reported the method of generation of
KWIC index in a paper. American Chemical Society established the
value of KWIC after its adoption in 1961 for its publication
Chemical Titles.
Uses of Keyword Index:
A number of indexing and abstracting services prepare their
subject indexes by using keyword indexing techniques. They are
nothing but the variations of keyword indexing apart from those
mentioned above.
3. 3 | P a g e
Some notable examples are
1.Chemical Titles;
2. BASIC (Biological Abstracts Subject In Context);
3. Keyword Index of Chemical Abstracts;
4. CBAC (Chemical Biological Activities);
5. KWIT (Keyword-In-Title) of Laurence Burkeley Laboratory;
6. SWIFT (Selected Words in Full Titles); and
7. SAPIR (System of Automatic Processing and Indexing of
Reports).
Types of Keyword Indexing:
1. KWIC (Keyword-In-Context) Index:
H P Luhn is credited for the development of KWIC index. This
index was based on the keywords in the title of a paper and was
produced with the help of computers.
Each entry in KWIC index consists of following three
parts:
Keywords
Significant or subject denoting words which serve as approach
terms.
4. 4 | P a g e
Context:
Keywords selected also specify the particular context of the
document (i.e. usually the rest of the terms of the title).
Identification or Location Code:
Code used (usually the serial numbers of the entries in the main
part) to provide address of the document where full bibliographic
description of the document will be available.
KWIC Indexing Process
KWIC indexing system consists of three steps
Step I : Keyword selection
Step II : Entry generation
Step III : Filin
The Operational Stages of KWIC Indexing
Consist of the Following
a) Mark the significant words or prepare the stop list and keep
it in computer. The stop list refers to a list of words, which are
considered to have no value for indexing / retrieval. These may
include insignificant words like articles (a, an, the), prepositions,
conjunctions, pronouns, auxiliary verbs together with such
general words as aspect, different, very, etc. Each major search
system has defined its own stop list.
b) Selection of keywords from the title and / or abstract and /
5. 5 | P a g e
or full text of the document.
c) KWIC routine serves to rotate the title to make it accessible
from each significant term. In view of this, manipulate the title or
title like phrase in such a way that each keyword serves as the
approach term and comes in the beginning (or in the middle) by
rotation followed by rest of the title.
c) KWIC routine serves to rotate the title to make it accessible
from each significant term. In view of this, manipulate the title or
title like phrase in such a way that each keyword serves as the
approach term and comes in the beginning (or in the middle) by
rotation followed by rest of the title.
d) Separate the last word and first word of the title by using a
symbol say, stroke [/] (sometime an asterisk * is used) in an
entry. Keywords are usually printed in bold type face.
e) Put the identification / location code at the right end of each
entry; and finally
f) Arrange the entries alphabetically by keywords.
Example of KWIC indexing
Title -Classification of Books in a University Library (with
identification code 1279
Step I: Classification Books University Library
Step II: CLASSIFICATION of Books in a University Library 1279
Books in a University Library/Classification of 1279
6. 6 | P a g e
UNIVERSITY Library/Classification of Books in 1279
LIBRARY/Classification of Books in University 1279
Step III: Books in a University Library/Classification of 1279
CLASSIFICATION of Books in a University Library 1279
LIBRARY/Classification of Books in a University 1279
UNIVERSITY Library/Classification of Books in a 1279
The keyword may also be in the centre as follows:
Classification of BOOKS in a University Library 1279
University Library CLASSIFICATION of Books in a 1279
In a University LIBRARY/Classification of Books 1279
of Books in a UNIV. LIBRARY/Classification 1279
2. KWOC (key-word out-of-context) Index
The KWOC is a variant of KWIC index. Here, each keyword is
taken out and printed separately in the left hand margin with the
complete title in its normal order printed to the right.
Examples,
Title: Computerisation of Libraries in India
Format 1
COMPUTERISATION Computerisation of libraries in India 1289
INDIA Computerisation of libraries in India 1289
LIBRARIES Computerisation of libraries in Indian 1289
7. 7 | P a g e
Format 2
COMPUTERISATION
Computerization of libraries in India 1289
INDIA
Computerisation of libraries in India 1289
LIBRARIES
Computerisation of libraries in India 1289
These entries are then filed in an alphabetical sequence in the file
of the KWOC index.
It should be noted that the changing of format in KWOC
index has provided only limited improvement. Since it follows the
same indexing technique there is hardly any difference in its
retrieval efficiency.
3. KWAC (key-word Augmented-in-context)
Index:
KWAC also stands for key-word-and-context. In many cases,
title cannot always represent the thought content of the
document co-extensively. KWIC and KWOC could not solve the
problem of the retrieval of irrelevant document.
In order to solve the problem of false drops, KWAC provides
the enrichment of the keywords of the title with additional
keywords taken either from the abstract or from the original text
of the document and are inserted into the title or added at the end
to give further index entries. KWAC is also called enriched KWIC
or KWOC. CBAC (Chemical Biological Activities) of BIOSIS uses
KWAC index where title is enriched by another title like phrase
formulated by the indexer.
Example
8. 8 | P a g e
A title of a document is Expert System. Here in this case the
title is not clearly expressing the contents of the document. So the
abstract of the document or even the contents itself may be
consulted to find the significant words, which should be added to
the title to make it expressive. E.g. the above example may result
in, Expert System in Library then the index should be prepared
either by KWIC or by KWAC system.
4. Key-Term Alphabetical (KEYTALPHA)
In the Key-Term Alphabetical index, keywords are arranged
side by side without forming a sentence. Entries are prepared
containing only keywords and location excluding the context.
Example
Computerisation of libraries in India
The KEYTALPHA index entries are:
COMPUTERISATION, INDIAN, LIBRARIES 1289
INDIA, LIBRARIES, COMPUTERISATION 1289
LIBRARIES, COMPUTERISATION, INDIA 1289
Advantages:
1) The principal merit of keyword indexing is the speed with
which it can be produced.
2) The production of keyword index does not involve trained
indexing staff. What is required is an expressive title coextensive
9. 9 | P a g e
to the specific subject of the document.
3) Involves minimum intellectual effort.
4) Vocabulary control need not be used.
5) Satisfied
Disadvantages:
1) Most of the terms used in science and technology are
standardized, but the situation is different in case of Humanities
and Social Sciences.
2) Related topics are scattered. The efficiency of keyword
indexing is invariably the question of reliability of expressive title
of document as most such indexes are based on titles.
3) Search of a topic may have to be done under several keywords.
4) Search time is high.
5) Searchers very often lead to high recall and low precision.
6) Fails to meet the exhaustive approach for a large collection are
the current approaches of users.
Search Strategy for Keyword Indexes
In the keyword indexes significant terms of the titles of
documents are arranged alphabetically, each having its context
and the identification number. There is no vocabulary control
and, therefore, related or identical subjects are scattered
10. 10 | P a g e
throughout the index file. There is no reference system to connect
or correlate the related or identical topics. While formulating
search strategy, these limitations should be kept in mind. The
user should search under the synonyms of the words and also
under the related terms. When titles are improved and
supplemented by the editors, the search yields better results. The
keyword indexes do not provide for the coordination of two or
more search words. In search strategy this limitation should also
be kept in mind. Also the users of these indexes should be
prepared to search under the terms with alternative, spelling
singular plurals, synonyms and near synonyms. Because of
the uncontrolled vocabulary, the number of search terms is
considerably enlarged necessitating more search efforts.
Conclusion
Despite the deficiencies, the keyword index has been quite
popular during the last four decades. A number of evaluation
studies have indicated that keyword indexes may offer several
advantages over others. The continued growth of machine
readable database has shown that the use of keyword indexes
works well. The problem of un-expressive titles is solved to a
considerable extent by editorial intervention. It is true that Key
Word Indexes as such will not facilitate comprehensive
search. Production of any index taking care of comprehensive
search takes time, money and effort. Key Word Index was never
envisaged to provide comprehensive subject index. It is
a mechanism of providing quick and specific subject approach to
information which Luhn envisaged it to be.