際際滷

際際滷Share a Scribd company logo
Keyword Indexing 
University of Calcutta 
11/21/2014 
Name -Sourav Sarkar. 
Roll no- 32
2 | P a g e 
Definition of Keyword Indexing: 
An indexing system without controlling the vocabulary may be 
referred as Natural Language Indexing or sometimes as Free 
Text Indexing. Keyword indexing is also known as Natural 
Language or Free Text Indexing. Keyword means catchword or 
significant word or subject denoting word taken mainly from the 
titles and / or sometimes from abstract or text of the document 
for the purpose of indexing. Thus keyword indexing is based on 
the natural language of the documents to generate index entries 
and no controlled vocabulary is required for this indexing 
system. Keyword indexing is not new. It existed in the nineteenth 
century, when it was referred to as a catchword indexing. 
Computers began to be used to aid information retrieval system 
in the 1950s. H P Luhn and his associates produced and 
distributed copies of machine produced permuted title indexes in 
the International Conference of Scientific Information held at 
Washington in 1958, which he named it as Keyword-In-Context 
(KWIC) index and reported the method of generation of 
KWIC index in a paper. American Chemical Society established the 
value of KWIC after its adoption in 1961 for its publication 
Chemical Titles. 
Uses of Keyword Index: 
A number of indexing and abstracting services prepare their 
subject indexes by using keyword indexing techniques. They are 
nothing but the variations of keyword indexing apart from those 
mentioned above.
3 | P a g e 
Some notable examples are 
1.Chemical Titles; 
2. BASIC (Biological Abstracts Subject In Context); 
3. Keyword Index of Chemical Abstracts; 
4. CBAC (Chemical Biological Activities); 
5. KWIT (Keyword-In-Title) of Laurence Burkeley Laboratory; 
6. SWIFT (Selected Words in Full Titles); and 
7. SAPIR (System of Automatic Processing and Indexing of 
Reports). 
Types of Keyword Indexing: 
1. KWIC (Keyword-In-Context) Index: 
H P Luhn is credited for the development of KWIC index. This 
index was based on the keywords in the title of a paper and was 
produced with the help of computers. 
Each entry in KWIC index consists of following three 
parts: 
Keywords 
Significant or subject denoting words which serve as approach 
terms.
4 | P a g e 
Context: 
Keywords selected also specify the particular context of the 
document (i.e. usually the rest of the terms of the title). 
Identification or Location Code: 
Code used (usually the serial numbers of the entries in the main 
part) to provide address of the document where full bibliographic 
description of the document will be available. 
KWIC Indexing Process 
KWIC indexing system consists of three steps 
Step I : Keyword selection 
Step II : Entry generation 
Step III : Filin 
The Operational Stages of KWIC Indexing 
Consist of the Following 
a) Mark the significant words or prepare the stop list and keep 
it in computer. The stop list refers to a list of words, which are 
considered to have no value for indexing / retrieval. These may 
include insignificant words like articles (a, an, the), prepositions, 
conjunctions, pronouns, auxiliary verbs together with such 
general words as aspect, different, very, etc. Each major search 
system has defined its own stop list. 
b) Selection of keywords from the title and / or abstract and /
5 | P a g e 
or full text of the document. 
c) KWIC routine serves to rotate the title to make it accessible 
from each significant term. In view of this, manipulate the title or 
title like phrase in such a way that each keyword serves as the 
approach term and comes in the beginning (or in the middle) by 
rotation followed by rest of the title. 
c) KWIC routine serves to rotate the title to make it accessible 
from each significant term. In view of this, manipulate the title or 
title like phrase in such a way that each keyword serves as the 
approach term and comes in the beginning (or in the middle) by 
rotation followed by rest of the title. 
d) Separate the last word and first word of the title by using a 
symbol say, stroke [/] (sometime an asterisk * is used) in an 
entry. Keywords are usually printed in bold type face. 
e) Put the identification / location code at the right end of each 
entry; and finally 
f) Arrange the entries alphabetically by keywords. 
Example of KWIC indexing 
Title -Classification of Books in a University Library (with 
identification code 1279 
Step I: Classification Books University Library 
Step II: CLASSIFICATION of Books in a University Library 1279 
Books in a University Library/Classification of 1279
6 | P a g e 
UNIVERSITY Library/Classification of Books in 1279 
LIBRARY/Classification of Books in University 1279 
Step III: Books in a University Library/Classification of 1279 
CLASSIFICATION of Books in a University Library 1279 
LIBRARY/Classification of Books in a University 1279 
UNIVERSITY Library/Classification of Books in a 1279 
The keyword may also be in the centre as follows: 
Classification of BOOKS in a University Library 1279 
University Library CLASSIFICATION of Books in a 1279 
In a University LIBRARY/Classification of Books 1279 
of Books in a UNIV. LIBRARY/Classification 1279 
2. KWOC (key-word out-of-context) Index 
The KWOC is a variant of KWIC index. Here, each keyword is 
taken out and printed separately in the left hand margin with the 
complete title in its normal order printed to the right. 
Examples, 
Title: Computerisation of Libraries in India 
Format 1 
COMPUTERISATION Computerisation of libraries in India 1289 
INDIA Computerisation of libraries in India 1289 
LIBRARIES Computerisation of libraries in Indian 1289
7 | P a g e 
Format 2 
COMPUTERISATION 
Computerization of libraries in India 1289 
INDIA 
Computerisation of libraries in India 1289 
LIBRARIES 
Computerisation of libraries in India 1289 
These entries are then filed in an alphabetical sequence in the file 
of the KWOC index. 
It should be noted that the changing of format in KWOC 
index has provided only limited improvement. Since it follows the 
same indexing technique there is hardly any difference in its 
retrieval efficiency. 
3. KWAC (key-word Augmented-in-context) 
Index: 
KWAC also stands for key-word-and-context. In many cases, 
title cannot always represent the thought content of the 
document co-extensively. KWIC and KWOC could not solve the 
problem of the retrieval of irrelevant document. 
In order to solve the problem of false drops, KWAC provides 
the enrichment of the keywords of the title with additional 
keywords taken either from the abstract or from the original text 
of the document and are inserted into the title or added at the end 
to give further index entries. KWAC is also called enriched KWIC 
or KWOC. CBAC (Chemical Biological Activities) of BIOSIS uses 
KWAC index where title is enriched by another title like phrase 
formulated by the indexer. 
Example
8 | P a g e 
A title of a document is Expert System. Here in this case the 
title is not clearly expressing the contents of the document. So the 
abstract of the document or even the contents itself may be 
consulted to find the significant words, which should be added to 
the title to make it expressive. E.g. the above example may result 
in, Expert System in Library then the index should be prepared 
either by KWIC or by KWAC system. 
4. Key-Term Alphabetical (KEYTALPHA) 
In the Key-Term Alphabetical index, keywords are arranged 
side by side without forming a sentence. Entries are prepared 
containing only keywords and location excluding the context. 
Example 
Computerisation of libraries in India 
The KEYTALPHA index entries are: 
COMPUTERISATION, INDIAN, LIBRARIES 1289 
INDIA, LIBRARIES, COMPUTERISATION 1289 
LIBRARIES, COMPUTERISATION, INDIA 1289 
Advantages: 
1) The principal merit of keyword indexing is the speed with 
which it can be produced. 
2) The production of keyword index does not involve trained 
indexing staff. What is required is an expressive title coextensive
9 | P a g e 
to the specific subject of the document. 
3) Involves minimum intellectual effort. 
4) Vocabulary control need not be used. 
5) Satisfied 
Disadvantages: 
1) Most of the terms used in science and technology are 
standardized, but the situation is different in case of Humanities 
and Social Sciences. 
2) Related topics are scattered. The efficiency of keyword 
indexing is invariably the question of reliability of expressive title 
of document as most such indexes are based on titles. 
3) Search of a topic may have to be done under several keywords. 
4) Search time is high. 
5) Searchers very often lead to high recall and low precision. 
6) Fails to meet the exhaustive approach for a large collection are 
the current approaches of users. 
Search Strategy for Keyword Indexes 
In the keyword indexes significant terms of the titles of 
documents are arranged alphabetically, each having its context 
and the identification number. There is no vocabulary control 
and, therefore, related or identical subjects are scattered
10 | P a g e 
throughout the index file. There is no reference system to connect 
or correlate the related or identical topics. While formulating 
search strategy, these limitations should be kept in mind. The 
user should search under the synonyms of the words and also 
under the related terms. When titles are improved and 
supplemented by the editors, the search yields better results. The 
keyword indexes do not provide for the coordination of two or 
more search words. In search strategy this limitation should also 
be kept in mind. Also the users of these indexes should be 
prepared to search under the terms with alternative, spelling 
singular plurals, synonyms and near synonyms. Because of 
the uncontrolled vocabulary, the number of search terms is 
considerably enlarged necessitating more search efforts. 
Conclusion 
Despite the deficiencies, the keyword index has been quite 
popular during the last four decades. A number of evaluation 
studies have indicated that keyword indexes may offer several 
advantages over others. The continued growth of machine 
readable database has shown that the use of keyword indexes 
works well. The problem of un-expressive titles is solved to a 
considerable extent by editorial intervention. It is true that Key 
Word Indexes as such will not facilitate comprehensive 
search. Production of any index taking care of comprehensive 
search takes time, money and effort. Key Word Index was never 
envisaged to provide comprehensive subject index. It is 
a mechanism of providing quick and specific subject approach to 
information which Luhn envisaged it to be.

More Related Content

Sourav

  • 1. Keyword Indexing University of Calcutta 11/21/2014 Name -Sourav Sarkar. Roll no- 32
  • 2. 2 | P a g e Definition of Keyword Indexing: An indexing system without controlling the vocabulary may be referred as Natural Language Indexing or sometimes as Free Text Indexing. Keyword indexing is also known as Natural Language or Free Text Indexing. Keyword means catchword or significant word or subject denoting word taken mainly from the titles and / or sometimes from abstract or text of the document for the purpose of indexing. Thus keyword indexing is based on the natural language of the documents to generate index entries and no controlled vocabulary is required for this indexing system. Keyword indexing is not new. It existed in the nineteenth century, when it was referred to as a catchword indexing. Computers began to be used to aid information retrieval system in the 1950s. H P Luhn and his associates produced and distributed copies of machine produced permuted title indexes in the International Conference of Scientific Information held at Washington in 1958, which he named it as Keyword-In-Context (KWIC) index and reported the method of generation of KWIC index in a paper. American Chemical Society established the value of KWIC after its adoption in 1961 for its publication Chemical Titles. Uses of Keyword Index: A number of indexing and abstracting services prepare their subject indexes by using keyword indexing techniques. They are nothing but the variations of keyword indexing apart from those mentioned above.
  • 3. 3 | P a g e Some notable examples are 1.Chemical Titles; 2. BASIC (Biological Abstracts Subject In Context); 3. Keyword Index of Chemical Abstracts; 4. CBAC (Chemical Biological Activities); 5. KWIT (Keyword-In-Title) of Laurence Burkeley Laboratory; 6. SWIFT (Selected Words in Full Titles); and 7. SAPIR (System of Automatic Processing and Indexing of Reports). Types of Keyword Indexing: 1. KWIC (Keyword-In-Context) Index: H P Luhn is credited for the development of KWIC index. This index was based on the keywords in the title of a paper and was produced with the help of computers. Each entry in KWIC index consists of following three parts: Keywords Significant or subject denoting words which serve as approach terms.
  • 4. 4 | P a g e Context: Keywords selected also specify the particular context of the document (i.e. usually the rest of the terms of the title). Identification or Location Code: Code used (usually the serial numbers of the entries in the main part) to provide address of the document where full bibliographic description of the document will be available. KWIC Indexing Process KWIC indexing system consists of three steps Step I : Keyword selection Step II : Entry generation Step III : Filin The Operational Stages of KWIC Indexing Consist of the Following a) Mark the significant words or prepare the stop list and keep it in computer. The stop list refers to a list of words, which are considered to have no value for indexing / retrieval. These may include insignificant words like articles (a, an, the), prepositions, conjunctions, pronouns, auxiliary verbs together with such general words as aspect, different, very, etc. Each major search system has defined its own stop list. b) Selection of keywords from the title and / or abstract and /
  • 5. 5 | P a g e or full text of the document. c) KWIC routine serves to rotate the title to make it accessible from each significant term. In view of this, manipulate the title or title like phrase in such a way that each keyword serves as the approach term and comes in the beginning (or in the middle) by rotation followed by rest of the title. c) KWIC routine serves to rotate the title to make it accessible from each significant term. In view of this, manipulate the title or title like phrase in such a way that each keyword serves as the approach term and comes in the beginning (or in the middle) by rotation followed by rest of the title. d) Separate the last word and first word of the title by using a symbol say, stroke [/] (sometime an asterisk * is used) in an entry. Keywords are usually printed in bold type face. e) Put the identification / location code at the right end of each entry; and finally f) Arrange the entries alphabetically by keywords. Example of KWIC indexing Title -Classification of Books in a University Library (with identification code 1279 Step I: Classification Books University Library Step II: CLASSIFICATION of Books in a University Library 1279 Books in a University Library/Classification of 1279
  • 6. 6 | P a g e UNIVERSITY Library/Classification of Books in 1279 LIBRARY/Classification of Books in University 1279 Step III: Books in a University Library/Classification of 1279 CLASSIFICATION of Books in a University Library 1279 LIBRARY/Classification of Books in a University 1279 UNIVERSITY Library/Classification of Books in a 1279 The keyword may also be in the centre as follows: Classification of BOOKS in a University Library 1279 University Library CLASSIFICATION of Books in a 1279 In a University LIBRARY/Classification of Books 1279 of Books in a UNIV. LIBRARY/Classification 1279 2. KWOC (key-word out-of-context) Index The KWOC is a variant of KWIC index. Here, each keyword is taken out and printed separately in the left hand margin with the complete title in its normal order printed to the right. Examples, Title: Computerisation of Libraries in India Format 1 COMPUTERISATION Computerisation of libraries in India 1289 INDIA Computerisation of libraries in India 1289 LIBRARIES Computerisation of libraries in Indian 1289
  • 7. 7 | P a g e Format 2 COMPUTERISATION Computerization of libraries in India 1289 INDIA Computerisation of libraries in India 1289 LIBRARIES Computerisation of libraries in India 1289 These entries are then filed in an alphabetical sequence in the file of the KWOC index. It should be noted that the changing of format in KWOC index has provided only limited improvement. Since it follows the same indexing technique there is hardly any difference in its retrieval efficiency. 3. KWAC (key-word Augmented-in-context) Index: KWAC also stands for key-word-and-context. In many cases, title cannot always represent the thought content of the document co-extensively. KWIC and KWOC could not solve the problem of the retrieval of irrelevant document. In order to solve the problem of false drops, KWAC provides the enrichment of the keywords of the title with additional keywords taken either from the abstract or from the original text of the document and are inserted into the title or added at the end to give further index entries. KWAC is also called enriched KWIC or KWOC. CBAC (Chemical Biological Activities) of BIOSIS uses KWAC index where title is enriched by another title like phrase formulated by the indexer. Example
  • 8. 8 | P a g e A title of a document is Expert System. Here in this case the title is not clearly expressing the contents of the document. So the abstract of the document or even the contents itself may be consulted to find the significant words, which should be added to the title to make it expressive. E.g. the above example may result in, Expert System in Library then the index should be prepared either by KWIC or by KWAC system. 4. Key-Term Alphabetical (KEYTALPHA) In the Key-Term Alphabetical index, keywords are arranged side by side without forming a sentence. Entries are prepared containing only keywords and location excluding the context. Example Computerisation of libraries in India The KEYTALPHA index entries are: COMPUTERISATION, INDIAN, LIBRARIES 1289 INDIA, LIBRARIES, COMPUTERISATION 1289 LIBRARIES, COMPUTERISATION, INDIA 1289 Advantages: 1) The principal merit of keyword indexing is the speed with which it can be produced. 2) The production of keyword index does not involve trained indexing staff. What is required is an expressive title coextensive
  • 9. 9 | P a g e to the specific subject of the document. 3) Involves minimum intellectual effort. 4) Vocabulary control need not be used. 5) Satisfied Disadvantages: 1) Most of the terms used in science and technology are standardized, but the situation is different in case of Humanities and Social Sciences. 2) Related topics are scattered. The efficiency of keyword indexing is invariably the question of reliability of expressive title of document as most such indexes are based on titles. 3) Search of a topic may have to be done under several keywords. 4) Search time is high. 5) Searchers very often lead to high recall and low precision. 6) Fails to meet the exhaustive approach for a large collection are the current approaches of users. Search Strategy for Keyword Indexes In the keyword indexes significant terms of the titles of documents are arranged alphabetically, each having its context and the identification number. There is no vocabulary control and, therefore, related or identical subjects are scattered
  • 10. 10 | P a g e throughout the index file. There is no reference system to connect or correlate the related or identical topics. While formulating search strategy, these limitations should be kept in mind. The user should search under the synonyms of the words and also under the related terms. When titles are improved and supplemented by the editors, the search yields better results. The keyword indexes do not provide for the coordination of two or more search words. In search strategy this limitation should also be kept in mind. Also the users of these indexes should be prepared to search under the terms with alternative, spelling singular plurals, synonyms and near synonyms. Because of the uncontrolled vocabulary, the number of search terms is considerably enlarged necessitating more search efforts. Conclusion Despite the deficiencies, the keyword index has been quite popular during the last four decades. A number of evaluation studies have indicated that keyword indexes may offer several advantages over others. The continued growth of machine readable database has shown that the use of keyword indexes works well. The problem of un-expressive titles is solved to a considerable extent by editorial intervention. It is true that Key Word Indexes as such will not facilitate comprehensive search. Production of any index taking care of comprehensive search takes time, money and effort. Key Word Index was never envisaged to provide comprehensive subject index. It is a mechanism of providing quick and specific subject approach to information which Luhn envisaged it to be.