ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Catalog Enrichment for RDA : adding relationship
designators (in Koha)
35th ADLUG Meeting- Sep 22, 2016 - Stefano Bargioni
ºÝºÝߣ 1
I'm very happy to discuss with you part of the project started at the Pontificia Università della Santa
Croce that aims to introduce RDA rules in our library.
We are sharing this goal with the URBE network, i.e. with other 17 ecclesiastical libraries running
on Koha, OliSuite or other Integrated Library Systems.
This work is a strong cooperation with my staff: especially Luigi Gentile, Michele Caputo, Alberto
Gambardella, Giampaolo Del Monte.
Introducing RDA in cataloging is a task with many parts. Now we will focus in one of them: the
relationship designators or relator terms in bibliographic records.
ºÝºÝߣ 2
Tags affected by relator terms are personal, corporate and meeting names.
The slide lists these tags, both for MARC21 and Unimarc. We will focus on MARC21, as this it the
MARC flavor used in our library. Anyway, following ideas could be applied to Unimarc as well.
Name-titles are not included. We will see why in some slides.
ºÝºÝߣ 3
The subfields involved are "e" for personal and corporate names, or "j" for meeting names (where
the subfield "e" is yet in use for subordinate unit).
In both cases, a string text in the language of the cataloging agency is entered in this subfield.
And subfield "4" can contain the same information using a standardized language independent 3
character code.
Note that Unimarc differs from MARC21. It approaches the problem using only numeric codes,
leaving to the software the responsibility of display and even to search this information. For display
and cataloguing, I appreciate the Unimarc's solution, but for searching, I prefer MARC21's solution.
This is why we decided to use both subfield "e" and subfield "4".
ºÝºÝߣ 4
The complete list is very large, and -in my opinion- is evolving continuously. It tries to include any
kind of role played by some people in "writing" a "book": author, joint author, editor, translator,
cover designer, and many more.
Any of them has a short code, but sometimes the code is shared by more than one role, like in "wit",
used for "witness" and "eye-witness", or -more interesting- in "aut" used for both "author" and
"joint-author".
Our decision was to simplify a lot the process, and ensure the catalog to contain information from a
closed list of values, chosen from the official Italian translation of RDA.
ºÝºÝߣ 5
This slide shows the new popup menu we added to Koha for subfield "e" of tag "7xx".
ºÝºÝߣ 6
Subfield "4" is hidden, and will be filled in automatically when saving the record. It can be useful in
a linked data environment, or copy cataloging from non-Italian libraries.
ºÝºÝߣ 7
Up to now, we described how we will add the relator information to new records. However, global
modifications to cataloging rules could require to modify old records, and many times this task is
never accomplished, because it is very difficult to achieve.
We discovered that bibliographic records contain useful information for changing them
automatically, and to add the relator codes an terms.
Our catalog contains 83 percent of records with only the main author tag "1xx", 35 percent of
records that contain only added authors, i.e. "7xx" tags, and 21 percent of records that have both
main and added authors.
ºÝºÝߣ 8
The main author -we can say- always has the term "author", of course. With exceptions, of course.
It depends on your library. So, probably it is possible to update records with only "1xx" tags
automatically. And we modified about 99,000 records.
Note that this operation cannot be applied to records with added authors, since these records will
remain only partially updated, and this can be a problem for any user, both professional and generic.
ºÝºÝߣ 9
Following slides illustrate how to infer the relator code for added authors using the information
stored in other tags, like the statement of responsibility, some kind of notes, and so on. Of course,
the quality of the catalog can play an important role in this operation, and other ideas could be
applied in other catalogs, especially depending on the type of collections.
ºÝºÝߣ 10
If the statement of responsibility or the contents note contain one of these strings, the added author
(when only one 7xx tag is present) is an editor.
So we were able to update automatically more than 14,000 records.
ºÝºÝߣ 11
The added author is an editor also if the statement of responsibility contains "critical edition" or one
of its translation in other languages.
ºÝºÝߣ 12
The added author is an editor if the remainder of statement (250$b) exists.
ºÝºÝߣ 13
The added author is "honored" if byte 30 of fixed field 008 is set.
ºÝºÝߣ 14
This case is a bit more difficult, but interesting indeed. The added author is a "joint author" if the
statement of responsibility is sufficiently similar to "7xx" occurrences. Here are some examples.
You may say that the algorithm that discover this similarity could be very complicated, and this was
my opinion before starting this project. Then, surprisingly, I wrote very few lines of (Perl) code for
this type of update.
Generally speaking, artificial intelligence could help a lot to obtain better results. But you have to
know very well your records, and usually this is true if you have a lot of help from your cataloging
staff. Is it worth?
ºÝºÝߣ 15
Another example is shown in this slide 15. The statement of responsibility contains names and some
keywords as well. The relator term and code can be added, and we updated about 2,700 records.
ºÝºÝߣ 16
As I told at the beginning of my presentation, records with name-titles caused a discussion among
us, and we involved Tiziana Possemato, prof. Mauro Guerrini, and Casalini Libri.
We think that the relator term of name-titles refers to the work or expression represented by the
"7xx" tag. The subfield "e", in this case, is the role of the author described in "subfield a" of the
work described in "subfield t". If you use it, will OPAC users understand this subtle but important
difference? This is why we prefer not to use it, in about 700 records.
ºÝºÝߣ 17
Remaining records could require complex algorithms, but many times the information to write
relator terms and codes is ambiguous or not available at all.
ºÝºÝߣ 18
Thus, a remaining group of record will require manual updates. And this is a boring task, that
requires specific skills. And sometimes the decision for each term is not so simple, leading to
discussions in the catalogers staff...
To facilitate this task, a tool named RP7 with specific functionalities was prepared, thanks to the
Advanced Programmable Interfaces (APIs) available in Koha.
ºÝºÝߣ 19
This tool, a web application contained in only one page, allows the cataloguers to navigate the set of
records without relator terms.
Each record is shown in brief or full format. Popup menus are available at the right of each added
name, and if filled in, a new occurrence appears.
Keyboard shortcuts are available ("S" for saving and moving to the next record, "B" for toggle
display format, "G" to go to the next record without saving, and so on).
ºÝºÝߣ 20
This table summarizes the percentages of this global update of legacy records. 75% of records were
updated automatically, and remaining 25% is a work in progress using both the RP7 specific tool
and the Koha cataloging interface. I'm not aware of other projects of catalog enrichment were a lot
of work can be saved using algorithms and information yet contained in the catalog itself.
ºÝºÝߣ 21
Let's take a look at the cataloging interface. Each occurrence of name added tags has a visible
subfield "e" for the relator term, with a reduced set of values stored in a popup menu, and a hidden
subfield "4" filled in automatically. The programming language is of course JavaScript (jQuery
Library).
ºÝºÝߣ 22
The new information can be shown in the OPAC, alongside the authors' names. This is a valuable
information for researchers and students, since they can quickly understand the relevance of an
author in the book/manifestation.
ºÝºÝߣ 23
It is important to ensure that the relator term will be indexed. By default, Koha adds it to the author
index, and we think that this is a good solution. This will allow OPAC users to perform more rich
searches, like limiting the results to records where an author is the main author, or he / she is the
translator, and so on. This is also a useful search path for the reference desk.
ºÝºÝߣ 24
This part of the presentation shows an example of how to use relations contained in the catalog.
They are defined by the presence in the same record of more than one name, even name-title. And
this is not an RDA advantage, of course.
The RDA relator terms qualify existing relations, and this can help to display very interesting paths
and links to navigate the catalog. Students and researchers can immediately discover who studied a
specific author, who worked with him / her, and so on. Think to a thesis about an important
philosopher.
ºÝºÝߣ 25
This is why we built a Name Cloud, that we will link as soon as possible to our catalog. It is divided
in two parts. The first part represents the cloud of names around the starting name, while the second
part contains the same information and functionalities of the cloud, with relator terms and some
counters. Let's open the Name Cloud, to see it moving.
ºÝºÝߣ 26
This is the second part of the Name Cloud page. Each link can be compressed or expanded, useful
for authors with many relations, and for printing reasons too. When compressed, a counter is
shown.
ºÝºÝߣ 27
Let's try to conclude: I could simply read this slide.
Adding relationship designators to old bibliographic records is possible for a large part of a library
catalog.
A good analysis of data is required, as well as good software tools and skills to perform batch
updates.
Adding relationship designators to new bibliographic records requires to help the staff adding some
functionalities to the cataloging module.
Their introduction leverages new services in the OPAC, enriches information about authors and
adds properties to the relationships among them.

More Related Content

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]

  • 1. Catalog Enrichment for RDA : adding relationship designators (in Koha) 35th ADLUG Meeting- Sep 22, 2016 - Stefano Bargioni ºÝºÝߣ 1 I'm very happy to discuss with you part of the project started at the Pontificia Università della Santa Croce that aims to introduce RDA rules in our library. We are sharing this goal with the URBE network, i.e. with other 17 ecclesiastical libraries running on Koha, OliSuite or other Integrated Library Systems. This work is a strong cooperation with my staff: especially Luigi Gentile, Michele Caputo, Alberto Gambardella, Giampaolo Del Monte. Introducing RDA in cataloging is a task with many parts. Now we will focus in one of them: the relationship designators or relator terms in bibliographic records. ºÝºÝߣ 2 Tags affected by relator terms are personal, corporate and meeting names. The slide lists these tags, both for MARC21 and Unimarc. We will focus on MARC21, as this it the MARC flavor used in our library. Anyway, following ideas could be applied to Unimarc as well. Name-titles are not included. We will see why in some slides. ºÝºÝߣ 3 The subfields involved are "e" for personal and corporate names, or "j" for meeting names (where the subfield "e" is yet in use for subordinate unit). In both cases, a string text in the language of the cataloging agency is entered in this subfield. And subfield "4" can contain the same information using a standardized language independent 3 character code. Note that Unimarc differs from MARC21. It approaches the problem using only numeric codes, leaving to the software the responsibility of display and even to search this information. For display and cataloguing, I appreciate the Unimarc's solution, but for searching, I prefer MARC21's solution. This is why we decided to use both subfield "e" and subfield "4". ºÝºÝߣ 4 The complete list is very large, and -in my opinion- is evolving continuously. It tries to include any kind of role played by some people in "writing" a "book": author, joint author, editor, translator, cover designer, and many more. Any of them has a short code, but sometimes the code is shared by more than one role, like in "wit", used for "witness" and "eye-witness", or -more interesting- in "aut" used for both "author" and "joint-author".
  • 2. Our decision was to simplify a lot the process, and ensure the catalog to contain information from a closed list of values, chosen from the official Italian translation of RDA. ºÝºÝߣ 5 This slide shows the new popup menu we added to Koha for subfield "e" of tag "7xx". ºÝºÝߣ 6 Subfield "4" is hidden, and will be filled in automatically when saving the record. It can be useful in a linked data environment, or copy cataloging from non-Italian libraries. ºÝºÝߣ 7 Up to now, we described how we will add the relator information to new records. However, global modifications to cataloging rules could require to modify old records, and many times this task is never accomplished, because it is very difficult to achieve. We discovered that bibliographic records contain useful information for changing them automatically, and to add the relator codes an terms. Our catalog contains 83 percent of records with only the main author tag "1xx", 35 percent of records that contain only added authors, i.e. "7xx" tags, and 21 percent of records that have both main and added authors. ºÝºÝߣ 8 The main author -we can say- always has the term "author", of course. With exceptions, of course. It depends on your library. So, probably it is possible to update records with only "1xx" tags automatically. And we modified about 99,000 records. Note that this operation cannot be applied to records with added authors, since these records will remain only partially updated, and this can be a problem for any user, both professional and generic. ºÝºÝߣ 9 Following slides illustrate how to infer the relator code for added authors using the information stored in other tags, like the statement of responsibility, some kind of notes, and so on. Of course, the quality of the catalog can play an important role in this operation, and other ideas could be applied in other catalogs, especially depending on the type of collections. ºÝºÝߣ 10 If the statement of responsibility or the contents note contain one of these strings, the added author (when only one 7xx tag is present) is an editor. So we were able to update automatically more than 14,000 records.
  • 3. ºÝºÝߣ 11 The added author is an editor also if the statement of responsibility contains "critical edition" or one of its translation in other languages. ºÝºÝߣ 12 The added author is an editor if the remainder of statement (250$b) exists. ºÝºÝߣ 13 The added author is "honored" if byte 30 of fixed field 008 is set. ºÝºÝߣ 14 This case is a bit more difficult, but interesting indeed. The added author is a "joint author" if the statement of responsibility is sufficiently similar to "7xx" occurrences. Here are some examples. You may say that the algorithm that discover this similarity could be very complicated, and this was my opinion before starting this project. Then, surprisingly, I wrote very few lines of (Perl) code for this type of update. Generally speaking, artificial intelligence could help a lot to obtain better results. But you have to know very well your records, and usually this is true if you have a lot of help from your cataloging staff. Is it worth? ºÝºÝߣ 15 Another example is shown in this slide 15. The statement of responsibility contains names and some keywords as well. The relator term and code can be added, and we updated about 2,700 records. ºÝºÝߣ 16 As I told at the beginning of my presentation, records with name-titles caused a discussion among us, and we involved Tiziana Possemato, prof. Mauro Guerrini, and Casalini Libri. We think that the relator term of name-titles refers to the work or expression represented by the "7xx" tag. The subfield "e", in this case, is the role of the author described in "subfield a" of the work described in "subfield t". If you use it, will OPAC users understand this subtle but important difference? This is why we prefer not to use it, in about 700 records. ºÝºÝߣ 17 Remaining records could require complex algorithms, but many times the information to write relator terms and codes is ambiguous or not available at all.
  • 4. ºÝºÝߣ 18 Thus, a remaining group of record will require manual updates. And this is a boring task, that requires specific skills. And sometimes the decision for each term is not so simple, leading to discussions in the catalogers staff... To facilitate this task, a tool named RP7 with specific functionalities was prepared, thanks to the Advanced Programmable Interfaces (APIs) available in Koha. ºÝºÝߣ 19 This tool, a web application contained in only one page, allows the cataloguers to navigate the set of records without relator terms. Each record is shown in brief or full format. Popup menus are available at the right of each added name, and if filled in, a new occurrence appears. Keyboard shortcuts are available ("S" for saving and moving to the next record, "B" for toggle display format, "G" to go to the next record without saving, and so on). ºÝºÝߣ 20 This table summarizes the percentages of this global update of legacy records. 75% of records were updated automatically, and remaining 25% is a work in progress using both the RP7 specific tool and the Koha cataloging interface. I'm not aware of other projects of catalog enrichment were a lot of work can be saved using algorithms and information yet contained in the catalog itself. ºÝºÝߣ 21 Let's take a look at the cataloging interface. Each occurrence of name added tags has a visible subfield "e" for the relator term, with a reduced set of values stored in a popup menu, and a hidden subfield "4" filled in automatically. The programming language is of course JavaScript (jQuery Library). ºÝºÝߣ 22 The new information can be shown in the OPAC, alongside the authors' names. This is a valuable information for researchers and students, since they can quickly understand the relevance of an author in the book/manifestation. ºÝºÝߣ 23 It is important to ensure that the relator term will be indexed. By default, Koha adds it to the author index, and we think that this is a good solution. This will allow OPAC users to perform more rich searches, like limiting the results to records where an author is the main author, or he / she is the translator, and so on. This is also a useful search path for the reference desk.
  • 5. ºÝºÝߣ 24 This part of the presentation shows an example of how to use relations contained in the catalog. They are defined by the presence in the same record of more than one name, even name-title. And this is not an RDA advantage, of course. The RDA relator terms qualify existing relations, and this can help to display very interesting paths and links to navigate the catalog. Students and researchers can immediately discover who studied a specific author, who worked with him / her, and so on. Think to a thesis about an important philosopher. ºÝºÝߣ 25 This is why we built a Name Cloud, that we will link as soon as possible to our catalog. It is divided in two parts. The first part represents the cloud of names around the starting name, while the second part contains the same information and functionalities of the cloud, with relator terms and some counters. Let's open the Name Cloud, to see it moving. ºÝºÝߣ 26 This is the second part of the Name Cloud page. Each link can be compressed or expanded, useful for authors with many relations, and for printing reasons too. When compressed, a counter is shown. ºÝºÝߣ 27 Let's try to conclude: I could simply read this slide. Adding relationship designators to old bibliographic records is possible for a large part of a library catalog. A good analysis of data is required, as well as good software tools and skills to perform batch updates. Adding relationship designators to new bibliographic records requires to help the staff adding some functionalities to the cataloging module. Their introduction leverages new services in the OPAC, enriches information about authors and adds properties to the relationships among them.