Presentation at the workshop "The Challenges of Publishing Finding Aids in a Digitally Joined-Up World" (The Hague, December 2014) about the extraction of structured data from edited books and its conversion in EAD.
1 of 17
Download to read offline
More Related Content
Use case: data edited as a book !!!
1. Use case: data edited as a book !!!
CONNECTING
COLLECTIONS
02-12-2014
Kepa J. Rodriguez
(Gottingen State and University Library)
2. Outline
? How do we import books into the portal?
? Case 1: Jewish Archival Guide Belgium (ARA)
? Case 2: Informator (IPN-Poland)
? Some conclusions
5. How do we import a book (3)
From our experience:
?
Important that the estructure of the book is good
represented in the layout.
? A hierarchical table of contents helps to extract
automatically the structure (later more in an example)
?
Layout and presentation should be consistent.
? Consistence in use of fonts, no spaces at the end of
the lines, etc.
? Fonts and color can be useful if the document is
converted/convertible into RTF.
¨C But... better... don't use colors in spreadsheets.
Visual arts are beautiful but no useful.
7. Case 1: Jewish Archival Guide Belgium (2)
? Structure of the table of contents corresponds to the
hierarchies of record groups.
? That help us to infer the hierarchies in the EAD.
9. Case 1: Jewish Archival Guide Belgium (4)
? Descriptions of collections and fonds are compliant
with ISAD(G) and other ICA standards.
? Conversion in EAD tags using crosswalks.
? The book provides the identifiers of the fonds in the
hosting institutions.
11. Case 1: Jewish Archival Guide Belgium (5)
? Very good communication with the authors during
the edition process.
? Trilingual tagset (EN, NL, FR)
? Use of the identifiers to find the original repositories.
? Help in the selection of data using the subject
keywords.
? Mapping of the used keywords with terms of the
EHRI thesaurus.
13. Case 2: IPN ¨C Informator (2)
? Book was written only for humans.
? Part of the structure extracted by hand
? Difficult to map the layout and structural information
to standards.
? Identifiers of the fonds in the IPN database are not
provided.
? At the end.... it took a lot of time and effort to
produce something meaninful.
15. Some conclusions
? Books and edited material are not the ideal way to
share data.
? Anyway they can be useful in this case if:
¨C Archival standards are used
¨C Use of standards is transparent
¨C Identifiers are provided
¨C Structure of the document reproduces the
hierarchical organisation of the data in the
archives.
¨C Layout of the doucument gives information
about the different pieces of information
16. NIOD Institute for War, Holocaust and
Genocide Studies (NL)
?
CEGES-SOMA Centre for Historical Research
and Documentation on War and Contemporary
Society (BE)
?
Jewish Museum in Prague (CZ)
?
Institute of Contemporary History Munich ¨C Berlin
(DE)
?
YAD VASHEM The Holocaust Martyrs¡¯ and
Heroes¡¯ Remembrance Authority (IL)
?
The Wiener Library ¨C Institute of Contemporary
History (UK)
?
Holocaust Memorial Center (HU)
?
HL-senteret Center for Studies of Holocaust
and Religious Minorities (NO)
?
NAF National Archives of Finland (FI)
?
The Emanuel Ringelblum Jewish Historical Institute (PL)
King¡¯s College London (UK)
?
Georg-August-Universit?t G?ttingen ¨C G?ttingen
State and University Library (DE)
?
Athena RC/IMIS (GR)
?
DANS Data Archiving and Networked Services (NL)
?
Shoah Memorial, Museum, Center for Contemporary
Jewish Documentation (FR)
?
ITS International Tracing Service (DE)
?
Memorial to the Murdered Jews of Europe (DE)
?
Terez¨ªn Memorial (CZ)
?
Beit Theresienstadt (IL)
?
VWI Vienna Wiesenthal Institute for
Holocaust Studies (AT)
CONNECTING
KNOWLEDGE