The document discusses strategies for long-term storage and preservation of digital data. It covers topics such as digitization of physical materials like microfilm and tapes, challenges with file formats and storage media becoming obsolete, and the complexity of preserving "born digital" content over long periods of time. The presentation also addresses criteria for prioritizing what content to digitize, technical aspects of digital preservation systems, and challenges around defining what it means to ensure access and usability of stored data in the long run.
1 of 47
Download to read offline
More Related Content
Legaspi 02 data_curation_powerpoint
1. Reinhard Feldmann
Data Curation / Digitisation /
iCloud / Long-term storage
of digital data
Legaspi, Aquinas University
10th to 12th April 2014
2. Data curation
Reinhard Feldmann
? General Introduction
? Tape, CD, DVD
? Preserving the past: Digitasation strategies
in Germany
? Final remarks
4. Data curation
Reinhard Feldmann
Microfilming or Digitasation?
? Intelligence program during World War II
? Civil Microfilming since World War II
? Microfilming has been a success
? Digitising the microfilms
? DAMP: Digitising of ageing microfilm project
7. Data curation
Reinhard Feldmann
New Complexity
? ?Born digital documents^
? Migration vs. Emulation
? Digital ?dark ages^ (Example: U.S.
Elections)
? New formats (books C music)
? Maschine readable texts
? Subscription C preservation C commercial?
? Users C Libraries
? Complex matrix of issues
8. Data curation
Reinhard Feldmann
Audio- and Videotapes
? History:
? 1888 Iron-wire
? 1928 Steel- and
Paperband-
Technology
? 1935 first non-
metallic strapping
(Berlin: AEG):
Cellulose-Acetat
? Compact cassette:
1964 (Philips)
? Video: 1975
(Sony)
14. Data curation
Reinhard Feldmann
Audio- and Videotapes
? Resumee
? Permanent data carrier vs. permanent
data
? Migration (expensive)
? Optimal conditions for storage
15. Data curation
More informations
? http://www.restaumedia.de (!)
? http://www.memoriav.ch
? http://www.forum-bestandserhaltung.de
? http://www.tape-online.net
18. Data curation
Books to be scanned. Calculation by
the ?German Distributed Library^
Century Editions Pages at
average
Total pages
C1500 27.000 235 6.345.000
1501C1600 140.000 220 30.800.000
1601C1700 265.000 213 56.445.000
1701C1800 600.000 300 180.000.000
1801C1870 511.978 245 125.434.610
1871C1900 525.000 245 128.625.000
total 2.068.978 255 527.649.610
40. Persisent addressing
Example: Erstlich wolgedeutes B?hmisches Gl┨cks vnd Vngl┨cksRath : hernach in Radt/ doch mit
der That schad vnd vnrath Wesen. [S.l.] 1621
Shelf mark of the original: Einbl. Xb FM 28
Shelf mark of the digital copy: drucke/einbl-xb-fm-28 (basic element of all identifiers;
ASCII 7-Bit, one-word conversion from the original shelf mark)
URL: http://diglib.hab.de/wdb.php?dir=drucke/einbl-xb-fm-28 (since 2004)
PURL: http://diglib.hab.de/drucke/einbl-xb-fm-28/start.htm (since 1998)
URN: urn:nbn:de:gbv:23-drucke/kb-53-2f-25 (since 2005)
Resolver at the library (used for URN and other identifier):
http://diglib.hab.de/?urn=urn:nbn:de:gbv:23- drucke/einbl-xb-fm-285
Resolver at the German National Library:
http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:gbv:23- drucke/einbl-xb-fm-285
41. Data curation
Reinhard Feldmann
Longterm Storage of Digital data
? How long is ^longterm ̄?
? What is ^sure storage^?
? What data are we saving?
? ^state of the art ̄
? Financial remarks
42. Data curation
Reinhard Feldmann
Semantic problem I: How long is Longterm?
? "five years or more" (IFLA 2006)
? ^Data should normally be preserved and accessible for not less
than 10 years for any projects, and for projects of clinical or
major social, environmental or heritage importance, the data
should be retained for up to 20 years, and preferably
permanently within a national collection, or as required by the
funder's data policy." (Research Councils UK 2008)
? "a period of time long enough for there to be concern about the
impacts of changing technologies (...) on the information being
held in a repository. This period extends into the indefinite
future." (CCSDS 2002: 1-11)
? ?'Longterm' is a non specified period, while unknown
technological or sociocultural changes may take place.^ (Nestor
2008)
? Longterm or forever?
43. Data curation
Reinhard Feldmann
Semantic problem II: What is sure?
? Conservation of a bitstream?
? Usability of data? What means usability?
? Storage of the content?
? Data and machines (Emulation)
? Data and applications?
? Semantic Context?
? Layout?
44. Data curation
Reinhard Feldmann
Semantic problems III:
What are we saving?
? Bitstream?
? Data?
? Digital documents?
? Digitale Representations of analogue documents?
? Contents?
? Information?
? Information nets?
? Knowledge? And how does information creates
knowledge?
? Everything?
45. Data curation
Reinhard Feldmann
Financial remarks
? "Like almost all engineering problems, bit preservation is
fundamentally a question of budgets.^ (David S. H. Rosenthal)
? ?A quick review of the literature reveals no consensus on metrics
or factors for calculating all the costs involved in digitizing a book.^
http://hurstassociates.blogspot.com/2008/04/costs-of-large-scale-
digitization.html
46. Data curation
Reinhard Feldmann
Theses
? Longterm storage must be seen under the conditions
of the www: international, netbased, divided.
? ?Everything^ and ?always^ is impossible: priorities
and decisions!
? Not the keeping of single objects is important, but
information and context creates permanent
knowledge
? Analogue or Digital?
? Analogue and Digital!
47. Data curation
Reinhard Feldmann
Thanks
Thank you for your attention!
Special thanks to:
Dr. Thomas St?cker (Duke August Library Wolfenb┨ttel)
Prof. Dr. Stefan Gradmann (Humboldt University Berlin)