際際滷

際際滷Share a Scribd company logo
Reinhard Feldmann
Data Curation / Digitisation /
iCloud / Long-term storage
of digital data
Legaspi, Aquinas University
10th to 12th April 2014
Data curation
Reinhard Feldmann
? General Introduction
? Tape, CD, DVD
? Preserving the past: Digitasation strategies
in Germany
? Final remarks
Data curation
Reinhard Feldmann
Data curation
Reinhard Feldmann
Microfilming or Digitasation?
? Intelligence program during World War II
? Civil Microfilming since World War II
? Microfilming has been a success
? Digitising the microfilms
? DAMP: Digitising of ageing microfilm project
Data curation
Reinhard Feldmann
Microfilmscanner
6
Data curation
Reinhard Feldmann
New Complexity
? ?Born digital documents^
? Migration vs. Emulation
? Digital ?dark ages^ (Example: U.S.
Elections)
? New formats (books C music)
? Maschine readable texts
? Subscription C preservation C commercial?
? Users C Libraries
? Complex matrix of issues
Data curation
Reinhard Feldmann
Audio- and Videotapes
? History:
? 1888 Iron-wire
? 1928 Steel- and
Paperband-
Technology
? 1935 first non-
metallic strapping
(Berlin: AEG):
Cellulose-Acetat
? Compact cassette:
1964 (Philips)
? Video: 1975
(Sony)
Data curation
Data curation
Reinhard Feldmann
Audio- and Videotapes
? Damages
? Vinegar Syndrom >
? Hydrolysis of binding agent
? Tape reel
? External magnetic fields ?
Data curation
Data curation
Reinhard Feldmann
Audio- and Videotapes
? Damages
? Vinegar Syndrom >
? Hydrolysis of binding agent
? Tape reel
? External magnetic fields ?
Data curation
Reinhard Feldmann
Audio- and Videotapes
? Best Storage Conditions
? Climate conditions
? 8< / 25% RH
? Separate storage
Data curation
Reinhard Feldmann
Audio- and Videotapes
? Resumee
? Permanent data carrier vs. permanent
data
? Migration (expensive)
? Optimal conditions for storage
Data curation
More informations
? http://www.restaumedia.de (!)
? http://www.memoriav.ch
? http://www.forum-bestandserhaltung.de
? http://www.tape-online.net
Data curation
Reinhard Feldmann
Duke August Library
Data curation
Reinhard Feldmann
Main Hall
Data curation
Books to be scanned. Calculation by
the ?German Distributed Library^
Century Editions Pages at
average
Total pages
C1500 27.000 235 6.345.000
1501C1600 140.000 220 30.800.000
1601C1700 265.000 213 56.445.000
1701C1800 600.000 300 180.000.000
1801C1870 511.978 245 125.434.610
1871C1900 525.000 245 128.625.000
total 2.068.978 255 527.649.610
Data curation
Reinhard Feldmann
Data curation
Reinhard Feldmann
www.deutsche-digitale-bibliothek.de
22
Data curation
Reinhard Feldmann
Criteria for digitisation
? Relevant for research
? Rare and precious
? Preservation needs
24
Preserving the Past
The Wolfenbuettel Book
Reflector
45< opening angle
Preserving the Past
The Wolfenbuettel Book
Reflector
90< opening angle
Preserving the Past
Graz Book Cradle, used in
Research libraries
Preserving the Past
ScanRobot 2.0
Preserving the Past
Legaspi 02 data_curation_powerpoint
Legaspi 02 data_curation_powerpoint
Legaspi 02 data_curation_powerpoint
Preserving the Past
Preserving the Past
Preserving the Past
Legaspi 02 data_curation_powerpoint
Preserving the Past
Fileserver at the Duke August
Library
RAID-Arrays with hard discs
Preserving the Past
Server for
master ´
´ and derivative
files
Preserving the Past
Persisent addressing
Example: Erstlich wolgedeutes B?hmisches Gl┨cks vnd Vngl┨cksRath : hernach in Radt/ doch mit
der That schad vnd vnrath Wesen. [S.l.] 1621
Shelf mark of the original: Einbl. Xb FM 28
Shelf mark of the digital copy: drucke/einbl-xb-fm-28 (basic element of all identifiers;
ASCII 7-Bit, one-word conversion from the original shelf mark)
URL: http://diglib.hab.de/wdb.php?dir=drucke/einbl-xb-fm-28 (since 2004)
PURL: http://diglib.hab.de/drucke/einbl-xb-fm-28/start.htm (since 1998)
URN: urn:nbn:de:gbv:23-drucke/kb-53-2f-25 (since 2005)
Resolver at the library (used for URN and other identifier):
http://diglib.hab.de/?urn=urn:nbn:de:gbv:23- drucke/einbl-xb-fm-285
Resolver at the German National Library:
http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:gbv:23- drucke/einbl-xb-fm-285
Data curation
Reinhard Feldmann
Longterm Storage of Digital data
? How long is ^longterm ̄?
? What is ^sure storage^?
? What data are we saving?
? ^state of the art ̄
? Financial remarks
Data curation
Reinhard Feldmann
Semantic problem I: How long is Longterm?
? "five years or more" (IFLA 2006)
? ^Data should normally be preserved and accessible for not less
than 10 years for any projects, and for projects of clinical or
major social, environmental or heritage importance, the data
should be retained for up to 20 years, and preferably
permanently within a national collection, or as required by the
funder's data policy." (Research Councils UK 2008)
? "a period of time long enough for there to be concern about the
impacts of changing technologies (...) on the information being
held in a repository. This period extends into the indefinite
future." (CCSDS 2002: 1-11)
? ?'Longterm' is a non specified period, while unknown
technological or sociocultural changes may take place.^ (Nestor
2008)
? Longterm or forever?
Data curation
Reinhard Feldmann
Semantic problem II: What is sure?
? Conservation of a bitstream?
? Usability of data? What means usability?
? Storage of the content?
? Data and machines (Emulation)
? Data and applications?
? Semantic Context?
? Layout?
Data curation
Reinhard Feldmann
Semantic problems III:
What are we saving?
? Bitstream?
? Data?
? Digital documents?
? Digitale Representations of analogue documents?
? Contents?
? Information?
? Information nets?
? Knowledge? And how does information creates
knowledge?
? Everything?
Data curation
Reinhard Feldmann
Financial remarks
? "Like almost all engineering problems, bit preservation is
fundamentally a question of budgets.^ (David S. H. Rosenthal)
? ?A quick review of the literature reveals no consensus on metrics
or factors for calculating all the costs involved in digitizing a book.^
http://hurstassociates.blogspot.com/2008/04/costs-of-large-scale-
digitization.html
Data curation
Reinhard Feldmann
Theses
? Longterm storage must be seen under the conditions
of the www: international, netbased, divided.
? ?Everything^ and ?always^ is impossible: priorities
and decisions!
? Not the keeping of single objects is important, but
information and context creates permanent
knowledge
? Analogue or Digital?
? Analogue and Digital!
Data curation
Reinhard Feldmann
Thanks
Thank you for your attention!
Special thanks to:
Dr. Thomas St?cker (Duke August Library Wolfenb┨ttel)
Prof. Dr. Stefan Gradmann (Humboldt University Berlin)

More Related Content

Legaspi 02 data_curation_powerpoint

  • 1. Reinhard Feldmann Data Curation / Digitisation / iCloud / Long-term storage of digital data Legaspi, Aquinas University 10th to 12th April 2014
  • 2. Data curation Reinhard Feldmann ? General Introduction ? Tape, CD, DVD ? Preserving the past: Digitasation strategies in Germany ? Final remarks
  • 4. Data curation Reinhard Feldmann Microfilming or Digitasation? ? Intelligence program during World War II ? Civil Microfilming since World War II ? Microfilming has been a success ? Digitising the microfilms ? DAMP: Digitising of ageing microfilm project
  • 7. Data curation Reinhard Feldmann New Complexity ? ?Born digital documents^ ? Migration vs. Emulation ? Digital ?dark ages^ (Example: U.S. Elections) ? New formats (books C music) ? Maschine readable texts ? Subscription C preservation C commercial? ? Users C Libraries ? Complex matrix of issues
  • 8. Data curation Reinhard Feldmann Audio- and Videotapes ? History: ? 1888 Iron-wire ? 1928 Steel- and Paperband- Technology ? 1935 first non- metallic strapping (Berlin: AEG): Cellulose-Acetat ? Compact cassette: 1964 (Philips) ? Video: 1975 (Sony)
  • 10. Data curation Reinhard Feldmann Audio- and Videotapes ? Damages ? Vinegar Syndrom > ? Hydrolysis of binding agent ? Tape reel ? External magnetic fields ?
  • 12. Data curation Reinhard Feldmann Audio- and Videotapes ? Damages ? Vinegar Syndrom > ? Hydrolysis of binding agent ? Tape reel ? External magnetic fields ?
  • 13. Data curation Reinhard Feldmann Audio- and Videotapes ? Best Storage Conditions ? Climate conditions ? 8< / 25% RH ? Separate storage
  • 14. Data curation Reinhard Feldmann Audio- and Videotapes ? Resumee ? Permanent data carrier vs. permanent data ? Migration (expensive) ? Optimal conditions for storage
  • 15. Data curation More informations ? http://www.restaumedia.de (!) ? http://www.memoriav.ch ? http://www.forum-bestandserhaltung.de ? http://www.tape-online.net
  • 18. Data curation Books to be scanned. Calculation by the ?German Distributed Library^ Century Editions Pages at average Total pages C1500 27.000 235 6.345.000 1501C1600 140.000 220 30.800.000 1601C1700 265.000 213 56.445.000 1701C1800 600.000 300 180.000.000 1801C1870 511.978 245 125.434.610 1871C1900 525.000 245 128.625.000 total 2.068.978 255 527.649.610
  • 22. 22
  • 23. Data curation Reinhard Feldmann Criteria for digitisation ? Relevant for research ? Rare and precious ? Preservation needs
  • 24. 24
  • 25. Preserving the Past The Wolfenbuettel Book Reflector 45< opening angle
  • 26. Preserving the Past The Wolfenbuettel Book Reflector 90< opening angle
  • 27. Preserving the Past Graz Book Cradle, used in Research libraries
  • 37. Preserving the Past Fileserver at the Duke August Library RAID-Arrays with hard discs
  • 38. Preserving the Past Server for master ´ ´ and derivative files
  • 40. Persisent addressing Example: Erstlich wolgedeutes B?hmisches Gl┨cks vnd Vngl┨cksRath : hernach in Radt/ doch mit der That schad vnd vnrath Wesen. [S.l.] 1621 Shelf mark of the original: Einbl. Xb FM 28 Shelf mark of the digital copy: drucke/einbl-xb-fm-28 (basic element of all identifiers; ASCII 7-Bit, one-word conversion from the original shelf mark) URL: http://diglib.hab.de/wdb.php?dir=drucke/einbl-xb-fm-28 (since 2004) PURL: http://diglib.hab.de/drucke/einbl-xb-fm-28/start.htm (since 1998) URN: urn:nbn:de:gbv:23-drucke/kb-53-2f-25 (since 2005) Resolver at the library (used for URN and other identifier): http://diglib.hab.de/?urn=urn:nbn:de:gbv:23- drucke/einbl-xb-fm-285 Resolver at the German National Library: http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:gbv:23- drucke/einbl-xb-fm-285
  • 41. Data curation Reinhard Feldmann Longterm Storage of Digital data ? How long is ^longterm ̄? ? What is ^sure storage^? ? What data are we saving? ? ^state of the art ̄ ? Financial remarks
  • 42. Data curation Reinhard Feldmann Semantic problem I: How long is Longterm? ? "five years or more" (IFLA 2006) ? ^Data should normally be preserved and accessible for not less than 10 years for any projects, and for projects of clinical or major social, environmental or heritage importance, the data should be retained for up to 20 years, and preferably permanently within a national collection, or as required by the funder's data policy." (Research Councils UK 2008) ? "a period of time long enough for there to be concern about the impacts of changing technologies (...) on the information being held in a repository. This period extends into the indefinite future." (CCSDS 2002: 1-11) ? ?'Longterm' is a non specified period, while unknown technological or sociocultural changes may take place.^ (Nestor 2008) ? Longterm or forever?
  • 43. Data curation Reinhard Feldmann Semantic problem II: What is sure? ? Conservation of a bitstream? ? Usability of data? What means usability? ? Storage of the content? ? Data and machines (Emulation) ? Data and applications? ? Semantic Context? ? Layout?
  • 44. Data curation Reinhard Feldmann Semantic problems III: What are we saving? ? Bitstream? ? Data? ? Digital documents? ? Digitale Representations of analogue documents? ? Contents? ? Information? ? Information nets? ? Knowledge? And how does information creates knowledge? ? Everything?
  • 45. Data curation Reinhard Feldmann Financial remarks ? "Like almost all engineering problems, bit preservation is fundamentally a question of budgets.^ (David S. H. Rosenthal) ? ?A quick review of the literature reveals no consensus on metrics or factors for calculating all the costs involved in digitizing a book.^ http://hurstassociates.blogspot.com/2008/04/costs-of-large-scale- digitization.html
  • 46. Data curation Reinhard Feldmann Theses ? Longterm storage must be seen under the conditions of the www: international, netbased, divided. ? ?Everything^ and ?always^ is impossible: priorities and decisions! ? Not the keeping of single objects is important, but information and context creates permanent knowledge ? Analogue or Digital? ? Analogue and Digital!
  • 47. Data curation Reinhard Feldmann Thanks Thank you for your attention! Special thanks to: Dr. Thomas St?cker (Duke August Library Wolfenb┨ttel) Prof. Dr. Stefan Gradmann (Humboldt University Berlin)