際際滷

際際滷Share a Scribd company logo
Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomputer Center [email_address] http://www.npaci.edu/DICE/
Data and Knowledge Systems Group Staff Reagan Moore Ilkai Altintas Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek M. Kulrul Bertram Lud辰scher Richard Marciano A. Memon XuFei Qian Roman Olshanowsky Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Graduate Students  A. Bagchi S. Bansal A. Behere R. Bharath S. Bharath L. Sui Undergraduate Interns N. Cotofana D. Le J. Trang L. Yin +/- NN
Topics Building persistent archives Data grids Authenticity mechanisms Managing technology evolution Knowledge-based access
Archival Processes 件 Appraisal  determine the archivable content 件 Accession  - determine the initial physical location for the data, and the relationship of the new collection to existing collections Arrangemen t - add administration control, describe the information content (provenance, authenticity, structure, administrative), and decompose digital objects into their components as needed.  Description  - complete the definition of collection attributes by iterating between arrangement, reformatting, and representation.  Preservation   build an archivable form of the digital entities, characterize the collection context , and manage their storage 件 Access   provide query mechanisms for discovering, retrieving, and presenting the digital entities.
ERA Concept model
Common Approach (digital library, persistent archive, data grid) Logical name space used to organize digital entities, and associate attributes Separation of information management from data storage management Definition of abstraction mechanisms for dealing with repositories Emergence of need for knowledge management
Java, NT Browsers Web WSDL Prolog Predicate SDSC Storage Resource Broker & Meta-data Catalog Levels of Abstraction Application HRM Clients Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle,  Sybase Logical Name  Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux  I/O DLL / Python Unix  Shell Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX C, C++,  Libraries
Authenticity Guarantee that the data has not been changed Collection owned data, only accessible through the data handling system Support roles defining access (curation, owner, annotation, read) Support access controls mapping users to roles Audit trails that record all operations on files Digital signatures - cryptographic checksums
Managing Technology Evolution Data grids provide interoperability mechanisms to access data in multiple administration domains and multiple types of storage systems. Persistent archives migrate collections from old technology to new technology to support presentation on new systems Both require the ability to access heterogeneous systems
Presentation of Digital Objects Storage System Operating System Application Digital Object Display System
Technology Management - Emulation New Storage System New Operating System Old Application Digital Object New Display System Wrap Application
Technology Management New Storage System New Operating System Old Application Digital Object New Display System Add Operating System Call
Technology Management Old Storage System New Operating System Old Application Digital Object Old Display System Add Operating System Call Add Operating System Call
Technology Management Migration New Storage System New Operating System New Application Digital Object New Display System Migrate Encoding Format
Technology Management - SDSC Old Storage System New Operating System New Application Digital Object Old Display System Wrap Storage System Wrap Display System Migrate Encoding Format
Accessing Archived Data Name transparency Access data without knowing the file name Map from attributes to a local file name Location transparency Access data without knowing where it is stored Map from global file name to local file name  Collection transparency Access data without knowing the collection attributes Map from concept space to collection attributes
Information Management- Logical Name Space Set of attributes to describe digital entities that are registered into the logical name space SRB metadata - Unix file system semantics Provenance metadata - Dublin Core Resource metadata - User access control lists Discipline metadata - User defined attributes Each digital entity may have unique attributes
Knowledge Management - Discovery across Collections Mapping from collection attributes to discipline concepts  Make queries based on discipline concepts Characterization of relationships between attributes Semantic / logical - cross-walks Procedural / temporal - records management Structural / spatial - GIS
Knowledge Based Data Grids Attributes Semantics Knowledge Information Data Ingest  Services Management Access Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based  Query Feature-based Query Knowledge or Topic-Based  Query / Browse Knowledge Repository for  Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs)
Further Information http://www.npaci.edu/DICE

More Related Content

Viewers also liked (14)

Minotou Biol Agric Pp
Minotou Biol Agric PpMinotou Biol Agric Pp
Minotou Biol Agric Pp
aigaiopelagitis
Internet History And Growth
Internet History And GrowthInternet History And Growth
Internet History And Growth
nishantsri
Operating Systems Basics
Operating Systems BasicsOperating Systems Basics
Operating Systems Basics
nishantsri
Tremopoulos Koronia Eu 2009
Tremopoulos Koronia Eu 2009Tremopoulos Koronia Eu 2009
Tremopoulos Koronia Eu 2009
aigaiopelagitis
Brunetta Er NanettoBrunetta Er Nanetto
Brunetta Er Nanetto
guestc3f075
Working in Greek Nature Management Bodies
Working in Greek Nature Management BodiesWorking in Greek Nature Management Bodies
Working in Greek Nature Management Bodies
aigaiopelagitis
Prperioxes(Dragoumis)
Prperioxes(Dragoumis)Prperioxes(Dragoumis)
Prperioxes(Dragoumis)
aigaiopelagitis
Anaskopisi Eisigiseon Hmerida Fysi
Anaskopisi Eisigiseon Hmerida FysiAnaskopisi Eisigiseon Hmerida Fysi
Anaskopisi Eisigiseon Hmerida Fysi
aigaiopelagitis
Env Edu Tsaliki
Env Edu TsalikiEnv Edu Tsaliki
Env Edu Tsaliki
aigaiopelagitis
留凌溜留侶 Greenpeace 侶亮竜旅隆留
留凌溜留侶 Greenpeace 侶亮竜旅隆留留凌溜留侶 Greenpeace 侶亮竜旅隆留
留凌溜留侶 Greenpeace 侶亮竜旅隆留
aigaiopelagitis
Estad鱈Stica Unidad IEstad鱈Stica Unidad I
Estad鱈Stica Unidad I
guest742715
Life Nature Greecel Gr1
Life Nature Greecel Gr1Life Nature Greecel Gr1
Life Nature Greecel Gr1
aigaiopelagitis
Kakouros7
Kakouros7Kakouros7
Kakouros7
aigaiopelagitis
El BesoEl Beso
El Beso
Pilar L坦pez Gonz叩lez
Minotou Biol Agric Pp
Minotou Biol Agric PpMinotou Biol Agric Pp
Minotou Biol Agric Pp
aigaiopelagitis
Internet History And Growth
Internet History And GrowthInternet History And Growth
Internet History And Growth
nishantsri
Operating Systems Basics
Operating Systems BasicsOperating Systems Basics
Operating Systems Basics
nishantsri
Tremopoulos Koronia Eu 2009
Tremopoulos Koronia Eu 2009Tremopoulos Koronia Eu 2009
Tremopoulos Koronia Eu 2009
aigaiopelagitis
Brunetta Er NanettoBrunetta Er Nanetto
Brunetta Er Nanetto
guestc3f075
Working in Greek Nature Management Bodies
Working in Greek Nature Management BodiesWorking in Greek Nature Management Bodies
Working in Greek Nature Management Bodies
aigaiopelagitis
Prperioxes(Dragoumis)
Prperioxes(Dragoumis)Prperioxes(Dragoumis)
Prperioxes(Dragoumis)
aigaiopelagitis
Anaskopisi Eisigiseon Hmerida Fysi
Anaskopisi Eisigiseon Hmerida FysiAnaskopisi Eisigiseon Hmerida Fysi
Anaskopisi Eisigiseon Hmerida Fysi
aigaiopelagitis
留凌溜留侶 Greenpeace 侶亮竜旅隆留
留凌溜留侶 Greenpeace 侶亮竜旅隆留留凌溜留侶 Greenpeace 侶亮竜旅隆留
留凌溜留侶 Greenpeace 侶亮竜旅隆留
aigaiopelagitis
Estad鱈Stica Unidad IEstad鱈Stica Unidad I
Estad鱈Stica Unidad I
guest742715
Life Nature Greecel Gr1
Life Nature Greecel Gr1Life Nature Greecel Gr1
Life Nature Greecel Gr1
aigaiopelagitis

Similar to Data Preservation (20)

Oracle archi ppt
Oracle archi pptOracle archi ppt
Oracle archi ppt
Hitesh Kumar Markam
2015 05-07-mac
2015 05-07-mac2015 05-07-mac
2015 05-07-mac
Artefactual Systems - Archivematica
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
Girish. N. Raghavan
Inroduction to Dspace
Inroduction to DspaceInroduction to Dspace
Inroduction to Dspace
Bharat Chaudhari
Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
Nikos Palavitsinis, PhD
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
Richard.Sapon-White
Dynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File SystemsDynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File Systems
IJERA Editor
Digital Library
Digital LibraryDigital Library
Digital Library
Apeejay Stya University
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
Data Grid Taxonomies
Data Grid TaxonomiesData Grid Taxonomies
Data Grid Taxonomies
awesomesos
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
ASIS&T
Database administration
Database administrationDatabase administration
Database administration
Anish Gupta
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Amit Sheth
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
Samar Prasad
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
Samar Prasad
Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)
jwnoteboom
APS-Presentation-MK.pptx
APS-Presentation-MK.pptxAPS-Presentation-MK.pptx
APS-Presentation-MK.pptx
Madhura Arvind
Personal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - WorkshopPersonal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - Workshop
Artefactual Systems - Archivematica
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
Richard.Sapon-White
Dynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File SystemsDynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File Systems
IJERA Editor
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
Data Grid Taxonomies
Data Grid TaxonomiesData Grid Taxonomies
Data Grid Taxonomies
awesomesos
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
ASIS&T
Database administration
Database administrationDatabase administration
Database administration
Anish Gupta
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Amit Sheth
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
Samar Prasad
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
Samar Prasad
Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)
jwnoteboom
APS-Presentation-MK.pptx
APS-Presentation-MK.pptxAPS-Presentation-MK.pptx
APS-Presentation-MK.pptx
Madhura Arvind
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen

Data Preservation

  • 1. Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomputer Center [email_address] http://www.npaci.edu/DICE/
  • 2. Data and Knowledge Systems Group Staff Reagan Moore Ilkai Altintas Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek M. Kulrul Bertram Lud辰scher Richard Marciano A. Memon XuFei Qian Roman Olshanowsky Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Graduate Students A. Bagchi S. Bansal A. Behere R. Bharath S. Bharath L. Sui Undergraduate Interns N. Cotofana D. Le J. Trang L. Yin +/- NN
  • 3. Topics Building persistent archives Data grids Authenticity mechanisms Managing technology evolution Knowledge-based access
  • 4. Archival Processes 件 Appraisal determine the archivable content 件 Accession - determine the initial physical location for the data, and the relationship of the new collection to existing collections Arrangemen t - add administration control, describe the information content (provenance, authenticity, structure, administrative), and decompose digital objects into their components as needed. Description - complete the definition of collection attributes by iterating between arrangement, reformatting, and representation. Preservation build an archivable form of the digital entities, characterize the collection context , and manage their storage 件 Access provide query mechanisms for discovering, retrieving, and presenting the digital entities.
  • 6. Common Approach (digital library, persistent archive, data grid) Logical name space used to organize digital entities, and associate attributes Separation of information management from data storage management Definition of abstraction mechanisms for dealing with repositories Emergence of need for knowledge management
  • 7. Java, NT Browsers Web WSDL Prolog Predicate SDSC Storage Resource Broker & Meta-data Catalog Levels of Abstraction Application HRM Clients Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Unix Shell Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX C, C++, Libraries
  • 8. Authenticity Guarantee that the data has not been changed Collection owned data, only accessible through the data handling system Support roles defining access (curation, owner, annotation, read) Support access controls mapping users to roles Audit trails that record all operations on files Digital signatures - cryptographic checksums
  • 9. Managing Technology Evolution Data grids provide interoperability mechanisms to access data in multiple administration domains and multiple types of storage systems. Persistent archives migrate collections from old technology to new technology to support presentation on new systems Both require the ability to access heterogeneous systems
  • 10. Presentation of Digital Objects Storage System Operating System Application Digital Object Display System
  • 11. Technology Management - Emulation New Storage System New Operating System Old Application Digital Object New Display System Wrap Application
  • 12. Technology Management New Storage System New Operating System Old Application Digital Object New Display System Add Operating System Call
  • 13. Technology Management Old Storage System New Operating System Old Application Digital Object Old Display System Add Operating System Call Add Operating System Call
  • 14. Technology Management Migration New Storage System New Operating System New Application Digital Object New Display System Migrate Encoding Format
  • 15. Technology Management - SDSC Old Storage System New Operating System New Application Digital Object Old Display System Wrap Storage System Wrap Display System Migrate Encoding Format
  • 16. Accessing Archived Data Name transparency Access data without knowing the file name Map from attributes to a local file name Location transparency Access data without knowing where it is stored Map from global file name to local file name Collection transparency Access data without knowing the collection attributes Map from concept space to collection attributes
  • 17. Information Management- Logical Name Space Set of attributes to describe digital entities that are registered into the logical name space SRB metadata - Unix file system semantics Provenance metadata - Dublin Core Resource metadata - User access control lists Discipline metadata - User defined attributes Each digital entity may have unique attributes
  • 18. Knowledge Management - Discovery across Collections Mapping from collection attributes to discipline concepts Make queries based on discipline concepts Characterization of relationships between attributes Semantic / logical - cross-walks Procedural / temporal - records management Structural / spatial - GIS
  • 19. Knowledge Based Data Grids Attributes Semantics Knowledge Information Data Ingest Services Management Access Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs)

Editor's Notes

  • #2: 1
  • #3: 2 The Data Intensive Computing Environment group at the San Diego Supercomputer Center has 16 full-time staff members, and 6-10 associated graduate students, working on topics from: - data handling systems (Wan, Rajasekar) - collection management (Rajasekar) - collection building (Kremenek, Zhu) - information management (Baru, Ludascher, Marciano) - knowledge management (Ludascher, Gupta) - presentation systems & GIS systems (Zaslavsky) - user interfaces (Cowart, Ludascher, Marciano, Zaslavasky, Zhu)
  • #6: 11
  • #8: 18
  • #18: 17
  • #20: 24
  • #21: 26