This document discusses approaches for long-term preservation and access to data and records in a knowledge-based society. It outlines key archival processes like appraisal, arrangement, description, and preservation. It also discusses using a logical name space to organize digital entities and associate attributes. Finally, it proposes that knowledge management, including discovery across collections through semantic relationships, is important for accessing archived data over time as technologies change.
Data repositories -- Xiamen University 2012 06-08Jian Qin
油
The document discusses data repositories and services. It begins by defining what a data repository is, noting that it is a logical and sometimes physical partitioning of data where multiple databases reside. It then outlines some key aspects of data repositories, including technical features like standards, software, and staffing requirements. The document also discusses functions of repositories like content management, archiving, dissemination and system maintenance. It provides examples of institutional repositories and data repositories, highlighting characteristics of each. Finally, it provides a case study on Dryad, an international repository for data and publications in biosciences.
This summary provides the key points from notes on a discussion about open educational resources (OERs):
1. Tracking use of OERs was discussed, including whether it is possible and worthwhile to track usage metrics like downloads, views, and patterns of production over time.
2. Usability of OER repositories for depositing, discovering, and using resources was a topic, along with issues like metadata, search interfaces, and barriers to access.
3. Streaming large files, bandwidth management, and bulk downloading of OERs were additional technical issues that were brought up.
4. Design processes and tools to support creators of OERs, as well as licensing, rights encoding, and ensuring
Open Archival Information Service (OAIS) workshop. Presented by Suzanne Butte, OCLC Library Services Consultant. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 15, 2002 at ALA Annual Conference.
The document discusses operating systems and their components. It defines an operating system as an interface between computer hardware and users that facilitates program execution and resource access. The main components of an operating system are described as the process management, memory management, file management, and user interface. Popular operating systems like Unix, Linux, and Windows are also mentioned.
This document summarizes the history and key aspects of Linux. It describes how Linux originated from earlier operating systems like MULTICS and UNIX. It provides an overview of Linux components like the kernel, GNU tools, and popular desktop interfaces. It also summarizes common Linux commands for file navigation, editing, and process management. Finally, it gives a brief introduction to the GNOME desktop environment.
The document provides a history of the development of the internet from the 1940s to 1995. It discusses early pioneers and technologies that laid the foundations for the internet, such as packet switching, TCP/IP, and the ARPANET network. Key people discussed include Vannevar Bush, J.C.R. Licklider, Paul Baran, and Lawrence Roberts. The ARPANET was created in the late 1960s to solve challenges around reliably sending and receiving electronic messages over wide areas.
The document discusses operating systems and Windows. It defines an operating system as the most important program that runs a computer and manages other programs. It describes operating system functions like recognizing input/output, tracking files, and controlling devices. It outlines four main types of operating systems and provides examples like Windows and MacOS. It explains how operating systems provide platforms for applications and manage memory, hardware, and resources. It provides details on the boot process, graphical user interfaces, desktop components like the taskbar and windows, and basic window features.
Este documento trata sobre estad鱈stica, que es el estudio de datos cuantitativos de poblaciones y recursos para obtener inferencias basadas en probabilidades. Explica conceptos como organizaci坦n y tabulaci坦n de datos, distribuci坦n de frecuencias, medidas descriptivas como media y desviaci坦n est叩ndar, y representaciones gr叩ficas de datos.
Este documento lista una serie de obras de arte y artistas relacionados con el tema del beso, incluyendo pinturas como "El beso robado" de Fran巽oise Boucher, esculturas como "Beso en sombra" de Constantin Brancusi, y fotograf鱈as como "El beso en el ayuntamiento de Par鱈s" de Robert Doisneau. La lista incluye obras de varios siglos creadas por artistas como Gericault, Klimt, Chagall y otros.
The document provides an overview of the Oracle database including its architecture, components, and features. It discusses Oracle's memory structure consisting of the shared pool, database buffer cache, and redo log buffer. It describes Oracle's process structure including background processes like DBWR, LGWR, PMON and SMON. It also covers Oracle's storage structure such as datafiles, redo logs, control files and the physical and logical storage architectures including tablespaces, segments, extents and blocks.
This document provides an overview of Archivematica and Access to Memory (AtoM) and how they can be used together for digital preservation and access. Archivematica is an open source digital preservation system that uses standards to create preservation packages (Archival Information Packages or AIPs) while AtoM is a content management system that can be used to describe and provide access to content. The document discusses how content could be described and managed in AtoM, preserved using Archivematica, and then have access copies and metadata handed back to AtoM for access. Integration with other systems like DSpace is also mentioned. Key features of Archivematica like standards compliance, flexibility and handling different types of digital content are
An perspective into the raise of NoSQL systems and an comparison between RDBMS and NoSQL technologies.
The basic idea of the presentation originated while trying to understand the different alternatives available for managing data while building a fast, highly scalable, available, and reliable enterprise application.
This document discusses creating a digital library service using DSpace. It begins with an introduction to DSpace, a digital content management system. It then covers digital preservation philosophy and strategies used by DSpace. Key differences between institutional repositories and digital libraries are outlined. The document provides details on the features, architecture, standards, and administration of DSpace installations. It presents examples of possible content and concludes with a scenario for making digital resources openly available electronically using DSpace.
Some background and thoughts on Metadata Mapping and Metadata Crosswalks. A collection of online sources and related projects. Comments are more than welcome, as is reuse!
The document discusses metadata schemes and their components. It defines a metadata scheme as a set of defined metadata elements and rules for a specific purpose. It provides examples of common metadata schemes and discusses their semantics (meanings), content rules, and syntax. The document also outlines some key purposes and benefits of metadata such as documentation, organization, search and retrieval, and preservation of information resources.
Dynamic Metadata Management in Semantic File SystemsIJERA Editor
油
The progression in data capacity and difficulty inflicts great challenges for file systems. To address these contests, an inventive namespace management scheme is in distracted need to deliver both the ease and competence of data access. For scalability, each server makes only local, autonomous decisions about relocation for load balancing. Associative access is provided by a traditional extension to present tree-structured file system conventions, and by protocols that are intended specifically for content based access.Rapid attribute-based accesstofile system contents is fulfilled by instinctive extraction and indexing of key properties of file system objects. The programmed indexing of files and calendars is called semantic because user programmable transducers use data about the semantics of efficient file system objects to extract the properties for indexing. Tentative results from a semantic file system execution support the thesis that semantic file systems present a more active storage abstraction than do traditional tree planned file systems for data sharing and command level programming. Semantic file system is executed as a middleware in predictable file systems and works orthogonally with categorized directory trees. The semantic relationships and file groups recognized in file systems can also be used to facilitate file prefetching among other system-level optimizations. All-encompassing trace-driven experiments on our sample implementation validate the efficiency and competence.
A digital library is a special library with a focused collection of digital objects that can include text, visual material, audio material, video material, stored as electronic media formats (as opposed to print, microform, or other media), along with means for organizing, storing, and retrieving the files and media contained in the library collection.
The document provides a history of the development of the internet from the 1940s to 1995. It discusses early pioneers and technologies that laid the foundations for the internet, such as packet switching, TCP/IP, and the ARPANET network. Key people discussed include Vannevar Bush, J.C.R. Licklider, Paul Baran, and Lawrence Roberts. The ARPANET was created in the late 1960s to solve challenges around reliably sending and receiving electronic messages over wide areas.
The document discusses operating systems and Windows. It defines an operating system as the most important program that runs a computer and manages other programs. It describes operating system functions like recognizing input/output, tracking files, and controlling devices. It outlines four main types of operating systems and provides examples like Windows and MacOS. It explains how operating systems provide platforms for applications and manage memory, hardware, and resources. It provides details on the boot process, graphical user interfaces, desktop components like the taskbar and windows, and basic window features.
Este documento trata sobre estad鱈stica, que es el estudio de datos cuantitativos de poblaciones y recursos para obtener inferencias basadas en probabilidades. Explica conceptos como organizaci坦n y tabulaci坦n de datos, distribuci坦n de frecuencias, medidas descriptivas como media y desviaci坦n est叩ndar, y representaciones gr叩ficas de datos.
Este documento lista una serie de obras de arte y artistas relacionados con el tema del beso, incluyendo pinturas como "El beso robado" de Fran巽oise Boucher, esculturas como "Beso en sombra" de Constantin Brancusi, y fotograf鱈as como "El beso en el ayuntamiento de Par鱈s" de Robert Doisneau. La lista incluye obras de varios siglos creadas por artistas como Gericault, Klimt, Chagall y otros.
The document provides an overview of the Oracle database including its architecture, components, and features. It discusses Oracle's memory structure consisting of the shared pool, database buffer cache, and redo log buffer. It describes Oracle's process structure including background processes like DBWR, LGWR, PMON and SMON. It also covers Oracle's storage structure such as datafiles, redo logs, control files and the physical and logical storage architectures including tablespaces, segments, extents and blocks.
This document provides an overview of Archivematica and Access to Memory (AtoM) and how they can be used together for digital preservation and access. Archivematica is an open source digital preservation system that uses standards to create preservation packages (Archival Information Packages or AIPs) while AtoM is a content management system that can be used to describe and provide access to content. The document discusses how content could be described and managed in AtoM, preserved using Archivematica, and then have access copies and metadata handed back to AtoM for access. Integration with other systems like DSpace is also mentioned. Key features of Archivematica like standards compliance, flexibility and handling different types of digital content are
An perspective into the raise of NoSQL systems and an comparison between RDBMS and NoSQL technologies.
The basic idea of the presentation originated while trying to understand the different alternatives available for managing data while building a fast, highly scalable, available, and reliable enterprise application.
This document discusses creating a digital library service using DSpace. It begins with an introduction to DSpace, a digital content management system. It then covers digital preservation philosophy and strategies used by DSpace. Key differences between institutional repositories and digital libraries are outlined. The document provides details on the features, architecture, standards, and administration of DSpace installations. It presents examples of possible content and concludes with a scenario for making digital resources openly available electronically using DSpace.
Some background and thoughts on Metadata Mapping and Metadata Crosswalks. A collection of online sources and related projects. Comments are more than welcome, as is reuse!
The document discusses metadata schemes and their components. It defines a metadata scheme as a set of defined metadata elements and rules for a specific purpose. It provides examples of common metadata schemes and discusses their semantics (meanings), content rules, and syntax. The document also outlines some key purposes and benefits of metadata such as documentation, organization, search and retrieval, and preservation of information resources.
Dynamic Metadata Management in Semantic File SystemsIJERA Editor
油
The progression in data capacity and difficulty inflicts great challenges for file systems. To address these contests, an inventive namespace management scheme is in distracted need to deliver both the ease and competence of data access. For scalability, each server makes only local, autonomous decisions about relocation for load balancing. Associative access is provided by a traditional extension to present tree-structured file system conventions, and by protocols that are intended specifically for content based access.Rapid attribute-based accesstofile system contents is fulfilled by instinctive extraction and indexing of key properties of file system objects. The programmed indexing of files and calendars is called semantic because user programmable transducers use data about the semantics of efficient file system objects to extract the properties for indexing. Tentative results from a semantic file system execution support the thesis that semantic file systems present a more active storage abstraction than do traditional tree planned file systems for data sharing and command level programming. Semantic file system is executed as a middleware in predictable file systems and works orthogonally with categorized directory trees. The semantic relationships and file groups recognized in file systems can also be used to facilitate file prefetching among other system-level optimizations. All-encompassing trace-driven experiments on our sample implementation validate the efficiency and competence.
A digital library is a special library with a focused collection of digital objects that can include text, visual material, audio material, video material, stored as electronic media formats (as opposed to print, microform, or other media), along with means for organizing, storing, and retrieving the files and media contained in the library collection.
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
油
Argonnes Discovery Engines for Big Data project is working to enable new research modalities based on the integration of advanced computing with experiments at facilities such as the Advanced Photon Source (APS). I review science drivers and initial results in diffuse scattering, high energy diffraction microscopy, tomography, and pythography. I also describe the computational methods and infrastructure that we leverage to support such applications, which include the Petrel online data store, ALCF supercomputers, Globus research data management services, and Swift parallel scripting. This work points to a future in which tight integration of DOEs experimental and computational facilities enables both new science and more efficient and rapid discovery.
FAIR Workflows and Research Objects get a Workout Carole Goble
油
油
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) thats how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
The document discusses data grids, which aggregate distributed computing, storage, and network resources to provide unified access to large datasets shared worldwide. A taxonomy is presented for classifying data grids based on their organization, data transport, data replication and storage, and resource allocation and scheduling. Several technologies are classified within this taxonomy, including their approaches to data transport, replication, and scheduling. The document concludes by discussing how Genesis II could be classified within this taxonomy.
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
This document provides an overview of database creation, granting user rights, threats and security, and backup. It discusses creating a database, defining tables and fields, granting users access to tables, common database threats and how to manage security, and the importance of backups. Physical database design involves translating the logical data model into technical specifications for storing and retrieving data efficiently while maintaining integrity, security, and recoverability.
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Amit Sheth
油
Amit Sheth, Keynote: International Conference on Interoperating Geographic Systems (Interop97), Santa Barbara, December 3-4 1997.
Related technical paper: http://knoesis.org/library/resource.php?id=00230
The document provides an overview of Oracle Database including its architecture, components, and new features. It discusses Oracle's memory structure including the shared pool, database buffer cache, and redo log buffer. It describes Oracle processes like the DBWR, LGWR, PMON, and user processes. It also covers Oracle's storage structure, files, tablespaces, and segments. New features discussed include VLDB support, parallel processing, partitioning, and Internet capabilities in Oracle8 and Oracle9i.
The document provides an overview of Oracle Database including its architecture, components, and functions. It discusses Oracle's three-level database architecture consisting of the external, conceptual, and internal levels. It also describes Oracle's memory structure including the shared pool, database buffer cache, and redo log buffer. Key Oracle background processes like DBWR, LGWR, PMON, SMON, and CKPT are summarized.
This document presents a framework for a blockchain-based content management system with cognitive processing capabilities. It discusses 4 objectives: 1) Surveying existing CMS and security technologies, 2) Designing a hybrid model for classifying documents, 3) Designing a blockchain storage mechanism, and 4) Designing an optimized confidential data tracker. Work completed includes surveys of CMS and security technologies, development of a hybrid classifier model, use of blockchain and IPFS for encrypted storage of files, and generation of a tracker ID for files. Several publications on parts of the framework are also mentioned.
This document discusses end-to-end digital preservation for diverse collections using open source tools Archivematica and Access to Memory (AtoM). It provides overviews of Archivematica, which creates standards-based Archival Information Packages (AIPs) for long-term preservation, and AtoM, which allows for standards-based description and access in a multilingual, multi-repository environment. Integration between the two is described to provide a workflow where content is preserved using Archivematica and metadata and access copies are managed and provided in AtoM.
This document provides an overview of digital libraries, including definitions, benefits, limitations, components, standards, and challenges. It defines a digital library as a collection of information stored and accessed electronically, extending the functions of a traditional library digitally. Benefits include improved access and searchability, easier information sharing and preservation. Emerging technologies discussed include metadata standards, XML, and protocols like OAI-PMH for metadata harvesting. Common digital library software includes DSpace, Greenstone, and EPrints. Challenges involve digitization, description, legal issues, presentation of heterogeneous resources, and economic sustainability.
1. Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomputer Center [email_address] http://www.npaci.edu/DICE/
2. Data and Knowledge Systems Group Staff Reagan Moore Ilkai Altintas Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek M. Kulrul Bertram Lud辰scher Richard Marciano A. Memon XuFei Qian Roman Olshanowsky Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Graduate Students A. Bagchi S. Bansal A. Behere R. Bharath S. Bharath L. Sui Undergraduate Interns N. Cotofana D. Le J. Trang L. Yin +/- NN
3. Topics Building persistent archives Data grids Authenticity mechanisms Managing technology evolution Knowledge-based access
4. Archival Processes 件 Appraisal determine the archivable content 件 Accession - determine the initial physical location for the data, and the relationship of the new collection to existing collections Arrangemen t - add administration control, describe the information content (provenance, authenticity, structure, administrative), and decompose digital objects into their components as needed. Description - complete the definition of collection attributes by iterating between arrangement, reformatting, and representation. Preservation build an archivable form of the digital entities, characterize the collection context , and manage their storage 件 Access provide query mechanisms for discovering, retrieving, and presenting the digital entities.
6. Common Approach (digital library, persistent archive, data grid) Logical name space used to organize digital entities, and associate attributes Separation of information management from data storage management Definition of abstraction mechanisms for dealing with repositories Emergence of need for knowledge management
7. Java, NT Browsers Web WSDL Prolog Predicate SDSC Storage Resource Broker & Meta-data Catalog Levels of Abstraction Application HRM Clients Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Unix Shell Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX C, C++, Libraries
8. Authenticity Guarantee that the data has not been changed Collection owned data, only accessible through the data handling system Support roles defining access (curation, owner, annotation, read) Support access controls mapping users to roles Audit trails that record all operations on files Digital signatures - cryptographic checksums
9. Managing Technology Evolution Data grids provide interoperability mechanisms to access data in multiple administration domains and multiple types of storage systems. Persistent archives migrate collections from old technology to new technology to support presentation on new systems Both require the ability to access heterogeneous systems
10. Presentation of Digital Objects Storage System Operating System Application Digital Object Display System
11. Technology Management - Emulation New Storage System New Operating System Old Application Digital Object New Display System Wrap Application
12. Technology Management New Storage System New Operating System Old Application Digital Object New Display System Add Operating System Call
13. Technology Management Old Storage System New Operating System Old Application Digital Object Old Display System Add Operating System Call Add Operating System Call
14. Technology Management Migration New Storage System New Operating System New Application Digital Object New Display System Migrate Encoding Format
15. Technology Management - SDSC Old Storage System New Operating System New Application Digital Object Old Display System Wrap Storage System Wrap Display System Migrate Encoding Format
16. Accessing Archived Data Name transparency Access data without knowing the file name Map from attributes to a local file name Location transparency Access data without knowing where it is stored Map from global file name to local file name Collection transparency Access data without knowing the collection attributes Map from concept space to collection attributes
17. Information Management- Logical Name Space Set of attributes to describe digital entities that are registered into the logical name space SRB metadata - Unix file system semantics Provenance metadata - Dublin Core Resource metadata - User access control lists Discipline metadata - User defined attributes Each digital entity may have unique attributes
18. Knowledge Management - Discovery across Collections Mapping from collection attributes to discipline concepts Make queries based on discipline concepts Characterization of relationships between attributes Semantic / logical - cross-walks Procedural / temporal - records management Structural / spatial - GIS
19. Knowledge Based Data Grids Attributes Semantics Knowledge Information Data Ingest Services Management Access Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs)
#3: 2 The Data Intensive Computing Environment group at the San Diego Supercomputer Center has 16 full-time staff members, and 6-10 associated graduate students, working on topics from: - data handling systems (Wan, Rajasekar) - collection management (Rajasekar) - collection building (Kremenek, Zhu) - information management (Baru, Ludascher, Marciano) - knowledge management (Ludascher, Gupta) - presentation systems & GIS systems (Zaslavsky) - user interfaces (Cowart, Ludascher, Marciano, Zaslavasky, Zhu)