ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Open-Source Big Data
for Archives and Libraries:

An                        Case Study



          Peter Van Garderen,
         President/Systems Archivist

              MJ Suhonos,
     Systems Librarian/Software Engineer
Access2011 - Van Garderen: Occupy The Memory
Free Beer!
http://archivematica.org

http://ica-atom.org

http://dcb-gcn.canadiana.org


http://qubit-toolkit.org
open-source sofware for archives
and libraries

digital preservation consulting
services                                       Peter Van Garderen (MAS)
                                               President / Systems Archivist
http://artefactual.com                         @pjvangarderen




                                   Evelyn McLellan (MAS)   Jessica Bushey (MAS)   Courtney Mumma         MJ Suhonos (MLIS)
                                   Systems Archivist       Systems Archivist      (MAS/MLIS)             Systems Librarian /
                                                                                  Systems Archivist      Software Engineer




                                   David Juhasz            Austin Trask            Jes¨²s Garc¨ªa Crespo   Joseph Perry
                                   Software Engineer       Software Engineer       Software Engineer     Software Engineer
http://archivesspace.org ?
Artefactual clients and project sponsors

  International Council on Archives                                     ¡ñ
                                                                            Provincial Archives of Alberta
¡ñ

  UNESCO Memory of the World                                            ¡ñ
                                                                            Alberta Government Services Ministry
¡ñ

  UNESCO Archives                                                       ¡ñ
                                                                            Insurance Corporation of British Columbia
¡ñ

  United Nations Archives and Records Management Section                ¡ñ
                                                                            Archives Association of British Columbia
¡ñ

  The World Bank Group                                                  ¡ñ
                                                                            Archives Society of Alberta
¡ñ

  International Monetary Fund                                           ¡ñ
                                                                            Archives Association of Ontario
¡ñ

  NATO Archives                                                         ¡ñ
                                                                            Association for Manitoba Archives
¡ñ

  International Records Management Trust                                    University of British Columbia Library
¡ñ                                                                       ¡ñ

  Rockefeller Archive Center                                                Simon Fraser University Archives
¡ñ                                                                       ¡ñ

  Library and Archives Canada                                               Simon Fraser University Library
¡ñ                                                                       ¡ñ

  Canadian Council of Archives                                              University of Victoria Archives
¡ñ                                                                       ¡ñ
  Canadiana                                                                 University of Toronto iSchool Institute
¡ñ                                                                       ¡ñ
  National Archives of the Netherlands                                      University of Northern British Columbia Library and Archives
¡ñ                                                                       ¡ñ
  Dutch Ministry of the Interior and Kingdom Relations                      University of Strathclyde Archives
¡ñ                                                                       ¡ñ
  Dutch Institute for Archival Research and Education (Archiefschool)       British Columbia Electronic Library Network
¡ñ
                                                                        ¡ñ
  British Commonwealth Secretariat
¡ñ
                                                                        ¡ñ
                                                                            University of British Columbia Irving K. Barber Learning Centre
  United Kingdom Department for International Development
¡ñ                                                                           Diocese of New Westminster - Anglican Church of Canada Archives
  Direction des Archives de France
                                                                        ¡ñ

¡ñ                                                                           City of Vancouver Archives
  United Arab Emirates Center for Documentation and Research
                                                                        ¡ñ

¡ñ                                                                           City of Toronto Corporate Information Management Services
  Al-Dhakira Al-Arabiyya                                                ¡ñ

¡ñ                                                                           City of Rotterdam Archives
  Association of Brazilian Archivists                                   ¡ñ

¡ñ                                                                           City of Edmonton Archives
  Botswana National Archives and Records Service                        ¡ñ

¡ñ
                                                                            Squamish Public Library
  Caribbean Regional Branch of the International Council on Archives    ¡ñ

¡ñ
                                                                            West Vancouver Museum and Archives
  American Institute of Architects                                      ¡ñ

¡ñ
                                                                            Whistler Museum and Archives
  British Columbia Museum and Archives                                  ¡ñ

¡ñ
                                                                            Langley Centennial Museum and National Exhibition Centre
  British Columbia Ministry of Management Services                      ¡ñ
¡ñ
                                                                        ¡ñ
                                                                            Stirling Council Archives
Archivists & Librarians:
           Who are we?

Who are we in the face of Google, ebooks,
iTunes, Facebook, Flickr, Internet Archive,
Ancestry.com, History Channel, Sharepoint,
Twitter...


Who are we in the face of our traditional
services, our traditional identity? tight
budgets?
we're space
http://www.vancouverarchives.ca/2011/06/forming-a-new-archives/
Access2011 - Van Garderen: Occupy The Memory
we're Trusted Digital Repositories


          we're portals


           we're code
we're context
all creation is connected
in various ways
in a marvelous spatial balance.
Out of the formation of new entities
has emerged
                information
resulting in communication
and memory

Hugh Taylor. ¡°The Archivist, the Letter, and the Spirit¡±
  Archivaria 43 Association of Canadian Archivists (1997) p6
  http://journals.sfu.ca/archivar
contextualize
                                                 authenticate
                                                      relate / bind
           file system       file format codec
                                                                   find
                character encoding fonts packaging      decryption
          error correction        operating system    compression    metadata
now                                                                             future
                  storage media          storage driver       input / output devices Accessible?
      bitstream         storage device      application software     user interface Usable?
                                                                                     Authentic?
         stored
              conserved
                    protected
Accessible?
      In your scope,                                    Usable?
       I am content                                     Authentic?




      <metadata isa=¡±love note to the future¡± />
now                                                      future
                           communication             wisdom

                                            memory        consciousness
Access2011 - Van Garderen: Occupy The Memory
Access2011 - Van Garderen: Occupy The Memory
Doctoral Candidate, Archival Science
Access2011 - Van Garderen: Occupy The Memory
we're the 99%
¡ñ
    We the people, helped by our archivists &
    librarians, should be in charge of:
    ¡ñ
        the space
    ¡ñ
        the portals
    ¡ñ
        the Trusted Digital Repositories
    ¡ñ
        the code
    ¡ñ
        the information
we're the 99%
¡ñ
    We the people, helped by our archivists &
    librarians, should be in charge of:
    ¡ñ
        the space
    ¡ñ
        the portals
    ¡ñ
        the Trusted Digital Repositories
    ¡ñ
        the code
    ¡ñ
        the information
         ¡ñ
          the public record
         ¡ñ
          the social network
         ¡ñ
          personal archives
         ¡ñ
          big data
#occupy the memory
¡ñ
    We the people, helped by our archivists &
    librarians, should be in charge of:
    ¡ñ
        the space
    ¡ñ
        the portals
    ¡ñ
        the Trusted Digital Repositories
    ¡ñ
        the code
    ¡ñ
        the information


    occupythememory.org
¡°They¡¯ll never take
       our freedom!¡±




??1995?Paramount?Pictures?&?20th?Century?Fox
See?fair?use?rationale:?http://en.wikipedia.org/wiki/File:Brave_mel.jpg
Users                                                                    Foundation or
                                                                                   Steering Committee
        Lead institutions
           Funding
           Development            Code                                                 Governance
        All users                 Time                                      Time
           Bug reports            Money                                   Money        Coordination
           Enhancement requests   Knowledge                            Knowledge
                                                                                         Funding
           Code patches
                                              Open Source Software                      Promotion
           Documentation
           Promotion


                                                       Code

                                                    Knowledge

                                                    Community



                                                   Code
                                                    Time
                                                  Money
                                               Knowledge




                                                  Service Providers

                                                     Development
                                                   Technical Support
                                                       Hosting
                                                       Training
                                                      Promotion
The open-source eco-system
Access2011 - Van Garderen: Occupy The Memory
hosting                      Community Support
installation                 We will try to answer fairly straight-forward
integration                  questions from the open source community about
software development         installing and configuring our software. When we
tech support                 think a particular query is beyond these free support
training                     parameters (too specific, in-depth, or time-
system analysis              consuming) we will inform the user that it may be
strategy                     necessary to address it as paid, commercial support.

$125/hr                      Commercial Support
                             Our software is always free and open source, but
Annual maintenance program   with our optional hosting and support services, the
                             Artefactual development team will assist a client
                             with more in-depth questions to get the software
                             installed and operating as required, whether on one
                             of our servers or their own.
Access2011 - Van Garderen: Occupy The Memory
Propel    ZSL
ORM      index
Big Data in Canadian
        Library and Archives: How Big?
¡ñ   MemoryBC.ca <100,00 archival descriptions &
    authority
¡ñ   Archeion.ca <100,000 archival descriptions & authority
¡ñ   Canadiana Portal: 1 million items, 4-5 million records
¡ñ   Toronto Public Library: 3 million MARC records
¡ñ   Library Archives Canada: 3.5 million MARC records
¡ñ   ArchivesCanada.ca: with LAC & BNQ? (<5 million?)
¡ñ   City of Vancouver: >25TB of digital files from VANOC
Attribution
Title:?????????Open?Source?Big?Data?for?Archives?and?Libraries:?An?Artefactual?Systems?Case?Study
Creator:????Peter?Van?Garderen?&?MJ?Suhonos
Publisher:?Artefactual?Systems?Inc.
Date:????????October?20,?2011




                        The?original?content?in?this?presentation?is?Copyright?Artefactual?Systems?Inc.?2011.?You?may?
                        freely?re?use?this?content?under?the?terms?of?the?Creative?Commons?Attribution?Non?Commercial?
                        Share?Alike?3.0?license

More Related Content

Access2011 - Van Garderen: Occupy The Memory

  • 1. Open-Source Big Data for Archives and Libraries: An Case Study Peter Van Garderen, President/Systems Archivist MJ Suhonos, Systems Librarian/Software Engineer
  • 5. open-source sofware for archives and libraries digital preservation consulting services Peter Van Garderen (MAS) President / Systems Archivist http://artefactual.com @pjvangarderen Evelyn McLellan (MAS) Jessica Bushey (MAS) Courtney Mumma MJ Suhonos (MLIS) Systems Archivist Systems Archivist (MAS/MLIS) Systems Librarian / Systems Archivist Software Engineer David Juhasz Austin Trask Jes¨²s Garc¨ªa Crespo Joseph Perry Software Engineer Software Engineer Software Engineer Software Engineer
  • 7. Artefactual clients and project sponsors International Council on Archives ¡ñ Provincial Archives of Alberta ¡ñ UNESCO Memory of the World ¡ñ Alberta Government Services Ministry ¡ñ UNESCO Archives ¡ñ Insurance Corporation of British Columbia ¡ñ United Nations Archives and Records Management Section ¡ñ Archives Association of British Columbia ¡ñ The World Bank Group ¡ñ Archives Society of Alberta ¡ñ International Monetary Fund ¡ñ Archives Association of Ontario ¡ñ NATO Archives ¡ñ Association for Manitoba Archives ¡ñ International Records Management Trust University of British Columbia Library ¡ñ ¡ñ Rockefeller Archive Center Simon Fraser University Archives ¡ñ ¡ñ Library and Archives Canada Simon Fraser University Library ¡ñ ¡ñ Canadian Council of Archives University of Victoria Archives ¡ñ ¡ñ Canadiana University of Toronto iSchool Institute ¡ñ ¡ñ National Archives of the Netherlands University of Northern British Columbia Library and Archives ¡ñ ¡ñ Dutch Ministry of the Interior and Kingdom Relations University of Strathclyde Archives ¡ñ ¡ñ Dutch Institute for Archival Research and Education (Archiefschool) British Columbia Electronic Library Network ¡ñ ¡ñ British Commonwealth Secretariat ¡ñ ¡ñ University of British Columbia Irving K. Barber Learning Centre United Kingdom Department for International Development ¡ñ Diocese of New Westminster - Anglican Church of Canada Archives Direction des Archives de France ¡ñ ¡ñ City of Vancouver Archives United Arab Emirates Center for Documentation and Research ¡ñ ¡ñ City of Toronto Corporate Information Management Services Al-Dhakira Al-Arabiyya ¡ñ ¡ñ City of Rotterdam Archives Association of Brazilian Archivists ¡ñ ¡ñ City of Edmonton Archives Botswana National Archives and Records Service ¡ñ ¡ñ Squamish Public Library Caribbean Regional Branch of the International Council on Archives ¡ñ ¡ñ West Vancouver Museum and Archives American Institute of Architects ¡ñ ¡ñ Whistler Museum and Archives British Columbia Museum and Archives ¡ñ ¡ñ Langley Centennial Museum and National Exhibition Centre British Columbia Ministry of Management Services ¡ñ ¡ñ ¡ñ Stirling Council Archives
  • 8. Archivists & Librarians: Who are we? Who are we in the face of Google, ebooks, iTunes, Facebook, Flickr, Internet Archive, Ancestry.com, History Channel, Sharepoint, Twitter... Who are we in the face of our traditional services, our traditional identity? tight budgets?
  • 12. we're Trusted Digital Repositories we're portals we're code
  • 14. all creation is connected in various ways in a marvelous spatial balance. Out of the formation of new entities has emerged information resulting in communication and memory Hugh Taylor. ¡°The Archivist, the Letter, and the Spirit¡± Archivaria 43 Association of Canadian Archivists (1997) p6 http://journals.sfu.ca/archivar
  • 15. contextualize authenticate relate / bind file system file format codec find character encoding fonts packaging decryption error correction operating system compression metadata now future storage media storage driver input / output devices Accessible? bitstream storage device application software user interface Usable? Authentic? stored conserved protected
  • 16. Accessible? In your scope, Usable? I am content Authentic? <metadata isa=¡±love note to the future¡± /> now future communication wisdom memory consciousness
  • 21. we're the 99% ¡ñ We the people, helped by our archivists & librarians, should be in charge of: ¡ñ the space ¡ñ the portals ¡ñ the Trusted Digital Repositories ¡ñ the code ¡ñ the information
  • 22. we're the 99% ¡ñ We the people, helped by our archivists & librarians, should be in charge of: ¡ñ the space ¡ñ the portals ¡ñ the Trusted Digital Repositories ¡ñ the code ¡ñ the information ¡ñ the public record ¡ñ the social network ¡ñ personal archives ¡ñ big data
  • 23. #occupy the memory ¡ñ We the people, helped by our archivists & librarians, should be in charge of: ¡ñ the space ¡ñ the portals ¡ñ the Trusted Digital Repositories ¡ñ the code ¡ñ the information occupythememory.org
  • 24. ¡°They¡¯ll never take our freedom!¡± ??1995?Paramount?Pictures?&?20th?Century?Fox See?fair?use?rationale:?http://en.wikipedia.org/wiki/File:Brave_mel.jpg
  • 25. Users Foundation or Steering Committee Lead institutions Funding Development Code Governance All users Time Time Bug reports Money Money Coordination Enhancement requests Knowledge Knowledge Funding Code patches Open Source Software Promotion Documentation Promotion Code Knowledge Community Code Time Money Knowledge Service Providers Development Technical Support Hosting Training Promotion The open-source eco-system
  • 27. hosting Community Support installation We will try to answer fairly straight-forward integration questions from the open source community about software development installing and configuring our software. When we tech support think a particular query is beyond these free support training parameters (too specific, in-depth, or time- system analysis consuming) we will inform the user that it may be strategy necessary to address it as paid, commercial support. $125/hr Commercial Support Our software is always free and open source, but Annual maintenance program with our optional hosting and support services, the Artefactual development team will assist a client with more in-depth questions to get the software installed and operating as required, whether on one of our servers or their own.
  • 29. Propel ZSL ORM index
  • 30. Big Data in Canadian Library and Archives: How Big? ¡ñ MemoryBC.ca <100,00 archival descriptions & authority ¡ñ Archeion.ca <100,000 archival descriptions & authority ¡ñ Canadiana Portal: 1 million items, 4-5 million records ¡ñ Toronto Public Library: 3 million MARC records ¡ñ Library Archives Canada: 3.5 million MARC records ¡ñ ArchivesCanada.ca: with LAC & BNQ? (<5 million?) ¡ñ City of Vancouver: >25TB of digital files from VANOC
  • 31. Attribution Title:?????????Open?Source?Big?Data?for?Archives?and?Libraries:?An?Artefactual?Systems?Case?Study Creator:????Peter?Van?Garderen?&?MJ?Suhonos Publisher:?Artefactual?Systems?Inc. Date:????????October?20,?2011 The?original?content?in?this?presentation?is?Copyright?Artefactual?Systems?Inc.?2011.?You?may? freely?re?use?this?content?under?the?terms?of?the?Creative?Commons?Attribution?Non?Commercial? Share?Alike?3.0?license