際際滷

際際滷Share a Scribd company logo
Interoperability and Technical
Collaboration for Web and Social
Media Archiving
Nicholas Taylor (@nullhandle)
Web Archiving Service Manager
Stanford University Libraries
DocNow Advisory Board Meeting
August 22, 2016
Heritrix archival crawler
Screenshot of Heritrix 1.8.0 admin console... by Frank McCown under MPL 1.1
Heritrix is great for archiving the Web
Stanford University
of ten years ago
Internet Archive: Stanford University Homepage
newer capture approaches
 headless browsers
 to prospect content only apparent by
executing JavaScript
 archiving proxies
 to enable more, and more specialized,
capture tools to write to WARC
 leveraging APIs
 to more reliably collect higher-fidelity data
from major social media services
WARC in Social Feed Manager
Justin Littman: Aligning Social Media Harvesting and Web Harvesting
web archiving system APIs (WASAPI)
technical architectures to
facilitate contributions by a
broad community?
community frameworks to
enable broad participation in
shaping technologies?
how to build more, and more
distributed, capacity?
how to make web and social
media archiving more inclusive?
Interoperability and Technical Collaboration for Web and Social Media Archiving

More Related Content

Interoperability and Technical Collaboration for Web and Social Media Archiving