The document discusses interoperability and technical collaboration for web and social media archiving. It describes Heritrix, an archival crawler for web archiving, and newer approaches like headless browsers and archiving proxies that can execute JavaScript and support more capture tools. It also discusses leveraging APIs to reliably collect higher-fidelity social media data and aligning social media harvesting with web archiving. Key questions raised include how to build technical architectures and community frameworks to facilitate broad participation in web and social archiving, increase distributed capacity, and make archiving more inclusive.
1 of 12
Download to read offline
More Related Content
Interoperability and Technical Collaboration for Web and Social Media Archiving
1. Interoperability and Technical
Collaboration for Web and Social
Media Archiving
Nicholas Taylor (@nullhandle)
Web Archiving Service Manager
Stanford University Libraries
DocNow Advisory Board Meeting
August 22, 2016
4. of ten years ago
Internet Archive: Stanford University Homepage
5. newer capture approaches
headless browsers
to prospect content only apparent by
executing JavaScript
archiving proxies
to enable more, and more specialized,
capture tools to write to WARC
leveraging APIs
to more reliably collect higher-fidelity data
from major social media services
6. WARC in Social Feed Manager
Justin Littman: Aligning Social Media Harvesting and Web Harvesting