The document summarizes work to publish metadata about a live music archive collection as linked data. Key points:
- The metadata from the Internet Archive's Live Music Archive of community-contributed live recordings is published as linked data using semantic technologies like RDF.
- The data is aligned with external resources like MusicBrainz, Geonames, and DBpedia to provide additional context.
- A SPARQL endpoint allows querying the structured data to extract interesting subcollections, such as performances by artists in their home towns.
Convert to study guideBETA
Transform any presentation into a summarized study guide, highlighting the most important points and key insights.
1 of 31
Downloaded 12 times
More Related Content
Linked Data Publication of Live Music Archives
1. Hello Cleveland!
Linked Data Publication of Live Music Archives
Sean Bechhofer*, Kevin Page+, David De Roure+
*School of Computer Science, University of Manchester
+Oxford eResearch Centre, University of Oxford
@seanbechhofer
DMRN+7, QMUL, December 2012
2. The Proposition
犢 Publication of structured metadata describing an audio
collection
犢 Links to external resources provide additional context
and information
犢 Rich query to allow the extraction of interesting
subcollections
2
3. The Players
The Internet Archive Live Music Archive
Community contributed live audio recordings
Semantic Technologies
RDF, Ontologies, SPARQL and Linked Data
Additional resources
Artist DBs, Geographical Information,Venue information, etc.
Some ruby scripts.....
3
4. The etree Collection
Internet Archive Live Music Archive
Community contributed live performance recordings
Legal bootlegs
Approx 4,000 artists,
100,000 performances
Why is it interesting?
Audio available in various formats
mp3, ogg, shn, 鍖ac....
Multiple performances by artists
Cover versions
4
5. Semantic Technologies
Semantic Technologies aim to provide structured, machine
readable representations of content
Uni鍖ed frameworks for (meta)data
RDF: Resource Description Framework
Triple based representation of information
OWL/SKOS: Ontologies & Vocabularies for content description
Shared vocabularies plus de鍖nitional capabilities
SPARQL
A query language for RDF data
A generic API
5
6. Semantic Technologies
RDF OWL/SKOS
Triple Based Representation Shared Vocabularies for
Common Data Model content description
Identi鍖cation via URIs
Facilitating interoperation and
exchange
Easy Integration
Everybody talks the same
Graph Merging language
OWL allows for rich
Query via SPARQL expressions and de鍖nitions
A 鍖exible, generic API
SKOS supports simpler
thesauri/controlled
vocabularies
6
7. Linked Data
A set of common principles for data publication
1. Use URIs for identi鍖cation
2. Use HTTP URIs (that will dereference)
3. Return useful information when dereferenced
4. Include links in that information
Common infrastructure facilitates construction of applications.
Use of content negotiation to supply appropriate
representations
7
8. Linked Data Resources
MusicBrainz
RDF conversions of MusicBrainz data
Geonames
Information about locations
DBpedia
Structured representation of Wikipedia content
BBC
Programme information, artist information
8
9. Data mangling
Download of etree metadata 鍖les
Simple data conversion
XML to RDF
etree data model
Alignments
String matching plus bespoke
methods for locations
Explicit capture of alignments
Publication Infrastructure
fuseki server + pubby front end
9
11. Data Alignment
MusicBrainz
Artist alignment via simple name queries
Geographical Locations
Query against Geonames
Query against last.fm
Combination of string matching and lat/long
11
12. Layering
Alignments are captured in an additional layer of data on top of
the underlying source facts
Preserving original metadata
Allows clients to make their own judgements
sameAs
Preserves subjectivity
Explicitly exposing the source of the mappings
Use of Provenance vocabularies
12
29. Discussion
So far entirely metadata based
No processing of underlying audio
Alignment is a little messy
But has to be automated
Dataset itself is an interesting artefact
Contrasts with some other LD activities.
Is this actually useful?
Do artists really get a better reception when
they play in their home town?
29
30. The Future
Better alignment
Beyond simple string queries
More alignment
Adding in, e.g. MusicBrainz track/work resources
Other collections?
Modelling questions
Characterising Alignments
Audio Fingerprinting
Identifying further track level matches
Crowdsourcing corrections
Extracting subcollections
What would you want??
30