際際滷

際際滷Share a Scribd company logo
Hello Cleveland!



Linked Data Publication of Live Music Archives
      Sean Bechhofer*, Kevin Page+, David De Roure+
    *School of Computer Science, University of Manchester
       +Oxford eResearch Centre, University of Oxford


                      @seanbechhofer

             DMRN+7, QMUL, December 2012
The Proposition
犢 Publication of structured metadata describing an audio
  collection

犢 Links to external resources provide additional context
  and information

犢 Rich query to allow the extraction of interesting
  subcollections




                                                           2
The Players
 The Internet Archive Live Music Archive
  
      Community contributed live audio recordings


 Semantic Technologies
  
      RDF, Ontologies, SPARQL and Linked Data


 Additional resources
  
      Artist DBs, Geographical Information,Venue information, etc.

 Some ruby scripts.....


                                                                     3
The etree Collection
 Internet Archive Live Music Archive
 Community contributed live performance recordings
  
          Legal bootlegs
 Approx 4,000 artists,
  
          100,000 performances
 Why is it interesting?
  
          Audio available in various formats
      
            mp3, ogg, shn, 鍖ac....
  
          Multiple performances by artists
  
          Cover versions


                                                      4
Semantic Technologies
 Semantic Technologies aim to provide structured, machine
  readable representations of content
  
      Uni鍖ed frameworks for (meta)data


 RDF: Resource Description Framework
  
      Triple based representation of information
 OWL/SKOS: Ontologies & Vocabularies for content description
  
      Shared vocabularies plus de鍖nitional capabilities
 SPARQL
  
      A query language for RDF data
  
      A generic API

                                                                5
Semantic Technologies
                    RDF                       OWL/SKOS
       Triple Based Representation    Shared Vocabularies for
       Common Data Model               content description
       Identi鍖cation via URIs         
                                           Facilitating interoperation and
                                           exchange
       Easy Integration               
                                           Everybody talks the same
    
         Graph Merging                     language
                                       OWL allows for rich
 Query via SPARQL                      expressions and de鍖nitions
         A 鍖exible, generic API
                                       SKOS supports simpler
    




                                        thesauri/controlled
                                        vocabularies
                                                                             6
Linked Data
 A set of common principles for data publication

    1.   Use URIs for identi鍖cation
    2.   Use HTTP URIs (that will dereference)
    3.   Return useful information when dereferenced
    4.   Include links in that information

 Common infrastructure facilitates construction of applications.
 Use of content negotiation to supply appropriate
  representations

                                                                    7
Linked Data Resources
 MusicBrainz
  
      RDF conversions of MusicBrainz data
 Geonames
  
      Information about locations
 DBpedia
  
      Structured representation of Wikipedia content
 BBC
  
      Programme information, artist information




                                                       8
Data mangling
 Download of etree metadata 鍖les
 Simple data conversion
  
      XML to RDF
  
      etree data model
 Alignments
  
      String matching plus bespoke
      methods for locations
  
      Explicit capture of alignments
 Publication Infrastructure
  
      fuseki server + pubby front end



                                        9
Modelling




Music Ontology
Event Ontology
                             10
Data Alignment
 MusicBrainz
  
      Artist alignment via simple name queries


 Geographical Locations
  
      Query against Geonames
  
      Query against last.fm
  
      Combination of string matching and lat/long




                                                    11
Layering
 Alignments are captured in an additional layer of data on top of
  the underlying source facts
 Preserving original metadata
      Allows clients to make their own judgements
                                                    sameAs
  


  
      Preserves subjectivity
 Explicitly exposing the source of the mappings
  
      Use of Provenance vocabularies




                                                                 12
Modelling



Similarity Ontology




                                  13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Big Picture




              28
Discussion
 So far entirely metadata based
  
      No processing of underlying audio
 Alignment is a little messy
  
      But has to be automated
 Dataset itself is an interesting artefact
  
      Contrasts with some other LD activities.
 Is this actually useful?


             Do artists really get a better reception when
                    they play in their home town?

                                                             29
The Future
 Better alignment
  
      Beyond simple string queries
 More alignment
  
      Adding in, e.g. MusicBrainz track/work resources
  
      Other collections?
  
      Modelling questions
 Characterising Alignments
 Audio Fingerprinting
  
      Identifying further track level matches
 Crowdsourcing corrections
 Extracting subcollections
  
      What would you want??
                                                         30
Thanks! Youve been a
   great audience!




http://etree.linkedmusic.org
                               31

More Related Content

Linked Data Publication of Live Music Archives

  • 1. Hello Cleveland! Linked Data Publication of Live Music Archives Sean Bechhofer*, Kevin Page+, David De Roure+ *School of Computer Science, University of Manchester +Oxford eResearch Centre, University of Oxford @seanbechhofer DMRN+7, QMUL, December 2012
  • 2. The Proposition 犢 Publication of structured metadata describing an audio collection 犢 Links to external resources provide additional context and information 犢 Rich query to allow the extraction of interesting subcollections 2
  • 3. The Players The Internet Archive Live Music Archive Community contributed live audio recordings Semantic Technologies RDF, Ontologies, SPARQL and Linked Data Additional resources Artist DBs, Geographical Information,Venue information, etc. Some ruby scripts..... 3
  • 4. The etree Collection Internet Archive Live Music Archive Community contributed live performance recordings Legal bootlegs Approx 4,000 artists, 100,000 performances Why is it interesting? Audio available in various formats mp3, ogg, shn, 鍖ac.... Multiple performances by artists Cover versions 4
  • 5. Semantic Technologies Semantic Technologies aim to provide structured, machine readable representations of content Uni鍖ed frameworks for (meta)data RDF: Resource Description Framework Triple based representation of information OWL/SKOS: Ontologies & Vocabularies for content description Shared vocabularies plus de鍖nitional capabilities SPARQL A query language for RDF data A generic API 5
  • 6. Semantic Technologies RDF OWL/SKOS Triple Based Representation Shared Vocabularies for Common Data Model content description Identi鍖cation via URIs Facilitating interoperation and exchange Easy Integration Everybody talks the same Graph Merging language OWL allows for rich Query via SPARQL expressions and de鍖nitions A 鍖exible, generic API SKOS supports simpler thesauri/controlled vocabularies 6
  • 7. Linked Data A set of common principles for data publication 1. Use URIs for identi鍖cation 2. Use HTTP URIs (that will dereference) 3. Return useful information when dereferenced 4. Include links in that information Common infrastructure facilitates construction of applications. Use of content negotiation to supply appropriate representations 7
  • 8. Linked Data Resources MusicBrainz RDF conversions of MusicBrainz data Geonames Information about locations DBpedia Structured representation of Wikipedia content BBC Programme information, artist information 8
  • 9. Data mangling Download of etree metadata 鍖les Simple data conversion XML to RDF etree data model Alignments String matching plus bespoke methods for locations Explicit capture of alignments Publication Infrastructure fuseki server + pubby front end 9
  • 11. Data Alignment MusicBrainz Artist alignment via simple name queries Geographical Locations Query against Geonames Query against last.fm Combination of string matching and lat/long 11
  • 12. Layering Alignments are captured in an additional layer of data on top of the underlying source facts Preserving original metadata Allows clients to make their own judgements sameAs Preserves subjectivity Explicitly exposing the source of the mappings Use of Provenance vocabularies 12
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 29. Discussion So far entirely metadata based No processing of underlying audio Alignment is a little messy But has to be automated Dataset itself is an interesting artefact Contrasts with some other LD activities. Is this actually useful? Do artists really get a better reception when they play in their home town? 29
  • 30. The Future Better alignment Beyond simple string queries More alignment Adding in, e.g. MusicBrainz track/work resources Other collections? Modelling questions Characterising Alignments Audio Fingerprinting Identifying further track level matches Crowdsourcing corrections Extracting subcollections What would you want?? 30
  • 31. Thanks! Youve been a great audience! http://etree.linkedmusic.org 31