際際滷

際際滷Share a Scribd company logo
The Research and Education Space
a pathway to bring our cultural heritage
(including the BBC archive) to life
Dr Chiara Del Vescovo
Data Architect at BBC
Vision
Web-like
Web-based
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Reliable,
provably
cleared for
education
Vision
Web-like
Web-based
Interlinking
heterogenous
resources
Capturing
semantic
interrelations
Reliable,
provably
cleared for
education
Linked Open Data
A pathway
users
BL
BM
BFI
Tate
V&A

BBC
A pathway
users
BL
BM
BFI
Tate
V&A

BBC
?
usersdevelopers
A pathway
BL
BM
BFI
Tate
V&A

BBC
usersdevelopers
A pathway
BL
BM
BFI
Tate
V&A

BBC
aggregating
platform
RES (BBC, Jisc, BUFVC)
Core Platform: Acropolis
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
usersdevelopers
Acropolis
(index!)
BL
BM
BFI
Tate
V&A

BBC
RES (BBC, Jisc, BUFVC)
Core Platform: Acropolis
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
informed by
usersdevelopers
Acropolis
(index!)
planned pilots
BL
BM
BFI
Tate
V&A

BBC
AcropolisCore Platform: Acropolis
1
The crawler fetches data
sources. Once retrieved
store and passed to the
2
The results of the aggre
are stored in the aggreg
browse information and
entities.
3
The public face of the c
browsing interface (whi
to aid application develo
4
Applications may use th
aggregated entities, and
data.
5
Each component emplo
For example, we can ma
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
storebeta.acropolis.org.uk
Acropolis
Acropolis
Acropolis
Acropolis
Core Platform: Acropolis
Project RES: Technical Approach
1
The crawler fetches data via HTTP from published
sources. Once retrieved, it is indexed by the full-text
store and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation process
are stored in the aggregate store, which contains minimal
browse information and information about the similarity of
entities.
3
The public face of the core platform is an extremely basic
browsing interface (which presents the data in tabular form
to aid application developers), and read-write RESTful APIs.
4
Applications may use the APIs to locate information about
aggregated entities, and also to store annotations and activity
data.
5
Each component employs standard protocols and formats.
For example, we can make use of any capable quad-store
as our aggregate store.
Linked
data
crawler
Anansi Aggregation
engine
Spindle
Full-text
store
Aggregate
store
Minimal browse
interface &
APIs
Quilt
Activity
store
informed by
usersdevelopersAcropolis
What I do
(with my colleague Alex)
planned pilots
BL
BM
BFI
Tate
V&A

BBC
What I do
(with my colleague Alex)
BL
BM
BFI
Tate
V&A

BBC
What I do
(with my colleague Alex)
1.devise a publishing scheme to
determine URIs
2.translate original metadata into RDF
3.links discovery and reconciliation with
hubs (e.g., LoC, Geonames,
DBPedia)
4.make the existing schema explicit as
a local ontology
5.matching the ontology onto well-
established ontologies (e.g., DCMI,
FOAF, SKOS, CIDOC-CRM)
6.advice on how to express machine-
readable licenses, for both resources
and metadata
7.technical support to publish LOD
BL
BM
BFI
Tate
V&A

BBC
DBPedialite
DBPedialite
DBPedialite
British Museum
British Museum
British Museum
DBPedia
DBPedia
 Europeana
 general Data Model (EDM)
 collection holders responsible to 鍖t their
resources and metadata in EDM
Europeana
 Europeana
 general Data Model (EDM)
 collection holders responsible to 鍖t their
resources and metadata in EDM
Europeana
British Library
Extreme cases
Challenges
Stakeholders go quiet!
1. Which metadata?
 Currently, resources metadata mostly oriented
towards physical proximity
i.e., indexes re鍖ect similarity of authors surname, broad
subject, format, media, etc.
 Heterogeneous platforms and data models
incompatibility, transformations needed
 Even when RDF is used, theres a proliferation of
terms, vocabularies, formats adopted
little (if any) validation
2. Linking
 Systems that do not use RDF do not allow
collection holders to express their knowledge as
they wish
underspeci鍖ed knowledge
 Even when RDF is used, information often provided
as literals rather than links to URIs
ad hoc solutions unavailable in a machine-readable format
3. Usability
 Reliability
 Lack of tools
developers have little contact with collection holders
 Licensing issues
resources licensing (not always explicit)
metadata licensing
users need to be aware of what that mean
(note that in educations things are slightly easier - blanket
licensing etc.)
Interested?
 get in touch!
 chiara.delvescovo@bbc.co.uk
 alex.tucker@bbc.co.uk
 new advertised position as
Junior Data Architect
careershub.bbc.co.uk

More Related Content

Documents, services, and data on the web

  • 1. The Research and Education Space a pathway to bring our cultural heritage (including the BBC archive) to life Dr Chiara Del Vescovo Data Architect at BBC
  • 11. RES (BBC, Jisc, BUFVC) Core Platform: Acropolis Project RES: Technical Approach 1 The crawler fetches data via HTTP from published sources. Once retrieved, it is indexed by the full-text store and passed to the aggregation engine for evaluation. 2 The results of the aggregation engine's evaluation process are stored in the aggregate store, which contains minimal browse information and information about the similarity of entities. 3 The public face of the core platform is an extremely basic browsing interface (which presents the data in tabular form to aid application developers), and read-write RESTful APIs. 4 Applications may use the APIs to locate information about aggregated entities, and also to store annotations and activity data. 5 Each component employs standard protocols and formats. For example, we can make use of any capable quad-store as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity store usersdevelopers Acropolis (index!) BL BM BFI Tate V&A BBC
  • 12. RES (BBC, Jisc, BUFVC) Core Platform: Acropolis Project RES: Technical Approach 1 The crawler fetches data via HTTP from published sources. Once retrieved, it is indexed by the full-text store and passed to the aggregation engine for evaluation. 2 The results of the aggregation engine's evaluation process are stored in the aggregate store, which contains minimal browse information and information about the similarity of entities. 3 The public face of the core platform is an extremely basic browsing interface (which presents the data in tabular form to aid application developers), and read-write RESTful APIs. 4 Applications may use the APIs to locate information about aggregated entities, and also to store annotations and activity data. 5 Each component employs standard protocols and formats. For example, we can make use of any capable quad-store as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity store informed by usersdevelopers Acropolis (index!) planned pilots BL BM BFI Tate V&A BBC
  • 13. AcropolisCore Platform: Acropolis 1 The crawler fetches data sources. Once retrieved store and passed to the 2 The results of the aggre are stored in the aggreg browse information and entities. 3 The public face of the c browsing interface (whi to aid application develo 4 Applications may use th aggregated entities, and data. 5 Each component emplo For example, we can ma as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity storebeta.acropolis.org.uk
  • 18. Core Platform: Acropolis Project RES: Technical Approach 1 The crawler fetches data via HTTP from published sources. Once retrieved, it is indexed by the full-text store and passed to the aggregation engine for evaluation. 2 The results of the aggregation engine's evaluation process are stored in the aggregate store, which contains minimal browse information and information about the similarity of entities. 3 The public face of the core platform is an extremely basic browsing interface (which presents the data in tabular form to aid application developers), and read-write RESTful APIs. 4 Applications may use the APIs to locate information about aggregated entities, and also to store annotations and activity data. 5 Each component employs standard protocols and formats. For example, we can make use of any capable quad-store as our aggregate store. Linked data crawler Anansi Aggregation engine Spindle Full-text store Aggregate store Minimal browse interface & APIs Quilt Activity store informed by usersdevelopersAcropolis What I do (with my colleague Alex) planned pilots BL BM BFI Tate V&A BBC
  • 19. What I do (with my colleague Alex) BL BM BFI Tate V&A BBC
  • 20. What I do (with my colleague Alex) 1.devise a publishing scheme to determine URIs 2.translate original metadata into RDF 3.links discovery and reconciliation with hubs (e.g., LoC, Geonames, DBPedia) 4.make the existing schema explicit as a local ontology 5.matching the ontology onto well- established ontologies (e.g., DCMI, FOAF, SKOS, CIDOC-CRM) 6.advice on how to express machine- readable licenses, for both resources and metadata 7.technical support to publish LOD BL BM BFI Tate V&A BBC
  • 29. Europeana general Data Model (EDM) collection holders responsible to 鍖t their resources and metadata in EDM Europeana
  • 30. Europeana general Data Model (EDM) collection holders responsible to 鍖t their resources and metadata in EDM Europeana
  • 34. 1. Which metadata? Currently, resources metadata mostly oriented towards physical proximity i.e., indexes re鍖ect similarity of authors surname, broad subject, format, media, etc. Heterogeneous platforms and data models incompatibility, transformations needed Even when RDF is used, theres a proliferation of terms, vocabularies, formats adopted little (if any) validation
  • 35. 2. Linking Systems that do not use RDF do not allow collection holders to express their knowledge as they wish underspeci鍖ed knowledge Even when RDF is used, information often provided as literals rather than links to URIs ad hoc solutions unavailable in a machine-readable format
  • 36. 3. Usability Reliability Lack of tools developers have little contact with collection holders Licensing issues resources licensing (not always explicit) metadata licensing users need to be aware of what that mean (note that in educations things are slightly easier - blanket licensing etc.)
  • 37. Interested? get in touch! chiara.delvescovo@bbc.co.uk alex.tucker@bbc.co.uk new advertised position as Junior Data Architect careershub.bbc.co.uk