際際滷

際際滷Share a Scribd company logo
Querying 
Cultural Heritage Data 
Dr. Barry Norton, 
Development Manager, ResearchSpace* 
* Funded by the Andrew W. Mellon Foundation 
* Hosted by the Curatorial Directorate, British Museum
Statements and Patterns 
 For one edge in a graph: 
crm:P52_has_current_owner 
bm-obj:EOC3130 
bm-id:the-british-museum
Statements and Patterns 
 For one edge in a graph: 
crm:P52_has_current_owner 
bm-obj:EOC3130 
bm-id:the-british-museum 
 We can declare/retrieve one (N)Triple:
Statements and Patterns 
 For one edge in a graph: 
crm:P52_has_current_owner 
bm-obj:EOC3130 
bm-id:the-british-museum 
 We can declare/retrieve one (N)Triple: 
 Or write this in Turtle: 
@prefix crm: <http://erlangen-crm.org/current/> . 
@prefix bm-obj: <http://collection.britishmuseum.org/id/object/> . 
@prefix bm-id: <http://collection.britishmuseum.org/id/> . 
bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum .
Statements and Patterns 
 For one edge in a graph: 
crm:P52_has_current_owner 
bm-obj:EOC3130 
bm-id:the-british-museum 
 We can write this in Turtle: 
 And check for it in SPARQL: 
bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum . 
PREFIX crm: <http://erlangen-crm.org/current/> 
PREFIX bm-obj: <http://collection.britishmuseum.org/id/object/> 
PREFIX bm-id: <http://collection.britishmuseum.org/id/> 
ASK {bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum} 
true
Statements and Patterns 
 For a set of edges: 
bm-obj:EOC3130 
bm-id:the-british-museum 
? 
crm:P51_has_former_or_current_owner 
? 
 We can do the work on the client: 
 Or have the server do it by turning the 
triple into a triple pattern: 
bm-obj:EOC3130 crm:P51_has_former_or_current_owner ?owner
Exercise 
? 
Questions: 
 Why is the answer different? 
 Who are the two (other) one-time owners? 
?
Solutions & Exercises 
 Why is the answer different? 
 Reasoning, part of the work by the server 
(being a triplestore) means that if two things 
are related by crm:P52_has_current_owner 
then theyre related by 
crm:P51_has_former_or_current_owner 
 This is part of the work that the server 
(triplestore) can do for you 
 Exercise: query for the (strictly) former 
owners ? 
?
Solution 1/2 
 Using specific server functionality:
Solution 2/2 
 In pure SPARQL:
Solutions & Exercises 
Who are the two (other) one-time owners? 
 Since people and institutions (and places) are 
? 
? 
treated as are concepts, the names of the former 
owners are attached using skos:prefLabel 
 Exercise: if you didnt already, include the 
names in your query results
Solutions & Exercises 
If you didnt already, include the names in 
your query results: 
Question: 
Why are we back at two answers?
Answer 
 Answer: 
 Just as we can add triples together to make a 
graph in RDF, so we can add triple patterns 
together in SPARQL to make a graph pattern 
 By default all triple patterns must be matched, 
but we can use the OPTIONAL {} pattern to 
allow variation 
 Exercise: 
 Query for the owners and their names, if they 
exist* 
* N.B. this bug in the BM data will be fixed soon
Solution
Exercise 
 Take a look here: 
 Exercise: copy and run this query
CSV Exercise 
 Type: 
 Observe that one can now paste the query 
including line breaks* 
 Type: 
* N.B. for now you should first replace the "s with 's and 
change the one occurrence of ecrm: with crm: - well fix this 
* N.B. currently the query needs to be simplified as the BBC 
data is not loaded  this will be available soon
Data Analysis 
 One can import this CSV file into many 
tools: 
 A spreadsheet can be a good way to carry out 
basic visualisations 
 A scripting environment like (i)python/scipy or 
R can allow more analysis before 
visualisation, but: 
 both languages also have libraries to encapsulate 
interaction via SPARQL (rdflib/sparqlwrapper and 
SPARQL/RCurl respectively) 
 one should decide whether more analysis should 
first be carried out using SPARQL
Exercise 
 If you havent so far, click on one of the 
(HotW) 100 Objects (such as number 70, 
Hoa Hakananai'a Easter Island Statue) 
having run the main query 
 Choose a material and observe the query 
for other objects in this material 
 Adapt this query to count how many BM 
objects are made from basalt
Solution & Exercise 
 Exercise: Now count the top ten materials 
and the number of objects for each
Solution
A Last Word 
 SPARQLing a native RDF database 
(often called a triplestore) is not the only 
option before defaulting to programming 
 A native graph database indexes the 
graph in a different way, supporting 
traversal-oriented queries
Exercise 
Double click
Exercise 
Double click

More Related Content

Mon norton tut_querying cultural heritage data

  • 1. Querying Cultural Heritage Data Dr. Barry Norton, Development Manager, ResearchSpace* * Funded by the Andrew W. Mellon Foundation * Hosted by the Curatorial Directorate, British Museum
  • 2. Statements and Patterns For one edge in a graph: crm:P52_has_current_owner bm-obj:EOC3130 bm-id:the-british-museum
  • 3. Statements and Patterns For one edge in a graph: crm:P52_has_current_owner bm-obj:EOC3130 bm-id:the-british-museum We can declare/retrieve one (N)Triple:
  • 4. Statements and Patterns For one edge in a graph: crm:P52_has_current_owner bm-obj:EOC3130 bm-id:the-british-museum We can declare/retrieve one (N)Triple: Or write this in Turtle: @prefix crm: <http://erlangen-crm.org/current/> . @prefix bm-obj: <http://collection.britishmuseum.org/id/object/> . @prefix bm-id: <http://collection.britishmuseum.org/id/> . bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum .
  • 5. Statements and Patterns For one edge in a graph: crm:P52_has_current_owner bm-obj:EOC3130 bm-id:the-british-museum We can write this in Turtle: And check for it in SPARQL: bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum . PREFIX crm: <http://erlangen-crm.org/current/> PREFIX bm-obj: <http://collection.britishmuseum.org/id/object/> PREFIX bm-id: <http://collection.britishmuseum.org/id/> ASK {bm-obj:EOC3130 crm:P52_has_current_owner bm-id:the-british-museum} true
  • 6. Statements and Patterns For a set of edges: bm-obj:EOC3130 bm-id:the-british-museum ? crm:P51_has_former_or_current_owner ? We can do the work on the client: Or have the server do it by turning the triple into a triple pattern: bm-obj:EOC3130 crm:P51_has_former_or_current_owner ?owner
  • 7. Exercise ? Questions: Why is the answer different? Who are the two (other) one-time owners? ?
  • 8. Solutions & Exercises Why is the answer different? Reasoning, part of the work by the server (being a triplestore) means that if two things are related by crm:P52_has_current_owner then theyre related by crm:P51_has_former_or_current_owner This is part of the work that the server (triplestore) can do for you Exercise: query for the (strictly) former owners ? ?
  • 9. Solution 1/2 Using specific server functionality:
  • 10. Solution 2/2 In pure SPARQL:
  • 11. Solutions & Exercises Who are the two (other) one-time owners? Since people and institutions (and places) are ? ? treated as are concepts, the names of the former owners are attached using skos:prefLabel Exercise: if you didnt already, include the names in your query results
  • 12. Solutions & Exercises If you didnt already, include the names in your query results: Question: Why are we back at two answers?
  • 13. Answer Answer: Just as we can add triples together to make a graph in RDF, so we can add triple patterns together in SPARQL to make a graph pattern By default all triple patterns must be matched, but we can use the OPTIONAL {} pattern to allow variation Exercise: Query for the owners and their names, if they exist* * N.B. this bug in the BM data will be fixed soon
  • 15. Exercise Take a look here: Exercise: copy and run this query
  • 16. CSV Exercise Type: Observe that one can now paste the query including line breaks* Type: * N.B. for now you should first replace the "s with 's and change the one occurrence of ecrm: with crm: - well fix this * N.B. currently the query needs to be simplified as the BBC data is not loaded this will be available soon
  • 17. Data Analysis One can import this CSV file into many tools: A spreadsheet can be a good way to carry out basic visualisations A scripting environment like (i)python/scipy or R can allow more analysis before visualisation, but: both languages also have libraries to encapsulate interaction via SPARQL (rdflib/sparqlwrapper and SPARQL/RCurl respectively) one should decide whether more analysis should first be carried out using SPARQL
  • 18. Exercise If you havent so far, click on one of the (HotW) 100 Objects (such as number 70, Hoa Hakananai'a Easter Island Statue) having run the main query Choose a material and observe the query for other objects in this material Adapt this query to count how many BM objects are made from basalt
  • 19. Solution & Exercise Exercise: Now count the top ten materials and the number of objects for each
  • 21. A Last Word SPARQLing a native RDF database (often called a triplestore) is not the only option before defaulting to programming A native graph database indexes the graph in a different way, supporting traversal-oriented queries