際際滷

際際滷Share a Scribd company logo
Wikidata
for
Libraries and Archives
Emw
Exploring Wikidata and the Semantic Web for Libraries
Metropolitan New York Library Council
2015-03-16
Wikidata is a free linked database that can be
read and edited by
humans and machines.
Wikidata's goals
 Centralize interwiki links
 Centralize Wikipedia infoboxes
 Provide an interface for rich queries
 Structure the sum of all human knowledge
What you'll learn from this talk
 How to edit Wikidata
 Authority control properties
 Ontology
 Wikidata vocabulary
 Querying and other tools
 Wikidata & Commons plans
Elements of a Wikidata statement
Example: New York City (Q60)
Items and properties

Each item and property has its own page
 Items
 Represent subjects: Douglas Adams, Challenger disaster
 Have identifiers like Q42, Q921090
13,745,153 items
 Properties
 Represent attribute names: occupation, cause of
 Have identifiers like P106, P828
 1,429 properties
Statements and claims
 Claims
 Claims are triplets
 Formally: subject, predicate, object
 In Wikidata: item, property, value
 Example: Douglas Adams, occupation, author
 Statements
 A claim is only part of a statement
 Statements also include:
 References
 Ranks
Qualifiers, ranks, references
 Qualifiers
 Qualifiers are properties used on claims rather than items
 Yonkers population 12,733 at time (P585) 1860
 Ranks
 Preferred, normal, deprecated
 Useful to mark outdated claims
 References
 Source of claim; provenance
 ... stated in (P248) 1860 United States Census
More on Wikidata vocabulary
https://www.wikidata.org/wiki/Wikidata:Glossary
Wikipedia articles have a Wikidata item link in the
left navigation panel.
Wikidata link on Wikipedia
Getting to Wikidata from Wikipedia
Wikidata's instant search suggests items that have
labels or aliases matching your keyword.
Wikidata search
Search by label
Search by alias: flu -> influenza
Finding properties
 Is there a property for number of windows?
 What was the ID of that property, again?
 Search
 In main site search box, prefix search term with P:
 P:number of, P:occupation
 Instant search doesn't work for properties, only items
 Browse
 https://www.wikidata.org/wiki/Wikidata:List_of_properties
^ bookmark this!
Let's edit Wikidata
Waldseemuller map
https://www.wikidata.org/wiki/Q195801
 instance of (P31): map
 creator (P170): Martin Waldseem端ller
 publication (P577): April 1507
 language of work (P407): Latin
 depicts (P180): Americas, Africa,
Europe, Asia, Pacific Ocean
 has part (P527): woodcut
 location (P276): Library of Congress
Tools
 Querying: Autolist, by Magnus Manske
 http://tools.wmflabs.org/autolist/autolist1.html
 Batch editing: Widar, by Magnus Manske
 https://tools.wmflabs.org/autolist/
 Software framework: Wikidata Toolkit, by Markus Kroetzsch et al.
 https://www.mediawiki.org/wiki/Wikidata_Toolkit
 https://github.com/Wikidata/Wikidata-Toolkit
Querying in Wikidata
Use case: Get a list of history books
Pseudo-query:
instance of: book AND genre: history
instance of: P31
book: Q571
genre: P136
history: Q309
Wikidata query in Autolist:
claim[31:571] AND claim[136:309]
http://tools.wmflabs.org/autolist/autolist1.html?q=claim[31:571]%20AND%20claim[136:309]
Authority control on Wikidata
 Identifier properties link Wikidata items to entities in
external databases
 Also known as authority control numbers
 Examples:
 VIAF identifier (P214)
 Freebase identifier (P646)
 MusicBrainz artist ID (P434)
Identifier properties on Wikidata
Top 10 ID properties on Wikidata as of 2015-03-15
1) Freebase identifier (P646) 1,153,786
2) China administrative division code (P442) 742,240
3) VIAF identifier (P214) 527,786
4) GeoNames ID (P1566) 490,444
5) GND identifier (P227) 345,247
6) ITIS TSN (P815) 334,053
7) PlantList-ID (P1070) 331,179
8) Tropicos taxon name identifier (P960) 318,719
9) IMDb identifier (P345) 286,637
10) IPNI taxon name identifier (P961) 276,515
https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/Top100
Identifier properties on WikidataIdentifier properties on Wikidata
Ontology
 A specification of a conceptualization
 Rooted in long-standing philosophical traditions
 Useful for:
 building concept hierarchies
 making common knowledge computable
Tree of Porphyry
User:VoiceOfTheCommons, CC-BY-SA 3.0
Classes and instances
 Plato is a human is a animal
 Plato instance of human subclass of animal
 Instance: concrete object, individual
 Class: abstract object
Concept hierarchy: painting
http://tools.wmflabs.org/wikidata-todo/tree.html?q=3305213&rp=279&lang=en
Concept tree diagram: painting
http://tools.wmflabs.org/wikidata-todo/tree.html?q=3305213&rp=279&lang=en&method=d3
Ontology on Wikidata
 instance of (P31)
 rdf:type in RDF and OWL
 12,692,181 usages
 Most popular Wikidata property
 subclass of (P279)
 all instances of A are also instances of B
 rdfs:subClassOf in RDF and OWL
 190,095 usages
Ontology on Wikidata
 Last but not least: part of (P361)
 Third basic membership property
 Top-level part-whole relation
 Instance of, subclass of and part of are all transitive
 Transitive relation :
A subclass of B
B subclass of C
:. A subclass of C
https://www.wikidata.org/wiki/Help:Basic_membership_properties
subproperty of (P1647)
 How to link author (P50) and creator (P170)
 subproperty of (P1647)
 author subproperty of creator
 Wikidata supports claims about properties
 subproperty of (P1647) has semantics of
rdfs:subPropertyOf
Structured data for images
 Use case:
 A library, archive or other cultural institution wants to
make its images discoverable on Wikidata
 Structured Data on Wikimedia Commons
 Project of the Wikimedia Foundation (WMF)
Structured Data
for Wikimedia Commons
 Purpose
 use structured data for all media files on Wikimedia sites
 make it easier for users to read, translate and edit file
information
 enable developers to build better tools, both to
contribute and reuse Commons content.
https://commons.wikimedia.org/wiki/Commons:Structured_data/Overview
Structured Data for Commons: FAQ
 How long will it take?
Completion will take years, but project will be released in stages.
Forward-facing part of project is on hold.
 Is file meta data staying on Commons?
Yes
 How will file information be structured?
To be determined
 Will Commons still have templates?
Yes
 Will Commons still have categories?
Yes
Wikidata for Commons
Partial support already exists!
Wikidata links to Commons
Commons links to Wikidata
Library ontology on Wikidata
 What is a book?
 Different levels of existence (FRBR)
 Work
 Expression - e.g., translation for a language
 Manifestation - edition, thing with an ISBN
 Item - physical copy
FRBR example
 Work
 Bible (Q1845)
 Expression
 Latin translation of the Bible
 Manifestation
 Vulgate (Q131175): a particular Latin translation
 Item
 Gutenberg Bible (Q158075): a physical book in New York Public Library(!)
Linking FRBR levels in Wikidata
Done by the property edition or translation of (P629)
 Gutenberg Bible (item)
 edition or translation of: Vulgate
 Vulgate (manifestation)
 edition or translation of: Bible
 Bible (work)
Trouble: what is an instance?
 Thing with a unique location in space and time
 Bible instance of religious text: incorrect?
 Gutenberg Bible instance of religious text: all agree
 Solution: Bible as information artifact
Information Artifact Ontology
 IAO sets forth a basic ontological distinction:
 Entities in the world
 Actual people: Grace Hopper, Barack Obama
 That very old tree
 Gutenberg Bible
 Information artifacts
 Entities about something in reality
 Concretized by some information bearer
Information Artifact Ontology
 Addresses problems in Dublin Core Metadata Initiative
(DCMI)
 Smith (2014)
http://ncor.buffalo.edu/2014/IAOW/IAO-Tutorial-Smith-Rio-Sep-2014.pptx
 Rooted in Basic Formal Ontology (BFO), which is used in
many scienitific ontologies
 Smith et al. (2007). Nature Biotechnology.
http://www.nature.com/nbt/journal/v25/n11/full/nbt1346.html
Summary
 Wikidata is a free knowledge base that anyone can edit
 Search properties by p:search term
 Wikidata items link to external databases through
identifier properties, e.g. VIAF identifier (P214)
 Wikidata supports FRBR and other ontologies
Thank you!
https://www.wikidata.org/wiki/User:Emw

More Related Content

Wikidata for libraries and archives

  • 1. Wikidata for Libraries and Archives Emw Exploring Wikidata and the Semantic Web for Libraries Metropolitan New York Library Council 2015-03-16
  • 2. Wikidata is a free linked database that can be read and edited by humans and machines.
  • 3. Wikidata's goals Centralize interwiki links Centralize Wikipedia infoboxes Provide an interface for rich queries Structure the sum of all human knowledge
  • 4. What you'll learn from this talk How to edit Wikidata Authority control properties Ontology Wikidata vocabulary Querying and other tools Wikidata & Commons plans
  • 5. Elements of a Wikidata statement
  • 6. Example: New York City (Q60)
  • 7. Items and properties Each item and property has its own page Items Represent subjects: Douglas Adams, Challenger disaster Have identifiers like Q42, Q921090 13,745,153 items Properties Represent attribute names: occupation, cause of Have identifiers like P106, P828 1,429 properties
  • 8. Statements and claims Claims Claims are triplets Formally: subject, predicate, object In Wikidata: item, property, value Example: Douglas Adams, occupation, author Statements A claim is only part of a statement Statements also include: References Ranks
  • 9. Qualifiers, ranks, references Qualifiers Qualifiers are properties used on claims rather than items Yonkers population 12,733 at time (P585) 1860 Ranks Preferred, normal, deprecated Useful to mark outdated claims References Source of claim; provenance ... stated in (P248) 1860 United States Census
  • 10. More on Wikidata vocabulary https://www.wikidata.org/wiki/Wikidata:Glossary
  • 11. Wikipedia articles have a Wikidata item link in the left navigation panel. Wikidata link on Wikipedia
  • 12. Getting to Wikidata from Wikipedia
  • 13. Wikidata's instant search suggests items that have labels or aliases matching your keyword. Wikidata search
  • 15. Search by alias: flu -> influenza
  • 16. Finding properties Is there a property for number of windows? What was the ID of that property, again? Search In main site search box, prefix search term with P: P:number of, P:occupation Instant search doesn't work for properties, only items Browse https://www.wikidata.org/wiki/Wikidata:List_of_properties ^ bookmark this!
  • 18. Waldseemuller map https://www.wikidata.org/wiki/Q195801 instance of (P31): map creator (P170): Martin Waldseem端ller publication (P577): April 1507 language of work (P407): Latin depicts (P180): Americas, Africa, Europe, Asia, Pacific Ocean has part (P527): woodcut location (P276): Library of Congress
  • 19. Tools Querying: Autolist, by Magnus Manske http://tools.wmflabs.org/autolist/autolist1.html Batch editing: Widar, by Magnus Manske https://tools.wmflabs.org/autolist/ Software framework: Wikidata Toolkit, by Markus Kroetzsch et al. https://www.mediawiki.org/wiki/Wikidata_Toolkit https://github.com/Wikidata/Wikidata-Toolkit
  • 20. Querying in Wikidata Use case: Get a list of history books Pseudo-query: instance of: book AND genre: history instance of: P31 book: Q571 genre: P136 history: Q309 Wikidata query in Autolist: claim[31:571] AND claim[136:309]
  • 22. Authority control on Wikidata Identifier properties link Wikidata items to entities in external databases Also known as authority control numbers Examples: VIAF identifier (P214) Freebase identifier (P646) MusicBrainz artist ID (P434)
  • 23. Identifier properties on Wikidata Top 10 ID properties on Wikidata as of 2015-03-15 1) Freebase identifier (P646) 1,153,786 2) China administrative division code (P442) 742,240 3) VIAF identifier (P214) 527,786 4) GeoNames ID (P1566) 490,444 5) GND identifier (P227) 345,247 6) ITIS TSN (P815) 334,053 7) PlantList-ID (P1070) 331,179 8) Tropicos taxon name identifier (P960) 318,719 9) IMDb identifier (P345) 286,637 10) IPNI taxon name identifier (P961) 276,515 https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/Top100 Identifier properties on WikidataIdentifier properties on Wikidata
  • 24. Ontology A specification of a conceptualization Rooted in long-standing philosophical traditions Useful for: building concept hierarchies making common knowledge computable
  • 26. Classes and instances Plato is a human is a animal Plato instance of human subclass of animal Instance: concrete object, individual Class: abstract object
  • 28. Concept tree diagram: painting http://tools.wmflabs.org/wikidata-todo/tree.html?q=3305213&rp=279&lang=en&method=d3
  • 29. Ontology on Wikidata instance of (P31) rdf:type in RDF and OWL 12,692,181 usages Most popular Wikidata property subclass of (P279) all instances of A are also instances of B rdfs:subClassOf in RDF and OWL 190,095 usages
  • 30. Ontology on Wikidata Last but not least: part of (P361) Third basic membership property Top-level part-whole relation Instance of, subclass of and part of are all transitive Transitive relation : A subclass of B B subclass of C :. A subclass of C https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 31. subproperty of (P1647) How to link author (P50) and creator (P170) subproperty of (P1647) author subproperty of creator Wikidata supports claims about properties subproperty of (P1647) has semantics of rdfs:subPropertyOf
  • 32. Structured data for images Use case: A library, archive or other cultural institution wants to make its images discoverable on Wikidata Structured Data on Wikimedia Commons Project of the Wikimedia Foundation (WMF)
  • 33. Structured Data for Wikimedia Commons Purpose use structured data for all media files on Wikimedia sites make it easier for users to read, translate and edit file information enable developers to build better tools, both to contribute and reuse Commons content. https://commons.wikimedia.org/wiki/Commons:Structured_data/Overview
  • 34. Structured Data for Commons: FAQ How long will it take? Completion will take years, but project will be released in stages. Forward-facing part of project is on hold. Is file meta data staying on Commons? Yes How will file information be structured? To be determined Will Commons still have templates? Yes Will Commons still have categories? Yes
  • 35. Wikidata for Commons Partial support already exists!
  • 36. Wikidata links to Commons
  • 37. Commons links to Wikidata
  • 38. Library ontology on Wikidata What is a book? Different levels of existence (FRBR) Work Expression - e.g., translation for a language Manifestation - edition, thing with an ISBN Item - physical copy
  • 39. FRBR example Work Bible (Q1845) Expression Latin translation of the Bible Manifestation Vulgate (Q131175): a particular Latin translation Item Gutenberg Bible (Q158075): a physical book in New York Public Library(!)
  • 40. Linking FRBR levels in Wikidata Done by the property edition or translation of (P629) Gutenberg Bible (item) edition or translation of: Vulgate Vulgate (manifestation) edition or translation of: Bible Bible (work)
  • 41. Trouble: what is an instance? Thing with a unique location in space and time Bible instance of religious text: incorrect? Gutenberg Bible instance of religious text: all agree Solution: Bible as information artifact
  • 42. Information Artifact Ontology IAO sets forth a basic ontological distinction: Entities in the world Actual people: Grace Hopper, Barack Obama That very old tree Gutenberg Bible Information artifacts Entities about something in reality Concretized by some information bearer
  • 43. Information Artifact Ontology Addresses problems in Dublin Core Metadata Initiative (DCMI) Smith (2014) http://ncor.buffalo.edu/2014/IAOW/IAO-Tutorial-Smith-Rio-Sep-2014.pptx Rooted in Basic Formal Ontology (BFO), which is used in many scienitific ontologies Smith et al. (2007). Nature Biotechnology. http://www.nature.com/nbt/journal/v25/n11/full/nbt1346.html
  • 44. Summary Wikidata is a free knowledge base that anyone can edit Search properties by p:search term Wikidata items link to external databases through identifier properties, e.g. VIAF identifier (P214) Wikidata supports FRBR and other ontologies