A presentation on Wikidata for libraries and archives delivered on March 16, 2015 to the Metropolitan New York Library Council.
Contains minor edits and corrections from version presented.
Released under CC0.
2. Wikidata is a free linked database that can be
read and edited by
humans and machines.
3. Wikidata's goals
Centralize interwiki links
Centralize Wikipedia infoboxes
Provide an interface for rich queries
Structure the sum of all human knowledge
4. What you'll learn from this talk
How to edit Wikidata
Authority control properties
Ontology
Wikidata vocabulary
Querying and other tools
Wikidata & Commons plans
7. Items and properties
Each item and property has its own page
Items
Represent subjects: Douglas Adams, Challenger disaster
Have identifiers like Q42, Q921090
13,745,153 items
Properties
Represent attribute names: occupation, cause of
Have identifiers like P106, P828
1,429 properties
8. Statements and claims
Claims
Claims are triplets
Formally: subject, predicate, object
In Wikidata: item, property, value
Example: Douglas Adams, occupation, author
Statements
A claim is only part of a statement
Statements also include:
References
Ranks
9. Qualifiers, ranks, references
Qualifiers
Qualifiers are properties used on claims rather than items
Yonkers population 12,733 at time (P585) 1860
Ranks
Preferred, normal, deprecated
Useful to mark outdated claims
References
Source of claim; provenance
... stated in (P248) 1860 United States Census
10. More on Wikidata vocabulary
https://www.wikidata.org/wiki/Wikidata:Glossary
11. Wikipedia articles have a Wikidata item link in the
left navigation panel.
Wikidata link on Wikipedia
16. Finding properties
Is there a property for number of windows?
What was the ID of that property, again?
Search
In main site search box, prefix search term with P:
P:number of, P:occupation
Instant search doesn't work for properties, only items
Browse
https://www.wikidata.org/wiki/Wikidata:List_of_properties
^ bookmark this!
18. Waldseemuller map
https://www.wikidata.org/wiki/Q195801
instance of (P31): map
creator (P170): Martin Waldseem端ller
publication (P577): April 1507
language of work (P407): Latin
depicts (P180): Americas, Africa,
Europe, Asia, Pacific Ocean
has part (P527): woodcut
location (P276): Library of Congress
19. Tools
Querying: Autolist, by Magnus Manske
http://tools.wmflabs.org/autolist/autolist1.html
Batch editing: Widar, by Magnus Manske
https://tools.wmflabs.org/autolist/
Software framework: Wikidata Toolkit, by Markus Kroetzsch et al.
https://www.mediawiki.org/wiki/Wikidata_Toolkit
https://github.com/Wikidata/Wikidata-Toolkit
20. Querying in Wikidata
Use case: Get a list of history books
Pseudo-query:
instance of: book AND genre: history
instance of: P31
book: Q571
genre: P136
history: Q309
Wikidata query in Autolist:
claim[31:571] AND claim[136:309]
22. Authority control on Wikidata
Identifier properties link Wikidata items to entities in
external databases
Also known as authority control numbers
Examples:
VIAF identifier (P214)
Freebase identifier (P646)
MusicBrainz artist ID (P434)
23. Identifier properties on Wikidata
Top 10 ID properties on Wikidata as of 2015-03-15
1) Freebase identifier (P646) 1,153,786
2) China administrative division code (P442) 742,240
3) VIAF identifier (P214) 527,786
4) GeoNames ID (P1566) 490,444
5) GND identifier (P227) 345,247
6) ITIS TSN (P815) 334,053
7) PlantList-ID (P1070) 331,179
8) Tropicos taxon name identifier (P960) 318,719
9) IMDb identifier (P345) 286,637
10) IPNI taxon name identifier (P961) 276,515
https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/Top100
Identifier properties on WikidataIdentifier properties on Wikidata
24. Ontology
A specification of a conceptualization
Rooted in long-standing philosophical traditions
Useful for:
building concept hierarchies
making common knowledge computable
26. Classes and instances
Plato is a human is a animal
Plato instance of human subclass of animal
Instance: concrete object, individual
Class: abstract object
28. Concept tree diagram: painting
http://tools.wmflabs.org/wikidata-todo/tree.html?q=3305213&rp=279&lang=en&method=d3
29. Ontology on Wikidata
instance of (P31)
rdf:type in RDF and OWL
12,692,181 usages
Most popular Wikidata property
subclass of (P279)
all instances of A are also instances of B
rdfs:subClassOf in RDF and OWL
190,095 usages
30. Ontology on Wikidata
Last but not least: part of (P361)
Third basic membership property
Top-level part-whole relation
Instance of, subclass of and part of are all transitive
Transitive relation :
A subclass of B
B subclass of C
:. A subclass of C
https://www.wikidata.org/wiki/Help:Basic_membership_properties
31. subproperty of (P1647)
How to link author (P50) and creator (P170)
subproperty of (P1647)
author subproperty of creator
Wikidata supports claims about properties
subproperty of (P1647) has semantics of
rdfs:subPropertyOf
32. Structured data for images
Use case:
A library, archive or other cultural institution wants to
make its images discoverable on Wikidata
Structured Data on Wikimedia Commons
Project of the Wikimedia Foundation (WMF)
33. Structured Data
for Wikimedia Commons
Purpose
use structured data for all media files on Wikimedia sites
make it easier for users to read, translate and edit file
information
enable developers to build better tools, both to
contribute and reuse Commons content.
https://commons.wikimedia.org/wiki/Commons:Structured_data/Overview
34. Structured Data for Commons: FAQ
How long will it take?
Completion will take years, but project will be released in stages.
Forward-facing part of project is on hold.
Is file meta data staying on Commons?
Yes
How will file information be structured?
To be determined
Will Commons still have templates?
Yes
Will Commons still have categories?
Yes
38. Library ontology on Wikidata
What is a book?
Different levels of existence (FRBR)
Work
Expression - e.g., translation for a language
Manifestation - edition, thing with an ISBN
Item - physical copy
39. FRBR example
Work
Bible (Q1845)
Expression
Latin translation of the Bible
Manifestation
Vulgate (Q131175): a particular Latin translation
Item
Gutenberg Bible (Q158075): a physical book in New York Public Library(!)
40. Linking FRBR levels in Wikidata
Done by the property edition or translation of (P629)
Gutenberg Bible (item)
edition or translation of: Vulgate
Vulgate (manifestation)
edition or translation of: Bible
Bible (work)
41. Trouble: what is an instance?
Thing with a unique location in space and time
Bible instance of religious text: incorrect?
Gutenberg Bible instance of religious text: all agree
Solution: Bible as information artifact
42. Information Artifact Ontology
IAO sets forth a basic ontological distinction:
Entities in the world
Actual people: Grace Hopper, Barack Obama
That very old tree
Gutenberg Bible
Information artifacts
Entities about something in reality
Concretized by some information bearer
43. Information Artifact Ontology
Addresses problems in Dublin Core Metadata Initiative
(DCMI)
Smith (2014)
http://ncor.buffalo.edu/2014/IAOW/IAO-Tutorial-Smith-Rio-Sep-2014.pptx
Rooted in Basic Formal Ontology (BFO), which is used in
many scienitific ontologies
Smith et al. (2007). Nature Biotechnology.
http://www.nature.com/nbt/journal/v25/n11/full/nbt1346.html
44. Summary
Wikidata is a free knowledge base that anyone can edit
Search properties by p:search term
Wikidata items link to external databases through
identifier properties, e.g. VIAF identifier (P214)
Wikidata supports FRBR and other ontologies