際際滷

際際滷Share a Scribd company logo
What are the Drivers?
and how do we intend to meaningfully respond to them?
David P. Shorthouse
Canadian Museum of Nature
Agriculture & Agri-Food Canada (April 1)
Is it possible that the lack of recognition in the academic assessment
system of these forms of productivity has contributed to the diminished
statusindeed even the near disappearance from many academic
departmentsof traditional systematics
Collecting
Curating
Identifying
Naming
Natural History Museums Desperately Want
Brand Awareness
Meaningful Measures of Impact
trust in an aggregator is not just a feature of the data signal quality provided by the
sources to the aggregator, but also a consequence of the social design of the
aggregation process and the resulting power balance between individual data
contributors and aggregators.
How Do We Fix This?
recognition for taxonomist
recognition for host institution
recognition for taxonomists
institution
Fully automated
Quantifiable
Ingredients to Make This Happen
Newly digitized specimen
IRI identifiedBy
http://rs.tdwg.org/dwc/iri/identifiedBy
https://orcid.org/0000-0001-9144-2848
institutionCode
ORCID: ringgold, GRID
dateIdentified
ORCID: employment/education
start/end date
GRBIO ?
not sameAs
Are There Other Drivers?
Newly digitized specimen
IRI identifiedBy
http://rs.tdwg.org/dwc/iri/identifiedBy
https://orcid.org/0000-0001-9144-2848
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
https://bloodhound-tracker.net
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
Shorthouse - Authority Management of People Names Workshop
For the DeceasedFor the Living
Cautionary Tale
Retrospective & Prospective Approaches
Shorthouse - Authority Management of People Names Workshop
Retrospective ApproachLayers of Dirt
 Strings to things
 Parsing, eg ruby gems Namae, DwcAgent
 Entity extraction, eg Rosette, https://www.rosette.com/, Watson Natural
Language
 Similarity scoring, eg R.D.M. Page <=> Roderic Page <=> Roderic D.M. Page
 Search logic
 Disambiguation
 Co-author, co-collector networks
 Collector codes
 Hand-crafted heuristics, eg birth/death/collection dates, taxa, places
Prospective ApproachClean Dirt
RDA / TDWG Metadata Standards for
attribution of physical and digital collections stewardship
Chairs: Anne Thessen, Matt Woodburn, Dimitris Koureas
Final Recommendations: https://github.com/tdwg/attribution/blob/master/RDA_recommendations.md
Shorthouse - Authority Management of People Names Workshop
What Actions Do We Care About?
 authored
 borrowed
 catalogued
 collected
 conserved
 contributed
 created
 curated
 
 georeferenced
 reviewed
https://github.com/tdwg/attribution/issues/5
Wishlist
 Test suite for parsing lists of names: text file with expectedJSON response
Charles R. Darwin Esq.
[{ family: Darwin, given:Charles R., title:Esq.}]
leg. A. Chuvilin
[{family:Chivilin,given:A.}]
N. Navarro, G. G坦mez y A Ferreira
[{family:Navarro,given:N.},
{family:G坦mez, given:G.},
{family:Ferreira, given:A}]}
Wishlist
 Common, consistent way to handle search
 Elasticsearch, Solr plugin
 Services
 Input: raw string of name(s), optional parameters
 Output: parsed name, identifiers, likelihood score
 Actions for inclusion in a DwC extension

More Related Content

Shorthouse - Authority Management of People Names Workshop

  • 1. What are the Drivers? and how do we intend to meaningfully respond to them? David P. Shorthouse Canadian Museum of Nature Agriculture & Agri-Food Canada (April 1)
  • 2. Is it possible that the lack of recognition in the academic assessment system of these forms of productivity has contributed to the diminished statusindeed even the near disappearance from many academic departmentsof traditional systematics Collecting Curating Identifying Naming
  • 3. Natural History Museums Desperately Want Brand Awareness Meaningful Measures of Impact
  • 4. trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators.
  • 5. How Do We Fix This?
  • 6. recognition for taxonomist recognition for host institution recognition for taxonomists institution Fully automated Quantifiable
  • 7. Ingredients to Make This Happen Newly digitized specimen IRI identifiedBy http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9144-2848 institutionCode ORCID: ringgold, GRID dateIdentified ORCID: employment/education start/end date GRBIO ? not sameAs
  • 8. Are There Other Drivers?
  • 9. Newly digitized specimen IRI identifiedBy http://rs.tdwg.org/dwc/iri/identifiedBy https://orcid.org/0000-0001-9144-2848
  • 17. For the DeceasedFor the Living
  • 21. Retrospective ApproachLayers of Dirt Strings to things Parsing, eg ruby gems Namae, DwcAgent Entity extraction, eg Rosette, https://www.rosette.com/, Watson Natural Language Similarity scoring, eg R.D.M. Page <=> Roderic Page <=> Roderic D.M. Page Search logic Disambiguation Co-author, co-collector networks Collector codes Hand-crafted heuristics, eg birth/death/collection dates, taxa, places
  • 22. Prospective ApproachClean Dirt RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship Chairs: Anne Thessen, Matt Woodburn, Dimitris Koureas
  • 25. What Actions Do We Care About? authored borrowed catalogued collected conserved contributed created curated georeferenced reviewed https://github.com/tdwg/attribution/issues/5
  • 26. Wishlist Test suite for parsing lists of names: text file with expectedJSON response Charles R. Darwin Esq. [{ family: Darwin, given:Charles R., title:Esq.}] leg. A. Chuvilin [{family:Chivilin,given:A.}] N. Navarro, G. G坦mez y A Ferreira [{family:Navarro,given:N.}, {family:G坦mez, given:G.}, {family:Ferreira, given:A}]}
  • 27. Wishlist Common, consistent way to handle search Elasticsearch, Solr plugin Services Input: raw string of name(s), optional parameters Output: parsed name, identifiers, likelihood score Actions for inclusion in a DwC extension