際際滷

際際滷Share a Scribd company logo
CatalogueofLifeAnnualScienceSymposium
The Use and Limits of Scientific Names in
Biological Informatics
Real world  Artificial World
All information.
Names as descriptive metadata
All accumulated information of a species is tied to
a scientific name, a name that serves as a link
between what has been learned in the past and
what we today add to the body of knowledge.
- Grimaldi & Engel, 2005, Evolution of the Insects
Use and Limits of Scientific Names in Biological Informatics
Use and Limits of Scientific Names in Biological Informatics
Nomenclature Taxonomy
Semantic Triangle
Concept?
Cardellina
canadensis
Specimen
Name Specimen
Concept
Its new!
Im famous
Name Specimen
Concept
Thats one
of those
evokes
referredto
1
2
evokes
refers to
3
4
Communication of meaning
A B
Use and Limits of Scientific Names in Biological Informatics
True Negatives
False Positives
False Negatives
Relevant Elements
What you want
Selected Elements
What you got
True Positives
Relevance in information retrieval
Syntax and Semantics/Precision and Recall
GRAPHIC HERE (MAYBE)
Change in syntax with no change in semantics
Doryteuthis (Amerigo) pealeiiLoligo pealeii
Doryteuthis pealeii
Impacts on Use and Recall
Doryteuthis pealeii Loligo pealei
Conflation of syntax
Agalinus paupercula borealis
Agalinus pauperculum borealis
Agalinis paupercula var. Borealis
Agalinus pauperculum var. borealis
Agalinus paupercula var. borealis
Agalinus paupercula var. borealis Pennell
Agalinus paupercula Britton var. borealis Pennell
Agalinus paupercula (Gray) Britt. var. borealis Pennell
Agalinis paupercula (A.Gray) Britton var. borealis Pennell
Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934
Gerardia paupercula borealis
Gerardia paupercula var. borealis
Gerardia paupercula var. borealis (Pennell) Deam
Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam
Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam
Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam
Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell
Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell
Gerardia paupercula Britton ssp. borealis Pennell
Change in semantics and no change in syntax
Pneumocystis carinii
Change in semantics and no change in syntax
Pneumocystis carinii
Change in semantics and no change in syntax
Pneumocystis carinii
Polysemy
Pneumocystis carinii Pneumocystis carinii Pneumocystis jiroveci
= +
< 2002 > 2002
Change in semantics = change in precision
Pneumocystis carinii Pneumocystis carinii
(the non-human part of..) Pneumocystis carinii (BEFORE 2002) OR Pneumocystis carinii (AFTER 2002)
FALSE
POSITIVE
Three concepts  three identifers
Pneumocystis carinii sec 1955 Pneumocystis carinii sec 2002 Pneumocystis jiroveci
Mint Distinct Taxon Identifiers
Pneumocystis carinii sec 1955 Pneumocystis carinii sec 2002 Pneumocystis jiroveci
2002
Pj
Halichondria (Halichondria) panicea (Pallas, 1766)
Spongia panicea Pallas, 1766
Alcyonium paniceum (Pallas, 1766)
Halina panicea (Pallas, 1766)
Amorphina panicea (Pallas, 1766)
Halichondria panicea (Pallas, 1766)
Alcyonium medullare Lamarck, 1815
Amorphina appendiculata Schmidt, 1875
Eumastia appendiculata (Schmidt, 1875)
Amorphina grisea Fristedt, 1887
Halichondria grisea (Fristedt, 1887)
Amorphina paciscens Schmidt, 1875
Halichondria paciscens (Schmidt, 1875)
Clathria (Microciona) seriata (Grant, 1826)
Clathria seriata (Grant, 1826)
Seriatula seriata (Grant, 1826)
Spongia seriata Grant, 1826
Halichondria albescens (Rafinesque, 1818)
Spongia albescens Rafinesque, 1818
Hymeniacidon fallaciosus Bowerbank, 1866
Hymeniacidon fragilis Bowerbank, 1866
Hymeniacidon lactea Bowerbank, 1866
Hymeniacidon membrana Bowerbank, 1866
Hymeniacidon parfitti Parfitt, 1868
Hymeniacidon reticulatus Bowerbank, 1866
Hymeniacidon solida Bowerbank, 1874
Hymeniacidon tegeticula Bowerbank, 1874
Hymeniacidon thomasii Bowerbank, 1866
Isodictya crassa Bowerbank, 1882
Isodictya perplexa Bowerbank, 1882
Menanetia minchini Topsent, 1896
Microciona tumulosa Bowerbank, 1882
Pellina bibula Schmidt, 1870
Spongia compacta Sowerby, 1806
Spongia cristata Ellis & Solander, 1786
Spongia tomentosa Linnaeus, 1767
Spongia tubulosa Ellis & Solander, 1786
Spongia urens Ellis & Solander, 1786
Spuma borealis var. convoluta Miklucho-Maclay, 1870
Spuma borealis var. tuberosa Miklucho-Maclay, 1870
Spuma borealis var. velamentosa Miklucho-Maclay, 1870
Trachyopsilla glaberrima Burton, 1931
Halichondria ambigua Bowerbank, 1874
Halichondria bibula (Schmidt, 1870)
Halichondria caduca Bowerbank, 1866
Halichondria coralloides Bowerbank, 1882
Halichondria edusa Bowerbank, 1874
Hymeniacidon firmus Bowerbank, 1874
Halichondria firmus (Bowerbank, 1874)
Halichondria glabra Bowerbank, 1866
Halichondria incerta Bowerbank, 1866
Halichondria lactea (Bowerbank, 1866)
Halichondria membrana (Bowerbank, 1866)
Halichondria pannosus Verrill, 1874
Halichondria papillaris (Linnaeus, 1791)
Halichondria reticulata Lieberk端hn, 1859
Halichondria sevosa Johnston, 1842
Halichondria topsenti de Laubenfels, 1936
Halichondriella corticata Burton, 1931
Halina papillaris (Pallas, 1766)
Halispongia papillaris (Pallas, 1766)
Halichondria brettii (Bowerbank, 1866)
Hymeniacidon brettii Bowerbank, 1866
Hymeniacidon coccinea (Bowerbank, 1861)
Halichondria coccinea Bowerbank, 1861

More Related Content

Use and Limits of Scientific Names in Biological Informatics

  • 1. CatalogueofLifeAnnualScienceSymposium The Use and Limits of Scientific Names in Biological Informatics
  • 2. Real world Artificial World
  • 4. Names as descriptive metadata All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. - Grimaldi & Engel, 2005, Evolution of the Insects
  • 9. Name Specimen Concept Its new! Im famous Name Specimen Concept Thats one of those evokes referredto 1 2 evokes refers to 3 4 Communication of meaning A B
  • 11. True Negatives False Positives False Negatives Relevant Elements What you want Selected Elements What you got True Positives Relevance in information retrieval
  • 12. Syntax and Semantics/Precision and Recall GRAPHIC HERE (MAYBE)
  • 13. Change in syntax with no change in semantics Doryteuthis (Amerigo) pealeiiLoligo pealeii Doryteuthis pealeii
  • 14. Impacts on Use and Recall Doryteuthis pealeii Loligo pealei
  • 15. Conflation of syntax Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Gerardia paupercula borealis Gerardia paupercula var. borealis Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell
  • 16. Change in semantics and no change in syntax Pneumocystis carinii
  • 17. Change in semantics and no change in syntax Pneumocystis carinii
  • 18. Change in semantics and no change in syntax Pneumocystis carinii
  • 19. Polysemy Pneumocystis carinii Pneumocystis carinii Pneumocystis jiroveci = + < 2002 > 2002
  • 20. Change in semantics = change in precision Pneumocystis carinii Pneumocystis carinii (the non-human part of..) Pneumocystis carinii (BEFORE 2002) OR Pneumocystis carinii (AFTER 2002) FALSE POSITIVE
  • 21. Three concepts three identifers Pneumocystis carinii sec 1955 Pneumocystis carinii sec 2002 Pneumocystis jiroveci
  • 22. Mint Distinct Taxon Identifiers Pneumocystis carinii sec 1955 Pneumocystis carinii sec 2002 Pneumocystis jiroveci 2002 Pj
  • 23. Halichondria (Halichondria) panicea (Pallas, 1766) Spongia panicea Pallas, 1766 Alcyonium paniceum (Pallas, 1766) Halina panicea (Pallas, 1766) Amorphina panicea (Pallas, 1766) Halichondria panicea (Pallas, 1766) Alcyonium medullare Lamarck, 1815 Amorphina appendiculata Schmidt, 1875 Eumastia appendiculata (Schmidt, 1875) Amorphina grisea Fristedt, 1887 Halichondria grisea (Fristedt, 1887) Amorphina paciscens Schmidt, 1875 Halichondria paciscens (Schmidt, 1875) Clathria (Microciona) seriata (Grant, 1826) Clathria seriata (Grant, 1826) Seriatula seriata (Grant, 1826) Spongia seriata Grant, 1826 Halichondria albescens (Rafinesque, 1818) Spongia albescens Rafinesque, 1818 Hymeniacidon fallaciosus Bowerbank, 1866 Hymeniacidon fragilis Bowerbank, 1866 Hymeniacidon lactea Bowerbank, 1866 Hymeniacidon membrana Bowerbank, 1866 Hymeniacidon parfitti Parfitt, 1868 Hymeniacidon reticulatus Bowerbank, 1866 Hymeniacidon solida Bowerbank, 1874 Hymeniacidon tegeticula Bowerbank, 1874 Hymeniacidon thomasii Bowerbank, 1866 Isodictya crassa Bowerbank, 1882 Isodictya perplexa Bowerbank, 1882 Menanetia minchini Topsent, 1896 Microciona tumulosa Bowerbank, 1882 Pellina bibula Schmidt, 1870 Spongia compacta Sowerby, 1806 Spongia cristata Ellis & Solander, 1786 Spongia tomentosa Linnaeus, 1767 Spongia tubulosa Ellis & Solander, 1786 Spongia urens Ellis & Solander, 1786 Spuma borealis var. convoluta Miklucho-Maclay, 1870 Spuma borealis var. tuberosa Miklucho-Maclay, 1870 Spuma borealis var. velamentosa Miklucho-Maclay, 1870 Trachyopsilla glaberrima Burton, 1931 Halichondria ambigua Bowerbank, 1874 Halichondria bibula (Schmidt, 1870) Halichondria caduca Bowerbank, 1866 Halichondria coralloides Bowerbank, 1882 Halichondria edusa Bowerbank, 1874 Hymeniacidon firmus Bowerbank, 1874 Halichondria firmus (Bowerbank, 1874) Halichondria glabra Bowerbank, 1866 Halichondria incerta Bowerbank, 1866 Halichondria lactea (Bowerbank, 1866) Halichondria membrana (Bowerbank, 1866) Halichondria pannosus Verrill, 1874 Halichondria papillaris (Linnaeus, 1791) Halichondria reticulata Lieberk端hn, 1859 Halichondria sevosa Johnston, 1842 Halichondria topsenti de Laubenfels, 1936 Halichondriella corticata Burton, 1931 Halina papillaris (Pallas, 1766) Halispongia papillaris (Pallas, 1766) Halichondria brettii (Bowerbank, 1866) Hymeniacidon brettii Bowerbank, 1866 Hymeniacidon coccinea (Bowerbank, 1861) Halichondria coccinea Bowerbank, 1861

Editor's Notes

  • #2: My intention is to extend the "artifical world" that Rich introduced into a framework that we use to cast some (not all) of the issues and observations we will see in the next presentations. My intention is to use this framework as a means to better inform the future directions of the CoL. I hope to demonstrate that the information components of the Catalogue of Life provide a critical basis for ensuring that biological information is accessible in units that make biological sense. When it comes operating within this artificial world I hope to make the argument that, without taxonomy, it isnt biology.
  • #3: Scientific names serve to label biodiversity information: information related to species providing (if not the sole, than the key) biological context to associated content, data, information, etc. This includes not just physical observations but, more importantly, anything we record as data, information, or knowledge related to a species.
  • #4: Scientific names label data objects you might traditionally associate with biodiversity: specimens, surveys, samples, etc. They also, however, provide the sole biological and evolutionary context for gene sequences, scientific publications, images, books, non-scientific articles, news stories, etc.
  • #5: This system of utilizing names for taxa has been in use for over 250 years. As a result, ALL information related to a species is labeled with a name, or, as we will hear today, some sort of identifying label. Names therefore, serve as identifiers for taxa the same way we use symbols, numbers and labels as identifiers for objects we refer to in other aspects of our lives. This ubiquity would imply a key role for names as identifiers for accessing information related to species since increasingly, data and information of all stripes is available online. If only we can find it. Names, and their related taxonomic definitions, however, present instabilities that limit their use as identifiers in information retrieval. These problems and their ramifications can impact the integrity of the use and analysis of biological data.
  • #6: Semiotics is the study of meaning-making. It provides a useful model for describing the relationship between symbols such as names, and the objects to which they refer.
  • #7: Semiotics distinguishes syntactics, which governs the rules and relationships among names, from semantics, which represents the relations between those labels and the objects to which they refer.
  • #8: Need Narrative
  • #9: The relationship between syntax and semantics, and how it intersects our discussion on biological taxonomy can be illustrated with the triangle of reference, or the semiotic triangle. In the model, there is no direct relationship between the name and the real-world object, the bird, it represents. Meaning, or the relationship between the name and the object, is conveyed only through a concept that exists in the mind of the user of the name.
  • #10: In taxonomy, a biologist (A) determines a specimen is sufficiently distinct to constitute a new species and documents the concept or idea of this novelty to a publication and assigns a name to it. Another person (B) subsequently reading the name, perhaps as a label on a specimen, evokes the concept originally described by the biologist, to refer to the specimen. Accurate communication occurs when there is congruence between both concepts among the writer and the reader.
  • #11: In order to function as useful identifiers in information retrieval, be it by visiting the library in person and going through the shelves or searching online, the relationship between a name and an identifier needs to be stable and unique. It needs to be one to one. This is why your social security number makes a good identifier and you name does not. Not only did my sisters name change when she got married she is also not the only Linda Richardson in the country. Likewise, thumbs-up can mean good in America and it can also mean may I have a ride but in some parts of the world it might get you punched in the nose. In biology the relationship between nomenclature and taxonomy is consistent. Both syntax and semantics are subject to change. This inconsistency places limits on how names may be used in biological informatics in initially anchoring, and in the subsequent retrieval and integration, of relevant biodiversity information
  • #12: Relevance in the context of information retrieval as two measures: Precision and Recall. This model provides demonstrates how they differ. As any Google search will demonstrate, the results retrieved via a keyword search do not always deliver what you asked for. Furthermore, you have no way of knowing if some relevant content was missed. Precision refers to the proportion of relevant objects returned in a search. False positives are those items returned that are not relevant. Recall is the proportion of relevant objects that are returned relative to all relevant objects actually available. False negatives are relevant items that were not returned.
  • #13: Bringing these four items together allows me to now articulate how we must use taxonomic sources like the Catalogue of Life to support the publication (or sharing), access and scientific analysis and use of biological data. I will illustrate where the current system places some limits on this use and some later presentations will demonstrate how they are pushing these limits. Ill try to illustrate how these issues can threaten the delivery, integrity and scope of biological data and the precise nature of that impact.
  • #14: Many of you are familiar with the Woods Hole squid, Loligo pealeii, and its giant axon that has been used for decades as a neurophysiological and model. This name was originally published in 1821 by Lasueur. Ten or twelve years ago, Michael Vecchione and others published a revision of the loliginids that resulted in this species being transferred to a different genus. It was the same species, the semantics didnt change, but just like the syntactic conventions that result in my sister changing her name when she got married, the nomenclatural rules that have a species name composed of a genus part and a species part result in a new name, Doryteuthis pealeii. In this case the genus, Doryteuthis was further sub-divided with the result of this more complex compound name.
  • #15: The impact of these nomenclatural changes is not hard to predict. With more than one name referring to the same taxon, a person seeking information about it must utilize both names to retrieve all relevant information. In addition, people who work with this species may not know, or even agree, with this genus change such that both names continue being used. Here are two articles published by MBL researchers demonstrating this. Its also easy to imagine these two names being mis-interpreted as referring to two different species. John Furfey will provide more details and examples of this in his upcoming talk.
  • #16: The formal rules of nomenclature, combined with latitude in how people record scientific names can conflate the proportion of names one must account for in order to access relevant information related to a species. Its not hard to see how relying on a correctly formed name can easily result in a negative match linked to relevant data or information.
  • #17: Lets look at how a change in semantics with no corresponding change impacts relevance and what this means for scientific use of biological information. Pneumocystis carinii is opportunistic fungal pathogen that causes an often deadly pneumonia in immune-compromised people. It was originally isolated in rats and dogs.
  • #18: In 2002, new molecular evidence led to an assertion that the form that infects humans was distinct from the animal form. This led to a splitting of the traditional concept of the taxon and the creation of a new name, Pneumocystis jiroveci.
  • #19: This did not go unnoticed or unchallenged within the medical research community. It is, however, common and part of the dynamism that is modern taxonomy. Every year, approximately 1% of all scientific names become invalidated, either because a reclassification results in a new syntactic change or two or more taxa are merged (a semantic change) and one of the names is no longer used. In this case a species was split and lets quickly explore the consequences of this.
  • #20: In the case of Pneumocytis carinii, the original, pre-2002 form consisted of organisms that infect rats, dogs and humans. Following 2002, Pneumocytis carinii only refers to the taxon that infects non-humans. In taxonomy these two different circumscriptions for the same nominal taxon are known as different taxon concepts. The semiotic term is polysemy or multiple-meanings. Please note that, while this occurred serially, it is not uncommon for different taxon concepts to occur in parallel with supporters for each different circumscription. Putting aside the new taxon P. jiroveci, for a moment, lets look at the informatics consequences of polysemy.
  • #21: Polysemy results in a reduction in precision in information retrieval. The use of the name, Pneumocytis carinii, since it is conserved and use identically after the split, is ambiguous. Studies that focus only on the current sense of the taxon cannot rely on the use of the name alone to retrieve relevant data objects. Human-related instances representing false-positive results will dominate. In some cases, where such ambiguity is known, users of these data know to scrutinize at the object level to disambiguate these results. In many cases, however, the proceedings of taxonomic expertise are unknown and data retrieval and subsequent use does not account for this ambiguity. The result is analyses, inferences and possibly conclusions that are less precise than assumed.
  • #22: Its important to recognize that this impact on precision represents a limit to the use of scientific names in their application to taxa. Conserving the same name results in three distinct taxa and two distinct labels. Recall that identifier stability requires a 1 to 1 relationship with between syntax and semantics. One vision under discussion this week is a global taxonomic clearinghouse that catalogs and uniquely identifies these different semantic views and utilizes an updated set of nomenclatural rules to uniquely identify each concept using terms a biologist might actually adopt.
  • #23: This slide discusses how even with the minting of identifiers ambiguity remains until and unless those identifiers are retrospectively applied within objects recorded in the sense of the earlier merged concept. This requires an object by object evaluation.
  • #24: Wrap with an extreme view of the MANY-TO-MANY relationship between SYNTAX and SEMANTIC. Halichondria in the sense of WORMS/SOEST is the result of grouping a large set of previously described taxa for this cosmopolitan species. Many of these taxa include additional combinations, creating an enormous set of both homotypic and heterotypic synonyms. The net result from a scientific use standpoint is that 1) you need to include all these names within search of heterogenous data systems in order to ensure high recall. But given the high degree of semantic change you will have to deal with potentially significant ambiguities in precision. Data objects linked with these names may refer to completely different species today. CAVEAT EMPTOR.