際際滷

際際滷Share a Scribd company logo
ISOcat -> LMF-> TEI (Dictionaries)Menzo WindhouwerThe Language Archive  MPI-PLMenzo.Windhouwer@mpi.nl12 October 20111TEI Lexical workshop - W端rzburg, Germany
OutlineIntroduction to ISOcat a ISO 12620:2009 compliant Data Category Registry (DCR)ISOcat and the Lexical Markup Framework (LMF; ISO 24613:2008)ISOcat and TEI (Dictionaries)12 October 20112TEI Lexical workshop - W端rzburg, Germany
12 October 20113ISO 12620:2009Terminology and other content and language resources  Specification of data categories and management of a Data Category Registry for language resources
An ISO TC 37/SC 3 standard
Replaces ISO 12620:1999, a hardcoded list of Data Categories, with a registry for (standardized) Data CategoriesTEI Lexical workshop - W端rzburg, Germany
12 October 20114What is a Data Category?The result of the specification of a given data field
A data category is an elementary descriptor in a linguistic structure or an annotation scheme.
Specification consists of 3 main parts:
Administrative part
Administration and identification
Descriptive part
Documentation in various working languages
Linguistic part
Conceptual domain(s for various object languages)TEI Lexical workshop - W端rzburg, Germany

More Related Content

Similar to ISOcat to LMF to TEI (20)

PPTX
ISOcat: a short introduction
Menzo Windhouwer
PPTX
Use of ISOcat within CMDI
Menzo Windhouwer
PPTX
The ISO-DCR
Menzo Windhouwer
PPTX
Taxonomy Interoperability Standards
Access Innovations, Inc.
PPTX
The state of KOS in the Linked Data movement
Marcia Zeng
PPTX
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Menzo Windhouwer
PPTX
LDL 2012 - Linking to ISOcat Data Categories
Menzo Windhouwer
PDF
Labels in the web of data
Basil Ell
PPT
ISO 25964: Thesauri and Interoperability with Other Vocabularies
Marcia Zeng
PPTX
Semantic challenges in sharing dataset metadata and creating federated datase...
Valeria Pesce
PPTX
Semantic Mapping in CLARIN Component Metadata.
Menzo Windhouwer
PDF
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...
National Information Standards Organization (NISO)
PPT
DCMI Abstract Model: issues and proposed changes
Eduserv Foundation
PDF
Glossary of Metadata standards
Satapon Yosakonkun
PPTX
Ontologies, controlled vocabularies and Dataverse
vty
PPT
Metadata Workshop - Utrecht - November 5, 2008
askamy
PDF
DOIs, provenance & vocabularies - Nicholas Car (CSIRO)
ARDC
PPT
Extending models for controlled vocabularies to classification systems: model...
Marcia Zeng
PPT
Getting a Handle on Alphabet Soup: Implementing Diverse Content Standards and...
Jenn Riley
PPT
Pratt Sils LIS653 4 Fall 2007
PrattSILS
ISOcat: a short introduction
Menzo Windhouwer
Use of ISOcat within CMDI
Menzo Windhouwer
The ISO-DCR
Menzo Windhouwer
Taxonomy Interoperability Standards
Access Innovations, Inc.
The state of KOS in the Linked Data movement
Marcia Zeng
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Menzo Windhouwer
LDL 2012 - Linking to ISOcat Data Categories
Menzo Windhouwer
Labels in the web of data
Basil Ell
ISO 25964: Thesauri and Interoperability with Other Vocabularies
Marcia Zeng
Semantic challenges in sharing dataset metadata and creating federated datase...
Valeria Pesce
Semantic Mapping in CLARIN Component Metadata.
Menzo Windhouwer
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...
National Information Standards Organization (NISO)
DCMI Abstract Model: issues and proposed changes
Eduserv Foundation
Glossary of Metadata standards
Satapon Yosakonkun
Ontologies, controlled vocabularies and Dataverse
vty
Metadata Workshop - Utrecht - November 5, 2008
askamy
DOIs, provenance & vocabularies - Nicholas Car (CSIRO)
ARDC
Extending models for controlled vocabularies to classification systems: model...
Marcia Zeng
Getting a Handle on Alphabet Soup: Implementing Diverse Content Standards and...
Jenn Riley
Pratt Sils LIS653 4 Fall 2007
PrattSILS

ISOcat to LMF to TEI

  • 1. ISOcat -> LMF-> TEI (Dictionaries)Menzo WindhouwerThe Language Archive MPI-PLMenzo.Windhouwer@mpi.nl12 October 20111TEI Lexical workshop - W端rzburg, Germany
  • 2. OutlineIntroduction to ISOcat a ISO 12620:2009 compliant Data Category Registry (DCR)ISOcat and the Lexical Markup Framework (LMF; ISO 24613:2008)ISOcat and TEI (Dictionaries)12 October 20112TEI Lexical workshop - W端rzburg, Germany
  • 3. 12 October 20113ISO 12620:2009Terminology and other content and language resources Specification of data categories and management of a Data Category Registry for language resources
  • 4. An ISO TC 37/SC 3 standard
  • 5. Replaces ISO 12620:1999, a hardcoded list of Data Categories, with a registry for (standardized) Data CategoriesTEI Lexical workshop - W端rzburg, Germany
  • 6. 12 October 20114What is a Data Category?The result of the specification of a given data field
  • 7. A data category is an elementary descriptor in a linguistic structure or an annotation scheme.
  • 12. Documentation in various working languages
  • 14. Conceptual domain(s for various object languages)TEI Lexical workshop - W端rzburg, Germany
  • 15. 12 October 20115Data category exampleData category: /grammatical gender/
  • 20. English definition: Category based on (depending on languages) the natural distinction between sex and formal criteria.
  • 21. French definition: Cat辿gorie fond辿e (selon la langue) sur la distinction naturelle entre les sexes ou d'autres crit竪res formels.
  • 23. Morposyntax conceptual domain: /male/, /feminine/, /neuter/
  • 24. French conceptual domain: /male/, /feminine/TEI Lexical workshop - W端rzburg, Germany
  • 25. 12 October 20116What is a Data Category Registry?www.isocat.orgA (coherent) set of Data Categories, in our case for linguistic resources
  • 26. A system to manage this set:
  • 27. Create and edit Data Categories
  • 28. Share Data Categories, e.g., resolve PID references
  • 30. Grass roots approachTEI Lexical workshop - W端rzburg, Germany
  • 31. ISOcat and LMF則4.4 ISO 12620 Data Category Registry (DCR)The designers of an LMF conformant lexicon shall use data categories from the ISO 12620 Data Category Registry (DCR) located at www.isocat.org.則 5.4 LMF data category selection proceduresCreate a Data Category SelectionAdd Data Categories to ISOcat if neededMissing: how to refer to ISOcat Data Categories?12 October 20117TEI Lexical workshop - W端rzburg, Germany
  • 32. Data Category identifiers are ambiguous<LexicalEntry> <feat att=partOfSpeech val=commonNoun/> ISOcat contains two exact matches for commonNoun and one close match:12 October 20118TEI Lexical workshop - W端rzburg, Germany
  • 33. Why are identifiers ambiguous?Several thematic domains can use the same name for a (slightly) different Data CategoryThis was already true in the predecessor of ISOcat SYNTAX (legacy)There maybe multiple versions of the same Data CategoryDue to semantic drift or rot the name can not just point to the latest versionUsers can also create Data Categories with the same nameIn the future even copy a Data Category to extends its conceptual domainIdentifier should have been renamed, e.g., to mnemonic12 October 20119TEI Lexical workshop - W端rzburg, Germany
  • 34. ISOcat Data Category PIDs are uniqueEach ISOcat Data Category (version) has an unique PIDhttp://www.isocat.org/datcat/DC-1256/common noun/ by Gil FrancopouloISO 12620:2009 Annex A provides a small vocabulary to annotate an XML document with Data Category PID references:<featatt=partOfSpeechdcr:datcat=http://www.isocat.org/datcat/DC-1345val=commonNoundcr:valueDatcat=http://www.isocat.org/datcat/DC-1256/>Preferably annotate the schema of the resource12 October 201110TEI Lexical workshop - W端rzburg, Germany
  • 36. TEI feature structure declarations<tei:fDecl name=partOfSpeechdcr:datcat=http://www.isocat.org/datcat/DC-1345> <tei:vRange> <tei:vAlt> <tei:symbol value=commonNoundcr:datcat=http://www.isocat.org/datcat/DC-1256/> 12 October 201112TEI Lexical workshop - W端rzburg, Germany
  • 37. TEI and ISOcat Data Category PIDsIs TEI open to attributes from foreign namespaces?dcr:* attributes can already be usedOr can the dcr:* attributes be part of the global attribute list?It would enable to annotate any TEI element, incl. Dictionary elements, with a Data Category reference
  • 38. The DCR data model now also includes container Data Categories and can thus also cover inner nodes
  • 39. Could also (partially?) be done by <equiv/> statements in the ODD files
  • 40. Scripts to do this (semi-)automatically have already been createdOr can at least the TEI/ISO feature structure part accept dcr:* attributes?Add a DCR specific attribute list?
  • 41. Would make the ISO TC 37 standards consistent ISO 24610-1, ISO 24613:2008 and ISO 12620:2009 Could also be another TEI attribute that expresses equivalence with an external (URI) specification (like <equiv/> in ODD) and which isnt as much bound to ISOcatas the dcr:* attributes imply12 October 201113TEI Lexical workshop - W端rzburg, Germany
  • 42. 12 October 201114Thank you for your attention!Visitwww.isocat.orgQuestions?Menzo.Windhouwer@mpi.nlTEI Lexical workshop - W端rzburg, Germany