ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
U.S. National Library of Medicine
National Institutes of Health
UMLS
(The Unified Medical Language System)
2012.11.29 Reviewed by Eunsil Yoon
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Contents
? Introduction
¨C What is the UMLS?
¨C UMLS is Use
¨C www.nlm.nih.gov/research/umls
? The Three UMLS Tools (Knowledge Sources)
¨C Metathesaurus
¨C Semantic network
¨C SPECIALIST Lexicon
? UMLS in JAMIA papers
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
What is the UMLS?
? Started in 1986 (NLM; National Library of Medicine)
? NLM is a member of the IHTSDO(owner of SNOMED CT)
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
What is the UMLS?
? Unified Medical Language System? (UMLS?)
? A set of files and software that brings together many health and
biomedical vocabularies and standards to enable interoperability
between computer systems.
? You can use the UMLS to enhance or develop applications, such as
electronic health records, classification tools, dictionaries and
language translators.
The UMLS is not an end-user application
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM Mainpage
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Metathesaurus browser
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Browser > Synonyms
Synonyms (246)
(Acute nasopharyngitis or rhinitis) or (common cold)
(Acute nasopharyngitis or rhinitis) or (common cold) (disorder)
ARNAS IBILBIDE GARAIETAKO ZOLDURA/ HOTZALDI
ARRUNTA
Acut nasopharyngitis (megh?l¨¦s)
Acut rhinitis
Acute Nasopharyngitis
Acute coryza
Acute infectie bovenste luchtwegen
Acute infective rhinitis
Acute nasal catarrh
Acute nasofaryngitis [verkoudheid]
Acute nasopharyngitis
Acute nasopharyngitis (common cold)
Acute nasopharyngitis [common cold]
Acute nasopharyngitis, NOS
Acute rhinitis
Acute rhinitis (disorder)
Akute Rhinopharyngitis [Erkaeltungsschnupfen]
Akutn¨ª nazofaryngitida
Akutn¨ª rinitida
Akutn¨ª z¨¢n¨§t nosohltanu (prost¨¦ nachlazen¨ª)
COLD
COMMON COLD
CORIZA
CORYZA
§±§²§°§³§´§µ§¥§¡
¤«¤¼
¤«¤¼¤Ò¤­
¤«¤¼Ö¢ºòȺ
¥³¥ê©`¥¶-¼±ÐÔ
¼±ÐÔ¥³¥ê©`¥¶
¼±ÐÔ±ÇÑÊî^Ñ×
¼±ÐÔ±ÇÑÊî^Ñ×£¨¸Ðð£©
¼±ÐÔ±ÇÑ×
¸Ðð
¸Ðð-ÆÕͨ
¸Ðð֢ºòȺ
¸ÐȾÐÔ±ÇÑ×
ÆÕͨ¸Ðð
î^²¿¸Ðð
ïLа
±Ç¸Ðð
±ÇÑ×£¨¸ÐȾÐÔ£©
?? ????[??]
??????????
?????
?????????????
??????????????????
?????????????
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Browser > Relations
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Metathesaurus browser
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Semantic Network Browser
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Semantic Network Browser
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Three UMLS Tools (Knowledge Sources)
? Metathesaurus
? Semantic Network
? SPECIALIST Lexicon
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus
? The Metathesaurus is a large, multi-purpose, and multi-lingual vocabulary database
that contains information about biomedical and health related concepts, their various
names, and the relationships among them.
? Over 100 vocabularies, code sets, and thesauri, or "source vocabularies" are brought
together to create the Metathesaurus.
? organized by meaning and assigned a concept unique identifier (CUI).
? 62% of the Metathesaurus source vocabularies ? English
? Also contains terms from 17 other languages
Atrial fibrillation ICD-9-CM
AF NCI Thesaurus
AFib MedDRA
Atrial fibrillation (disorder) SNOMED Clinical Terms
atrium; fibrillation ICPC2-ICD10 Thesaurus
Ex. ¡°Atrial Fibrillation¡±
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Basic organization
? Concepts
¨C Synonymous terms are clustered into a concept
¨C Properties are attached to concepts, e.g.,
? Unique identifier
? Definition
? Relations
¨C Concepts are related to other concepts
¨C Properties are attached to relations, e.g.,
? Type of relationship
? Source
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus - subsets
? Users create a useful subset, or smaller grouping of concepts, by
choosing source vocabularies
? Examples of subsets include
¨C Source vocabularies in a language (all Spanish vocabularies)
¨C All terms that are free for use within the United States
¨C CPT codes to be used for billing purposes
¨C Terms with the semantic type 'Clinical Drug'
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Unique Identifiers
? Concept Unique Identifiers (CUI)
¨C A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus
construction is to understand the intended meaning of each name in each source vocabulary and
to link all the names from all of the source vocabularies that mean the same thing (the synonyms).
? Lexical (term) Unique Identifiers (LUI)
¨C LUI link strings that are lexical variants. Lexical variants are detected using the Lexical Variant
Generator (LVG) program, one of the UMLS lexical tools.
? String Unique Identifiers (SUI)
¨C Each unique concept name or string in each language in the Metathesaurus has a unique and
permanent string identifier (SUI). Any variation in character set, upper-lower case, or punctuation
difference is a separate string, with a separate SUI. SUI contain the letter S followed by seven
numbers. In the example on the right there are four strings with four different SUI.
? Atom Unique Identifiers (AUI)
¨C The basic building blocks or "atoms" from which the Metathesaurus is constructed are the concept
names or strings from each of the source vocabularies. Every occurrence of a string in each
source vocabulary is assigned a unique atom identifier (AUI).
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Unique Identifiers > Atom
obsolete
suppressible
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Data Files
? The Metathesaurus consists of forty data, metadata, and index files.
? The data files listed below contain information obtained from the
source vocabularies.
? The table below illustrates what information populates each data file.
Metadata File Name Contents
MRCONSO.RRF
Names, Synonyms, Terms, Term Types, Code
s
MRREL.RRF Relationships
MRHIER.RRF Hierarchies
MRSAT.RRF Attributes
MRDEF.RRF Definitions
MRMAP.RRF Mappings
MRSMAP.RRF Simplified Mappings
MRSTY.RRF Semantic Types
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Data Files > RRF
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network
? The Semantic Network
¨C Semantic types (high level categories)
¨C Semantic relationships (relationships between semantic types)
? The Semantic Network can be used to categorize any medical vocabulary.
? 133 semantic types in the Semantic Network
? Every Metathesaurus concept is assigned at least one semantic type; very
few terms are assigned as many as five semantic types.
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Type
? Entity
? A broad type for grouping physical and
conceptual entities.
? Examples of Entity semantic types are:
? Amphibian
? Gene or Genome
? Carbohydrate
? Event
? A broad type for grouping activities, processes
and states.
? Examples of Event semantic types are:
? Social Behavior
? Laboratory Procedure
? Mental Process
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Anatomical
Abnormality
SubstanceOrganism
Manufactured
Object
Anatomical
Structure
Conceptual entity
Entity
Physical Object
Clinical
Drug
Fully Formed
Anatomical
Structure
Embryonic
Structure
Research
Device
Medical
Device
FoodChemical
Body
Substance
Rickettsia or
Chlamydia
VirusPlantFungusBacteriumArchaeonAnimal
Biological
Active
Substance
ReptileMammalFishBirdAmphibian
Pharmacologic
Substance
Element,
Ion, or
Isotope
Inorganic
Chemical
Organic
Chemical
Hazardous or
Poisonous
Substance
Biological
Dental
Material
Indicator,
Reagent, or
Diagnostic Aid
Cell
component
Body Part Organ, or
Organ Component
Congenital
Abnormality
Acquired
Abnormality
InvertebrateVertebrate
Gene or
Genome
TissueCell Alga
Chemical
Viewed
Structurally
Chemical
Viewed
Functionally
VitaminEnzymeHormone
Neuroreactive
Substance or
Biogenic Amine
Human
Immunologic
Factor
Receptor Antibiotic
Amino Acid,
Nucleoside,
or nucleotide
Carbohy
drate
Lipid
Nucleic Acid,
Nucleoside
,or Nucleotide
Organophos
phorus
Compound
SteroidEicosanoid
Semantic Network Physical Object
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Finding
Idea or
Concept
Physical ObjectConceptual entity
Occupation or
Discipline
Language
Intellectual
Product
Organism
Attribute
Group
Group
Attribute
Organization
Regulation
or Law
Classification
Clinical
Attribute
Sign or
Symptom
Laboratory or
Test Result
Amino Acid
Sequence
Biomedical
Occupation or
Discipline
Nucleotide
Sequence
Carbohydrate
Sequence
Patient or
Disabled
Group
Population
Group
Professional or
Occupational
Group
Family GroupAge Group
Spatial
Concept
Quantitative
Concept
Qualitative
Concept
Temporal
Concept
Functional
Concept
Body System
Molecular
Sequence
Geographic
Area
Body Space or
Junction
Body Location
or Region
Carbohydrate
Sequence
Amino Acid
Sequence
Nucleotide
Sequence
Semantic Network Conceptual Object
Entity
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Event
Behavior
Phenomenon
Or Process
Activity
Individual
Behavior
Educational
Activity
Social
Behavior
Daily or
Recreational
Activity
Injury or
Poisoning
Natural
Phenomenon
of Process
Human-caused
Phenomenon of
Process
Machine
Activity
Occupational
Activity
Environment
al Effect of
Human
Research
Activity
Health Care
Activity
Governmental
or Regulatory
Activity
Biologic
Function
Molecular
Biology
Research
Technique
Therapeutic or
Preventive
Procedure
Laboratory
Procedure
Diagnostic
Procedure
Pathologic
Function
Physiologic
Function
Cell or
Molecular
DysFunction
Organism
Function
Organ or
Tissue
Function
Molecular
Function
Cell
Function
Experimental
Model of
Disease
Disease
or
Syndrome
Mental or
Behavioral
Dysfunction
Neoplastic
Process
Mental
Process
Genetic
Function
Semantic Network - Event
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Relationships
? 54 Semantic Relationships
? The primary link between most semantic types is
the ¡®isa¡¯ relationship.
? Animal isa Entity
? Carbohydrate isa Chemical
? Human isa Mammal
[ Relation Label ]
? isa
? part_of
? result_of
? co-occurs_with
? evaluation_of
? location_of
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Relationships
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
SPECIALIST Lexicon
? A lexicon is necessarily a core component of any natural language processing system
? Coverage includes both commonly occurring English words and biomedical
vocabulary discovered in the NLM Test Collection and the UMLS Metathesaurus.
? The lexicon entry for each word or term records the syntactic, morphological, and
graphemic information.
¨C Syntactic information includes syntactic category(part of speech), and complementation patterns for verbs,
adjectives and nouns, as well as positional and modification types for adjectives and adverbs.
¨C Inflectional morphology is indicated for those syntactic categories which inflect, and spelling variation is
recorded for each lexical item known to exhibit such variation.
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
SPECIALIST NLP Tools
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
????
[1] Wu S.T., Liu.H et al (2012). Unified Medical Language
System term occurrences in clinical notes: a large-scale
corpus analysis. Journal of the American Medical
Informatics Association : JAMIA, 19(e1), e149¨Ce156.
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Objective
¨C To characterise empirical instances of Unified Medical Language System (UMLS)
Metathesaurus term strings in a large clinical corpus, and to illustrate what types
of term characteristics are generalisable across data sources.
? Data Sources
¨C The data source for the corpus analysis of clinical text was Mayo Clinic clinical
notes between 1 January 2001 and 31 December 2010, retrieved from the
Mayo¡¯s Enterprise Data Trust (EDT).
¨C 51,945,627EA documents
¨C 296,167 unique terms
¨C 2,319,010,575 case-insensitive exact term match
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Figure 1 shows histograms for the number of words in the UMLS and in the subset that
is empirically found in Mayo Clinic data.
? Corpus Analysis ¨C Word Statistics
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Corpus Analysis - Term Frequency
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Corpus Analysis ¨C Source Terminology
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Corpus Analysis ¨C syntactic categories
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Cross-Institutional analysis
¢Ù Special characters
¢Ú Maximum number of words
¢Û Maximum number of characters
¢Ü Language
¢Ý Source terminology
¢Þ Semantic group
¢ß Empirical occurrence filter
¢à Term frequency
? SNOMED-CT
? Consumer Health Vocabulary
? National Cancer Institute(NCI) Thesaurus
? Medical Subject Headings (MSH)
? Read Codes
? Medical Dictionary for Regulatory Activities Terminology (MedDRA)
? SNOMED International
? MEDCIN
? UMLS Metathesaurus
? National Drug Filed Reference Terminology(NDF-RT)
? The original SNOMED
? Online Mendelian Inheritance in Man (OMIM)
? Logical Observation Identifiers Names and Codes (LOINC)
? Computer Retrieval of Information on Scientific Projects (CRISP)
? Anatomy
? chemicals & drugs
? concepts & ideas
? Disorders
? living beings
? physiology
? procedures
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Reference
? UMLS; http://www.nlm.nih.gov/research/umls
? UMLS Basics Tutorial;
http://www.nlm.nih.gov/research/umls/new_users/online_learning/in
dex.htm
? UTS; https://uts.nlm.nih.gov/
? Wu S.T., Liu.H et al (2012). Unified Medical Language System term
occurrences in clinical notes: a large-scale corpus analysis. Journal
of the American Medical Informatics Association : JAMIA, 19(e1),
e149¨Ce156.
? ???, ???, ???. ¡®UMLS Metathesaurus 2004? ??? ??
??- Rich Release Format(RRF)? ??¡¯

More Related Content

121129 umls yes

  • 1. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon U.S. National Library of Medicine National Institutes of Health UMLS (The Unified Medical Language System) 2012.11.29 Reviewed by Eunsil Yoon
  • 2. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Contents ? Introduction ¨C What is the UMLS? ¨C UMLS is Use ¨C www.nlm.nih.gov/research/umls ? The Three UMLS Tools (Knowledge Sources) ¨C Metathesaurus ¨C Semantic network ¨C SPECIALIST Lexicon ? UMLS in JAMIA papers
  • 3. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon What is the UMLS? ? Started in 1986 (NLM; National Library of Medicine) ? NLM is a member of the IHTSDO(owner of SNOMED CT)
  • 4. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon What is the UMLS? ? Unified Medical Language System? (UMLS?) ? A set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems. ? You can use the UMLS to enhance or develop applications, such as electronic health records, classification tools, dictionaries and language translators. The UMLS is not an end-user application
  • 5. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM Mainpage
  • 6. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM > UMLS
  • 7. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM > UMLS > UTS
  • 8. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM > UMLS > UTS > Metathesaurus browser
  • 9. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus Browser > Synonyms Synonyms (246) (Acute nasopharyngitis or rhinitis) or (common cold) (Acute nasopharyngitis or rhinitis) or (common cold) (disorder) ARNAS IBILBIDE GARAIETAKO ZOLDURA/ HOTZALDI ARRUNTA Acut nasopharyngitis (megh?l¨¦s) Acut rhinitis Acute Nasopharyngitis Acute coryza Acute infectie bovenste luchtwegen Acute infective rhinitis Acute nasal catarrh Acute nasofaryngitis [verkoudheid] Acute nasopharyngitis Acute nasopharyngitis (common cold) Acute nasopharyngitis [common cold] Acute nasopharyngitis, NOS Acute rhinitis Acute rhinitis (disorder) Akute Rhinopharyngitis [Erkaeltungsschnupfen] Akutn¨ª nazofaryngitida Akutn¨ª rinitida Akutn¨ª z¨¢n¨§t nosohltanu (prost¨¦ nachlazen¨ª) COLD COMMON COLD CORIZA CORYZA §±§²§°§³§´§µ§¥§¡ ¤«¤¼ ¤«¤¼¤Ò¤­ ¤«¤¼Ö¢ºòȺ ¥³¥ê©`¥¶-¼±ÐÔ ¼±ÐÔ¥³¥ê©`¥¶ ¼±ÐÔ±ÇÑÊî^Ñ× ¼±ÐÔ±ÇÑÊî^Ñ×£¨¸Ð𣩠¼±ÐÔ±ÇÑ× ¸Ðð ¸Ðð-ÆÕͨ ¸Ðð֢ºòȺ ¸ÐȾÐÔ±ÇÑ× ÆÕͨ¸Ðð î^²¿¸Ðð ïLа ±Ç¸Ðð ±ÇÑ×£¨¸ÐȾÐÔ£© ?? ????[??] ?????????? ????? ????????????? ?????????????????? ?????????????
  • 10. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus Browser > Relations
  • 11. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM > UMLS > UTS > Metathesaurus browser
  • 12. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM > UMLS > UTS > Semantic Network Browser
  • 13. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon NLM > UMLS > UTS > Semantic Network Browser
  • 14. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon The Three UMLS Tools (Knowledge Sources) ? Metathesaurus ? Semantic Network ? SPECIALIST Lexicon
  • 15. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus ? The Metathesaurus is a large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. ? Over 100 vocabularies, code sets, and thesauri, or "source vocabularies" are brought together to create the Metathesaurus. ? organized by meaning and assigned a concept unique identifier (CUI). ? 62% of the Metathesaurus source vocabularies ? English ? Also contains terms from 17 other languages Atrial fibrillation ICD-9-CM AF NCI Thesaurus AFib MedDRA Atrial fibrillation (disorder) SNOMED Clinical Terms atrium; fibrillation ICPC2-ICD10 Thesaurus Ex. ¡°Atrial Fibrillation¡±
  • 16. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus Basic organization ? Concepts ¨C Synonymous terms are clustered into a concept ¨C Properties are attached to concepts, e.g., ? Unique identifier ? Definition ? Relations ¨C Concepts are related to other concepts ¨C Properties are attached to relations, e.g., ? Type of relationship ? Source
  • 17. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus - subsets ? Users create a useful subset, or smaller grouping of concepts, by choosing source vocabularies ? Examples of subsets include ¨C Source vocabularies in a language (all Spanish vocabularies) ¨C All terms that are free for use within the United States ¨C CPT codes to be used for billing purposes ¨C Terms with the semantic type 'Clinical Drug'
  • 18. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus ¨C Unique Identifiers ? Concept Unique Identifiers (CUI) ¨C A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms). ? Lexical (term) Unique Identifiers (LUI) ¨C LUI link strings that are lexical variants. Lexical variants are detected using the Lexical Variant Generator (LVG) program, one of the UMLS lexical tools. ? String Unique Identifiers (SUI) ¨C Each unique concept name or string in each language in the Metathesaurus has a unique and permanent string identifier (SUI). Any variation in character set, upper-lower case, or punctuation difference is a separate string, with a separate SUI. SUI contain the letter S followed by seven numbers. In the example on the right there are four strings with four different SUI. ? Atom Unique Identifiers (AUI) ¨C The basic building blocks or "atoms" from which the Metathesaurus is constructed are the concept names or strings from each of the source vocabularies. Every occurrence of a string in each source vocabulary is assigned a unique atom identifier (AUI).
  • 19. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus ¨C Unique Identifiers > Atom obsolete suppressible
  • 20. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus ¨C Data Files ? The Metathesaurus consists of forty data, metadata, and index files. ? The data files listed below contain information obtained from the source vocabularies. ? The table below illustrates what information populates each data file. Metadata File Name Contents MRCONSO.RRF Names, Synonyms, Terms, Term Types, Code s MRREL.RRF Relationships MRHIER.RRF Hierarchies MRSAT.RRF Attributes MRDEF.RRF Definitions MRMAP.RRF Mappings MRSMAP.RRF Simplified Mappings MRSTY.RRF Semantic Types
  • 21. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Metathesaurus ¨C Data Files > RRF
  • 22. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon The Semantic Network ? The Semantic Network ¨C Semantic types (high level categories) ¨C Semantic relationships (relationships between semantic types) ? The Semantic Network can be used to categorize any medical vocabulary. ? 133 semantic types in the Semantic Network ? Every Metathesaurus concept is assigned at least one semantic type; very few terms are assigned as many as five semantic types.
  • 23. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon The Semantic Network - Type ? Entity ? A broad type for grouping physical and conceptual entities. ? Examples of Entity semantic types are: ? Amphibian ? Gene or Genome ? Carbohydrate ? Event ? A broad type for grouping activities, processes and states. ? Examples of Event semantic types are: ? Social Behavior ? Laboratory Procedure ? Mental Process
  • 24. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Anatomical Abnormality SubstanceOrganism Manufactured Object Anatomical Structure Conceptual entity Entity Physical Object Clinical Drug Fully Formed Anatomical Structure Embryonic Structure Research Device Medical Device FoodChemical Body Substance Rickettsia or Chlamydia VirusPlantFungusBacteriumArchaeonAnimal Biological Active Substance ReptileMammalFishBirdAmphibian Pharmacologic Substance Element, Ion, or Isotope Inorganic Chemical Organic Chemical Hazardous or Poisonous Substance Biological Dental Material Indicator, Reagent, or Diagnostic Aid Cell component Body Part Organ, or Organ Component Congenital Abnormality Acquired Abnormality InvertebrateVertebrate Gene or Genome TissueCell Alga Chemical Viewed Structurally Chemical Viewed Functionally VitaminEnzymeHormone Neuroreactive Substance or Biogenic Amine Human Immunologic Factor Receptor Antibiotic Amino Acid, Nucleoside, or nucleotide Carbohy drate Lipid Nucleic Acid, Nucleoside ,or Nucleotide Organophos phorus Compound SteroidEicosanoid Semantic Network Physical Object
  • 25. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Finding Idea or Concept Physical ObjectConceptual entity Occupation or Discipline Language Intellectual Product Organism Attribute Group Group Attribute Organization Regulation or Law Classification Clinical Attribute Sign or Symptom Laboratory or Test Result Amino Acid Sequence Biomedical Occupation or Discipline Nucleotide Sequence Carbohydrate Sequence Patient or Disabled Group Population Group Professional or Occupational Group Family GroupAge Group Spatial Concept Quantitative Concept Qualitative Concept Temporal Concept Functional Concept Body System Molecular Sequence Geographic Area Body Space or Junction Body Location or Region Carbohydrate Sequence Amino Acid Sequence Nucleotide Sequence Semantic Network Conceptual Object Entity
  • 26. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Event Behavior Phenomenon Or Process Activity Individual Behavior Educational Activity Social Behavior Daily or Recreational Activity Injury or Poisoning Natural Phenomenon of Process Human-caused Phenomenon of Process Machine Activity Occupational Activity Environment al Effect of Human Research Activity Health Care Activity Governmental or Regulatory Activity Biologic Function Molecular Biology Research Technique Therapeutic or Preventive Procedure Laboratory Procedure Diagnostic Procedure Pathologic Function Physiologic Function Cell or Molecular DysFunction Organism Function Organ or Tissue Function Molecular Function Cell Function Experimental Model of Disease Disease or Syndrome Mental or Behavioral Dysfunction Neoplastic Process Mental Process Genetic Function Semantic Network - Event
  • 27. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon The Semantic Network - Relationships ? 54 Semantic Relationships ? The primary link between most semantic types is the ¡®isa¡¯ relationship. ? Animal isa Entity ? Carbohydrate isa Chemical ? Human isa Mammal [ Relation Label ] ? isa ? part_of ? result_of ? co-occurs_with ? evaluation_of ? location_of
  • 28. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon The Semantic Network - Relationships
  • 29. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon SPECIALIST Lexicon ? A lexicon is necessarily a core component of any natural language processing system ? Coverage includes both commonly occurring English words and biomedical vocabulary discovered in the NLM Test Collection and the UMLS Metathesaurus. ? The lexicon entry for each word or term records the syntactic, morphological, and graphemic information. ¨C Syntactic information includes syntactic category(part of speech), and complementation patterns for verbs, adjectives and nouns, as well as positional and modification types for adjectives and adverbs. ¨C Inflectional morphology is indicated for those syntactic categories which inflect, and spelling variation is recorded for each lexical item known to exhibit such variation.
  • 30. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon SPECIALIST NLP Tools
  • 31. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon ???? [1] Wu S.T., Liu.H et al (2012). Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. Journal of the American Medical Informatics Association : JAMIA, 19(e1), e149¨Ce156.
  • 32. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes ? Objective ¨C To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources. ? Data Sources ¨C The data source for the corpus analysis of clinical text was Mayo Clinic clinical notes between 1 January 2001 and 31 December 2010, retrieved from the Mayo¡¯s Enterprise Data Trust (EDT). ¨C 51,945,627EA documents ¨C 296,167 unique terms ¨C 2,319,010,575 case-insensitive exact term match
  • 33. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes ? Figure 1 shows histograms for the number of words in the UMLS and in the subset that is empirically found in Mayo Clinic data. ? Corpus Analysis ¨C Word Statistics
  • 34. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes ? Corpus Analysis - Term Frequency
  • 35. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes ? Corpus Analysis ¨C Source Terminology
  • 36. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes ? Corpus Analysis ¨C syntactic categories
  • 37. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes ? Cross-Institutional analysis ¢Ù Special characters ¢Ú Maximum number of words ¢Û Maximum number of characters ¢Ü Language ¢Ý Source terminology ¢Þ Semantic group ¢ß Empirical occurrence filter ¢à Term frequency ? SNOMED-CT ? Consumer Health Vocabulary ? National Cancer Institute(NCI) Thesaurus ? Medical Subject Headings (MSH) ? Read Codes ? Medical Dictionary for Regulatory Activities Terminology (MedDRA) ? SNOMED International ? MEDCIN ? UMLS Metathesaurus ? National Drug Filed Reference Terminology(NDF-RT) ? The original SNOMED ? Online Mendelian Inheritance in Man (OMIM) ? Logical Observation Identifiers Names and Codes (LOINC) ? Computer Retrieval of Information on Scientific Projects (CRISP) ? Anatomy ? chemicals & drugs ? concepts & ideas ? Disorders ? living beings ? physiology ? procedures
  • 38. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon [1] UMLS term occurrences in clinical notes
  • 39. Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon Reference ? UMLS; http://www.nlm.nih.gov/research/umls ? UMLS Basics Tutorial; http://www.nlm.nih.gov/research/umls/new_users/online_learning/in dex.htm ? UTS; https://uts.nlm.nih.gov/ ? Wu S.T., Liu.H et al (2012). Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. Journal of the American Medical Informatics Association : JAMIA, 19(e1), e149¨Ce156. ? ???, ???, ???. ¡®UMLS Metathesaurus 2004? ??? ?? ??- Rich Release Format(RRF)? ??¡¯