The document describes the Unified Medical Language System (UMLS) which brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems. It discusses the three main UMLS knowledge sources: the Metathesaurus, Semantic Network, and SPECIALIST Lexicon. The Metathesaurus contains over 100 vocabularies and provides concept unique identifiers to link synonymous terms. The Semantic Network categorizes terms into 133 semantic types and 54 relationships. The SPECIALIST Lexicon contains syntactic and morphological information to support natural language processing. An example journal article analyzing UMLS term occurrences in clinical notes is also mentioned.
1 of 39
Downloaded 48 times
More Related Content
121129 umls yes
1. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
U.S. National Library of Medicine
National Institutes of Health
UMLS
(The Unified Medical Language System)
2012.11.29 Reviewed by Eunsil Yoon
2. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Contents
? Introduction
¨C What is the UMLS?
¨C UMLS is Use
¨C www.nlm.nih.gov/research/umls
? The Three UMLS Tools (Knowledge Sources)
¨C Metathesaurus
¨C Semantic network
¨C SPECIALIST Lexicon
? UMLS in JAMIA papers
3. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
What is the UMLS?
? Started in 1986 (NLM; National Library of Medicine)
? NLM is a member of the IHTSDO(owner of SNOMED CT)
4. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
What is the UMLS?
? Unified Medical Language System? (UMLS?)
? A set of files and software that brings together many health and
biomedical vocabularies and standards to enable interoperability
between computer systems.
? You can use the UMLS to enhance or develop applications, such as
electronic health records, classification tools, dictionaries and
language translators.
The UMLS is not an end-user application
12. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Semantic Network Browser
13. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Semantic Network Browser
14. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Three UMLS Tools (Knowledge Sources)
? Metathesaurus
? Semantic Network
? SPECIALIST Lexicon
15. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus
? The Metathesaurus is a large, multi-purpose, and multi-lingual vocabulary database
that contains information about biomedical and health related concepts, their various
names, and the relationships among them.
? Over 100 vocabularies, code sets, and thesauri, or "source vocabularies" are brought
together to create the Metathesaurus.
? organized by meaning and assigned a concept unique identifier (CUI).
? 62% of the Metathesaurus source vocabularies ? English
? Also contains terms from 17 other languages
Atrial fibrillation ICD-9-CM
AF NCI Thesaurus
AFib MedDRA
Atrial fibrillation (disorder) SNOMED Clinical Terms
atrium; fibrillation ICPC2-ICD10 Thesaurus
Ex. ¡°Atrial Fibrillation¡±
16. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Basic organization
? Concepts
¨C Synonymous terms are clustered into a concept
¨C Properties are attached to concepts, e.g.,
? Unique identifier
? Definition
? Relations
¨C Concepts are related to other concepts
¨C Properties are attached to relations, e.g.,
? Type of relationship
? Source
17. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus - subsets
? Users create a useful subset, or smaller grouping of concepts, by
choosing source vocabularies
? Examples of subsets include
¨C Source vocabularies in a language (all Spanish vocabularies)
¨C All terms that are free for use within the United States
¨C CPT codes to be used for billing purposes
¨C Terms with the semantic type 'Clinical Drug'
18. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Unique Identifiers
? Concept Unique Identifiers (CUI)
¨C A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus
construction is to understand the intended meaning of each name in each source vocabulary and
to link all the names from all of the source vocabularies that mean the same thing (the synonyms).
? Lexical (term) Unique Identifiers (LUI)
¨C LUI link strings that are lexical variants. Lexical variants are detected using the Lexical Variant
Generator (LVG) program, one of the UMLS lexical tools.
? String Unique Identifiers (SUI)
¨C Each unique concept name or string in each language in the Metathesaurus has a unique and
permanent string identifier (SUI). Any variation in character set, upper-lower case, or punctuation
difference is a separate string, with a separate SUI. SUI contain the letter S followed by seven
numbers. In the example on the right there are four strings with four different SUI.
? Atom Unique Identifiers (AUI)
¨C The basic building blocks or "atoms" from which the Metathesaurus is constructed are the concept
names or strings from each of the source vocabularies. Every occurrence of a string in each
source vocabulary is assigned a unique atom identifier (AUI).
19. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Unique Identifiers > Atom
obsolete
suppressible
20. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus ¨C Data Files
? The Metathesaurus consists of forty data, metadata, and index files.
? The data files listed below contain information obtained from the
source vocabularies.
? The table below illustrates what information populates each data file.
Metadata File Name Contents
MRCONSO.RRF
Names, Synonyms, Terms, Term Types, Code
s
MRREL.RRF Relationships
MRHIER.RRF Hierarchies
MRSAT.RRF Attributes
MRDEF.RRF Definitions
MRMAP.RRF Mappings
MRSMAP.RRF Simplified Mappings
MRSTY.RRF Semantic Types
22. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network
? The Semantic Network
¨C Semantic types (high level categories)
¨C Semantic relationships (relationships between semantic types)
? The Semantic Network can be used to categorize any medical vocabulary.
? 133 semantic types in the Semantic Network
? Every Metathesaurus concept is assigned at least one semantic type; very
few terms are assigned as many as five semantic types.
23. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Type
? Entity
? A broad type for grouping physical and
conceptual entities.
? Examples of Entity semantic types are:
? Amphibian
? Gene or Genome
? Carbohydrate
? Event
? A broad type for grouping activities, processes
and states.
? Examples of Event semantic types are:
? Social Behavior
? Laboratory Procedure
? Mental Process
24. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Anatomical
Abnormality
SubstanceOrganism
Manufactured
Object
Anatomical
Structure
Conceptual entity
Entity
Physical Object
Clinical
Drug
Fully Formed
Anatomical
Structure
Embryonic
Structure
Research
Device
Medical
Device
FoodChemical
Body
Substance
Rickettsia or
Chlamydia
VirusPlantFungusBacteriumArchaeonAnimal
Biological
Active
Substance
ReptileMammalFishBirdAmphibian
Pharmacologic
Substance
Element,
Ion, or
Isotope
Inorganic
Chemical
Organic
Chemical
Hazardous or
Poisonous
Substance
Biological
Dental
Material
Indicator,
Reagent, or
Diagnostic Aid
Cell
component
Body Part Organ, or
Organ Component
Congenital
Abnormality
Acquired
Abnormality
InvertebrateVertebrate
Gene or
Genome
TissueCell Alga
Chemical
Viewed
Structurally
Chemical
Viewed
Functionally
VitaminEnzymeHormone
Neuroreactive
Substance or
Biogenic Amine
Human
Immunologic
Factor
Receptor Antibiotic
Amino Acid,
Nucleoside,
or nucleotide
Carbohy
drate
Lipid
Nucleic Acid,
Nucleoside
,or Nucleotide
Organophos
phorus
Compound
SteroidEicosanoid
Semantic Network Physical Object
25. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Finding
Idea or
Concept
Physical ObjectConceptual entity
Occupation or
Discipline
Language
Intellectual
Product
Organism
Attribute
Group
Group
Attribute
Organization
Regulation
or Law
Classification
Clinical
Attribute
Sign or
Symptom
Laboratory or
Test Result
Amino Acid
Sequence
Biomedical
Occupation or
Discipline
Nucleotide
Sequence
Carbohydrate
Sequence
Patient or
Disabled
Group
Population
Group
Professional or
Occupational
Group
Family GroupAge Group
Spatial
Concept
Quantitative
Concept
Qualitative
Concept
Temporal
Concept
Functional
Concept
Body System
Molecular
Sequence
Geographic
Area
Body Space or
Junction
Body Location
or Region
Carbohydrate
Sequence
Amino Acid
Sequence
Nucleotide
Sequence
Semantic Network Conceptual Object
Entity
26. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Event
Behavior
Phenomenon
Or Process
Activity
Individual
Behavior
Educational
Activity
Social
Behavior
Daily or
Recreational
Activity
Injury or
Poisoning
Natural
Phenomenon
of Process
Human-caused
Phenomenon of
Process
Machine
Activity
Occupational
Activity
Environment
al Effect of
Human
Research
Activity
Health Care
Activity
Governmental
or Regulatory
Activity
Biologic
Function
Molecular
Biology
Research
Technique
Therapeutic or
Preventive
Procedure
Laboratory
Procedure
Diagnostic
Procedure
Pathologic
Function
Physiologic
Function
Cell or
Molecular
DysFunction
Organism
Function
Organ or
Tissue
Function
Molecular
Function
Cell
Function
Experimental
Model of
Disease
Disease
or
Syndrome
Mental or
Behavioral
Dysfunction
Neoplastic
Process
Mental
Process
Genetic
Function
Semantic Network - Event
27. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Relationships
? 54 Semantic Relationships
? The primary link between most semantic types is
the ¡®isa¡¯ relationship.
? Animal isa Entity
? Carbohydrate isa Chemical
? Human isa Mammal
[ Relation Label ]
? isa
? part_of
? result_of
? co-occurs_with
? evaluation_of
? location_of
29. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
SPECIALIST Lexicon
? A lexicon is necessarily a core component of any natural language processing system
? Coverage includes both commonly occurring English words and biomedical
vocabulary discovered in the NLM Test Collection and the UMLS Metathesaurus.
? The lexicon entry for each word or term records the syntactic, morphological, and
graphemic information.
¨C Syntactic information includes syntactic category(part of speech), and complementation patterns for verbs,
adjectives and nouns, as well as positional and modification types for adjectives and adverbs.
¨C Inflectional morphology is indicated for those syntactic categories which inflect, and spelling variation is
recorded for each lexical item known to exhibit such variation.
31. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
????
[1] Wu S.T., Liu.H et al (2012). Unified Medical Language
System term occurrences in clinical notes: a large-scale
corpus analysis. Journal of the American Medical
Informatics Association : JAMIA, 19(e1), e149¨Ce156.
32. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Objective
¨C To characterise empirical instances of Unified Medical Language System (UMLS)
Metathesaurus term strings in a large clinical corpus, and to illustrate what types
of term characteristics are generalisable across data sources.
? Data Sources
¨C The data source for the corpus analysis of clinical text was Mayo Clinic clinical
notes between 1 January 2001 and 31 December 2010, retrieved from the
Mayo¡¯s Enterprise Data Trust (EDT).
¨C 51,945,627EA documents
¨C 296,167 unique terms
¨C 2,319,010,575 case-insensitive exact term match
33. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Figure 1 shows histograms for the number of words in the UMLS and in the subset that
is empirically found in Mayo Clinic data.
? Corpus Analysis ¨C Word Statistics
34. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Corpus Analysis - Term Frequency
35. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Corpus Analysis ¨C Source Terminology
36. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Corpus Analysis ¨C syntactic categories
37. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
? Cross-Institutional analysis
¢Ù Special characters
¢Ú Maximum number of words
¢Û Maximum number of characters
¢Ü Language
¢Ý Source terminology
¢Þ Semantic group
¢ß Empirical occurrence filter
¢à Term frequency
? SNOMED-CT
? Consumer Health Vocabulary
? National Cancer Institute(NCI) Thesaurus
? Medical Subject Headings (MSH)
? Read Codes
? Medical Dictionary for Regulatory Activities Terminology (MedDRA)
? SNOMED International
? MEDCIN
? UMLS Metathesaurus
? National Drug Filed Reference Terminology(NDF-RT)
? The original SNOMED
? Online Mendelian Inheritance in Man (OMIM)
? Logical Observation Identifiers Names and Codes (LOINC)
? Computer Retrieval of Information on Scientific Projects (CRISP)
? Anatomy
? chemicals & drugs
? concepts & ideas
? Disorders
? living beings
? physiology
? procedures
38. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
39. Medical Informatics Laboratory
Department of Biomedical engineering
College of Medicine , Seoul National Univ.
Eunsil Yoon
Reference
? UMLS; http://www.nlm.nih.gov/research/umls
? UMLS Basics Tutorial;
http://www.nlm.nih.gov/research/umls/new_users/online_learning/in
dex.htm
? UTS; https://uts.nlm.nih.gov/
? Wu S.T., Liu.H et al (2012). Unified Medical Language System term
occurrences in clinical notes: a large-scale corpus analysis. Journal
of the American Medical Informatics Association : JAMIA, 19(e1),
e149¨Ce156.
? ???, ???, ???. ¡®UMLS Metathesaurus 2004? ??? ??
??- Rich Release Format(RRF)? ??¡¯