際際滷

際際滷Share a Scribd company logo
When Healthcare Meets
Data Science
Anastasiia Kornilova
http://www.slideshare.net/WebCongress/mars-one-bas-lansdorp
http://www.slideshare.net/WebCongress/mars-one-bas-lansdorp
The Medicine of the Future
http://www.healthbizdecoded.com/2013/05/hies-meeting-the-sustainability-challenge/
http://www.wellcentive.com/blog/the-relationship-between-hie-and-population-health-management/
http://graphics.wsj.com/infectious-diseases-and-vaccines/
NLP approach for medical translation task
束One or two patient died per week in a
certain smallish town because of the lack
of information 鍖ow between the
hospitals emergency room and the
nearby mental health clinic損
[束Doing Data Science損, ONeil ]
60% of US doctors still use
paper medical records
Lets create our own EHR standard
Patient
gender
Code
Male 0
Female 1
Patient
gender
Code
Male 1
Female 0
Patient
gender
Code
Male M
Female F
Unknown U
Lets code gender
Standart A
Standart B
Standart C
x
x
[Image Source]
There 5 important data
standards
ICD - diagnostic, billing, world-wide
CPT - procedures, billing, US-speci鍖c, classi鍖cation
LOINC - lab tests and observations, world-wide
NDC - medication, US-speci鍖c, classi鍖cation
SNOMED - medicine
 and a lot of custom dictionaries
Even within one data standard:
ICD-9
174 malignant neoplasm of female breast
174.1 malignant neoplasm of central portion of female breast
ICD-10
C50 malignant neoplasm of breast
C50.1 malignant neoplasm of central portion of breast
C50.111 malignant neoplasm of central portion of right female
breast
C50.112 malignant neoplasm of central portion of left female
breast
You have to be a doctor to handle them
Problem summary
Source 1
Source 2
Source N
medicine expertise
a lot of (expensive) hours
Knowledge
Standards are changing
Arti鍖cial Intelligence Way
Feed a lot of medical texts to
束medical doctor損
Use NLP power
Make it unsupervised
Key idea:
束Semantically similar words occurs in similar
contents損 Harris, 1954
束You shall know a word by the company it
keeps損, Firth, 1957
束It was the year when Udacity, Coursera and edX, the three
leading MOOC companies, took the education world by storm
and promised a lot損 [Huf鍖ngton Post]
束Many places offer MOOCs, and many more will. But
Coursera, Udacity and edX are the leading
providers.損 [NYTimes]
Distributed Vectors
Representation
Two layer neural network
Input: text corpus
Output: set of vectors
Group the vectors of similar
words together in vector
space (detects similarities
matematically)
Predict a word using content
All
you
need
love
is
Resulting vectors
All
you
need
is
love
[0.2, 0.11, 087, 0.9,  , 0.2]
[0.1, 0,98, 01, 0.26, , 0.82]
[0.7, 0.22, 0.3, 0.1, , 0.45]
[0.5, 0.21, 0,67, 0.82,, 0.49]
[0.6, 034, 0.21, 0.45,, 0.2]
Vectors Relationships
Vectors Relationships
http://nlp.stanford.edu/projects/glove/images/company_ceo.jpg
http://nlp.stanford.edu/projects/glove/images/comparative_superlative.jpg
NLP approach for medical translation task
NLP approach for medical translation task
ICD-9
174 malignant neoplasm of female breast
174.1 malignant neoplasm of central
portion of female breast
ICD-10
C50 malignant neoplasm of breast
C50.1 malignant neoplasm of central portion of
breast
C50.111 malignant neoplasm of central portion
of right female breast
C50.112 malignant neoplasm of central portion
of left female breast
Summary
Links
Ef鍖cient Estimation of Word Representation in
Vector Space (Mikolov)
Distributed representation of words and phrases and
their compositionality (Mikolov)
word2vec Parameter Learning Explaining (Rong)
Questions?

More Related Content

NLP approach for medical translation task