�ݺ�ߣ

• The difference between POC Pilot and Production
• What makes healthcare POC special ?
• Things to watch-out for
• The most important steps
• The simplified data science process
• Conceptual architecture - How to deploy & comsume a model
AGENDA

POC
Stakeholder
Output
• Model +
prediction
• dashboard
Pilot
End user
Output
• User oriented
MVP
• Match azure
components vs
user flow
Production
Integration
Output
• Gains : value-
adds when
optimize
• Explore vs
exploit

WHAT MAKES
HEALTHCARE POC
SPECIAL ?

Patient data
anonymization
Hard to
control data
quality
Hard to add IOT data sources
( can’t join with individual
patient data)
Hard to change existing way-of-
working
Hard to find a good level of data
science application which can be
directly utilized to show impact

THINGS TO
WATCHOUT FOR
Generic key elements in data science which you want to get it right from the start !

Start
Define clearly –
The Objective  goals
End
Map carefully –
Goals  deliverables
Engage frequently –
The data science process & sub-activities

The importance of domain
expertise
If a patient has blood pressure measurements from every
hour, then how should the model use those
measurements? (Take the average? Take the daily
average? Use some weight function?)
The prediction has most value if it’s done in the beginning
of the hospital visit (e.g. when moving the patient from
the Emergency Clinic to the Surgery Ward, but obviously
more data is available later in the stay (close to the
discharge time), so where do we draw the line?

THE MOST
IMPORTANT STEP
IT’S ALL ABOUT THE OBJECTIVE

(1)
Define the
objective Break down
objectives into
goals
goal1
goal2
goal3
goal4
...
goal N
(2)
Data Science activities
(4) (3)
Deliverables
deliverable1
deliverable2
deliverable3
deliverable14
...
deliverable N

EXAMPLE OF
MAPPIGN (1)+(2) 
(3)

Does your model
make $ for the
company ? Innovation
Officer
Let’s run a
POC to
understand
how data
science
work
Goal1 : set recall as
the main
measurement of
model performance
Goal2 : Document
all activites , make
sure to incl ...
a. why the model
works
b. what does it look
like in production
c. how to scale &
integrate with IT
Goal3 :Make sure the
deliverables incl. a
killer-looking
dashboard/app so i
can easily show/tell
others
(1)+(2) (3)
Model Performance
Mapping Goal 1 to deliverable 1
(4)
Project leader
advocate

The Microsoft Data Science Process

SIMPLIFICATION OF DATA SCIENCE PROCESS
Group 1 - Business
Understanding
+
Data understanding
Group 2 -Data explore
+
Feature Engineering
Group 3 - Model
selection
+
Performance Evaluation
Group 4 – Model
Deployment
+
Application
development

GROUP 1 : INVESTIGATE WHETHER
THE DATA SUPPORTS THE OBJECTIVE
Group 1 - Business
Understanding
+
Data understanding
+
Feature Engineering
Group 3 - Model selection
+
Group 4 – Model
Deployment
+
Application development
(1) Break down objective to goals and iterate through to form data scope and
deliverables
(2) Evaluate whether the goals can be achieved via available data
(3) Establish the data pipeline (on-prem + cloud )

On prem data sources merging
anonymization
Model
training/selectio
n/deployment
On prem
Trained model
Wrap into
Applications
Upload to the cloud
push to
download
Applications
Wrap into
CONSIDER DATA PIPELINE : ON-PREM & IN THE CLOUD
datastor
e
explore and feature selection
Trained model
Save to

Patient demographic
info table
Hospital visits table
Diagonosis code
table
Lab result table
Hospital departmens
info tableNurse schedule table
join
join
join
join
join
join
join
Flat table
EXAMPLE OF ON-PREM DATA MERGING  FLAT
TABLE

THIS IS THE FLAT TABLE FOR FURTHER USE
Pseudo ID features
Labe
l

CONSIDER TIME . . .
Time as part
of the keys features
Labe
l
features
Labe
l
Time as part of the
feature

WHY DO IT IN
THE CLOUD
AT ALL?

GROUP 2 : EXPLORE DATA AND FEATURE
ENGINEERING
Group 1 - Business
Understanding
+
Data understanding
+
Feature Engineering
+
Group 4 – Model
Deployment
+
(1) archive all the statistical plots + scripts used + intermediate output data for
reproducibility and documentation
(2) so what about time aspect ?
(3) include domain expertise in all activities ( for transparency)

COMMON WAYS TO FIND OUTLIERS
Find outliers by plots Find outliers by statistic
methods

COMMON WAYS TO DEAL WITH MISSING DATA
Impute by mean /median / constant
Impute by K’s Nearest Neighbors
Library(Hmisc)
impute(dataset$column, mean) # replace with mean
impute(dataset$column, median) # replace with median
Library(DMwR)
knnOutput <- knnImputation(dataset[,!names(dataset) %in%
"column"])
# replaced by K’s nearest neighbors

DATA EXPLORATION  FEATURE SELECTION
Statistical plots Find features that might have
prediction power
rcount(Recency count )
Lengthofstay(days
)

DIMENSION
REDUCTION
USE WITH CARE , ESPECIALLY
IN HEALTH CARE DOMAIN
 RISK OF LOSING
INTERPRETABILITY !

Cluster features by correlation matrix

FEATURE SELECTION – MANUALLY SELECT
use correlation
Forward/backward selection

FEATURE SELECTION AUTOMATICALLY

FEATURE SELECTION – AUTOMATIC
( EXAMPLE : GENETIC ALGORITHM )
cross over
mutation

FEATURE SELECTION – AUTOMATIC
( EXAMPLE : GENETIC ALGORITHM CONTINUED... )
fitness
select
crossovermutation
Generation is
generated each
time with this cycle
of select, crossover
and mutation

run the genetic algorithm
What features were
selected by the genetic
algorithm
the result

GROUP 3 : MODEL DEVELOPMENT &
PERFORMANCE EVALUATION
Group 1 - Business
Understanding
+
Data understanding
+
Feature Engineering
+
Group 4 – Model
Deployment
+
(1) what types of models are available to use and
considerations
(2) beware of model explainability vs. model performance
(3) can it be scaled up and out ? (consider production)

TYPE OF MODELS
 SELECT THE ONES THAT
FITS YOUR CRITERIA

Source url :
https://www.datasciencecentral.com/profiles/blog/show?id=6448529%3ABlogPost%3A598753&commentId=6448529%3AComment%3A708763&xg_source=activity
Understand what types of models can be used for the specific task

source url : https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-
3fb6f2367464

ALSO ---
WHAT’S NORMALLY
USED IN HEALTHCARE

Source in end note
Mainly due to
interpretability

MODEL TRAINING :
LOCAL DEVELOP  REMOTE COMPUTE

When to use what Azure ML services to train & develop the model detail is in
another ppt
Local develop the model and train on remote compute target

Processed dataset
Validation setTraining set Test set
Split into
Train
ML models
Check
Select one winning
model
Models that pass
the testing set
Winning
model
Document
+
dashboard
next step
local develop ( experiment)  remote compute ( for

MODEL
TRAINING
UTILIZING THE CLOUD  FAST
DISTRIBUTED TRAINING

Model performance
Costs Gain
Other consideration

EXAMPLE OF MODEL EVALUATION
criteria SGD trees RF GBM weights
Regressor
Performance 86.30% 92.20% 91.40% 96.60% 50%
interpretability 1 0.9 0.8 0.9 20%
Time to compute 1 0.9 0.2 0.2 10%
# of parameters 1 1 0.7 1 10%
Ranking 83% 56% 56% 58% 100%

GROUP 4 : APPLY THE MODEL
Group 1 - Business
Understanding
+
Data understanding
+
Feature Engineering
+
Group 4 – Model
Deployment
+
(1) in POC , there will only build a prototype
(2) in Pilot , the entire data pipeline + architecture is important to test out
(3) in production , optimization cost vs performance & monitoring model lifecycle for
retire/retrain model become important
(4) the application development usually hand-over to IT dept

DATA SCIENCE MACHINE LEARNING
MODEL DEPLOYMENT (ONE-OFF)
Usually used in POC to show the potential , one off deployment as a web API
service

how to build a Length of Stay model for a ProofOfConcept project

DATA SCIENCE MACHINE LEARNING
PIPELINE
model train-deploy-manage architecture framework consolidate all in one pipeline

ONE-STOP
PIPELINE
TRAIN+DEPLOY
+MANAGE

�ݺ�ߣ

how to build a Length of Stay model for a ProofOfConcept project

More Related Content

how to build a Length of Stay model for a ProofOfConcept project

Editor's Notes