3. 蠍郁(Machine learning) 襦語
Raw data
Data
preprocessing
Prepared
data
Apply
algorithms
Candidate
model
Chosen
model
Application
Iterate until data is ready Iterate for best model
蠍郁 襦語れ 覈 覈呉 蟲 蟆
覈語 蠍一ヾ 給 一危磯ゼ 覦朱, 襦 一危一 覲企ゼ 豢豢
Application 伎 覲企ゼ 詞
一危 螻狩 蠍郁 襦語れ 覦襯 覈語 蟲 襦
一危 豌襴, 螻襴讀 , 覈 螻 螳 螻殊 蟆一
3
4. 蠍郁(Machine learning) 襦語
Raw data
Data
preprocessing
Prepared
data
Apply
algorithms
Candidate
model
Chosen
model
Application
Iterate until data is ready Iterate for best model
蠍郁 襦語る 讌覓語
蠍郁旧 伎 企 蟆 り 螳?
襯 る, 豬レ 覦 襯 豢豌 螻襴讀
4
5. 蠍郁(Machine learning) 襦語
Raw data
Data
preprocessing
Prepared
data
Apply
algorithms
Candidate
model
Chosen
model
Application
Iterate until data is ready Iterate for best model
れ 螻 譴觜 一危(Prepared data) 豢伎
譯殊伎 raw 一危磯ゼ 蠏碁襦 蠍磯慨る 襦 覲伎 覿 螻殊
, 螳 蠍郁旧 牛 企螻 覈 螻襴讀
螳 襷襦 一危磯ゼ 覲 螻殊 覩誤
5
6. 蠍郁(Machine learning) 襦語
Raw data
Data
preprocessing
Prepared
data
Apply ML
algorithms
Candidate
model
Chosen
model
Application
Iterate until data is ready Iterate for best model
譴觜 一危磯ゼ 詞 , 磯Μ 覈 煙 蠍郁 螻襴讀
覈語 燕 蟾讌 覦覲旧朱 螻襴讀 覦 ろ 伎狩
6
8. 企至 伎 螳?
Data
preprocessing
Prepared
data
Apply ML
algorithms
Candidate
model
Chosen
model
Application
Iterate for best model
Missing
data
Complete
data
Imputation
Missing data襯 complete data襦 豢 襦
imputation(豌) 覈語 伎伎
8
9. Imputation (豌)
Imputation企, 暑 一危磯ゼ 豌 螳朱 豌危 襦語
Missing data 轟 磯 imputation 覈語 伎
Missing
data
Complete
data
Imputation
Listwise deletion
Single imputation
- Hot-deck
- Cold-deck
- Mean substation
- Interpolation
Multiple imputation
Model based approach
.
Missing data
轟 覿
imputation
覈
9
10. Imputation (豌)
Missing
data
Complete
data
Imputation
Listwise deletion
Single imputation
- Hot-deck
- Cold-deck
- Mean substation
- Interpolation
Multiple imputation
Model based approach
.
Imputation企, 暑 一危磯ゼ 豌 螳朱 豌危 襦語
Missing data 轟 磯 imputation 覈語 伎
imputation 覈語 蠍 伎, missing data 轟
Missing data
轟 覿
imputation
覈
10
12. Missing data?
Missing data 蟯谿磯 覲 一危 螳 ル讌 蟆曙 覩誤1
暑 一危磯 企 伎襦 蠍磯讌 螻, 一危 誤語 一危磯ゼ 覩誤
12
1. Graham, John W. "Missing data analysis: Making it work in the real world." Annual review of psychology 60 (2009): 549-576.
13. 覿れ 讀螳襦 誤 クル 覿 蟆郁骸螳 豢1
糾 覿 蟆郁骸襯 襤壱
糾 煙 螻°
Missing data襦 誤 ?
Iris data2
(a) 一危 曙 蟆曙
Average Petal length: 3.113
(b) Petal length 覓伎襦 33%螳 暑 蟆曙
Average Petal length: 3.735
(c) Petal length 螳 33%螳 暑 蟆曙
Average Petal length: 4.906
Missing 譟郁唄 磯 一危 轟煙 覲
13
1. Stuart, Elizabeth A., et al. "Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative." American journal of epidemiology 169.9 (2009)
2. Fernstad, Sara Johansson. "To identify what is not there: A definition of missingness patterns and evaluation of missing value visualization." Information Visualization (2018)
16. Missing type
Missing type 磯 missing value襯 豌襴 覦覯 る
Missing type 蟆 3螳讌襦 蟲覿 1
MCAR (Missing Completely at random): 覓伎襦
MAR (Missing at random) 覓伎
NMAR (Not missing at random): 覓伎襦 暑讌
16
1. Little, Roderick JA, and Donald B. Rubin. Statistical analysis with missing data. Vol. 333. John Wiley & Sons, 2014.
26. Regression imputation R code
26
R-code
Source: Templ, Matthias, and Peter Filzmoser. "Visualization of missing values using the R-package VIM." Reserach report cs-2008-1, Department of Statistics and
Probability Therory, Vienna University of Technology (2008).
27. K-NN imputation
K-NN (K-豕蠏殊 伎)螻襴讀 伎 imputation
K=6企 れ覃,
Missing value襯 譴朱 , 6螳 一危郁 覯 れ伎 蟾讌 ロ
6螳 一危郁 覃, 螳 襷 一危 企る missing value 企るゼ 豌危
X Y Class
35 62 a
57 11 a
98 46 b
52 24 a
33 19 a
40 70 missing
28 56 a
21 89 a
94 17 b
10 37 a
73 88 b
97 77 b
37 37 a
95 72
36 9 a
25 93 a
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90
y
x
?
Missing value
K=6
27
28. K-NN imputation R code
28
R-code
Source: Templ, Matthias, and Peter Filzmoser. "Visualization of missing values using the R-package VIM." Reserach report cs-2008-1, Department of Statistics and
Probability Therory, Vienna University of Technology (2008).
31. Multiple imputation R code
31
R-code
Source: Templ, Matthias, and Peter Filzmoser. "Visualization of missing values using the R-package VIM." Reserach report cs-2008-1, Department of Statistics and
Probability Therory, Vienna University of Technology (2008).
32. Multiple imputation
1. Single imputation 覦覯 n 覯 覦覲牛 n 螳 一危
2. n 螳 一危一 豢 missing value 螳螻 覿 螻
3. Rubins rule 伎 n螳 一危一 missing value 螳螻 覿一 螻壱
揃揃揃
Incomplete data Complete data
暑 螳
豢 螳
覲 豢
Rubins rule
: 一危一 覲襦 蟲 豢豺
: 豢豺 譴れ姶
W: 豌 覿(within-imputation variance)
B: 豌 螳 覿(Between-imputation variance)
32
34. 螳 譟一襯 讌 朱, 螳 煙 暑 螳 伎
.
螳
MCAR,
MAR
NMAR
Missing
data
Single
imputation
Multiple
imputation
糾 覦覯朱 一危磯ゼ
豌危 覈語
Explicit
modeling
Implicit
modeling
Mean, Regression,
Stochastic regression
Hot deck, Cold deck,
Substitution, Deletion
Single imputationMissing pattern
譟一
34
35. Missing pattern 覿 tool/package
Tool
Tableau: Interactive data exploration software
R Package
VIM: Visualization and imputation of missing values
Amelia2: Bootstrap EM imputation
35
37. VIM(Visualization and imputation of missing values) package
暑 螳 螳 螻, imputation 覈語 R package
譯殊 蠍磯
Visualization
Marginplot
Matrixplot
Histogram
Imputation model
kNN
Hotdeck
Regression
37
38. VIM Package
Aggregations for missing/imputed values
Calculate or plot the amount of missing/imputed values in each variable and the amount of
missing/imputed values in certain combinations of variables.
Variables Variables
NonD, Dream, Span missing 覦 觜 1.6%
Missing data
Observed data
38
39. VIM Package
Margin plot: Scatterplot with additional information in the margins
Missing data
Observed data
39
40. VIM Package
Matrix plot
In a matrix plot, all cells of a data matrix are
visualized by rectangles.
Available data is coded according to a
continuous color scheme.
Missing values can easily be distinguished by
using a color such as red/orange.
40
41. Visualization technique of missing data
Song, Hayeong, and Danielle Albers Szafir. "Where's My Data? Evaluating Visualizations with Missing Data." IEEE transactions on visualization and computer graphics (2018).
41