7. データ分析プロセスのフレームワーク
KDD
(Knowledge Discovery in
Database)
CRISP-DM
(Cross Industry Standard
Process for Data Mining)
SEMMA
(Sample, Explore, Modify,
Model and Assess)
Business Understanding
(ビジネス課題の理解)
Selection
(データ取得)
Sample
(データ取得)
Explore
(データ理解)
Transformation
(データ加工)
Data Mining
(データマイニング)
Modeling
(モデリング)
Model
(モデリング)
Interpretation/Evaluation
(解釈と評価)
Evaluation
(評価)
Deployment
(施策の実行)
Assess
(評価と施策の実行)
Modify
(データ加工)
Preprocessing
(前処理)
Data Preparation
(データ準備)
Data Understanding
(データ取得と理解)
(Fayyad, 1996) (SAS Enterprise Miner, 2008)(Chapman, P. et al, 2000) 7
13. 欠損値の対処3ステップ
?ステップ1. 欠損値の特定?可視化 。?(Identify the missing data.)
?ステップ2. 欠損メカニズムの理解 。?(Examine the causes of the
missing data.)
?ステップ3. タイプに応じた削除若しくは置換。 (Delete the cases
containing missing data or replace (impute) the missing values with reasonable
alternative data values.)
13
53. References.
? RESAS -地域経済分析システム- Accessed from https://resas.go.jp/ on Nov. 2015
? Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P., (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of
the ACM, 39(11), 27-34.
? Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P., (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.
? SAS Enterprise Miner ? SEMMA. SAS Institute, (2008). Accessed from http://www.sas.com/technologies/analytics/datamining/miner/semma.html, on
May 2008
? Chapman, P. et al, (2000). CRISP-DM 1.0 - Step-by-step data mining guide. Accessed from http://www.crisp-dm.org/CRISPWP-0800.pdf on Nov. 2015
? Likit. P, Data preprocessing, (2015). Accessd from http://www.slideshare.net/LikitPreeyanon/data-preprocessing-43972402 on May 2008
? Luengo, J., (2011), Missing Values in Data Mining [Online] Accessed from http://sci2s.ugr.es/MVDM/index.php on Nov. 2015
? 福島辰太郎, 2015, データ分析プロセス シリーズUseful R2, 共立出版
? Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592.
? Robert K, R IN ACTION: Data analysis and graphics with R, 2015
? Daisuke, I. ,Maeshori missing, (2015). Accessed from http://www.slideshare.net/dichika/maeshori-missing on Nov. 2015
? Kriegel, H. P., & Zimek, A. (2008, August). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD
international conference on Knowledge discovery and data mining (pp. 444-452). ACM.
? Aggarwal, C. C., & Yu, P. S. (2000). Finding generalized projected clusters in high dimensional spaces (Vol. 29, No. 2, pp. 70-81). ACM.
? Kriegel, H. P., Kr?ger, P., & Zimek, A. (2010). Outlier detection techniques. In Tutorial at the 16th ACM International Conference on Knowledge
Discovery and Data Mining (SIGKDD), Washington, DC.
? Keller, F., Müller, E., & B?hm, K. (2012, April). HiCS: high contrast subspaces for density-based outlier ranking. In Data Engineering (ICDE), 2012 IEEE
28th International Conference on (pp. 1037-1048). IEEE.
? Kriegel, H., Kroger, P., Schubert, E., & Zimek, A. (2012, December). Outlier detection in arbitrarily oriented subspaces. In Data Mining (ICDM), 2012
IEEE 12th International Conference on (pp. 379-388). IEEE.
? Lazarevic, A., & Kumar, V. (2005, August). Feature bagging for outlier detection. In Proceedings of the eleventh ACM SIGKDD international conference
on Knowledge discovery in data mining (pp. 157-166). ACM.
? 高橋将宜, 伊藤孝之, 様々な多重代入法アルゴリズムの比較~大規模経済系データを用いた分析, 統計研究彙報 第 71 号 2014 年 3 月 (39~82)
? Kandel, S., Heer, J., Plaisant, C., Kennedy, J., van Ham, F., Riche, N. H., ... & Buono, P. (2011). Research directions in data wrangling: Visualizations and
transformations for usable and credible data. Information Visualization, 10(4), 271-288.
? dplyr と tidyrを使ったデータラングリングチートシート,(2015), Accessed from https://www.rstudio.com/wp-content/uploads/2015/09/data-wrangling-
japanese.pdf on Nov. 2015
? Ozaki, T., (2013), 最新業界事情から見るデータサイエンティストの「実像」, Accessed from http://www.slideshare.net/takashijozaki1/21-21583073?related=2 on
Nov. 2015
? Kitajima, S, (2015) データサイエンティスト必見!M-1グランプリ Accessed from http://www.slideshare.net/SatoshiKitajima2/m1-38513054 on Nov. 2015
53