際際滷

際際滷Share a Scribd company logo
????? ??? ??
?????
2016. 12. 06 (?)
????? ????????
? ? ? ???
bhkim@bi.snu.ac.kr
Contents
? ????? ???? ??
? ?? ??? ??
? Part 1: ???? ?? ?? ???
? Part 2: ???? ??: Weka? ?? ?? ??
? Part 3: ??? ????: Neural Network Playground
? ??
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 2
????? ???? ??
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 3
Intro
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 4
際際滷 by Jiqiong Qiu at DevFest 2016
http://www.slideshare.net/SfeirGroup/first-step-deep-learning-by-jiqiong-qiu-devfest-2016
????
????
(????)
???
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 5
Image source: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
????(AI)
? ^???? ???? ???? ???? ?? ̄(???, SW)? ??
? ??? ???? ? ?? ?? ??? ? ? ??? ?? ??
? ??? ??? ?? ?? ??? ? ? ??? ?? ??
? 1950: Turing¨s Paper, 1956: ^Artificial Intelligence (AI) ̄
6(KIPS 2016 Conference Keynote by Byoung-Tak Zhang)
???? ??? ??: ??
? AI ??? ??? `???` ???/??? ??? ?
? ???? ??? ????(intelligent), ??? ?? ???
???(adaptive), ???(robust)?? ??
? ??? ??? ????? ???? ?? ???
? ??? ????? ??? ??? ??? ?? ????,
??? ????? ????? ??!
? ??? AI ?? ?? ? `??(learning)` ??
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 7
cat car
(Sam Roweis, MLSS¨05 Lecture Note)
??? ??? ??? ???? ?? ????
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 8
??(world)? ???? ??? ??? ??
???? ?? ? ?? ??
?? ??
?? ??
??? ??
Computer Vision
???? ??/??
Natural Language
Processing / Understanding
?? ??
Speech Recognition
+ ???? ??? ?? ?? ??? ????
Intelligent Agent
??? ????
Intelligent Agent
?? ??? ??
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 9
??? ??? ??? ???? ?? ????
???? ???? ??? ???
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 10
??(world)? ???? ??? ??? ??
+ ???? ??? ?? ??
(??) ??
(recognition)
(??) ??
(recognition)
(??) ??
(generation)
?
?? ???? ?? ???? ?? ??
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 11
(??) ??
(recognition)
(??) ??
(recognition)
(??) ??
(generation)
?
?? ?? ??? ????? ??
?? ??? ??? ?? ??? ???? ??
??? ??? ???? ??? ??? ??
??? ??(data analysis)? ??
? ?? ??(define the question)
? ??? ??(dataset)
? ???? ???? ??(define the ideal data set)
? ??? ??? ??(determine what data you can access)
? ??? ??(obtain the data)
? ??? ??(clean the data)
? ??? ??? ??(exploratory data analysis)
? ???/??? ???(Clustering / Data visualization)
? (???) ??/???((statistical) prediction/modeling)
? ??/??(Classification / Prediction)
? ?? ??(interpret results)
? ??? ?? ?? ? ??(evaluation), ?? ?? ?? ??(model selection)
? ?? ?? ? ??? ?? ?? ?? ? ??(challenge results)
? ?? ?? ? ??? ??(synthesize/write up results)
? ?? ?? ??? ???? ??(create reproducible code)
(c) 2008-2015, B.-H. Kim 12
J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
?? ?? ?? ?? ??
Data mining
Data Science
Analytics
´
?????
????
?? ???
???? ??
???? & ???? ??? ?? ??
Problem &
Dataset
Tool
Process &
Algorithm
(Programming)
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 13
???? & ???? ??? ?? ?? ???
1. ?? ??
1. ?? ??? ?? ??? ??? ?? ????
2. ?? ?? ?? ? ??? ??? ?? ??? ???
3. ?? ???? ??? ??? ?? ??? ??? ??? ??
2. ??? ?? ??? ???? (??? ??)
1. Weka? ???? ?? (NO ?????!)
2. NN Playground? ??? ?? (NO ?????!)
3. ?? ??/???? ????
1. `??? ?? ????/??? ??¨: http://hunkim.github.io/ml/
2. ??????????(http://nacsi.kr/) ???? ????
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 14
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 15
?? ?????.
????? ?? ?? ???
Part 1
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 16
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 17
Machine Learning
???? (Machine Learning, ML)
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 18
Q. If the season is dry and the pavement is slippery, did it rain?
A. Unlikely, it is more likely that the sprinkler was ON
??? ????
??? ?????
??? ??? ??? ????
????, ?? ?? ???? ??
?? ??
?? ??
??, ????
?? ?? ??
?? ?? ??
?? ?? ?? ??
????? ? ?? ?? ?? ??
? Supervised Learning (????, ????, ????)
? ???(?? ???? ??) ?? ??? ????
? ??? ?? ?? ?? ???? ??? ????
? ?) ??(classification): ???? ????(discrete) ??.
? ?) ??(regression): ???? ??? ?? ??? ??
? Unsupervised Learning (?????, ?????, ?????)
? ???? ??? ??, ??, ??? ??? ?? ??. ???? ????
??
? ?? ???? ??? ?? ?? ????? ??? D={(x)}
? ?: ?? ??(dimension reduction), ???(clustering)
? Reinforcement Learning (????)
? ???? ??? ???(right/wrong)? ?? ???? ?? ??
? ????? ?????, ??(environment) ??? ??(rewards)? ???
?? ??? ??(action)? ????? ???? ??
? ??? ??, ????? ??, ?? ?? ?? ? ??, ?? ??? ???
??
? Action selection, planning, policy learning
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 19
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 20
Img source: http://nkonst.com/machine-learning-explained-simple-words/
????(connectionism)
? ????, ???? ??? ???? ?? ?? ??(neural
information processing)? ?? ??? ?? ? ??
? ??? ??? ??? ?(network)?? ?? ??? ??? ???
??
? ?????(artificial neural networks)
? ????(??)? ??? ??? ?? ???? ?? ??? ??
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 21
?? ??
??? ??(feedforward)
???(Deep Learning)
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 22
(??: Y. Bengio, Deep Learning Summer School 2015)
? ??: ??? ??, ?? ??: ???? ???? ??
GoogLeNet
DeepFace
Convolutional NN
Stacked RBM
Neural
Machine Translator
? ?? ??? ?? ? ?? ??/??, ?? ??, ???? ??? ??,
?? ?? ? ????? ???? ???? ?? ??? ??? ??.
? (?? ??)?????? ?? ??? ??? ????
? ?? ??? ?? ??? ???? ????
?? ???? ????? ??
Deep Q Network
???? ???:
WEKA? ?? ?? ??
Part 2
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 23
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 24
[ Problem &
Dataset ]
?? ?? ??
[ Tool ]
Weka
[ Process &
Algorithm ]
????
?????
(No
Programming)
Outline
?Part 2-1: Weka ??
?Part 2-2: Weka? ?? ?? ??
?Weka ????? ??? ??/?? ????
????, ?? ??? ??? ?????
????? ???????.
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
25
Outline
?Weka Explorer? ??? ??? ? ?? ??
? Filter, Visualize
? Dataset: diabetes
?Weka Explorer? ??? Classification
? Dataset: Iris, diabetes
? Classifier: ????(ID3, J48, SimpleCart), Random Forest
?Weka Experimenter? ??? ?? ??
? Dataset: diabetes
? T-test? ??? ?? ?? ??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
26
?? 1 ??
? ?? ?? ?? ????
? ??: ?? ?? ??, ??? ??
? ??: ???
? ?? ??: ??? ??/???, ?? ??
? ??: Weka? Explorer, Experimenter
? ?? ??? ? ? ???? ?? ?? ?? ??
? ??: ???
? ?? ??: ??? ??? ??
? ?? ??
? ??? ???
? (??? ?? ??? ?? ? ??)
? ??? ???
? ??: Weka? Explorer
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
27
Weka ??
? ???? ???? ???? ??, ??? ??? ??
? Weka? ?? ??
? ??? ???(data pre-processing), ??? ??(feature selection)
? ???(clustering), ???(visualization)
? ??(classification), ????(regression), ??? ??(forecast)
? ?? ?? ??(association rules)
? S/W ??
? ?? ? ?? ?? ?????(free & open source GNU General Public License)
? ?? analytics S/W? ??? ?: RapidMiner, MOA
? Java? ??. ??? ????? ?? ??
? Python, C, R, Matlab ? ?? ??? ?? ??
? ????
? Google?? Weka? ??, ? ?? ?? ??
? http://www.cs.waikato.ac.nz/ml/weka/
(c)2008-2016, SNU Biointelligence Lab. 28Weka (bird): http://www.arkive.org/weka/gallirallus-australis/video-au00.html
Top 20 Most Popular Tools for
Big Data, Data Mining, and Data Science
(c)2008-2016, SNU Biointelligence Lab. 29Source: http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html
Red: Free/Open Source tools
Green: Commercial tools
Fuchsia: Hadoop/Big Data tools
Weka? ???? ?????
(c)2008-2016, SNU Biointelligence Lab.
30
Explorer: ??? ?? ??? ? ???
?? ?? ? ?? ?? ??.
?????, ?? ?? ??
KnowledgeFlow:
??? ?? ??? ?? ??? ????
????? ???? ?? ??
Experimenter: ?? ? ?? ???
?? ??. ?? ?? ??.
- ??? ???? ? ???? ??
- ?? ???-???? ?? ?? ??
- ?? ?? ? ??? ??
- ??? ??? ?? ??
Simple CLI: ?? ??????
????? ???? ???. Weka?
?? ??? ???? ?? ??
? ? ?? ??
Workbench: ?? ?????? ???
?? ????? (3.8.0?? ??)
??: ??(iris) ??(classification)
(c)2008-2016, SNU Biointelligence Lab.
31
Iris virginicaIris versicolorIris setosa
??? ??? ???
??: ??(iris) ??(classification)
? ??? ??(Define features or attributes)
? Sepal length, sepal width, petal length, petal width
? ?? ??(Class label): ??(iris)? ? ??. Setosa, versicolor, ? virginica
? ?? ?? ? ??? ??
? ? ?? ?? ?? 50?? ?? ?? (1935?)
? Data table : 150 samples (or instances) * 5 attributes
? R. Fisher ?? 1936? ?? ???? ? ???? linear discriminant model ? ???
? ??: ?? ???? ?? ? ???? ??
? ? ?? ?? ?????? ??: ???, ?? ??, SVM
? ? ???? ? ???? ??? ???? ??
? ?? ?? ?? ? ?? ??
? ??? ?? ?? ??
? ?? ??? ???? ?? ?? ??(algorithm + parameter setting) ??
? ??
(c)2008-2016, SNU Biointelligence Lab.
32
??: ??(iris) ????
? Just open ^iris.arff ̄ in `data¨ folder
(c)2008-2016, SNU Biointelligence Lab.
33
Weka ??? ?? (.ARFF)
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth real
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth numeric
@ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica}
@DATA
5.1, 3.5, 1.4, 0.2, Iris-setosa
4.9, 3.0, 1.4, 0.2, Iris-setosa
4.7, 3.2, 1.3, 0.2, Iris-setosa
´
7.0, 3.2, 4.7, 1.4, Iris-versicolor
6.4, 3.2, 4.5, 1.5, Iris-versicolor
6.9, 3.1, 4.9, 1.5, Iris-versicolor
´
???
(CSV format)
??
34
Note: Excel? ???? CSV ?? ?? ?, ??? ???? ?? arff ??? ?? ?? ??
(c)2008-2016, SNU Biointelligence Lab.
Dataset name Attribute name Attribute type
ARFF Example
35
%
% ARFF file for weather data with some numeric features
%
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {true, false}
@attribute play? {yes, no}
@data
sunny, 85, 85, false, no
sunny, 80, 90, true, no
overcast, 83, 86, false, yes
...
際際滷 from Data Mining: Practical Machine Learning Tools and Techniques (Chapter 2)
(c)2008-2016, SNU Biointelligence Lab.
Preprocess ??? ??? ??? ?? ??
? ??? ??(current relation)
? ??? ??(remove attributes)
? ??? ? ??? ?? ??(selected attribute)
? ?? ???? ???? class label ?? ???(Visualize All)
? `Filter¨? ??? preprocessing
(c)2008-2016, SNU Biointelligence Lab. 36
?? ?? ?? ? ?? ?? ??
?
? ?
?
?
Data matrix
(row: instance)
(col: feature)
Feature
Selection
PCA
SVM
Decision
Tree
Neural
Networks
Accuracy +
Cross-
validation
ROC
Curve
AUC
(Area
Under ROC
Curve)
feature
instance
normalization
Dataset
??/Cleaning
Feature
Manipulation
Classification
Regression
Evaluation
Fill missing values
standardization
?? ??? ?? ??? ???? ?? ? ???
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 37
Approaching (Almost) Any Machine Learning Problem
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 38
FIGURE FROM: A. THAKUR AND A. KROHN-GRIMBERGHE, AUTOCOMPETE: A FRAMEWORK FOR MACHINE LEARNING
COMPETITIONS, AUTOML WORKSHOP, INTERNATIONAL CONFERENCE ON MACHINE LEARNING 2015.
http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/
??: ?? ??? ??? ??
? Description
? Pima Indians have the highest prevalence of diabetes in the world
? We will build classification models that diagnose if the patient shows signs of
diabetes
? http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
? Configuration of the data set
? 768 instances
? 8 attributes
? age, number of times pregnant, results of medical tests/analysis
? all numeric (integer or real-valued)
? Class label = 1 (Positive example )
? Interpreted as "tested positive for diabetes"
? 268 instances
? Class label = 0 (Negative example)
? 500 instances
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 39
WEKA? ??? ??? ???
? ??? ??(DESCRIPTIVE ANALYSIS)
Part
Preprocess ??? ??? ?? ??
? ??? ??(current relation)
? ??? ??(remove attributes)
? ??? ? ??? ?? ??(selected attribute)
? ?? ???? ???? class label ?? ???(Visualize All)
? `Filter¨? ??? preprocessing: Part V?? ??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 41
Weka - Explorer C Preprocess CFilter
? ??? ??? ???
? ?? ?? ?? ????
? Fill in missing values
? weka.filters.unsupervised.attribute.ReplaceMissingValues ??
? Standardization for all the attributes: x? z ? ??
? weka.filters.unsupervised.attribute.Standardize ??
? Data reduction using PCA
? weka.filters.unsupervised.attribute.PrincipalComponents ??
? ???? ? maximumAttributes? 10~50 ?? ? ???
??? ??? ??
? Check the effect of PCA using `Visualize-Plot Matrix¨ ??? ?
(c)2008-2016, SNU Biointelligence
Laboratory, http://bi.snu.ac.kr
42
Weka - Explorer C Visualize ? ???
??? ?? (descriptive analysis)
? Check the effect of PCA using `Visualize-Plot Matrix¨
? PCA ?? ?/? ? plot matrix? ???? PCA? ???
???? ??
? PCA ?? ?(401?? ???) ? ?(???? ??? ?? ???
???)? ?? Plot Matrix ??? ?? ????, ?? ? ???
??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 43
WEKA EXPLORER? ???
CLASSIFICATION
Part Å
45
?? ???? C ????(Decision Trees)
?J48 (C4.5? Java ?? ??)
? ?? ?? ???? ?? ??? `??¨ ??? ?? ? ??
? Weka?? ????: classifiers-trees-J48
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
46
??: ???(Neural Networks)
?MLP (Multilayer Perceptron)
? ????? ?? ??? ??? ??? ?? ????
? Weka?? ????: classifiers-functions-MultilayerPerceptron
(c)2008-2016, SNU Biointelligence
Laboratory, http://bi.snu.ac.kr
Figure from Andrew Ng¨s Machine Learning Lecture Notes, on Coursera, 2013-1
Weka? ?? ???? C ???? ?
47
click ? load a file that contains the
training data by clicking
`Open file¨ button
? `ARFF¨ or `CSV¨ formats are
readable
? Click `Classify¨ tab
? Click `Choose¨ button
? Select `weka C function
- MultilayerPerceptron
? Click `MultilayerPerceptron¨
? Set parameters for MLP
? Set parameters for Test
? Click `Start¨ for learning
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
48
?? ????? ???? ??
? ???? ??(Parameter Setting) = ??? ??(Car Tuning)
? ?? ?? ?? ???? ??
? ???? ??? ?? ??? ??????? ???? ??? ???
?? ?? ?? ??
? ????? ?? ???? (J48, SimpleCart in Weka)
? ??? ??? ??? ??? ?? ????: confidenceFactor, pruning,
minNumObj ?
? Random Forest? ?? ???? (RandomForest in Weka)
? numTrees: ?? ? ??? ??? tree? ?? ??. ??? ?? ??
???, overfitting? ???? ??.
? ??: ???? ?? ???? (MultilayerPerceptron in Weka)
? ?? ??: hiddenLayers,
? ?? ?? ??: learningRate, momentum, trainingTime (epoch), seed
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
Test Options and Classifier Output
49(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
There are
various metrics
for evaluation
Setting the
data set used
for evaluation
Classifier Output
?Run information
?Classifier model (full
training set)
?Evaluation results
? General summary
? Detailed accuracy by
class
? Confusion matrix
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 50
The output
depends on
the classifier
????? (ANN) ??
? ANN? ?? ???? ??(functions-MultilayerPerceptron ??)
? learningRate -- The amount the weights are updated.
? momentum -- Momentum applied to the weights during updating.
? hiddenLayers C
? This defines the hidden layers of the neural network. This is a list of positive whole
numbers. 1 for each hidden layer. Comma seperated.
? Ex) 3: one hidden layer with 3 hidden nodes
? Ex) 5,3; two hidden layers with 5 and 3 hidden nodes, respectively
? To have no hidden layers put a single 0 here. This will only be used if autobuild is
set. There are also wildcard values 'a' = (attribs + classes) / 2, 'i' = attribs, 'o' =
classes , 't' = attribs + classes.
? trainingTime -- The number of epochs to train through. If the validation set is
non-zero then it can terminate the network early
? Experiments
? ?? ??: ?? ????? ??? ???? ?? ???? ?? ???? ??
?? ??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 51
SVM ??
? SVM? ?? ???? ??(functions-SMO ??)
? c -- The complexity parameter C.
? kernel -- The kernel to use.
? PolyKernel -- The polynomial kernel : K(x, y) = <x, y>^p or K(x, y) = (<x,
y>+1)^p.
? ^exponent ̄ represents p in the equations.
? RBFKernel -- K(x, y) = e^-(gamma * <x-y, x-y>^2)
? gamma (γ) controls the width (range of neighborhood) of the kernel
? Experiments
? ?? ??: ??? ?? ??. ??? ?? ???? ?? ??
? PolyKernel: testing several exponents. {1, 2, 5}
? RBF kernel: ^grid-search" on C and γ using cross-validation.
? C = {0.1, 1, 10}, γ = {0.1, 1, 10}
? Reference
? A practical guide to SVM classification (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf)
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 52
WEKA EXPERIMENTER?
??? CLASSIFICATION
Part ‰
Using Experimenter in Weka
? Tool for `Batch¨ experiments
54
click
? Set experiment type/iteration
control
? Set datasets / algorithms
Click `New¨
? Select `Run¨ tab and click `Start¨
? If it has finished successfully, click
`Analyse¨ tab and see the summary
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
Usages of the Experimenter
? Model selection for classification/regression
? Various approaches
? Repeated training/test set split
? Repeated cross-validation (c.f. double cross-validation)
? Averaging
? Comparison between models / algorithms
? Paired t-test
? On various metrics: accuracies / RMSE / etc.
? Batch and/or Distributed processing
? Load/save experiment settings
? http://weka.wikispaces.com/Remote+Experiment
? Multi-core support : utilize all the cores on a multi-core machine
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 55
Experimenter ??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 56
Experimenter ??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 57
Experimenter ??
? ?? ?? ?? ??(Analyse ??? 1~4 ???? ??)
(C) 2014-2015, B.-H Kim 58
? Accuracy: percent_correct ??
? F1-measure: F_measure ??
? ROC Area: Area Under ROC ??
1
2
3
4
??: Package Manager
? Explorer? ??? ?? ?? ??
(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 59
??? ????:
NEURAL NETWORK PLAYGROUND
Part 3
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 61
[ Problem &
Dataset ]
?? ?? ??
[ Tool ]
Web Demo
[ Process &
Algorithm ]
?????
(No
Programming)
Neural Network ??
(c)2008-2016, SNU Biointelligence Lab. 62
http://playground.tensorflow.org/
??, ??? ??: https://cloud.google.com/blog/big-data/2016/07/understanding-neural-networks-with-tensorflow-playground
Neural Network Activation Functions
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 63
(A. Graves, 2012)
???? ? = max(?, 0)
Rectified Linear Unit
[??]
? Hidden unit? sparsity? ????
? Gradient vanishing ??? ??
? ??? ???? ??? ????
?? ???? ??
??
? Data Mining, Data Science & Machine Learning
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 64
? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 65
DATA MINING,
DATA SCIENCE &
MACHINE LEARNING
(c) 2008-2015, B.-H. Kim 66
What is Data?
?`data¨? ??
(c) 2008-2015, B.-H. Kim
67
^Data is a set of values of qualitative or quantitative
variables, belonging to a set of items. ̄
Variables: A measurement or characteristic of an item.
Qualitative: Country of origin, gender, treatment
Quantitative: Height, weight, blood pressure
What Is Data Mining?
? Data mining (knowledge discovery from data)
? Extraction of `interesting¨ patterns or knowledge from huge
amount of data
? `Interesting¨ means: non-trivial, implicit, previously unknown
and potentially useful
? Alternative names
? Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
? Watch out: Is everything ^data mining ̄?
? Simple search and query processing
? (Deductive) expert systems
68際際滷 from Lecture 際際滷 of Ch. 1 by J. Han, et al., for Data Mining: Concepts and Techniques
(c) 2008-2015, B.-H. Kim
Related Research Fields
(c) 2008-2015, B.-H. Kim
Data
Mining
Artificial
Intelligence
(AI)
Machine
Learning
(ML)
Deep
Learning
Data
Science
Information
Retrieval
(IR)
Knowledge
Discovery
from Data
(KDD)
Big Data
Analytics
Business
Intelligence
69
Machine Learning & Data Mining
(c) 2008-2015, B.-H. Kim
70
際際滷 from GECCO 2009 Tutorial on `Large Scale Data Mining using
Genetics-Based Machine Learning¨, by Jaume Bacardit and Xavier Llor┐
Data Science
? ??? ????
? ??? ??
?? ?? ?
?? ???
???? ??
??? ??
??
(c) 2008-2015, B.-H. Kim
71
Data Science
(c) 2008-2015, B.-H. Kim
72
Figure source: http://nirvacana.com/thoughts/becoming-a-data-scientist/
??? ??? ?? ??? ??? ??
? ???(descriptive)
? Describe a set of data
? ???(exploratory)
? Find relationships you didn't know about
? Correlation does not imply causation
? ???(inferential)
? Use a relatively small sample of data to say something about a bigger population
? Inference is commonly the goal of statistical models
? ???(predictive)
? To use the data on some objects to predict values for another object
? Accurate prediction depends heavily on measuring the right variables
? ???(causal)
? To find out what happens to one variable when you make another variable change
? ????(mechanistic)
? Understand the exact changes in variables that lead to changes in other variables for
individual objects
(c) 2008-2015, B.-H. Kim
73
J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
?? ??? ?? ???? ??
? ???(descriptive)
? ?? ?? ??(a whole population)
? ???(exploratory)
? ??? ?? ? ??? ?? ??(a random sample with many variables
measured)
? ???(inferential)
? ???? ??? ?? ? ??? ??(the right population, randomly
sampled)
? ???(predictive)
? ??? ????? ?? ???? ??? ??? ??(a training and test
data set from the same population)
? ???(causal)
? ???? ??? ??? ???? ??? ??(data from a randomized
study)
? ????(mechanistic)
? ???? ?? ??? ???? ??? ??(data about all components of
the system)(c) 2008-2015, B.-H. Kim
74
J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
??? ??(data analysis)? ??
? ?? ??(define the question)
? ??? ??(dataset)
? ???? ???? ??(define the ideal data set)
? ??? ??? ??(determine what data you can access)
? ??? ??(obtain the data)
? ??? ??(clean the data)
? ??? ??? ??(exploratory data analysis)
? ?????/??? ???(Clustering / Data visualization)
? ??? ??/???(statistical prediction/modeling)
? ??/??(Classification / Prediction)
? ?? ??(interpret results)
? ??? ?? ?? ? ??(evaluation), ?? ?? ?? ??(model selection)
? ?? ?? ? ??? ?? ?? ?? ? ??(challenge results)
? ?? ?? ? ??? ??(synthesize/write up results)
? ?? ?? ??? ???? ??(create reproducible code)
(c) 2008-2015, B.-H. Kim
75
J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
Raw versus processed data
?Raw data
? The original source of the data
? Often hard to use for data analyses
? Data analysis includes processing
? Raw data may only need to be processed once
?Processed data
? Data that is ready for analysis
? Processing can include merging, subsetting, transforming,
etc.
? There may be standards for processing
? All steps should be recorded
(c) 2008-2015, B.-H. Kim
76
Raw versus processed data
?Raw data ?Processed data
(c) 2008-2015, B.-H. Kim 77
(c) 2008-2015, B.-H. Kim
78

More Related Content

????? ??? ??

  • 1. ????? ??? ?? ????? 2016. 12. 06 (?) ????? ???????? ? ? ? ??? bhkim@bi.snu.ac.kr
  • 2. Contents ? ????? ???? ?? ? ?? ??? ?? ? Part 1: ???? ?? ?? ??? ? Part 2: ???? ??: Weka? ?? ?? ?? ? Part 3: ??? ????: Neural Network Playground ? ?? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 2
  • 3. ????? ???? ?? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 3
  • 4. Intro ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 4 際際滷 by Jiqiong Qiu at DevFest 2016 http://www.slideshare.net/SfeirGroup/first-step-deep-learning-by-jiqiong-qiu-devfest-2016 ???? ???? (????) ???
  • 5. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 5 Image source: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
  • 6. ????(AI) ? ^???? ???? ???? ???? ?? ̄(???, SW)? ?? ? ??? ???? ? ?? ?? ??? ? ? ??? ?? ?? ? ??? ??? ?? ?? ??? ? ? ??? ?? ?? ? 1950: Turing¨s Paper, 1956: ^Artificial Intelligence (AI) ̄ 6(KIPS 2016 Conference Keynote by Byoung-Tak Zhang)
  • 7. ???? ??? ??: ?? ? AI ??? ??? `???` ???/??? ??? ? ? ???? ??? ????(intelligent), ??? ?? ??? ???(adaptive), ???(robust)?? ?? ? ??? ??? ????? ???? ?? ??? ? ??? ????? ??? ??? ??? ?? ????, ??? ????? ????? ??! ? ??? AI ?? ?? ? `??(learning)` ?? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 7 cat car (Sam Roweis, MLSS¨05 Lecture Note)
  • 8. ??? ??? ??? ???? ?? ???? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 8 ??(world)? ???? ??? ??? ?? ???? ?? ? ?? ?? ?? ?? ?? ?? ??? ?? Computer Vision ???? ??/?? Natural Language Processing / Understanding ?? ?? Speech Recognition + ???? ??? ?? ?? ??? ???? Intelligent Agent ??? ???? Intelligent Agent
  • 9. ?? ??? ?? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 9
  • 10. ??? ??? ??? ???? ?? ???? ???? ???? ??? ??? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 10 ??(world)? ???? ??? ??? ?? + ???? ??? ?? ?? (??) ?? (recognition) (??) ?? (recognition) (??) ?? (generation) ?
  • 11. ?? ???? ?? ???? ?? ?? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 11 (??) ?? (recognition) (??) ?? (recognition) (??) ?? (generation) ? ?? ?? ??? ????? ?? ?? ??? ??? ?? ??? ???? ?? ??? ??? ???? ??? ??? ??
  • 12. ??? ??(data analysis)? ?? ? ?? ??(define the question) ? ??? ??(dataset) ? ???? ???? ??(define the ideal data set) ? ??? ??? ??(determine what data you can access) ? ??? ??(obtain the data) ? ??? ??(clean the data) ? ??? ??? ??(exploratory data analysis) ? ???/??? ???(Clustering / Data visualization) ? (???) ??/???((statistical) prediction/modeling) ? ??/??(Classification / Prediction) ? ?? ??(interpret results) ? ??? ?? ?? ? ??(evaluation), ?? ?? ?? ??(model selection) ? ?? ?? ? ??? ?? ?? ?? ? ??(challenge results) ? ?? ?? ? ??? ??(synthesize/write up results) ? ?? ?? ??? ???? ??(create reproducible code) (c) 2008-2015, B.-H. Kim 12 J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013 ?? ?? ?? ?? ?? Data mining Data Science Analytics ´ ????? ???? ?? ??? ???? ??
  • 13. ???? & ???? ??? ?? ?? Problem & Dataset Tool Process & Algorithm (Programming) ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 13
  • 14. ???? & ???? ??? ?? ?? ??? 1. ?? ?? 1. ?? ??? ?? ??? ??? ?? ???? 2. ?? ?? ?? ? ??? ??? ?? ??? ??? 3. ?? ???? ??? ??? ?? ??? ??? ??? ?? 2. ??? ?? ??? ???? (??? ??) 1. Weka? ???? ?? (NO ?????!) 2. NN Playground? ??? ?? (NO ?????!) 3. ?? ??/???? ???? 1. `??? ?? ????/??? ??¨: http://hunkim.github.io/ml/ 2. ??????????(http://nacsi.kr/) ???? ???? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 14
  • 15. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 15 ?? ?????.
  • 16. ????? ?? ?? ??? Part 1 ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 16
  • 17. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 17 Machine Learning
  • 18. ???? (Machine Learning, ML) ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 18 Q. If the season is dry and the pavement is slippery, did it rain? A. Unlikely, it is more likely that the sprinkler was ON ??? ???? ??? ????? ??? ??? ??? ???? ????, ?? ?? ???? ?? ?? ?? ?? ?? ??, ???? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
  • 19. ????? ? ?? ?? ?? ?? ? Supervised Learning (????, ????, ????) ? ???(?? ???? ??) ?? ??? ???? ? ??? ?? ?? ?? ???? ??? ???? ? ?) ??(classification): ???? ????(discrete) ??. ? ?) ??(regression): ???? ??? ?? ??? ?? ? Unsupervised Learning (?????, ?????, ?????) ? ???? ??? ??, ??, ??? ??? ?? ??. ???? ???? ?? ? ?? ???? ??? ?? ?? ????? ??? D={(x)} ? ?: ?? ??(dimension reduction), ???(clustering) ? Reinforcement Learning (????) ? ???? ??? ???(right/wrong)? ?? ???? ?? ?? ? ????? ?????, ??(environment) ??? ??(rewards)? ??? ?? ??? ??(action)? ????? ???? ?? ? ??? ??, ????? ??, ?? ?? ?? ? ??, ?? ??? ??? ?? ? Action selection, planning, policy learning ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 19
  • 20. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 20 Img source: http://nkonst.com/machine-learning-explained-simple-words/
  • 21. ????(connectionism) ? ????, ???? ??? ???? ?? ?? ??(neural information processing)? ?? ??? ?? ? ?? ? ??? ??? ??? ?(network)?? ?? ??? ??? ??? ?? ? ?????(artificial neural networks) ? ????(??)? ??? ??? ?? ???? ?? ??? ?? ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 21 ?? ?? ??? ??(feedforward)
  • 22. ???(Deep Learning) ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 22 (??: Y. Bengio, Deep Learning Summer School 2015) ? ??: ??? ??, ?? ??: ???? ???? ?? GoogLeNet DeepFace Convolutional NN Stacked RBM Neural Machine Translator ? ?? ??? ?? ? ?? ??/??, ?? ??, ???? ??? ??, ?? ?? ? ????? ???? ???? ?? ??? ??? ??. ? (?? ??)?????? ?? ??? ??? ???? ? ?? ??? ?? ??? ???? ???? ?? ???? ????? ?? Deep Q Network
  • 23. ???? ???: WEKA? ?? ?? ?? Part 2 ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 23
  • 24. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 24 [ Problem & Dataset ] ?? ?? ?? [ Tool ] Weka [ Process & Algorithm ] ???? ????? (No Programming)
  • 25. Outline ?Part 2-1: Weka ?? ?Part 2-2: Weka? ?? ?? ?? ?Weka ????? ??? ??/?? ???? ????, ?? ??? ??? ????? ????? ???????. (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 25
  • 26. Outline ?Weka Explorer? ??? ??? ? ?? ?? ? Filter, Visualize ? Dataset: diabetes ?Weka Explorer? ??? Classification ? Dataset: Iris, diabetes ? Classifier: ????(ID3, J48, SimpleCart), Random Forest ?Weka Experimenter? ??? ?? ?? ? Dataset: diabetes ? T-test? ??? ?? ?? ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 26
  • 27. ?? 1 ?? ? ?? ?? ?? ???? ? ??: ?? ?? ??, ??? ?? ? ??: ??? ? ?? ??: ??? ??/???, ?? ?? ? ??: Weka? Explorer, Experimenter ? ?? ??? ? ? ???? ?? ?? ?? ?? ? ??: ??? ? ?? ??: ??? ??? ?? ? ?? ?? ? ??? ??? ? (??? ?? ??? ?? ? ??) ? ??? ??? ? ??: Weka? Explorer (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 27
  • 28. Weka ?? ? ???? ???? ???? ??, ??? ??? ?? ? Weka? ?? ?? ? ??? ???(data pre-processing), ??? ??(feature selection) ? ???(clustering), ???(visualization) ? ??(classification), ????(regression), ??? ??(forecast) ? ?? ?? ??(association rules) ? S/W ?? ? ?? ? ?? ?? ?????(free & open source GNU General Public License) ? ?? analytics S/W? ??? ?: RapidMiner, MOA ? Java? ??. ??? ????? ?? ?? ? Python, C, R, Matlab ? ?? ??? ?? ?? ? ???? ? Google?? Weka? ??, ? ?? ?? ?? ? http://www.cs.waikato.ac.nz/ml/weka/ (c)2008-2016, SNU Biointelligence Lab. 28Weka (bird): http://www.arkive.org/weka/gallirallus-australis/video-au00.html
  • 29. Top 20 Most Popular Tools for Big Data, Data Mining, and Data Science (c)2008-2016, SNU Biointelligence Lab. 29Source: http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html Red: Free/Open Source tools Green: Commercial tools Fuchsia: Hadoop/Big Data tools
  • 30. Weka? ???? ????? (c)2008-2016, SNU Biointelligence Lab. 30 Explorer: ??? ?? ??? ? ??? ?? ?? ? ?? ?? ??. ?????, ?? ?? ?? KnowledgeFlow: ??? ?? ??? ?? ??? ???? ????? ???? ?? ?? Experimenter: ?? ? ?? ??? ?? ??. ?? ?? ??. - ??? ???? ? ???? ?? - ?? ???-???? ?? ?? ?? - ?? ?? ? ??? ?? - ??? ??? ?? ?? Simple CLI: ?? ?????? ????? ???? ???. Weka? ?? ??? ???? ?? ?? ? ? ?? ?? Workbench: ?? ?????? ??? ?? ????? (3.8.0?? ??)
  • 31. ??: ??(iris) ??(classification) (c)2008-2016, SNU Biointelligence Lab. 31 Iris virginicaIris versicolorIris setosa ??? ??? ???
  • 32. ??: ??(iris) ??(classification) ? ??? ??(Define features or attributes) ? Sepal length, sepal width, petal length, petal width ? ?? ??(Class label): ??(iris)? ? ??. Setosa, versicolor, ? virginica ? ?? ?? ? ??? ?? ? ? ?? ?? ?? 50?? ?? ?? (1935?) ? Data table : 150 samples (or instances) * 5 attributes ? R. Fisher ?? 1936? ?? ???? ? ???? linear discriminant model ? ??? ? ??: ?? ???? ?? ? ???? ?? ? ? ?? ?? ?????? ??: ???, ?? ??, SVM ? ? ???? ? ???? ??? ???? ?? ? ?? ?? ?? ? ?? ?? ? ??? ?? ?? ?? ? ?? ??? ???? ?? ?? ??(algorithm + parameter setting) ?? ? ?? (c)2008-2016, SNU Biointelligence Lab. 32
  • 33. ??: ??(iris) ???? ? Just open ^iris.arff ̄ in `data¨ folder (c)2008-2016, SNU Biointelligence Lab. 33
  • 34. Weka ??? ?? (.ARFF) @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth real @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth numeric @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica} @DATA 5.1, 3.5, 1.4, 0.2, Iris-setosa 4.9, 3.0, 1.4, 0.2, Iris-setosa 4.7, 3.2, 1.3, 0.2, Iris-setosa ´ 7.0, 3.2, 4.7, 1.4, Iris-versicolor 6.4, 3.2, 4.5, 1.5, Iris-versicolor 6.9, 3.1, 4.9, 1.5, Iris-versicolor ´ ??? (CSV format) ?? 34 Note: Excel? ???? CSV ?? ?? ?, ??? ???? ?? arff ??? ?? ?? ?? (c)2008-2016, SNU Biointelligence Lab. Dataset name Attribute name Attribute type
  • 35. ARFF Example 35 % % ARFF file for weather data with some numeric features % @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature numeric @attribute humidity numeric @attribute windy {true, false} @attribute play? {yes, no} @data sunny, 85, 85, false, no sunny, 80, 90, true, no overcast, 83, 86, false, yes ... 際際滷 from Data Mining: Practical Machine Learning Tools and Techniques (Chapter 2) (c)2008-2016, SNU Biointelligence Lab.
  • 36. Preprocess ??? ??? ??? ?? ?? ? ??? ??(current relation) ? ??? ??(remove attributes) ? ??? ? ??? ?? ??(selected attribute) ? ?? ???? ???? class label ?? ???(Visualize All) ? `Filter¨? ??? preprocessing (c)2008-2016, SNU Biointelligence Lab. 36
  • 37. ?? ?? ?? ? ?? ?? ?? ? ? ? ? ? Data matrix (row: instance) (col: feature) Feature Selection PCA SVM Decision Tree Neural Networks Accuracy + Cross- validation ROC Curve AUC (Area Under ROC Curve) feature instance normalization Dataset ??/Cleaning Feature Manipulation Classification Regression Evaluation Fill missing values standardization ?? ??? ?? ??? ???? ?? ? ??? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 37
  • 38. Approaching (Almost) Any Machine Learning Problem (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 38 FIGURE FROM: A. THAKUR AND A. KROHN-GRIMBERGHE, AUTOCOMPETE: A FRAMEWORK FOR MACHINE LEARNING COMPETITIONS, AUTOML WORKSHOP, INTERNATIONAL CONFERENCE ON MACHINE LEARNING 2015. http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/
  • 39. ??: ?? ??? ??? ?? ? Description ? Pima Indians have the highest prevalence of diabetes in the world ? We will build classification models that diagnose if the patient shows signs of diabetes ? http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes ? Configuration of the data set ? 768 instances ? 8 attributes ? age, number of times pregnant, results of medical tests/analysis ? all numeric (integer or real-valued) ? Class label = 1 (Positive example ) ? Interpreted as "tested positive for diabetes" ? 268 instances ? Class label = 0 (Negative example) ? 500 instances (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 39
  • 40. WEKA? ??? ??? ??? ? ??? ??(DESCRIPTIVE ANALYSIS) Part
  • 41. Preprocess ??? ??? ?? ?? ? ??? ??(current relation) ? ??? ??(remove attributes) ? ??? ? ??? ?? ??(selected attribute) ? ?? ???? ???? class label ?? ???(Visualize All) ? `Filter¨? ??? preprocessing: Part V?? ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 41
  • 42. Weka - Explorer C Preprocess CFilter ? ??? ??? ??? ? ?? ?? ?? ???? ? Fill in missing values ? weka.filters.unsupervised.attribute.ReplaceMissingValues ?? ? Standardization for all the attributes: x? z ? ?? ? weka.filters.unsupervised.attribute.Standardize ?? ? Data reduction using PCA ? weka.filters.unsupervised.attribute.PrincipalComponents ?? ? ???? ? maximumAttributes? 10~50 ?? ? ??? ??? ??? ?? ? Check the effect of PCA using `Visualize-Plot Matrix¨ ??? ? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 42
  • 43. Weka - Explorer C Visualize ? ??? ??? ?? (descriptive analysis) ? Check the effect of PCA using `Visualize-Plot Matrix¨ ? PCA ?? ?/? ? plot matrix? ???? PCA? ??? ???? ?? ? PCA ?? ?(401?? ???) ? ?(???? ??? ?? ??? ???)? ?? Plot Matrix ??? ?? ????, ?? ? ??? ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 43
  • 45. 45 ?? ???? C ????(Decision Trees) ?J48 (C4.5? Java ?? ??) ? ?? ?? ???? ?? ??? `??¨ ??? ?? ? ?? ? Weka?? ????: classifiers-trees-J48 (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
  • 46. 46 ??: ???(Neural Networks) ?MLP (Multilayer Perceptron) ? ????? ?? ??? ??? ??? ?? ???? ? Weka?? ????: classifiers-functions-MultilayerPerceptron (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr Figure from Andrew Ng¨s Machine Learning Lecture Notes, on Coursera, 2013-1
  • 47. Weka? ?? ???? C ???? ? 47 click ? load a file that contains the training data by clicking `Open file¨ button ? `ARFF¨ or `CSV¨ formats are readable ? Click `Classify¨ tab ? Click `Choose¨ button ? Select `weka C function - MultilayerPerceptron ? Click `MultilayerPerceptron¨ ? Set parameters for MLP ? Set parameters for Test ? Click `Start¨ for learning (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
  • 48. 48 ?? ????? ???? ?? ? ???? ??(Parameter Setting) = ??? ??(Car Tuning) ? ?? ?? ?? ???? ?? ? ???? ??? ?? ??? ??????? ???? ??? ??? ?? ?? ?? ?? ? ????? ?? ???? (J48, SimpleCart in Weka) ? ??? ??? ??? ??? ?? ????: confidenceFactor, pruning, minNumObj ? ? Random Forest? ?? ???? (RandomForest in Weka) ? numTrees: ?? ? ??? ??? tree? ?? ??. ??? ?? ?? ???, overfitting? ???? ??. ? ??: ???? ?? ???? (MultilayerPerceptron in Weka) ? ?? ??: hiddenLayers, ? ?? ?? ??: learningRate, momentum, trainingTime (epoch), seed (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
  • 49. Test Options and Classifier Output 49(c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr There are various metrics for evaluation Setting the data set used for evaluation
  • 50. Classifier Output ?Run information ?Classifier model (full training set) ?Evaluation results ? General summary ? Detailed accuracy by class ? Confusion matrix (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 50 The output depends on the classifier
  • 51. ????? (ANN) ?? ? ANN? ?? ???? ??(functions-MultilayerPerceptron ??) ? learningRate -- The amount the weights are updated. ? momentum -- Momentum applied to the weights during updating. ? hiddenLayers C ? This defines the hidden layers of the neural network. This is a list of positive whole numbers. 1 for each hidden layer. Comma seperated. ? Ex) 3: one hidden layer with 3 hidden nodes ? Ex) 5,3; two hidden layers with 5 and 3 hidden nodes, respectively ? To have no hidden layers put a single 0 here. This will only be used if autobuild is set. There are also wildcard values 'a' = (attribs + classes) / 2, 'i' = attribs, 'o' = classes , 't' = attribs + classes. ? trainingTime -- The number of epochs to train through. If the validation set is non-zero then it can terminate the network early ? Experiments ? ?? ??: ?? ????? ??? ???? ?? ???? ?? ???? ?? ?? ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 51
  • 52. SVM ?? ? SVM? ?? ???? ??(functions-SMO ??) ? c -- The complexity parameter C. ? kernel -- The kernel to use. ? PolyKernel -- The polynomial kernel : K(x, y) = <x, y>^p or K(x, y) = (<x, y>+1)^p. ? ^exponent ̄ represents p in the equations. ? RBFKernel -- K(x, y) = e^-(gamma * <x-y, x-y>^2) ? gamma (γ) controls the width (range of neighborhood) of the kernel ? Experiments ? ?? ??: ??? ?? ??. ??? ?? ???? ?? ?? ? PolyKernel: testing several exponents. {1, 2, 5} ? RBF kernel: ^grid-search" on C and γ using cross-validation. ? C = {0.1, 1, 10}, γ = {0.1, 1, 10} ? Reference ? A practical guide to SVM classification (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf) (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 52
  • 54. Using Experimenter in Weka ? Tool for `Batch¨ experiments 54 click ? Set experiment type/iteration control ? Set datasets / algorithms Click `New¨ ? Select `Run¨ tab and click `Start¨ ? If it has finished successfully, click `Analyse¨ tab and see the summary (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr
  • 55. Usages of the Experimenter ? Model selection for classification/regression ? Various approaches ? Repeated training/test set split ? Repeated cross-validation (c.f. double cross-validation) ? Averaging ? Comparison between models / algorithms ? Paired t-test ? On various metrics: accuracies / RMSE / etc. ? Batch and/or Distributed processing ? Load/save experiment settings ? http://weka.wikispaces.com/Remote+Experiment ? Multi-core support : utilize all the cores on a multi-core machine (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 55
  • 56. Experimenter ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 56
  • 57. Experimenter ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 57
  • 58. Experimenter ?? ? ?? ?? ?? ??(Analyse ??? 1~4 ???? ??) (C) 2014-2015, B.-H Kim 58 ? Accuracy: percent_correct ?? ? F1-measure: F_measure ?? ? ROC Area: Area Under ROC ?? 1 2 3 4
  • 59. ??: Package Manager ? Explorer? ??? ?? ?? ?? (c)2008-2016, SNU Biointelligence Laboratory, http://bi.snu.ac.kr 59
  • 60. ??? ????: NEURAL NETWORK PLAYGROUND Part 3
  • 61. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 61 [ Problem & Dataset ] ?? ?? ?? [ Tool ] Web Demo [ Process & Algorithm ] ????? (No Programming)
  • 62. Neural Network ?? (c)2008-2016, SNU Biointelligence Lab. 62 http://playground.tensorflow.org/ ??, ??? ??: https://cloud.google.com/blog/big-data/2016/07/understanding-neural-networks-with-tensorflow-playground
  • 63. Neural Network Activation Functions ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 63 (A. Graves, 2012) ???? ? = max(?, 0) Rectified Linear Unit [??] ? Hidden unit? sparsity? ???? ? Gradient vanishing ??? ?? ? ??? ???? ??? ???? ?? ???? ??
  • 64. ?? ? Data Mining, Data Science & Machine Learning ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 64
  • 65. ? 2016 SNU Biointelligence Laboratory, http://bi.snu.ac.kr/ 65
  • 66. DATA MINING, DATA SCIENCE & MACHINE LEARNING (c) 2008-2015, B.-H. Kim 66
  • 67. What is Data? ?`data¨? ?? (c) 2008-2015, B.-H. Kim 67 ^Data is a set of values of qualitative or quantitative variables, belonging to a set of items. ̄ Variables: A measurement or characteristic of an item. Qualitative: Country of origin, gender, treatment Quantitative: Height, weight, blood pressure
  • 68. What Is Data Mining? ? Data mining (knowledge discovery from data) ? Extraction of `interesting¨ patterns or knowledge from huge amount of data ? `Interesting¨ means: non-trivial, implicit, previously unknown and potentially useful ? Alternative names ? Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. ? Watch out: Is everything ^data mining ̄? ? Simple search and query processing ? (Deductive) expert systems 68際際滷 from Lecture 際際滷 of Ch. 1 by J. Han, et al., for Data Mining: Concepts and Techniques (c) 2008-2015, B.-H. Kim
  • 69. Related Research Fields (c) 2008-2015, B.-H. Kim Data Mining Artificial Intelligence (AI) Machine Learning (ML) Deep Learning Data Science Information Retrieval (IR) Knowledge Discovery from Data (KDD) Big Data Analytics Business Intelligence 69
  • 70. Machine Learning & Data Mining (c) 2008-2015, B.-H. Kim 70 際際滷 from GECCO 2009 Tutorial on `Large Scale Data Mining using Genetics-Based Machine Learning¨, by Jaume Bacardit and Xavier Llor┐
  • 71. Data Science ? ??? ???? ? ??? ?? ?? ?? ? ?? ??? ???? ?? ??? ?? ?? (c) 2008-2015, B.-H. Kim 71
  • 72. Data Science (c) 2008-2015, B.-H. Kim 72 Figure source: http://nirvacana.com/thoughts/becoming-a-data-scientist/
  • 73. ??? ??? ?? ??? ??? ?? ? ???(descriptive) ? Describe a set of data ? ???(exploratory) ? Find relationships you didn't know about ? Correlation does not imply causation ? ???(inferential) ? Use a relatively small sample of data to say something about a bigger population ? Inference is commonly the goal of statistical models ? ???(predictive) ? To use the data on some objects to predict values for another object ? Accurate prediction depends heavily on measuring the right variables ? ???(causal) ? To find out what happens to one variable when you make another variable change ? ????(mechanistic) ? Understand the exact changes in variables that lead to changes in other variables for individual objects (c) 2008-2015, B.-H. Kim 73 J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
  • 74. ?? ??? ?? ???? ?? ? ???(descriptive) ? ?? ?? ??(a whole population) ? ???(exploratory) ? ??? ?? ? ??? ?? ??(a random sample with many variables measured) ? ???(inferential) ? ???? ??? ?? ? ??? ??(the right population, randomly sampled) ? ???(predictive) ? ??? ????? ?? ???? ??? ??? ??(a training and test data set from the same population) ? ???(causal) ? ???? ??? ??? ???? ??? ??(data from a randomized study) ? ????(mechanistic) ? ???? ?? ??? ???? ??? ??(data about all components of the system)(c) 2008-2015, B.-H. Kim 74 J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
  • 75. ??? ??(data analysis)? ?? ? ?? ??(define the question) ? ??? ??(dataset) ? ???? ???? ??(define the ideal data set) ? ??? ??? ??(determine what data you can access) ? ??? ??(obtain the data) ? ??? ??(clean the data) ? ??? ??? ??(exploratory data analysis) ? ?????/??? ???(Clustering / Data visualization) ? ??? ??/???(statistical prediction/modeling) ? ??/??(Classification / Prediction) ? ?? ??(interpret results) ? ??? ?? ?? ? ??(evaluation), ?? ?? ?? ??(model selection) ? ?? ?? ? ??? ?? ?? ?? ? ??(challenge results) ? ?? ?? ? ??? ??(synthesize/write up results) ? ?? ?? ??? ???? ??(create reproducible code) (c) 2008-2015, B.-H. Kim 75 J. Leek, Data Analysis C Structure of a Data Analysis, Lecture at Coursera, 2013
  • 76. Raw versus processed data ?Raw data ? The original source of the data ? Often hard to use for data analyses ? Data analysis includes processing ? Raw data may only need to be processed once ?Processed data ? Data that is ready for analysis ? Processing can include merging, subsetting, transforming, etc. ? There may be standards for processing ? All steps should be recorded (c) 2008-2015, B.-H. Kim 76
  • 77. Raw versus processed data ?Raw data ?Processed data (c) 2008-2015, B.-H. Kim 77