ΊέΊέί£

ΊέΊέί£Share a Scribd company logo
Machine/Deep Learning
with Theano
Softmax classification : Multinomial classification
Application & Tips : Learning rate, data preprocessing, overfitting
Deep Neural Nets for Everyone
Multinomial Classification
Softmax classification
Logistic Regression
𝐻𝐿(𝑋) = π‘Šπ‘‹
𝐻𝐿 𝑋 = 𝑍
𝑔(𝑍) =
1
1 + π‘’βˆ’π‘
𝐻 𝑅 𝑋 = 𝑔(𝐻𝐿(𝑋))
𝑋
π‘Š
𝑍 π‘Œ
π‘Œ : Prediction ( 0 ~ 1 )
π‘Œ : Real Value ( 0 or 1 )
Binomial Classification
μ™Όμͺ½μ˜ 그림은 원 일까?
yes/no
Binomial Classification
π‘₯1 β‡’ λͺ¨μ„œλ¦¬μ˜ κ²½ν–₯μ„±
π‘₯2 β‡’ μ§μ„ μ˜ κ²½ν–₯μ„±
π‘₯1
π‘₯2
원
𝑋
π‘Š
𝑍 π‘Œ
λ‹€κ°ν˜•
Multinomial Classification
μ™Όμͺ½μ˜ 그림은 A/B/C 쀑 λ¬΄μ—‡μΌκΉŒ?
π‘₯1
π‘₯2
Multinomial Classification
π‘₯1
π‘₯2
Multinomial Classification
𝑋
π‘Š
𝑍 π‘Œ
π‘₯1
π‘₯2
Multinomial Classification
𝑋
π‘Š
𝑍 π‘Œ
π‘₯1
π‘₯2
Multinomial Classification
𝑋
π‘Š
𝑍 π‘Œ
π‘₯1
π‘₯2
Multinomial Classification
𝑋
π‘Š
𝑍 π‘Œ
𝑋
π‘Š
𝑍 π‘Œ
𝑋
π‘Š
𝑍 π‘Œ
Multinomial Classification
𝑋
π‘Š
𝑍
𝑀1 𝑀2 𝑀3
π‘₯1
π‘₯2
π‘₯3
= 𝑀1 π‘₯1 + 𝑀2 π‘₯2 + 𝑀3 π‘₯3
Multinomial Classification
𝑋
π‘Š
𝑍
𝑀 𝐴1 𝑀 𝐴2 𝑀 𝐴3
π‘₯1
π‘₯2
π‘₯3
= 𝑀 𝐴1 π‘₯1 + 𝑀 𝐴2 π‘₯2 + 𝑀 𝐴3 π‘₯3
𝑋
π‘Š
𝑍
𝑀 𝐡1 𝑀 𝐡2 𝑀 𝐡3
π‘₯1
π‘₯2
π‘₯3
= 𝑀 𝐡1 π‘₯1 + 𝑀 𝐡2 π‘₯2 + 𝑀 𝐡3 π‘₯3
𝑋
π‘Š
𝑍
𝑀 𝐢1 𝑀 𝐢2 𝑀 𝐢3
π‘₯1
π‘₯2
π‘₯3
= 𝑀 𝐢1 π‘₯1 + 𝑀 𝐢2 π‘₯2 + 𝑀 𝐢3 π‘₯3
Multinomial Classification
𝑋
π‘Š
𝑍
𝑀 𝐴1 π‘₯1 + 𝑀 𝐴2 π‘₯2 + 𝑀 𝐴3 π‘₯3
𝑀 𝐡1 π‘₯1 + 𝑀 𝐡2 π‘₯2 + 𝑀 𝐡3 π‘₯3
𝑀 𝐢1 π‘₯1 + 𝑀 𝐢2 π‘₯2 + 𝑀 𝐢3 π‘₯3
𝑋
π‘Š
𝑍
π‘₯1
π‘₯2
π‘₯3
=
𝑋
π‘Š
𝑍
𝑀 𝐴1 𝑀 𝐴2 𝑀 𝐴3
𝑀 𝐡1 𝑀 𝐡2 𝑀 𝐡3
𝑀 𝐢1 𝑀 𝐢2 𝑀 𝐢3
𝐻𝐴(𝑋)
𝐻 𝐡(𝑋)
𝐻 𝐢(𝑋)
=
Multinomial Classification
𝐻𝐴(𝑋)
𝐻 𝐡(𝑋)
𝐻 𝐢(𝑋)
150
5
βˆ’0.1
example
=
Multinomial Classification : Softmax Function
Score Probability
𝑯 𝑨 𝑿 = 𝒁 𝑨
𝑯 𝑩 𝑿 = 𝒁 𝑩
𝑯 π‘ͺ 𝑿 = 𝒁 π‘ͺ
𝒀 𝑨
𝒀 𝑩
𝒀 π‘ͺ
π’”π’π’‡π’•π’Žπ’‚π’™(π’π’Š)
=
𝒆 𝒁 π’Š
π’Š 𝒆 𝒁 π’Š
(2) π’Š π’€π’Š = 𝟏(1) 𝟎 ≀ π’€π’Š ≀ 𝟏
Multinomial Classification
𝑋
π‘Šπ΄
𝑍 𝐴
𝑋
π‘Šπ΅
𝑍 𝐡
𝑋
π‘ŠπΆ
𝑍 𝐢
softmax hot encoding
(find maximum)
1.0
0.0
0.0
π‘Œπ΅
π‘Œπ‘
π‘Œπ΄
0.8
0.15
0.05
Cost Function
Cross Entropy Function
Entropy Function
𝐻 𝑝 = βˆ’ 𝑝(π‘₯) log 𝑝(π‘₯)
β€’ ν™•λ₯  뢄포 p 에 λ‹΄κΈ΄ λΆˆν™•μ‹€μ„±μ„ λ‚˜νƒ€λ‚΄λŠ” μ§€ν‘œ
β€’ 이 값이 클 수둝 μΌμ •ν•œ λ°©ν–₯μ„±κ³Ό κ·œμΉ™μ„±μ΄ μ—†λŠ” chaos
β€’ pλΌλŠ” λŒ€μƒμ„ ν‘œν˜„ν•˜κΈ°μœ„ν•΄ ν•„μš”ν•œ μ •λ³΄λŸ‰(bit)
Cross Entropy Function
𝐻 𝑝, π‘ž = βˆ’ 𝑝(π‘₯) log π‘ž(π‘₯)
β€’ 두 ν™•λ₯  뢄포 p, q 사이에 μ‘΄μž¬ν•˜λŠ” μ •λ³΄λŸ‰μ„ κ³„μ‚°ν•˜λŠ” 방법
β€’ p->q둜 정보λ₯Ό λ°”κΎΈκΈ° μœ„ν•΄ ν•„μš”ν•œ μ •λ³΄λŸ‰(bit)
Cross Entropy Cost Function
𝑋
π‘Šπ΄
𝑍 𝐴
𝑋
π‘Šπ΅
𝑍 𝐡
𝑋
π‘ŠπΆ
𝑍 𝐢
π‘Œπ΄
π‘Œπ΅
π‘Œπ‘
π‘Œ : Prediction ( 0 ~ 1 )
π‘Œ : Real Value ( 0 or 1 )
𝐷 π‘Œπ‘–, π‘Œπ‘– = βˆ’ π‘Œπ‘– log π‘Œπ‘–
Cross Entropy Cost Function
π‘Œπ΄
π‘Œπ΅
π‘ŒπΆ
=
1
0
0
π‘Œπ΄
π‘Œπ΅
π‘ŒπΆ
=
1
0
0
βˆ’
1
0
0
βˆ™ log
1
0
0
= βˆ’
1
0
0
0
∞
∞
=
0
0
0
= 0
𝐷 π‘Œπ‘–, π‘Œπ‘– = βˆ’ π‘Œπ‘– log π‘Œπ‘–
Cross Entropy Cost Function
𝐷 π‘Œπ‘–, π‘Œ 𝑖 = βˆ’ π‘Œπ‘– log π‘Œπ‘–
π‘Œπ΄
π‘Œπ΅
π‘ŒπΆ
=
1
0
0
π‘Œπ΄
π‘Œπ΅
π‘ŒπΆ
=
0
1
0
βˆ’
0
1
0
βˆ™ log
1
0
0
= βˆ’
1
0
0
0
∞
∞
=
0
βˆ’βˆž
0
= βˆ’βˆž
Logistic Cost VS Cross Entropy
binomial classification 의 경우
각각 였직 2가지 경우의 Real
Data와 H(x) 값이 λ‚˜μ˜¬ 수 μžˆλ‹€.
0
1
1
0
μœ„ 행렬은 λ‹€μŒκ³Ό 같이 ν‘œν˜„ ν•  수 μžˆλ‹€.
𝐻(π‘₯)
1 βˆ’ 𝐻(π‘₯)
𝐻 π‘₯ , 𝑦
0
1
𝑦
1 βˆ’ 𝑦
Logistic Cost VS Cross Entropy
Cross Entropy Cost Function에
λŒ€μž…ν•˜λ©΄
𝐻 𝐻(π‘₯), 𝑦 = βˆ’
𝑦
1 βˆ’ 𝑦 βˆ™ log
𝐻(π‘₯)
1 βˆ’ 𝐻(π‘₯)
= βˆ’
𝑦log 𝐻 π‘₯
1 βˆ’ 𝑦 log(1 βˆ’ 𝐻 π‘₯ )
= βˆ’π‘¦ log 𝐻 𝑋 βˆ’ (1 βˆ’ 𝑦)log(1 βˆ’ 𝐻 π‘₯ )
= 𝐢(𝐻 π‘₯ , 𝑦)
Cross Entropy Cost Function
𝐿 =
1
𝑁
𝑛
𝐷 𝑛 π‘Œ, π‘Œ = βˆ’
1
𝑁
π‘Œπ‘–log π‘Œπ‘–
N 개의 training set 에 λŒ€ν•œ Costλ“€μ˜ ν•©
Application & Tips
Learning Rate
Data Preprocessing
Overshooting
Gradient Descent Function
π‘Š = π‘Š βˆ’ 𝛼
πœ•
πœ•π‘Š
πΆπ‘œπ‘ π‘‘(π‘Š)
Learning Rate
Learning rate : Overshooting
𝐿(π‘Š)
π‘Š
Learning rate : Too small
𝐿(π‘Š)
π‘Š
Data Preprocessing
𝐿(π‘Š)
π‘Š 𝑀1
𝑀2
Data Preprocessing
𝑀1
𝑀2
π‘Š = π‘Š βˆ’ 𝛼
πœ•
πœ•π‘Š
πΆπ‘œπ‘ π‘‘(π‘Š)
𝛼 κ°€ λ³€ν•˜λ©΄μ„œ 각 weight 값듀에 λ―ΈμΉ˜λŠ” 영ν–₯이
λ‹€λ₯Ό λ•Œ μ μ ˆν•œ Learning rate을 μ°ΎκΈ°κ°€ νž˜λ“€μ–΄μ§„
λ‹€.
Data Preprocessing : Standardization
𝑀𝑖′ =
𝑀𝑖 βˆ’ πœ‡π‘–
πœŽπ‘–
πœ‡π‘– ∢ π‘€μ˜ 평균
πœŽπ‘– ∢ π‘€μ˜ ν‘œμ€€νŽΈμ°¨
Overfitting
β€’ training data에 κ³Όλ„ν•˜κ²Œ μ΅œμ ν™” λ˜λŠ” ν˜„μƒ
β€’ real data에 λŒ€ν•΄μ„  잘 λ™μž‘ν•˜μ§€ μ•ŠλŠ”λ‹€.
Overfitting
π‘₯2
원
π‘₯1
π‘₯2
원
π‘₯1
Overfitting
β€’ λ§Žμ€ μ–‘μ˜ training data둜 ν•™μŠ΅ μ‹œν‚¨λ‹€.
β€’ feature(π‘₯𝑖)의 개수λ₯Ό 쀄인닀.
β€’ Regularization
Solution:
Overfitting : Regularization
𝐿 =
1
𝑁
𝑛
𝐷 𝑛 π‘Œ, π‘Œ + Ξ» π‘Š2
β€’ weightκ°€ λ„ˆλ¬΄ 큰 값을 가지지 μ•Šλ„λ‘ ν•œλ‹€. => Cost ν•¨μˆ˜κ°€ ꡴곑이 μ‹¬ν•˜μ§€ μ•Šλ„λ‘
μ‘°μ •ν•œλ‹€.
Regularization Strength
Overfitting : Regularization
𝐿 =
1
𝑁
𝑛
𝐷 𝑛 π‘Œ, π‘Œ + Ξ» π‘Š2
Regularization Strength
Application & Tips
Learning and Test data sets
Training, validation and test sets
β€’ training data에 λŒ€ν•΄μ„œλŠ” 이미 정닡을 memorize ν•œ μƒνƒœμ΄κΈ° λ•Œλ¬Έμ—
μ‹€μ œ real data에 잘 μž‘λ™ ν•˜λŠ”μ§€ 확인을 ν•  수 μ—†λ‹€. => Test data ν•„
μš”!
β€’ ν•™μŠ΅λœ machine에 λŒ€ν•΄μ„œ μ μ ˆν•œ learning rate와 regularization
strengtλ₯Ό μ°ΎκΈ° μœ„ν•œ validation μž‘μ—…μ΄ μžˆμ–΄μ•Ό ν•œλ‹€. => Validation
data ν•„μš”!
Online Learning
Data
Model
β€’ λ„ˆλ¬΄ λ§Žμ€ μ–‘μ˜ 데이터
κ°€ μžˆμ„ λ•Œ, λΆ„ν• ν•˜μ—¬
λ‚˜λˆ„μ–΄ ν•™μŠ΅μ‹œν‚¨λ‹€.
β€’ Dataκ°€ μ§€μ†μ μœΌλ‘œ 유
μž… λ˜λŠ” 경우 μ‚¬μš©λ˜κΈ°
도 ν•œλ‹€.

More Related Content

Similar to Multinomial classification and application of ML (20)

Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)
Jeonghun Yoon
Μύ
04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )
04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )
04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )
Jeonghun Yoon
Μύ
03. linear regression
03. linear regression03. linear regression
03. linear regression
Jeonghun Yoon
Μύ
RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°
RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°
RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°
Woong won Lee
Μύ
Έι²Τ²Τκ°œλ…μ •λ¦¬
Έι²Τ²Τκ°œλ…μ •λ¦¬Έι²Τ²Τκ°œλ…μ •λ¦¬
Έι²Τ²Τκ°œλ…μ •λ¦¬
μ’…ν˜„ 졜
Μύ
Dsh data sensitive hashing for high dimensional k-nn search
Dsh  data sensitive hashing for high dimensional k-nn searchDsh  data sensitive hashing for high dimensional k-nn search
Dsh data sensitive hashing for high dimensional k-nn search
WooSung Choi
Μύ
Lecture 2: Supervised Learning
Lecture 2: Supervised LearningLecture 2: Supervised Learning
Lecture 2: Supervised Learning
Sang Jun Lee
Μύ
02.09 naive bayesian classifier
02.09 naive bayesian classifier02.09 naive bayesian classifier
02.09 naive bayesian classifier
Dea-hwan Ki
Μύ
Linear regression
Linear regressionLinear regression
Linear regression
μ „ 희천
Μύ
Probability with MLE, MAP
Probability with MLE, MAPProbability with MLE, MAP
Probability with MLE, MAP
Junho Lee
Μύ
Variational Auto Encoder, Generative Adversarial Model
Variational Auto Encoder, Generative Adversarial ModelVariational Auto Encoder, Generative Adversarial Model
Variational Auto Encoder, Generative Adversarial Model
SEMINARGROOT
Μύ
2.linear regression and logistic regression
2.linear regression and logistic regression2.linear regression and logistic regression
2.linear regression and logistic regression
Haesun Park
Μύ
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
Sang Jun Lee
Μύ
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-Encoders
Jinho Lee
Μύ
Vae
VaeVae
Vae
Lee Gyeong Hoon
Μύ
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
KyeongUkJang
Μύ
파이썬과 μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•
파이썬과 μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•νŒŒμ΄μ¬κ³Ό μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•
파이썬과 μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•
Woong won Lee
Μύ
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
Sang Jun Lee
Μύ
Super resolution
Super resolutionSuper resolution
Super resolution
SEMINARGROOT
Μύ
[Probability for machine learning]
[Probability for machine learning][Probability for machine learning]
[Probability for machine learning]
κ°•λ―Όκ΅­ κ°•λ―Όκ΅­
Μύ
Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)
Jeonghun Yoon
Μύ
04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )
04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )
04. logistic regression ( λ‘œμ§€μŠ€ν‹± νšŒκ·€ )
Jeonghun Yoon
Μύ
03. linear regression
03. linear regression03. linear regression
03. linear regression
Jeonghun Yoon
Μύ
RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°
RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°
RLCode와 A3C 쉽고 깊게 μ΄ν•΄ν•˜κΈ°
Woong won Lee
Μύ
Έι²Τ²Τκ°œλ…μ •λ¦¬
Έι²Τ²Τκ°œλ…μ •λ¦¬Έι²Τ²Τκ°œλ…μ •λ¦¬
Έι²Τ²Τκ°œλ…μ •λ¦¬
μ’…ν˜„ 졜
Μύ
Dsh data sensitive hashing for high dimensional k-nn search
Dsh  data sensitive hashing for high dimensional k-nn searchDsh  data sensitive hashing for high dimensional k-nn search
Dsh data sensitive hashing for high dimensional k-nn search
WooSung Choi
Μύ
Lecture 2: Supervised Learning
Lecture 2: Supervised LearningLecture 2: Supervised Learning
Lecture 2: Supervised Learning
Sang Jun Lee
Μύ
02.09 naive bayesian classifier
02.09 naive bayesian classifier02.09 naive bayesian classifier
02.09 naive bayesian classifier
Dea-hwan Ki
Μύ
Probability with MLE, MAP
Probability with MLE, MAPProbability with MLE, MAP
Probability with MLE, MAP
Junho Lee
Μύ
Variational Auto Encoder, Generative Adversarial Model
Variational Auto Encoder, Generative Adversarial ModelVariational Auto Encoder, Generative Adversarial Model
Variational Auto Encoder, Generative Adversarial Model
SEMINARGROOT
Μύ
2.linear regression and logistic regression
2.linear regression and logistic regression2.linear regression and logistic regression
2.linear regression and logistic regression
Haesun Park
Μύ
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
Sang Jun Lee
Μύ
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-Encoders
Jinho Lee
Μύ
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
KyeongUkJang
Μύ
파이썬과 μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•
파이썬과 μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•νŒŒμ΄μ¬κ³Ό μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•
파이썬과 μΌ€λΌμŠ€λ‘œ λ°°μš°λŠ” κ°•ν™”ν•™μŠ΅ μ €μžνŠΉκ°•
Woong won Lee
Μύ
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
Sang Jun Lee
Μύ
Super resolution
Super resolutionSuper resolution
Super resolution
SEMINARGROOT
Μύ

Multinomial classification and application of ML

  • 1. Machine/Deep Learning with Theano Softmax classification : Multinomial classification Application & Tips : Learning rate, data preprocessing, overfitting Deep Neural Nets for Everyone
  • 3. Logistic Regression 𝐻𝐿(𝑋) = π‘Šπ‘‹ 𝐻𝐿 𝑋 = 𝑍 𝑔(𝑍) = 1 1 + π‘’βˆ’π‘ 𝐻 𝑅 𝑋 = 𝑔(𝐻𝐿(𝑋)) 𝑋 π‘Š 𝑍 π‘Œ π‘Œ : Prediction ( 0 ~ 1 ) π‘Œ : Real Value ( 0 or 1 )
  • 5. Binomial Classification π‘₯1 β‡’ λͺ¨μ„œλ¦¬μ˜ κ²½ν–₯μ„± π‘₯2 β‡’ μ§μ„ μ˜ κ²½ν–₯μ„± π‘₯1 π‘₯2 원 𝑋 π‘Š 𝑍 π‘Œ λ‹€κ°ν˜•
  • 12. Multinomial Classification 𝑋 π‘Š 𝑍 𝑀1 𝑀2 𝑀3 π‘₯1 π‘₯2 π‘₯3 = 𝑀1 π‘₯1 + 𝑀2 π‘₯2 + 𝑀3 π‘₯3
  • 13. Multinomial Classification 𝑋 π‘Š 𝑍 𝑀 𝐴1 𝑀 𝐴2 𝑀 𝐴3 π‘₯1 π‘₯2 π‘₯3 = 𝑀 𝐴1 π‘₯1 + 𝑀 𝐴2 π‘₯2 + 𝑀 𝐴3 π‘₯3 𝑋 π‘Š 𝑍 𝑀 𝐡1 𝑀 𝐡2 𝑀 𝐡3 π‘₯1 π‘₯2 π‘₯3 = 𝑀 𝐡1 π‘₯1 + 𝑀 𝐡2 π‘₯2 + 𝑀 𝐡3 π‘₯3 𝑋 π‘Š 𝑍 𝑀 𝐢1 𝑀 𝐢2 𝑀 𝐢3 π‘₯1 π‘₯2 π‘₯3 = 𝑀 𝐢1 π‘₯1 + 𝑀 𝐢2 π‘₯2 + 𝑀 𝐢3 π‘₯3
  • 14. Multinomial Classification 𝑋 π‘Š 𝑍 𝑀 𝐴1 π‘₯1 + 𝑀 𝐴2 π‘₯2 + 𝑀 𝐴3 π‘₯3 𝑀 𝐡1 π‘₯1 + 𝑀 𝐡2 π‘₯2 + 𝑀 𝐡3 π‘₯3 𝑀 𝐢1 π‘₯1 + 𝑀 𝐢2 π‘₯2 + 𝑀 𝐢3 π‘₯3 𝑋 π‘Š 𝑍 π‘₯1 π‘₯2 π‘₯3 = 𝑋 π‘Š 𝑍 𝑀 𝐴1 𝑀 𝐴2 𝑀 𝐴3 𝑀 𝐡1 𝑀 𝐡2 𝑀 𝐡3 𝑀 𝐢1 𝑀 𝐢2 𝑀 𝐢3 𝐻𝐴(𝑋) 𝐻 𝐡(𝑋) 𝐻 𝐢(𝑋) =
  • 16. Multinomial Classification : Softmax Function Score Probability 𝑯 𝑨 𝑿 = 𝒁 𝑨 𝑯 𝑩 𝑿 = 𝒁 𝑩 𝑯 π‘ͺ 𝑿 = 𝒁 π‘ͺ 𝒀 𝑨 𝒀 𝑩 𝒀 π‘ͺ π’”π’π’‡π’•π’Žπ’‚π’™(π’π’Š) = 𝒆 𝒁 π’Š π’Š 𝒆 𝒁 π’Š (2) π’Š π’€π’Š = 𝟏(1) 𝟎 ≀ π’€π’Š ≀ 𝟏
  • 17. Multinomial Classification 𝑋 π‘Šπ΄ 𝑍 𝐴 𝑋 π‘Šπ΅ 𝑍 𝐡 𝑋 π‘ŠπΆ 𝑍 𝐢 softmax hot encoding (find maximum) 1.0 0.0 0.0 π‘Œπ΅ π‘Œπ‘ π‘Œπ΄ 0.8 0.15 0.05
  • 19. Entropy Function 𝐻 𝑝 = βˆ’ 𝑝(π‘₯) log 𝑝(π‘₯) β€’ ν™•λ₯  뢄포 p 에 λ‹΄κΈ΄ λΆˆν™•μ‹€μ„±μ„ λ‚˜νƒ€λ‚΄λŠ” μ§€ν‘œ β€’ 이 값이 클 수둝 μΌμ •ν•œ λ°©ν–₯μ„±κ³Ό κ·œμΉ™μ„±μ΄ μ—†λŠ” chaos β€’ pλΌλŠ” λŒ€μƒμ„ ν‘œν˜„ν•˜κΈ°μœ„ν•΄ ν•„μš”ν•œ μ •λ³΄λŸ‰(bit)
  • 20. Cross Entropy Function 𝐻 𝑝, π‘ž = βˆ’ 𝑝(π‘₯) log π‘ž(π‘₯) β€’ 두 ν™•λ₯  뢄포 p, q 사이에 μ‘΄μž¬ν•˜λŠ” μ •λ³΄λŸ‰μ„ κ³„μ‚°ν•˜λŠ” 방법 β€’ p->q둜 정보λ₯Ό λ°”κΎΈκΈ° μœ„ν•΄ ν•„μš”ν•œ μ •λ³΄λŸ‰(bit)
  • 21. Cross Entropy Cost Function 𝑋 π‘Šπ΄ 𝑍 𝐴 𝑋 π‘Šπ΅ 𝑍 𝐡 𝑋 π‘ŠπΆ 𝑍 𝐢 π‘Œπ΄ π‘Œπ΅ π‘Œπ‘ π‘Œ : Prediction ( 0 ~ 1 ) π‘Œ : Real Value ( 0 or 1 ) 𝐷 π‘Œπ‘–, π‘Œπ‘– = βˆ’ π‘Œπ‘– log π‘Œπ‘–
  • 22. Cross Entropy Cost Function π‘Œπ΄ π‘Œπ΅ π‘ŒπΆ = 1 0 0 π‘Œπ΄ π‘Œπ΅ π‘ŒπΆ = 1 0 0 βˆ’ 1 0 0 βˆ™ log 1 0 0 = βˆ’ 1 0 0 0 ∞ ∞ = 0 0 0 = 0 𝐷 π‘Œπ‘–, π‘Œπ‘– = βˆ’ π‘Œπ‘– log π‘Œπ‘–
  • 23. Cross Entropy Cost Function 𝐷 π‘Œπ‘–, π‘Œ 𝑖 = βˆ’ π‘Œπ‘– log π‘Œπ‘– π‘Œπ΄ π‘Œπ΅ π‘ŒπΆ = 1 0 0 π‘Œπ΄ π‘Œπ΅ π‘ŒπΆ = 0 1 0 βˆ’ 0 1 0 βˆ™ log 1 0 0 = βˆ’ 1 0 0 0 ∞ ∞ = 0 βˆ’βˆž 0 = βˆ’βˆž
  • 24. Logistic Cost VS Cross Entropy binomial classification 의 경우 각각 였직 2가지 경우의 Real Data와 H(x) 값이 λ‚˜μ˜¬ 수 μžˆλ‹€. 0 1 1 0 μœ„ 행렬은 λ‹€μŒκ³Ό 같이 ν‘œν˜„ ν•  수 μžˆλ‹€. 𝐻(π‘₯) 1 βˆ’ 𝐻(π‘₯) 𝐻 π‘₯ , 𝑦 0 1 𝑦 1 βˆ’ 𝑦
  • 25. Logistic Cost VS Cross Entropy Cross Entropy Cost Function에 λŒ€μž…ν•˜λ©΄ 𝐻 𝐻(π‘₯), 𝑦 = βˆ’ 𝑦 1 βˆ’ 𝑦 βˆ™ log 𝐻(π‘₯) 1 βˆ’ 𝐻(π‘₯) = βˆ’ 𝑦log 𝐻 π‘₯ 1 βˆ’ 𝑦 log(1 βˆ’ 𝐻 π‘₯ ) = βˆ’π‘¦ log 𝐻 𝑋 βˆ’ (1 βˆ’ 𝑦)log(1 βˆ’ 𝐻 π‘₯ ) = 𝐢(𝐻 π‘₯ , 𝑦)
  • 26. Cross Entropy Cost Function 𝐿 = 1 𝑁 𝑛 𝐷 𝑛 π‘Œ, π‘Œ = βˆ’ 1 𝑁 π‘Œπ‘–log π‘Œπ‘– N 개의 training set 에 λŒ€ν•œ Costλ“€μ˜ ν•©
  • 27. Application & Tips Learning Rate Data Preprocessing Overshooting
  • 28. Gradient Descent Function π‘Š = π‘Š βˆ’ 𝛼 πœ• πœ•π‘Š πΆπ‘œπ‘ π‘‘(π‘Š) Learning Rate
  • 29. Learning rate : Overshooting 𝐿(π‘Š) π‘Š
  • 30. Learning rate : Too small 𝐿(π‘Š) π‘Š
  • 32. Data Preprocessing 𝑀1 𝑀2 π‘Š = π‘Š βˆ’ 𝛼 πœ• πœ•π‘Š πΆπ‘œπ‘ π‘‘(π‘Š) 𝛼 κ°€ λ³€ν•˜λ©΄μ„œ 각 weight 값듀에 λ―ΈμΉ˜λŠ” 영ν–₯이 λ‹€λ₯Ό λ•Œ μ μ ˆν•œ Learning rate을 μ°ΎκΈ°κ°€ νž˜λ“€μ–΄μ§„ λ‹€.
  • 33. Data Preprocessing : Standardization 𝑀𝑖′ = 𝑀𝑖 βˆ’ πœ‡π‘– πœŽπ‘– πœ‡π‘– ∢ π‘€μ˜ 평균 πœŽπ‘– ∢ π‘€μ˜ ν‘œμ€€νŽΈμ°¨
  • 34. Overfitting β€’ training data에 κ³Όλ„ν•˜κ²Œ μ΅œμ ν™” λ˜λŠ” ν˜„μƒ β€’ real data에 λŒ€ν•΄μ„  잘 λ™μž‘ν•˜μ§€ μ•ŠλŠ”λ‹€.
  • 36. Overfitting β€’ λ§Žμ€ μ–‘μ˜ training data둜 ν•™μŠ΅ μ‹œν‚¨λ‹€. β€’ feature(π‘₯𝑖)의 개수λ₯Ό 쀄인닀. β€’ Regularization Solution:
  • 37. Overfitting : Regularization 𝐿 = 1 𝑁 𝑛 𝐷 𝑛 π‘Œ, π‘Œ + Ξ» π‘Š2 β€’ weightκ°€ λ„ˆλ¬΄ 큰 값을 가지지 μ•Šλ„λ‘ ν•œλ‹€. => Cost ν•¨μˆ˜κ°€ ꡴곑이 μ‹¬ν•˜μ§€ μ•Šλ„λ‘ μ‘°μ •ν•œλ‹€. Regularization Strength
  • 38. Overfitting : Regularization 𝐿 = 1 𝑁 𝑛 𝐷 𝑛 π‘Œ, π‘Œ + Ξ» π‘Š2 Regularization Strength
  • 39. Application & Tips Learning and Test data sets
  • 40. Training, validation and test sets β€’ training data에 λŒ€ν•΄μ„œλŠ” 이미 정닡을 memorize ν•œ μƒνƒœμ΄κΈ° λ•Œλ¬Έμ— μ‹€μ œ real data에 잘 μž‘λ™ ν•˜λŠ”μ§€ 확인을 ν•  수 μ—†λ‹€. => Test data ν•„ μš”! β€’ ν•™μŠ΅λœ machine에 λŒ€ν•΄μ„œ μ μ ˆν•œ learning rate와 regularization strengtλ₯Ό μ°ΎκΈ° μœ„ν•œ validation μž‘μ—…μ΄ μžˆμ–΄μ•Ό ν•œλ‹€. => Validation data ν•„μš”!
  • 41. Online Learning Data Model β€’ λ„ˆλ¬΄ λ§Žμ€ μ–‘μ˜ 데이터 κ°€ μžˆμ„ λ•Œ, λΆ„ν• ν•˜μ—¬ λ‚˜λˆ„μ–΄ ν•™μŠ΅μ‹œν‚¨λ‹€. β€’ Dataκ°€ μ§€μ†μ μœΌλ‘œ 유 μž… λ˜λŠ” 경우 μ‚¬μš©λ˜κΈ° 도 ν•œλ‹€.

Editor's Notes

  • #20: 도박을 ν•  λ•Œ ν™•λ₯  값을 기반으둜 값을 λ§žμΆ˜λ‹€κ³  ν•˜μž 동전 λ˜μ§€κΈ°μ˜ 경우 H, T 0.5 0.5 μ΄λ―€λ‘œ ν™•λ₯  데이터λ₯Ό 기반으둜 닡을 λ§žμΆ”λŠ” μ˜λ―Έκ°€ μ—†λ‹€. => μ—”νŠΈλ‘œν”Ό 1 νŠΉμ • λ™μ „μ˜ 경우 H, T 0.8, 0.2 이라고 ν•  λ•Œ 이 ν™•λ₯  데이터λ₯Ό 기반으둜 Hλ₯Ό μ„ νƒν•˜λ©΄ 닡을 맞좜 ν™•λ₯ μ΄ 높아진닀. => μ—”νŠΈλ‘œν”Όκ°€ μž‘μ•„μ§„λ‹€. 즉 λͺ¨λ“  ν™•λ₯ μ˜ 값이 λ˜‘κ°™μ„ λ•Œ μ—”νŠΈλ‘œν”ΌλŠ” κ°€μž₯ λ†’κ³  νŠΉμ • 데이터에 ν™•λ₯ μ΄ μΉ˜μ€‘ λ˜μ–΄ μžˆμ„ λ•Œ μ—”νŠΈλ‘œν”ΌλŠ” μž‘μ•„μ§„λ‹€.