際際滷

際際滷Share a Scribd company logo
??? ?? ??? ??
???? ??? ??
pooling
ReLU
Momentum
dropout
??? ??
???
Tanh
Softmax
???
Learning rate
VGG
Sigmoid
backpropagation
?????
?? ???? ????
??? ?? ??
???? ???? ?? ? ??
??/?? ???? ?? Backpropagation
3?? ??? CNN, RNN
??????
`?? ???? ??? ???? ??¨
Data? Model? ??? ????
???? ?? ??
1957
?? ??? ??
(Perceptron) ??
???? ?? ??
1957 1986
?? ??? ??
(Perceptron) ??
? ???(30?)
???? ?? ??
1957 1986
?? ??? ??
(Perceptron) ??
? ???(30?)
1969
?? ?? ???? ??
Perceptron? ???? ??? ? ??? ??? ? ??? ??
??? ???? ??????, ??? ????? ????? parameter
?? ??? ?? ???
10? 20?
???? ?? ??
1986
??? ?? ?? ??
Data? Model? ??? ???? Backpropagation ??
? ?? ?????? ???? ?? ??? ??? ?? ??
??? weight? bias? ???? ?? ??
???? ?? ??
1986
??? ?? ?? ??
1990s
BOOM
?????2 ??? ?(1991)
???? ?? ??
1990s
?? ??
??? ???(10?)
???? ????? ??? ??? ?? ? ??.
??? ??? ???? ?? ??
??? ??? ?? ??? ??? 10?? ??
1. Overfitting
2. Vanishing gradient
3. Too slow
SVM, Random Forest ??
???? ?? ??
1990s
?? ?? ?? ?? ???
2000s
3?? ??? ???? ??
1.Overfitting
2.Vanishing Gradient
3.Too slow
GPU? ??? ?? ??
????? ??
BOOM
??? ???? ??
1957
?? ??? ??
(Perceptron) ??
2000s1969
?? ?? ???? ??
1986
??? ????
(Backpropagation)
3?? ??? ????
??? ??? 1 ?????? N´
??? ???? ??
??? ??? 1 ?????? N
W
a y
a y
W W
Backpropagation
????
(Activation Function)
f(a)
????, tanh,
Sigmoid, ReLU, ELU ?
Softmax(a)
Drop out
Weight Update
???
(Optimization)
SGD, AdaGrad,
Momentum, Adam ?
??? (Normalization)
???? (Loss Function)
Batch Size
Learning rate
epoch? ?? (Layer size)
?? ?? (Unit size)
??/?? ???? ??
??/?? ???? ??
???? ?? ???? ????
Output = x1*w1 + x2*w2 + bias
??? #input #weight #bias #output
#activation_function
x1
Output
bias
x2
w1
w2
Weight? bias? ???? ?? ???? ?? ??!
x1
Output
bias
x2
w1
w2
??/?? ???? ??
???? ?? ???? ????
??? #input #weight #bias #output
#activation_function
Weight? bias? ???? ?? ???? ?? ??!
x1
Output
bias
x2
w1
w2
??/?? ???? ??
???? ?? ???? ????
??? #input #weight #bias #output
#activation_function
??? #hidden_layer
h(f(x))
bias
yx1
x2
??? ??? ???
h(x1*w1 + x2*w2 + bias*1)
#activation_function
??/?? ???? ??
???? ??, Deep Learning
?????
(Activation Function)
????, tanh,
Sigmoid, ReLU, ELU ?
?????
(Activation Function)
softmax
???? 2? ?? = Deep Learning
??? #input #weight #bias #output
#activation_function
Output = h(x1*w1 + x2*w2 + x3*w3 + bias)
x1
bias
a
x2
w1
w2
Activation Function
(??? ??)
y
h( )
? ??? ??? ????
??/?? ???? ??
???? ?? ???? ????
??? #input #weight #bias #output
#activation_function
Output = h(x1*w1 + x2*w2 + x3*w3 + bias)
x1
bias
a
x2
w1
w2
Activation Function
(??? ??)
y
h( )
??? ??? ?? ???,
Linear regression? ???.
(??? y = A?X + b? ??)
??/?? ???? ??
???? ?? ???? ????
??? #hidden_layer
Output
??? ??? ???
x1*w11 + x2*w12 + bias1
#activation_function
??/?? ???? ??
???? ??, Deep Learning
x1*w21 + x2*w22 + bias2
x1*(w11 +w21) +
x2*(w12 + w22 ) +
(bias1 +bias2)
??? #input #weight #bias #output
#activation_function
Output = h(x1*w1 + x2*w2 + x3*w3 + bias)
x1
bias
a
x2
w1
w2
Activation Function
(??? ??)
y
h( )
??? ??? ?? ??? ????,
Linear regression? ???.
(??? y = A?X + b? ??)
??/?? ???? ??
???? ?? ???? ????
??? #input #weight #bias #output
#activation_function
Output = h(x1*w1 + x2*w2 + x3*w3 + bias)
x1
bias
a
x2
w1
w2
Activation Function
(??? ??)
y
h( )??? ??(non-linear function)
???? ?? ???
?? ??? ???? ??? ??? ??? ??
??/?? ???? ??
???? ?? ???? ????
??? #input #weight #bias #output
#activation_function
ylinear
linear
linear
sigmoidx2
x1
x3
Q. ???? ???? ??, ???? ????? ??
??/?? ???? ??
???? ?? ???? ????
??? #input #weight #bias #output
#activation_function
ylinear
linear
linear
sigmoidx2
x1
x3
Q. ???? ???? ??, ???? ????? ??
??/?? ???? ??
???? ?? ???? ????
???? ?? ??
Non-linear ??? ??? ????
?? ???? ???
??? ??? ??? ??
???? ???? ??? ???
?? ??? ???
?? ? ?? ?????
??? ???? ???? NN(???3?) NN(???50?)
3?? ???
1. Overfitting
2. Vanishing Gradient
3. Too slow
????
????
????
1. Overfitting
2. Vanishing Gradient
3. Too slow
????
????
????
??? #overfitting #vanishing_gradient
3?? ???
Overfitting ???
??? #overfitting #vanishing_gradient
?? ????? ? ?? ??
? ????
CASE 1 CASE 2
3?? ???
Overfitting ???
???? ??? ?????, ??? ???? ???
??? #dropout
3?? ???
Overfitting ?? ??
Output
??? #dropout
3?? ???
Overfitting ?? ??
Output
??? #dropout
3?? ???
Overfitting ?? ??
Output
??? #dropout
3?? ???
Overfitting ?? ??
1. Overfitting
2. Vanishing Gradient
3. Too slow
????
????
????
1. Overfitting
2. Vanishing Gradient
3. Too slow
????
????
????
???? ???
???? ?? ???? ????
Backpropagation
??? #backpropagation
input

Output
input
 Forward Propagation
∠ Backpropagation
(`?? ??¨? ???? ??)
 ?? ??
(?? ??)
⊥ weight, bias ??
Backpropagation
???? ?? ??
??? #backpropagation
input

Output
input
 Forward Propagation
∠ Backpropagation
(`?? ??¨? ???? ??)
 ?? ??
(?? ??)
⊥ weight, bias ??
Backpropagation
???? ?? ??
??? ??
-???? ????
?? ???? ?????
??? = 0? ???? ??
??? ??
-???? ????
?? ????? ????
??? #backpropagation
Backpropagation
`?? ??? ???¨?
`?? ??? ???¨ = ??? ?????? = f(?????, ???)
1
0
0.5
Vanishing gradient? ?? ???? ?? 0? ????,
??? ??? ???? ??? ? ??? ??? ? ??? ???? ?
3?? ???
Vanishing Gradient??
0.25
0
sigmoid
Sigmoid?
?? ??
1
0
0.5
Sigmoid
1
0
-
1
Tanh
ReLU(Rectified Linear Unit)
h(x) =
x (x ? 0)
0 (x + 0)
?? ????? ???? 1
3?? ???
Vanishing Gradient ????
?dead neuron?
????? ?????? ?? ?? 6?? ???!
1. Overfitting
2. Vanishing Gradient
3. Too slow
????
????
????
??? #cost_function #gradient_descent
#learning_rate
??? ??? ???? bias, weight? ?? ?
????
inp
ut

Outpu
t
inp
ut
∠ Backpropagation
(`?? ??¨? ???? ??)
 ?? ??
⊥ weight, bias ??
? ???? ??? cost
function? ???(gradient)?
??? cost? ???
??(descent)?? ??????
?? ?? ??(learning rate)?
???? ??.
W = W - α?J(W,b)
3?? ???
?? ???
??? #cost_function #gradient_descent
#learning_rate
3?? ???
?? ???
??? ??
-???? ????
?? ???? ?????
??? = 0? ???? ??
??? ??
-???? ????
?? ????? ????
?? ??? ??
???? ??? ? ??
??
´
??!
Step 1. ?? parameters? train set ??
??? ?? ?? ??
Step 2. ?? ??? ???? ??? ??
? ?? ?? ??
Step 3. ?? ?? ???? ????
Step 4. ??? ??? ??? ? ??
parameter? ??
Step 5. ??? parameter? ??? ??
train set ?? ???? ?? ??
??
Step 6. ??
´
?? ????
bias? weight?
?? ???
??? #cost_function #gradient_descent
#learning_rate
3?? ???
?? ???
SGD(Stochastic Gradient Descent)
-??? ?? ???
`????? ???? ??? ???¨? ??
???? ?? ???
https://www.coursera.org/learn/machine-learning/lecture/DoRHJ/stochastic-gradient-descent
??? #SGD
Gradient Descent
Stochastic
Gradient Descent
Mini-batch
Gradient Descent
W = W-α?J(W,b) W = W-α?J(W,b,x(z),y(z))W = W-α?J(W,b,x(z:z+bs),y(z:z+bs))
1??
1??
1??
1??
1??
1??
1??
´
´
3?? ???
Stochastic Gradient Descent
??? #SGD
3?? ???
SGD? ???
SGD?? ??? ???
??? ??? ??? ?? ??? ??
????? ?????
?? ?? ???? ????
???? ??
???? ?????
Local minimum? ??? ??
Local minima
Global minimum
??? #learning_rate
3?? ???
SGD? ??
#momentum
????? ???? ??? Optimizer ??
????,
SGD
??
???
Momentum
NAG
Adagrad Adadelta
RMSProp
Adam
???? ? ????,
? ?? ?? ?? ?? ??
?? ???? ???? ??,
?? ??? ????
???? ? ????,
??? ? ??? ????
?? ???? ?? ??
? ?? ?? ?? ?? ?
?? ???
Momentum?? ?? ?
?? gradient ?? ??
???? ?? ?????
??? ??
3?? ???
Optimizer ??? ??
??????
ReLUinput
Softmax
input ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
???? ?? ??: Backpropagation
cross entropy
1. ?? ?? ??: Dropout 2. ??? ??: ReLU, softmax
dropout
3. ?? ?? ??: ReLU4. ?? ?? ??: Adam
Coming Soon
?????.

More Related Content

??? ?? ??? ??