8. Mora Chen
)
f :sigmoid function
=Logistic function
?活化函數
?Activation function
Change activation function
輸入層 隱藏層 輸出層
b
1
下雨 晴天
Step function
非線
性
機率
f ( )
18. Mora Chen
訓練資料 模型 輸出 正確
Initial value
Forward
Propagation
Cost function
Model 1
Model 2
Model 3
…
挑選 最好model
平均而言cost function 最小
flowchart
? 找出w ,b,使得cost function=0
? 找出w ,b ,使得 cost fucntion最小
(Gradient descent:找局部的最小值)
)
以整體cost function 的角度來挑選參數
Derivative: scalar-valued
Gradient: a multi-variable generalization
of the derivative
36. Mora Chen
How many parameters? (kernel size)
C5
120x(5x5x16+1)=48120
F6
(120+1)x84
激活函數為Atanh(Sx),
A,S超參數
84
代表7*12的照
片flatten(白
值:-1,黑值=1)
主要是處理連
續型ASCII 數
字,不是單獨
數字? Why?
37. Mora Chen
How many parameters?
output 層:
Euclidean Radial Basis Function
: the unnormalized negative log-likelihood of
a Gaussian distribution in F6 (why?)
,
Force the F6 operate in their maximally non-
linear range.
39. Mora Chen
Paper還有很多細節,仍需努力
? Saturation of the sigmoids must be avoided
because it is known to lead slow convergence
and ill-conditioning of the loss function.
? Saturation of the sigmoids?斜率沒有變化的
區域