The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
[DL輪読会]Recent Advances in Autoencoder-Based Representation LearningDeep Learning JP
?
1. Recent advances in autoencoder-based representation learning include incorporating meta-priors to encourage disentanglement and using rate-distortion and rate-distortion-usefulness tradeoffs to balance compression and reconstruction.
2. Variational autoencoders introduce priors to disentangle latent factors, but recent work aggregates posteriors to directly encourage disentanglement.
3. The rate-distortion framework balances the rate of information transmission against reconstruction distortion, while rate-distortion-usefulness also considers downstream task usefulness.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
[DL輪読会]Recent Advances in Autoencoder-Based Representation LearningDeep Learning JP
?
1. Recent advances in autoencoder-based representation learning include incorporating meta-priors to encourage disentanglement and using rate-distortion and rate-distortion-usefulness tradeoffs to balance compression and reconstruction.
2. Variational autoencoders introduce priors to disentangle latent factors, but recent work aggregates posteriors to directly encourage disentanglement.
3. The rate-distortion framework balances the rate of information transmission against reconstruction distortion, while rate-distortion-usefulness also considers downstream task usefulness.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
[Paper Reading] Causal Bandits: Learning Good Interventions via Causal InferenceDaiki Tanaka
?
paper reading : [NIPS 2016] Causal Bandits: Learning Good Interventions via Causal Inference
https://papers.nips.cc/paper/6195-causal-bandits-learning-good-interventions-via-causal-inference.pdf
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
?
This document proposes two new algorithms, L-SHAPLEY and C-SHAPLEY, for interpreting black-box machine learning models in an instance-wise and model-agnostic manner. L-SHAPLEY and C-SHAPLEY are approximations of the SHAPLEY value that take graph structure between features into account to improve computational efficiency. The algorithms were evaluated on text and image classification tasks and were shown to outperform baselines like KERNELSHAP and LIME, providing more accurate feature importance scores according to both automatic metrics and human evaluation.
Selective inference is a statistical framework that accounts for selection bias when using feature selection methods like Lasso. When features are selected from a larger set for inclusion in a model, directly interpreting p-values from fitting that model can be misleading without correcting for the selection process. Selective inference provides adjusted confidence intervals to correctly assess whether selected features have statistically significant effects while controlling for the selection bias introduced by the feature selection method.
Anomaly Detection with VAEGAN and Attention [JSAI2019 report]Daiki Tanaka
?
Daiki Tanaka from Kyoto University proposes a method to detect anomaly images using deep generative models while correcting for noisy areas that could be misidentified as anomalies. The method trains an autoencoder with a GAN discriminator that learns to focus on major image areas rather than noise. At test time, it calculates the reconstruction error between the original and reconstructed image, weighted by the discriminator's attention weights to discount noisy pixels. On MNIST data with added noise, the method outperforms other deep generative models in anomaly detection as measured by ROC-AUC scores.
[Paper Reading] Attention is All You NeedDaiki Tanaka
?
The document summarizes the "Attention Is All You Need" paper, which introduced the Transformer model for natural language processing. The Transformer uses attention mechanisms rather than recurrent or convolutional layers, allowing for more parallelization. It achieved state-of-the-art results in machine translation tasks using techniques like multi-head attention, positional encoding, and beam search decoding. The paper demonstrated the Transformer's ability to draw global dependencies between input and output with constant computational complexity.
Local Outlier Detection with InterpretationDaiki Tanaka
?
This paper proposes a method called Local Outlier Detection with Interpretation (LODI) that detects outliers and explains their anomalousness simultaneously. LODI first selects a neighboring set for each outlier candidate using entropy measures. It then computes an anomaly degree for each object based on its deviation from neighbors in a learned 1D subspace. Finally, LODI interprets outliers by identifying a small set of influential features. Experiments on synthetic and real-world data show LODI outperforms other methods in outlier detection and provides intuitive feature-based explanations. However, LODI's computation is expensive and it assumes linear separability, which are limitations for future work.
1) The document discusses LIME (Local Interpretable Model-Agnostic Explanations), a method for explaining the predictions of any machine learning model. LIME works by training an interpretable model locally around predictions to approximate the original model.
2) Experiments show that LIME explanations help human subjects select better performing classifiers, identify features to improve classifiers, and gain insights into how classifiers work.
3) SP-LIME is introduced to select a representative set of predictions to provide a global view of a model, by maximizing coverage of important features.
The Million Domain Challenge: Broadcast Email Prioritization by Cross-domain ...Daiki Tanaka
?
This document summarizes a research paper presented at Kyoto University. The paper proposes a framework called CBEP to prioritize broadcast emails. CBEP addresses three challenges: sampling user feedback, selecting optimal source domains to transfer knowledge from, and predicting email priority. It uses a matrix factorization technique called alternating least squares to model user and item latent factors from feedback data. The method was tested on a dataset of emails and view logs from Samsung mailing lists.
The Limits of Popularity-Based Recommendations, and the Role of Social TiesDaiki Tanaka
?
This document summarizes a research paper that models how recommender systems can influence product popularity in markets. It presents a model that simulates user purchases based on personal preferences and recommendations from social connections. Experiments on this model using real social network data found that the recommender system did not significantly distort the market shares of different products. However, adding a "super-node" that strongly recommends one product to all users did substantially distort the market in favor of that product.
Learning Deep Representation from Big and Heterogeneous Data for Traffic Acci...Daiki Tanaka
?
The document describes a study that used deep learning to predict traffic accident risk levels based on human mobility data. The researchers trained a stacked denoising autoencoder model on GPS records from 1.6 million people to learn representations of human mobility patterns. They then used these representations along with 300,000 records of past traffic accidents to predict accident risk levels on a grid map. The model outperformed baseline methods like decision trees and logistic regression in predicting traffic accident risk levels.
9. 6.1.1 負定値カーネル
[証明]
? ): を負定値とする.ci 2 C (i = 1; : : : ; n) を任意にとり,c0 := `
Pn
i=1 ci
とすれば, の負定値性から任意の x0; x1; : : : ; xn 2 X に対して,
nX
i=0;j=0
ci—cj (xi; xj) ? 0
が成り立つ.上式の左辺は i = 0; j = 0 の場合を外に出せば:
n
X
i=0;j=0
ci—cj (xi; xj) =
n
X
i=0
n
X
j=0
ci—cj (xi; xj)
=
n
X
i;j=1
ci—cj (xi; xj) + c0
n
X
i=1
ci (xi; x0) + c0
n
X
j=1
cj (x0; xj) + jc0j2
(x0; x0)
=
n
X
i;j=1
cicj (xi; xj) `
n
X
i;j=1
cicj (xi; x0) `
n
X
i;j=1
cicj (x0; xj) +
n
X
i;j=1
cicj (x0; x0)
= `
n
X
i;j=1
cicj’ (xi; xj)
となって,
Pn
i;j=1 cicj’ (xi; xj) – 0 から ’ は正定値である. ?
9
13. 6.1.2 カーネルを生成する操作
命題 6.8
: X ? X ! C を,集合 X 上の負定値カーネルとする.任意の x; y 2 X について
(x; y) – 0 を満たす時,任意の ? > 0 について
log (? + (x; y))
は負定値カーネルとなる.また, (x; y) > 0 である時,
log ( (x; y))
は負定値カーネルとなる.
[証明]:積分表示
log (1 + (x; y)) =
Z 1
0
“
1 ` e`t (x;y)
” e`t
t
dt
により,命題 6.6 と同様に被積分関数が負定値カーネルであることから
log (1 + (x; y)) は負定値カーネルである.したがって,
log (? + ) = log
`
1 + 1
?
?
+ log ? も負定値である.
[Remark]:
命題 6.1(3):
「任意の関数 f に対して, (x; y) = f (x) + f (y) は負定値カーネル.」
から, (x; y) := x + y は R 上の負定値カーネルであるので,
(x; y) = log (x + y) は (0; 1) 上の負定値カーネルである.13
14. 6.1.2 カーネルを生成する操作
以下の命題を用いれば,負定値カーネルから正定値カーネルを生成できる.
命題 6.9 負定値カーネルから正定値カーネルを生成
負定値カーネル が Re (x; y) – 0 を満たす時,
1
(x; y) + a
は正定値カーネルである.ただし,a は正の定数.
[証明]:積分表示
1
(x; y) + a
=
Z 1
0
e`t( (x;y)+a)
dt
より,命題 6.6 と同様にして被積分関数の正定値性から,正定値性をえる. ?
14
16. 6.2 Bochner の定理
Rn 上のカーネル k が平行移動不変である,とは Rn 上の関数:? があって,
k (x; y) = ? (x ` y) と書けることである.(2 要素の差にのみ依存するカーネル e.g.
RBF カーネル) カーネルが平行移動不変であることは
k (x; y) = k (x + z; y + z) (8z 2 Rn) と同値である.
定義:正値関数
Rn 上の関数 ? が正値である,とは
k (x; y) := ? (x ` y)
により定義されるカーネル k が正定値であることをいう.
定理 6.10 Bochner の定理
? を Rn 上の複素数連続関数とする.この時 ? が正値であることの必要十分条件は,Rn 上
の有限な非負 Borel 測度 ? があって,
? (x) =
Z
e
p
`1!Tx
d? (!) (6.1)
と表されることである.
16
17. 6.2 Bochner の定理:証明
? 十分性:
? (x) =
Z
e
p
`1!Tx
d? (!)
と表されるとする.
e
p
`1!T(x`y) = e
p
`1!Txe`
p
`1!Ty = e
p
`1!Txe
p
`1!Ty であるから
(純虚数 z に対して `z = z であることと exp
`
z
?
= exp (z) を使った),以下の
カーネル:
K (x; y) := ? (x ` y) =
Z
e
p
`1!Tx
e
p
`1!Tyd? (!)
の被積分関数は命題 2.5(2) から正定値カーネルである.よって,その積分値として得
られる K も正定値カーネルであり,? は正値である. ?
? 必要性: 省略.
Bochner の定理は,任意の正値連続関数が fe
p
`1!Tx j ! 2 Rng の非負結合とし
て表されることを主張している.
17
18. 6.2 周波数領域で見た RKHS, 命題 2.19
平行移動不変な正定値カーネルは周波数領域において陽な形で表現できる.(e.g. RBF
カーネル,ラプラスカーネル)
平行移動不変なカーネル K が以下のような形をもつと仮定する.
K (x; y) =
Z
e
p
`1!T(x`y)
? (!) d!
ただし,? は連続で,? (!) > 0;
R
? (!) d! < 1.
この時,K を再生核とする RKHS:HK は
HK =
(
f 2 L2
(R; dx) j
Z ?
? ^f (!)
?
?2
? (!)
d! < 1
)
hf; gi =
Z
^f (!) ^g (!)
? (!)
d!
ただし, ^f は f の Fourier 変換: ^f (!) = 1
(2?)m
R
f (x) e`
p
`1!Txd!
18
41. 正定値カーネルに対応する RKHS の陽な表示
Mercer の定理を用いると,正定値カーネルに対応する RKHS と,その内積の陽な表示を
与えることができる.(2.2.2.b:有限集合上の RKHS の表示の拡張として与えることがで
きる)
Mercer の定理と同じ条件の元,積分作用素 TK の非ゼロ固有値に対応する単位固有ベクト
ルに N (TK ) の正規直交基底を付け加えて L2 (˙; —) の完全正規直交基底 f?ig1
i=1
を構
成する.すると,f?ig1
i=1
はシャウダー基底となり,任意の f 2 H はそれらの基底の線
型結合として:
f =
1X
i=1
ai?i (ただし ai 2 R)
とできる.これを用いて L2 (˙; —) の部分ベクトル空間 H を
H :=
8
<
:
f 2 L2
(˙; —) j f =
1X
i=1
ai?i;
1X
i=1
jaij2
–i
< 1
9
=
;
(6.11)
により定義する.また,f =
P1
i=1 ai?i 2 H と g =
P1
i=1 bi?i 2 H に対して,内
積を
hf; giH :=
1X
i=1
aibi
–i
(6.12)
と定める.このように定めた H が K を再生核とする RKHS になることを示す.41
42. H が正定値カーネルに対応する RKHS であること:証明 1
まず H がヒルベルト空間である (完備である) ことを示す.
ffng1
n=1 を H の Cauchy 列とする.式 6.11 から fn =
P1
i=1 ?n;i?i とおくことが
できて,
P1
i=1
j?n;ij
2
–i
< 1 であるので,
tn :=
(
?n;i
p
–i
)1
i=1
は数列空間 l2 の Cauchy 列である.l2 は完備なので,ある t = f?ig1
i=1 2 l2 が存在
し,tn ! t (n ! 1) である.この時,??
i
:=
p
–i?i とすると,
1X
i=1
?
???
i
?
?2
–i
=
1X
i=1
j?ij2
< 1
`
f?ig1
i=1 2 l2
から
?
1X
i=1
?
?an;i ` a?
i
?
?2
–i
! 0
が成り立つ.すると,f =
P1
i=1 a?
i
?i 2 H に対して,kfn ` fkH ! 0 を得る.
42
43. H が正定値カーネルに対応する RKHS であること:証明 2
次に,H が K を再生核に持つことを示す.
K (x; ?) =
P1
i=1 –i?i (?) ?i (x) において,Mercer の定理から K (?; x) 2 H であ
る.8f =
P1
i=1 ai?i 2 H に対し,
hf; K (?; x)iH =
1X
i=1
ai–i?i (x)
–i
=
1X
i=1
ai?i = f (x)
となり,再生性が確認できた. ?
以上から,正定値カーネルに対応する RKHS は式 6.11 で与えられる H に一致し,その
内積は式 6.12 のように,級数として与えられる.
43
44. ヒルベルト空間上の確率変数の平均
H:ヒルベルト空間 (関数空間),F :H 上の確率変数.ただし,E[kF k] < 1.
この時,f 2 H に対して H 上の線形汎関数 ?F : H ! R を以下のように定める:
?F (f) := E[hf; F i]
リースの表現定理から,任意の f 2 H に対して,ある mF 2 H が存在し,
hf; mF i = ?F (f) が成り立つ.よって
?F (f) = E[hf; F i] = hf; mF i (8.1)
をえる.この mF を確率変数 F の平均と呼び,E[F ] で表す.この時,
E[hf; F i] = hf; mF i = hf; E[F ]i
となり,平均と内積の操作は交換可能である.
44
45. RKHS における平均
(X; B):可測空間,X:X に値をとる確率変数,RKHS:(Hk; k) を考える.ただし,
E[
p
k (X; X)] < 1 を仮定する.
特徴写像 ? (x) = k (?; x) に対して,再生性から
k? (X)k2
= hk (?; X) ; k (?; X)i = hkX ; kX i = kX (X) = k (X; X)
が成り立つことに注意すれば,E k? (X)k < 1 となり,前項の仮定を満たし,確率変数
? (X) の平均 mk
X が存在する.この時,mk
X を,X の Hk における平均,と呼ぶ.式
(8.1) および再生性から,任意の f 2 H に対して
˙
f; mk
X
?
= E[hf; ? (X)i] = E[f (X)] (8.2)
となり,任意の f に対して期待値 E[f (X)] が f と mk
X の内積で表される.
平均 mk
X の陽な表示を求める.mk
X 2 H から,任意の y 2 X について,再生性を用い
ると
mk
X (y) =
˙
mk
X ; k (?; y)
?
= hE (? (X)) ; k (?; y)i
= E hk (?; X) ; k (?; y)i = E[k (X; y)] (8.8)
となって,平均 mk
X はカーネル関数の期待値として与えられる.
45