This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
This document discusses causal discovery and its application to analyzing predictive models. It introduces causal discovery as the unsupervised learning of causal relations from data to estimate causal structures like directed acyclic graphs under certain assumptions. The document then discusses using causal discovery to analyze the mechanisms of predictive models by combining causal models with predictive models to model how interventions on features affect predictions. An example using an auto MPG dataset demonstrates how this approach can suggest which variable has the greatest intervention effect on MPG predictions.
A non-Gaussian model for causal discovery in the presence of hidden common ca...Shiga University, RIKEN
?
1) The document proposes a Bayesian linear non-Gaussian structural equation model (SEM) approach for estimating causal direction between observed variables in the presence of hidden common causes.
2) Rather than explicitly model the hidden common causes, the approach transforms the model to one without hidden causes by including observation-specific intercepts representing the sums of hidden causes.
3) The approach compares marginal likelihoods of the transformed models under different causal directions to select the most likely direction, without needing to specify the number or distributions of hidden causes. It was shown to successfully estimate causal directions on a sociology data set.
1. The document discusses estimating causal direction between two variables in the presence of hidden common causes.
2. A key challenge is that the hidden common causes introduce dependence between the error terms, making regression coefficients an unreliable guide to causal direction.
3. The author proposes a non-Gaussian structural equation model that can estimate causal direction without specifying the number of hidden common causes by exploiting the fact that different causal directions imply different distributions over the data, even when the error terms are dependent.
Discovery of Linear Acyclic Models Using Independent Component AnalysisShiga University, RIKEN
?
This document discusses the discovery of linear acyclic models from non-experimental data using independent component analysis (ICA). It describes how existing methods assume Gaussian disturbances, producing equivalent models, whereas the proposed LiNGAM approach assumes non-Gaussian disturbances. This allows identifying the connection strengths and structure without equivalent models. The LiNGAM algorithm estimates the matrix B using ICA and post-processing, finds a causal order, and prunes non-significant edges. Examples show LiNGAM can correctly estimate networks and the document concludes it is an important topic with code available online.
DirectLiNGAM is a new non-Gaussian estimation method for the LiNGAM model that directly estimates the variable ordering without using ICA. It iteratively identifies exogenous variables using independence tests between each variable and its residuals from regression on other variables. This allows it to estimate the ordering in a fixed number of steps, with no algorithmic parameters, convergence issues, or scale dependence like previous ICA-based methods. It was shown to estimate the correct causal ordering on a real-world socioeconomic dataset, matching domain knowledge better than alternative methods like ICA-LiNGAM, PC algorithm, and GES.
This document summarizes linear non-Gaussian structural equation modeling (SEM). It introduces linear SEM and its limitations due to indistinguishable models based on covariance structure alone. Linear non-Gaussian SEM uses non-Gaussian distributions of the external influences to distinguish between models. Independent component analysis (ICA) is then used to estimate the model, relating it to linear non-Gaussian acyclic models (LiNGAM). The key steps of LiNGAM involve using ICA to estimate the model, finding the correct permutation to eliminate zeros in the diagonal, and pruning the estimated model matrix to identify actual zero paths. Simulations demonstrate accurate estimation of the model matrix B.
Non-Gaussian Methods for Learning Linear Structural Equation Models: Part IShiga University, RIKEN
?
This document provides an overview of a tutorial on non-Gaussian methods for learning linear structural equation models (SEMs). The tutorial will cover how linear SEMs can be used to model data generating processes and review a new approach that utilizes non-Gaussianity of data for model identification. The tutorial is divided into two parts, with the first part providing an overview of linear SEMs and the identifiability problems of conventional methods. The second part will discuss recent advances in applying these non-Gaussian methods to time series data and models with latent confounders.
16. の計算
16
y x
f
ey ex yy
xx
efxby
efx
???
??
?
?
x?y?
b
y x
f
ey yy efxby
x
???
?
?
y?
b
? ?cxdo ?xの値を強制的にcにする
c
c
? ?? ? ? ? ? ?yy eEfEbccxdoyE ???? ?|下のモデルでの E(y) が
? ?? ?cxdoyE ?|
介入前のデータ生成過程 (自然におまかせ)
17. 逆に、yの値を変化させたら?
17
y x
f
ex
? ?? ? ? ?? ?
? ? ? ? ? ? ? ?? ?xxxx eEfEeEfE
cydoxEdydoxE
????
????
??
||)( 因果効果平均
?
x?
0
?y c
“ちゃんと”ゼロになる
(注: xをyに回帰しても, 回帰係数 ≠ 0)
c
yy
xx
efbxy
efx
???
??
?
?
18. ? 因果グラフは既知としよう
? 非巡回有向きグラフの場合:
– 十分条件 (Pearl, 1995)
xの親を観測して調整:
– 線形なら
– 因果グラフ(因果構造)を知る必要がある
? 結果変数qや中間変数uは説明変数に入れちゃダメ
? ? ? ?? ?の親の親 xxyEExdoyE x ,|)|( ?
y x
z
w
u
v
q
? ?? ? ? ?? ?
? ?cdx
cxdoyEdxdoyE
???
???
の偏回帰係数
||
18
因果効果の識別性
26. ? PCアルゴリズム (Spirtes+91)
– Skeletonの推定
? xとy独立 | s となるような変数集合(空ok)があれば辺なし
– 残った辺に向きをつける
? V字合流
– 構造から示唆される有向辺 (Meek95UAI): Complete
? 例: 非巡回になるように
制約ベースの推定法
26
x y
z
x y
z
初期
グラフ
x,y独立だが
x y
z
zで条件づける
とx,y従属
? 検定で独立性を
判定
? 一致性: 閾値を
未知量に合せる
? Sparseなら
1000変数5分
(Kalisch+07JMLR)
V字合流
29. ? Linear Non-Gaussian Acyclic Model (LiNGAM)
(Shimizu+06JMLR)
? データXから因果方向, 係数が識別可能
? 忠実性不要
LiNGAMモデル
29
i
ij
jiji exbx ?? ??
x1 x2
x3
21b
23b13b
2e
3e
1e
- 非巡回
- 非ガウス外生変数(誤差) ei
- ei は互いに独立
(潜在共通原因なし)
)( eBxx ??
行列表現
47. ? とは、do(x=d)のモデルでのゼウスのy
外生変数 の値で、個体(+状況)を特定
( は、yの値を決めるx以外の要因すべてを含む)
再訪: 個体における因果 (Pearl, 2000)
? ?
ゼウス
ゼウスゼウス
yyx
yydx
edb
edfy
??
?? ,
yyx
x
exby
ex
??
?
モデル1:
x
y
ex
ey yyx exby
dx
??
?
モデル1’ do(x=d):
x
y
d
ey
ye
47
ゼウス
dxy ?
ye
ゼウス
xe
ゼウス
ye ゼウスのデータを作るときにつかったeyの値