The document discusses distances between data and similarity measures in data analysis. It introduces the concept of distance between data as a quantitative measure of how different two data points are, with smaller distances indicating greater similarity. Distances are useful for tasks like clustering data, detecting anomalies, data recognition, and measuring approximation errors. The most common distance measure, Euclidean distance, is explained for vectors of any dimension using the concept of norm from geometry. Caution is advised when calculating distances between data with differing scales.
NagoyaStat #1 で用いた発表資料になります。主な内容は統計モデリングの考え方と、ポアソン分布に従うデータに対して最尤推定法を適用する方法です。
This slide is used at NagoyaStat #1 on August 6, 2016. Main contents are way of thinking of statistical modeling and applying Maximum Likelihood Estimation to data following poisson distribution.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
NagoyaStat #1 で用いた発表資料になります。主な内容は統計モデリングの考え方と、ポアソン分布に従うデータに対して最尤推定法を適用する方法です。
This slide is used at NagoyaStat #1 on August 6, 2016. Main contents are way of thinking of statistical modeling and applying Maximum Likelihood Estimation to data following poisson distribution.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
22. 引用?参考文献
Fleming, S. M. (2017). HMeta-d: hierarchical Bayesian estimation of metacognitive efficiency
from confidence ratings. Neuroscience of Consciousness, 1, 1-14.
Fleming, S. M., & Lau, H. C. (2014). How to measure metacognition. Frontiers in human
neuroscience, 8, 1-9.
Galvin, S. J., Podd, J. V., Drga, V., & Whitmore, J. (2003). Type 2 tasks in the theory of signal
detectability: Discrimination between correct and incorrect decisions. Psychonomic
Bulletin & Review, 10, 843-876.
草薙邦広 (2018). 外国語教育研究における第二種信号検出モデル: 基本の理解とベイジ
アンモデリング 広島外国語教育研究, 21, 169-185.
Lee, M.D., & Wagenmakers, E. J. (2013). Bayesian Cognitive Modeling: A Practical Course.
Cambridge: Cambridge University Press.
(リー, M. D. & ワーゲンメイカーズ, E. J. 井関龍太(訳) 岡田謙介(解説) (2017).
『ベイズ統計で実践モデリング 認知モデルのトレーニング』 北大路書房)
Maniscalco, B., & Lau, H. C. (2014). Signal detection theory analysis of Type 1 and Type 2
data: Meta-d’, response-specific Meta-d’, and the unequal variance SDT model. In:
Fleming, S. M., Frith, C. D. (Eds), The Cognitive Neuroscience of Metacognition. Berlin
Heidelberg: Springer, 2014.
三好清文 (2016). 再認記憶データモデリング 心理学評論, 59, 367-386.
Pallier, C. (2002). Computing discriminability and bias with the R software (http://www.pallier.
org/pdfs/aprime.pdf)
豊田秀樹 (2017). 『実践ベイズモデリング:解析技法と認知モデル』東京:朝倉書店.