The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
?
This document summarizes a paper that proposes a new topic modeling method called SC-LDA that incorporates prior knowledge about word correlations into LDA. SC-LDA uses a factor graph to encode must-link and cannot-link constraints between words based on an external knowledge source. It then integrates this prior knowledge into the LDA inference process to influence the topic assignments. The paper experiments with SC-LDA on several datasets and knowledge sources, finding it converges faster than baselines and produces more coherent topics.
ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...Shuyo Nakatani
?
The document summarizes two papers on language modeling techniques. [Zhang+ ACL2014] proposes applying Kneser-Ney smoothing to expected counts when training data has fractional weights, outperforming other methods on a domain adaptation task. [Pickhardt+ ACL2014] presents a generalized language model combining skipped n-grams and modified Kneser-Ney smoothing, reducing perplexity by 25.7% on small training data. The talks also reviewed modified Kneser-Ney smoothing and its application to language model adaptation.
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...Shuyo Nakatani
?
This document summarizes the Dirichlet Process with Mixed Random Measures (DP-MRM) topic model. DP-MRM is a nonparametric, supervised topic model that does not require specifying the number of topics in advance. It places a Dirichlet process prior over label-specific random measures, with each measure representing the topics for a label. The generative process samples document-topic distributions from these random measures. Inference is done using a Chinese restaurant franchise process. Experiments show DP-MRM can automatically learn label-topic correspondences without manual specification.
Short Text Language Detection with Infinity-GramShuyo Nakatani
?
This document discusses short text language detection using n-gram analysis. It presents an existing language detection library called language-detection that uses character 3-grams and Bayesian filtering. The library achieves over 99% accuracy on 53 languages when trained on Wikipedia text. The document also reports on evaluations of the library and another method (CLD) on news text and European Parliament proceedings, achieving over 90% accuracy on most languages. However, accuracy drops to 90-95% for Twitter text due to its short, noisy nature.
40. ぷちPython講座:リスト内包
? リスト内包 : ルールから配列を作る
– for ループを書かなくていい
– R の apply() 系の関数に相当
a = []
for x in xrange(10):
a.append(x * x)
リスト内包なら簡潔!
a = [x * x for x in xrange(10)]
※厳密にはいろいろ(ry
41. 「リスト内包」を使えば……
phi = [
lambda x: 1,
lambda x: x,
= ( = 0, ? , ? 1)
lambda x: x ** 2,
lambda x: x ** 3
]
こう書ける気がする
phi = [lambda x: x ** m for m in xrange(M)]
? かんたんになったね!
42. だめでした……
? 0 2 , 1 2 , 2 2 , 3 2 を表示してみる
– “1 2 4 8” と出力されることを期待
M = 4
phi = [lambda x: x ** m for m in xrange(M)]
print phi[0](2), phi[1](2), phi[2](2), phi[3](2)
? ところがこれの実行結果は “8 8 8 8”
– って、全部同じ!? なんで???
43. うまくいかない理由は……
? 「レキシカルスコープ」がどうとか
– ちょっとややこしい
? 回避する裏技もあるけど……
– もっとややこしい
M = 4
phi = [lambda x, c=m: x ** c for m in xrange(M)]
print phi[0](2), phi[1](2), phi[2](2), phi[3](2)
# => “1 2 4 8” と表示される(???