This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Statstical Genetics Summer School 2023
http://www.sg.med.osaka-u.ac.jp/school_2023.html
Aug 25-27th 2023, Osaka University, The University of Tokyo, RIKENm, Japan
Predicting protein–protein interactions based only on sequences information
Juwen Shen, Jian Zhang, Xiaomin Luo, Weiliang Zhu, Kunqian Yu, Kaixian Chen, Yixue Li and Hualiang Jiang
Proc Natl Acad Sci USA, 2007, 104(11), 4337-4341.
3. 遺伝マーカーとSNPについて
? 反復配列
ミニサテライト?VNTR(Variable Number of Tandem Repeat)
10~100bp, 20~50繰り返し配列
マイクロサテライト?STR(Short Tandem Repeat)
約数十万カ所、1~9bp, 5~60繰り返し配列、複数の対立遺伝子
? SNP(Single Nucleotide Polymorphism:一塩基多型)
約1200万カ所、基本はbiallele(2対立遺伝子)
マーカーとしての意味に加え、変位に直接関わっている場合もある
3
4. 遺伝マーカーとSNPについて
ハプロタイプ(Haplotype)
=連鎖した(同一染色体上で近接した)SNPの組合せ
SNP1 SNP2 SNP3
ACACAGGATCACTTGAGGCCAGGAGTT ハプロタイプ1
Aさん
A C A C A T G A T C A A T T G A G G C C A G G A G G T ハプロタイプ2
A C A C A G G A T C A C T T G A G G C C A G G A G T T ハプロタイプ1
Bさん
A C A C A G G A T C A C T T G A G G C C A G G A G T T ハプロタイプ1
1つのSNP(TagSNP)のみを調べれば十分
4
22. 1.PEDファイル
書式
PEDファイル(拡張子は「.ped」)
1列目 Family ID
2列目 Individual ID
3列目 Paternal ID
4列目 Maternal ID
5列目 SEX
6列目 affection status
7列目~(SNP数)数十万列 Genotype
ポイント:サンプル間に血縁関係がない場合
?Family ID = 家族ID
?Individual ID = 個体ID
?Paternal ID = 父親の個体ID
?Maternal ID = 母親の個体ID
study1.ped
その他
?SEX=1:男性、2:女性
?affection status:Control = 1, Case = 2
(発現量や臨床情報などの連続値(QT)も可)
22
23. 2.MAPファイル
書式
MAPファイル(拡張子は「.map」)
1列目 Chrmosome
2列目 SNP identifier
3列目 Genetic distance
4列目 Base-Pair position
ポイント:MAPとPEDの拡張子以前は同じ名前にします
study1.map
23
25. 4.GWASの実行
plink --noweb --bfile study1 --assoc --out study1
以下のファイルが作成
? study1.assoc
study1.assoc
1列目 CHR Chromosome
2列目 SNP SNP identifier
3列目 BP Code for allele 1 (the minor, rare allele based on the entire sample
4列目 A1 frequencies)
5列目 F_A The frequency of this variant in cases
6列目 F_U The frequency of this variant in controls
7列目 A2 Code for the other allele
8列目 CHISQ The chi-squared statistic for this test (1 df)
9列目 P The asymptotic significance value for this test
10列目 OR The odds ratio for this test
29. 6.QTL解析
Phenotypeファイルの準備
phenotypeファイル(拡張子は「.phe」)
1列目 Family ID
2列目 Individual ID
3列目 Quantitative Trait
29
30. 6.QTL解析
QTL解析の実行
plink --noweb --bfile study1 --assoc --pheno param.phe --out study1_qtl
以下のファイルが作成
? study1_qtl.qassoc
QTL解析の結果の書式
study1_qtl.qassoc
1列目 Chr Chromosome number
2列目 SNP SNP identifier
3列目 BP Physical position (base-pair)
4列目 NMISS Number of non-missing genotypes
5列目 BETA Regression coefficient
6列目 SE Standard error
7列目 R2 Regression r-squared
8列目 T Wald test (based on t-distribtion)
9列目 P Wald test asymptotic p-value 30