17. ? So-net Media Networks Corporation.
17Policy OutcomeとDivergenceの経験値
The Monte-Carlo Estimator for the Source Policy Outcome
The Empirical H-divergence
経験分布を とする
38. ? So-net Media Networks Corporation.
38References
[1] O. Atan, W. R. Zame, and M. van der Schaar.
Learning optimal policies from observational data.
International Conference on Machine Learning (ICML?18).
[2] Adith Swaminathan and Thorsten Joachims.
Batch learning from logged bandit feedback through counterfactual risk minimization.
Journal of Machine Learning Research 16 (2015), 1731–1755.
[3] Adith Swaminathan and Thorsten Joachims.
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback.
International Conference on Machine Learning (ICML?15).
[4] Thorsten Joachims, Artem Grotov, Adith Swaminathan,and Maarten de Rijke.
Deep Learning with Logged Bandit Feedback.
Proceedings of the International Conference on Learning Representations (ICLR) (2018).
39. ? So-net Media Networks Corporation.
39References
[5] A. Swaminathan and T. Joachims.
The self-normalized estimator for counterfactual learning.
In NIPS, 2015
[6] Ganin, Yaroslav, Ustinova, Evgeniya, Ajakan, Hana, Germain,
Pascal, Larochelle, Hugo, Laviolette, Francois, Marchand, Mario, and Lempitsky, Victor.
Domain adversarial training of neural networks.
The Journal of Machine Learning Research, 17(1), 2016.
[7] Ben-David, Shai, Blitzer, John, Crammer, Koby, and Pereira, Fernando.
Analysis of representations for domain adaptation.
In Advances in neural information processing systems, pp. 137–144, 2007.
[8] Yaroslav Ganin and Victor Lempitsky.
Unsupervised domain adaptation by backpropagation.
In ICML, pages 325–333, 2015. URL http://jmlr.org/proceedings/papers/ v37/ganin15.html.