15. Histogram-based algorithm
● Histogram-based algorithmにおいて、カテゴリデータは以下のように扱っている.
(https://github.com/Microsoft/LightGBM/issues/1279)
“So when #category is smaller than max_bin, the #bin is smaller than max_bin.
otherwise it use the most frequent categories and stop when use 99% data.”
34. Overall Time Cost Comparison
● lgb_baselineとEFB_Onlyの比較を見ても、EFBはスパースデータには大きな効果
あり(LETORはdenseなのでさほど変わらず)
● KDDデータのような大規模データではGOSSが特に効果あり
out of memory
43. Reference
1. Ke, Guolin, et al. "Lightgbm: A highly efficient gradient boosting decision tree." Advances in Neural Information
Processing Systems. 2017.
2. Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd
international conference on knowledge discovery and data mining. ACM, 2016.
3. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. No. 10. New York,
NY, USA:: Springer series in statistics, 2001.
4. Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of statistics (2001):
1189-1232.