The document discusses XGBoost, an effective and scalable gradient boosting system for machine learning. XGBoost has achieved success in many real-world applications and Kaggle competitions due to its regularized learning approach, sparsity awareness, and cache-aware design which allows it to process large datasets efficiently. The document outlines the history of boosting techniques, describes XGBoost's algorithmic features and system design, and evaluates its performance on several benchmark datasets.
1 of 17
Downloaded 43 times
More Related Content
XGBoost (System Overview)
1. XGBOOST: A SCALABLE
TREE BOOSTING SYSTEM
(T. CHEN, C. GUESTRIN, 2016)
NATALLIE BAIKEVICH
HARDWARE ACCELERATION FOR
DATA PROCESSING SEMINAR
ETH ZRICH
4. A BIT OF HISTORY
AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001
5. AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001
Various improvements in tree
boosting
XGBoost package
A BIT OF HISTORY
6. AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001
Various improvements in tree
boosting
XGBoost package
1st Kaggle success: Higgs Boson
Challenge
17/29 winning solutions in 2015
A BIT OF HISTORY
7. WHY DOES XGBOOST WIN "EVERY" MACHINE
LEARNING COMPETITION?
- (MASTER THESIS, D. NIELSEN, 2016)
Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions
9. REGULARIZED LEARNING
OBJECTIVE
L = l( yi, yi )
i
奪 + W( fk )
k
奪
W( f ) =gT +
1
2
l w
2
Source: http://xgboost.readthedocs.io/en/latest/model.html
yi = fk (xi )
k=1
K
奪
loss
regularization
# of leaves
11. ALGORITHM FEATURES
Regularized objective
Shrinkage and column subsampling
Split finding: exact & approximate,
global & local
Weighted quantile sketch
Sparsity-awareness
12. SYSTEM DESIGN:
BLOCK STRUCTURE
O(Kd x 0
logn) O(Kd x 0
+ x 0
logB)
Blocks can be
Distributed across machines
Stored on disk in out-of-core setting
Sorted structure > linear scan
# trees
Max depth
# non-missing entries
14. SYSTEM DESIGN:
BLOCK STRUCTURE
Compression by
columns (CSC):
Decompression
vs
Disk Reading
Block sharding:
Use multiple disks
Too large blocks, cache misses
Too small, inefficient
parallelization
Prefetch
in independent thread