�ݺ�ߣ

XGBOOST: A SCALABLE
TREE BOOSTING SYSTEM
(T. CHEN, C. GUESTRIN, 2016)
NATALLIE BAIKEVICH
HARDWARE ACCELERATION FOR
DATA PROCESSING SEMINAR
ETH ZÜRICH

MOTIVATION
 Effective
statistical
models
 Scalable system
 Successful
real-world
applications
XGBoost
eXtreme
Gradient
Boosting

BIAS-VARIANCE TRADEOFF
Random Forest
Variance ↓
Boosting
Bias ↓
Voting
+ +

A BIT OF HISTORY
AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001

AdaBoost, 1996
Various improvements in tree
boosting
XGBoost package
A BIT OF HISTORY

AdaBoost, 1996
Various improvements in tree
boosting
XGBoost package
1st Kaggle success: Higgs Boson
Challenge
17/29 winning solutions in 2015
A BIT OF HISTORY

WHY DOES XGBOOST WIN "EVERY" MACHINE
LEARNING COMPETITION?
- (MASTER THESIS, D. NIELSEN, 2016)
Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

REGULARIZED LEARNING
OBJECTIVE
L = l( ˆyi, yi )
i
å + W( fk )
k
å
W( f ) =gT +
1
2
l w
2
Source: http://xgboost.readthedocs.io/en/latest/model.html
ˆyi = fk (xi )
k=1
K
å
loss
regularization
# of leaves

SCORE CALCULATION
1st order gradient 2nd order gradient
Statistics for each leaf
Score

ALGORITHM FEATURES
 Regularized objective
 Shrinkage and column subsampling
 Split finding: exact & approximate,
global & local
 Weighted quantile sketch
 Sparsity-awareness

SYSTEM DESIGN:
BLOCK STRUCTURE
O(Kd x 0
logn) O(Kd x 0
+ x 0
logB)
Blocks can be
 Distributed across machines
 Stored on disk in out-of-core setting
Sorted structure –> linear scan
# trees
Max depth
# non-missing entries

SYSTEM DESIGN:
CACHE-AWARE ACCESS
Improved split finding
 Allocate internal buffer
 Prefetch gradient statistics
Non-continuous memory access
Datasets:
Larger vs Smaller

SYSTEM DESIGN:
BLOCK STRUCTURE
Compression by
columns (CSC):
Decompression
vs
Disk Reading
Block sharding:
Use multiple disks
Too large blocks, cache misses
Too small, inefficient
parallelization
Prefetch
in independent thread

EVALUATION
AWS c3.8xlarge machine:
32 virtual cores, 2x320GB SSD,
60 GB RAM
32 m3.2xlarge machines, each:
8 virtual cores, 2x80GB SSD,
30GB RAM

DATASETS
Dataset n m Task
Allstate 10M 4227 Insurance claim classification
Higgs Boson 10M 28 Event classification
Yahoo LTRC 473K 700 Learning to rank
Criteo 1.7B 67 Click through rate prediction

WHAT’S NEXT?
Model Extensions
DART (+ Dropouts)
LinXGBoost
Parallel Processing
GPU
FPGA
Tuning
Hyperparameter
optimization
More Applications
XGBoost
Scalability
Weighted quantiles
Sparsity-awareness
Cache-awarereness
Data compression

�ݺ�ߣ

XGBoost (System Overview)

More Related Content

XGBoost (System Overview)