A 20min presentation on the why and how of variable selection with just a touch of feature creation.
1 of 14
More Related Content
Feature and Variable Selection in Classification
1. Feature and Variable Selection in Classi鍖cation
Aaron Karper
University of Bern
Aaron Karper (UniBe)
Feature selection
1 / 12
2. Why?
Why not use all the features?
training error
test error
Interpretability
error
Over鍖tting
Computational Complexity
Model complexity
Aaron Karper (UniBe)
Feature selection
2 / 12
3. What are the options?
Ranking
Measure relevance for each feature separately.
The good:
The bad:
Xor problem.
Fast
Aaron Karper (UniBe)
Feature selection
3 / 12
4. What are the options?
Ranking
Measure relevance for each feature separately.
The good:
The bad:
Xor problem.
Fast
Aaron Karper (UniBe)
Feature selection
3 / 12
5. What are the options?
Xor problem
Aaron Karper (UniBe)
Feature selection
4 / 12
6. What are the options?
Filters
Walk in feature
subset space
evaluate
proxy measure
train
classi鍖er
The bad:
The good:
Suboptimal
performance
Flexibility
Aaron Karper (UniBe)
Feature selection
5 / 12
7. What are the options?
Wrappers
Walk in feature
subset space
The good:
train
classi鍖er
The bad:
Slow training
Accuracy
Aaron Karper (UniBe)
Feature selection
6 / 12
8. What are the options?
Embedded methods
Integrate feature selection into classi鍖er.
The good:
The bad:
Accuracy, training
time
Aaron Karper (UniBe)
Feature selection
Lacks 鍖exibility
7 / 12
9. What should I use?
What is the best one?
Accuracy-wise: embedded or wrapper.
Complexity-wise: ranking, 鍖lters.
Why not both?
Aaron Karper (UniBe)
Feature selection
8 / 12
10. Examples
Probabilistic feature selection
For model p(c|x) p(c) p(x|c)
Can be retro鍖tted with
p(c) = p(M) p(c|M) for
model M.
More degrees of freedom
spread the model thin.
probability
specific model
wide spread model
Standard optimizations
apply.
possible data
Aaron Karper (UniBe)
Feature selection
9 / 12
11. Examples
Probabilistic feature selection
For model p(c|x) p(c) p(x|c)
Can be retro鍖tted with
p(c) = p(M) p(c|M) for
model M.
More degrees of freedom
spread the model thin.
probability
specific model
wide spread model
Standard optimizations
apply.
possible data
Aaron Karper (UniBe)
Feature selection
9 / 12
12. Examples
Probabilistic feature selection
Akaike information criterion every additional variable needs to explain e times as
much data.
Bayesian information criterion Unused parameters are marginalized.
Minimum descriptor length
Aaron Karper (UniBe)
Feature selection
10 / 12