際際滷

際際滷Share a Scribd company logo
Caret and zoon: machine learning,
ecology and domain speci?c package
systems
Tim C.D. Lucas
Malaria Atlas Project, BDI, Oxford
@timcdlucas @statsforbios
timcdlucas@gmail.com
Who am I?
Malaria Atlas Project at BDI
Malaria, maps, geostatistics
Who am I?
R packages
Zoon
INLAutils
palettetown - my greatest ever achievement
Talk overview
caret
General package for machine learning.
Introduction to the package.
A domain speci?c package ecosystem?
zoon
General package for species distribution modelling.
What are SDMs?
Package overview.
Domain speci?c ecosystems
Other examples.
Are they a good thing?
caret
https://topepo.github.io/caret/model-training-and-tuning.html
What is machine learning?
q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
Not cat
Cat
0.00 0.25 0.50 0.75
Feature / predictor variable
Class
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0
2
4
6
0.25 0.50 0.75 1.00
Feature / predictor variable
Response
Cross-validation
Hyperparameters
Hyperparameters
Number of PCA coordinates
Cut-o?s for variable selection
x + x2 + x3 + x4 + ...
No free lunch
No such thing as a universal, `best¨ machine learning model.
What does caret do?
What does caret do?
Training a model
m1 <- train(Species ~ .,
iris,
method = `gbm¨)
q
q
q
0.935
0.940
0.945
50 75 100 125 150
# Boosting Iterations
Accuracy(Bootstrap)
Max Tree Depth
q
1
2
3
Training a di?erent model
m2 <- train(Species ~ .,
iris,
method = `nnet¨)
q
q q
0.75
0.80
0.85
0.90
0.95
1 2 3 4 5
#Hidden Units
Accuracy(Bootstrap)
Weight Decay
q
0e+00
1e?01
1e?04
Controlling Crossvalidation
tr <- trainControl(method = `cv¨, number = 5)
m3 <- train(Species ~ .,
iris,
trControl = tr,
method = `nnet¨)
Try more hyperparameter values
m4 <- train(Species ~ .,
iris,
tuneLength = 10,
method = `nnet¨)
q
q
q
q
q
q
q
q q
q
0.84
0.88
0.92
0.96
5 10 15
#Hidden Units
Accuracy(Bootstrap)
Weight Decay
q
0.0000000000
0.0001000000
0.0002371374
0.0005623413
0.0013335214
0.0031622777
0.0074989421
0.0177827941
0.0421696503
0.1000000000
Use chosen hyperparameter values
m5 <- train(Species ~ .,
iris,
tuneGrid = expand.grid(size=c(1,5,10,20),
decay=seq(0,0.1,0.01)),
method = `nnet¨)
q
q
q
q q q q q q q q
0.80
0.85
0.90
0.95
0.000 0.025 0.050 0.075 0.100
Weight Decay
Accuracy(Bootstrap)
#Hidden Units
q
1
5
10
20
Contributions
Add your own models.
Share by github pull request.
But aim is for devs to keep package up to date.
2 0 2 0 S c i e n c e
N?OZ
Who develops zoon?
Tom August
Me
Nick Golding
Emiel van Loon
David Gavaghan
Greg McInerny
What does caret do?
q
q
q
q
q
q
q
q
q q
qq
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Sp. Absent
Sp. Present
0.00 0.25 0.50 0.75
Feature / predictor variable
Class
A basic work?ow
work1 <- workflow(
occurrence = UKAnophelesPlumbeus,
covariate = UKAir,
process = OneHundredBackground,
model = RandomForest,
output = PrintMap
)
q
q
q
0.935
0.940
0.945
50 75 100 125 150
# Boosting Iterations
Accuracy(Bootstrap)
Max Tree Depth
q
1
2
3
A di?erent work?ow
work2 <- workflow(
occurrence = UKAnophelesPlumbeus,
covariate = UKBioclim,
process = Background(n = 500),
model = RandomForest,
output = Appify)
A di?erent model
work3 <- workflow(
occurrence = UKAnophelesPlumbeus,
covariate = UKAir,
process = OneHundredBackground,
model = MaxEnt,
output = PrintMap
)
q
q
q
q q q q q q q q
0.80
0.85
0.90
0.95
0.000 0.025 0.050 0.075 0.100
Weight Decay
Accuracy(Bootstrap)
#Hidden Units
q
1
5
10
20
caret in zoon
work4 <- workflow(
occurrence = UKAnophelesPlumbeus,
covariate = UKAir,
process = OneHundredBackground,
model = MachineLearn(method = ¨nnet¨,
tuneLength = 8),
output = PrintMap
)
Contributions
Add your own methods.
Share by web form or github.
Not the aim for devs to keep package up to date.
Package ecosystems
CRAN
Bioconductor
zoon
caret
dismo
Package ecosystems
CRAN
Bioconductor
zoon
caret
dismo
User contribu?on
Extendability
Any Questions ?
Tim C.D. Lucas

More Related Content

Caret and zoon: machine learning, ecology and domain specific package systems

  • 1. Caret and zoon: machine learning, ecology and domain speci?c package systems Tim C.D. Lucas Malaria Atlas Project, BDI, Oxford @timcdlucas @statsforbios timcdlucas@gmail.com
  • 2. Who am I? Malaria Atlas Project at BDI Malaria, maps, geostatistics
  • 3. Who am I? R packages Zoon INLAutils palettetown - my greatest ever achievement
  • 4. Talk overview caret General package for machine learning. Introduction to the package. A domain speci?c package ecosystem? zoon General package for species distribution modelling. What are SDMs? Package overview. Domain speci?c ecosystems Other examples. Are they a good thing?
  • 6. What is machine learning? q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q Not cat Cat 0.00 0.25 0.50 0.75 Feature / predictor variable Class q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 2 4 6 0.25 0.50 0.75 1.00 Feature / predictor variable Response
  • 9. Hyperparameters Number of PCA coordinates Cut-o?s for variable selection x + x2 + x3 + x4 + ...
  • 10. No free lunch No such thing as a universal, `best¨ machine learning model.
  • 13. Training a model m1 <- train(Species ~ ., iris, method = `gbm¨) q q q 0.935 0.940 0.945 50 75 100 125 150 # Boosting Iterations Accuracy(Bootstrap) Max Tree Depth q 1 2 3
  • 14. Training a di?erent model m2 <- train(Species ~ ., iris, method = `nnet¨) q q q 0.75 0.80 0.85 0.90 0.95 1 2 3 4 5 #Hidden Units Accuracy(Bootstrap) Weight Decay q 0e+00 1e?01 1e?04
  • 15. Controlling Crossvalidation tr <- trainControl(method = `cv¨, number = 5) m3 <- train(Species ~ ., iris, trControl = tr, method = `nnet¨)
  • 16. Try more hyperparameter values m4 <- train(Species ~ ., iris, tuneLength = 10, method = `nnet¨) q q q q q q q q q q 0.84 0.88 0.92 0.96 5 10 15 #Hidden Units Accuracy(Bootstrap) Weight Decay q 0.0000000000 0.0001000000 0.0002371374 0.0005623413 0.0013335214 0.0031622777 0.0074989421 0.0177827941 0.0421696503 0.1000000000
  • 17. Use chosen hyperparameter values m5 <- train(Species ~ ., iris, tuneGrid = expand.grid(size=c(1,5,10,20), decay=seq(0,0.1,0.01)), method = `nnet¨) q q q q q q q q q q q 0.80 0.85 0.90 0.95 0.000 0.025 0.050 0.075 0.100 Weight Decay Accuracy(Bootstrap) #Hidden Units q 1 5 10 20
  • 18. Contributions Add your own models. Share by github pull request. But aim is for devs to keep package up to date.
  • 19. 2 0 2 0 S c i e n c e N?OZ
  • 20. Who develops zoon? Tom August Me Nick Golding Emiel van Loon David Gavaghan Greg McInerny
  • 21. What does caret do? q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q Sp. Absent Sp. Present 0.00 0.25 0.50 0.75 Feature / predictor variable Class
  • 22. A basic work?ow work1 <- workflow( occurrence = UKAnophelesPlumbeus, covariate = UKAir, process = OneHundredBackground, model = RandomForest, output = PrintMap ) q q q 0.935 0.940 0.945 50 75 100 125 150 # Boosting Iterations Accuracy(Bootstrap) Max Tree Depth q 1 2 3
  • 23. A di?erent work?ow work2 <- workflow( occurrence = UKAnophelesPlumbeus, covariate = UKBioclim, process = Background(n = 500), model = RandomForest, output = Appify)
  • 24. A di?erent model work3 <- workflow( occurrence = UKAnophelesPlumbeus, covariate = UKAir, process = OneHundredBackground, model = MaxEnt, output = PrintMap ) q q q q q q q q q q q 0.80 0.85 0.90 0.95 0.000 0.025 0.050 0.075 0.100 Weight Decay Accuracy(Bootstrap) #Hidden Units q 1 5 10 20
  • 25. caret in zoon work4 <- workflow( occurrence = UKAnophelesPlumbeus, covariate = UKAir, process = OneHundredBackground, model = MachineLearn(method = ¨nnet¨, tuneLength = 8), output = PrintMap )
  • 26. Contributions Add your own methods. Share by web form or github. Not the aim for devs to keep package up to date.
  • 29. Any Questions ? Tim C.D. Lucas