際際滷

際際滷Share a Scribd company logo
Adding tree and tree 
@avibryant
Brushfire:! 
Distributed, 
Generic, 
Decision Tree Learning 
in Scala 
(using Hadoop) 
@avibryant 
Open source: Real Soon Now
Vun!
Two! 
+
Tree!
Adding Tree and Tree
Do you like cookies? 
{height: 5, color: blue, wears: fur} ? 
{height: 7, color: yellow, wears: feathers} ? 
{height: 3, color: green, wears: garbage} ? 
{height: 5, color: yellow, wears: stripes} ? 
{height: 4, color: orange, wears: stripes} ?
Do you like cookies? 
color != blue color = blue
Does Cookie Monster like Cookies? 
color != blue color = blue
Is Cookie Monster Blue? 
color != blue color = blue
Cooooookie! 
color != blue color = blue 
cookie!
Do you like cookies? 
color != blue color = blue 
yuck ok 
cookie! 
wears != stripes 
wears = stripes
color != blue color = blue 
T T 
T 
wears != stripes 
wears = stripes
color != blue color = blue 
T T 
T 
wears != stripes 
wears = stripes 
Do you like cookies? 
How many cookies will you eat? 
Whats your favorite kind of cookie?
Bootstrap or k-fold? 
Chi-square or entropy? 
Wow! 
Classification or regression? 
Binary splits or multiway? 
Out-of-bag 
or out-of-time? 
One tree or 
many? 
Binary or multi-class?
trait Evaluator[V,T] 
trait Tree[V,T] 
trait Splitter[V,T] 
trait Error[T,E] 
Wow! 
Such types! 
case class Instance[V,T]
false true 
false 
true 
Binary classification
0.1 0.4 
0.0 
0.9 
Binary classification
T+T+T+T= 
T T 
T 
T 
T+T+T+T+T= 
T+T+T+T+T= T+T+T=
Binary classification
Adding Tree and Tree
Bigger (data) 
= Better (models) 
Generic != Fast 
Why do you rob banks?
Learning a tree in Scalding 
11 passes through the data 
21 MapReduce steps
T 
T
T T T T
T T 
T T 
T T T T
Step 1/21 
T
{height: 5, color: blue, wears: fur} 
{height: 7, color: yellow, wears: feathers} 
{height: 3, color: green, wears: garbage} 
{height: 5, color: yellow, wears: stripes} 
{height: 4, color: orange, wears: stripes}
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
T 
T 
T 
T 
T 
T 
T 
T 
T 
T 
Map 
T 
Reduce
color 
!= blue = blue 
T T 
color 
!= yellow = yellow 
T T 
height 
< 5 >= 5 
T T 
? 
Step 2/21
color 
!= blue = blue 
T T 
color 
Step 2/21 
!= yellow = yellow 
T T 
?
blue 
yellow 
green 
yellow 
orange
blue 
yellow 
green 
yellow 
orange
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
Map Reduce 
Step 2/21 
S 
S 
Other options: 
CountMinSketch 
QTree
V => Boolean V => Boolean 
T T
V => Boolean V => Boolean 
T T 
T 
V => Boolean
Adding Tree and Tree
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T]
Adding Tree and Tree
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T] 
S 
S 
S 
S 
S 
S
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
Step 3/21 
S 
S 
S Split[V,T] Split[V,T] 
Split[V,T] 
Split[V,T] 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S Split[V,T] 
Split[V,T] 
Split[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T]
Instance[V,T] 
Instance[V,T] 
Instance[V,T] 
 
Forests!
Instance[V,T] 
Instance[V,T] 
Instance[V,T]
Adding Tree and Tree
Adding Tree and Tree
Adding Tree and Tree
V? 
{height: 5, color: blue, wears: fur} ? 
{height: 7, color: yellow, wears: feathers} ? 
{height: 3, color: green, wears: garbage} ? 
{height: 5, color: yellow, wears: stripes} ? 
{height: 4, color: orange, wears: stripes} ?
Adding Tree and Tree
PLANET 
http://static.googleusercontent.com/media/ 
research.google.com/en/us/pubs/archive/36296.pdf 
Scalding + Algebird 
http://github.com/twitter/scalding 
http://github.com/twitter/algebird 
Coming soon 
http://github.com/stripe/brushfire

More Related Content

Adding Tree and Tree