際際滷

際際滷Share a Scribd company logo
Trulia Estimates v2.0
Trulia Estimates 2.0
Motivation
 Trulia Estimates launched in 2011
 Public records snowball has evolved since then, but the valuation
algorithm has not
 Valuations already have a lot of visibility (valuation heatmaps etc)
and we are planning to give them even more visibility in the near
future (valuations history)
 Brilliant Basics  Improve estimates before surfacing them
everywhere
Us v/s Competition
0 5 10 15
Trulia
Estimates
Zestimate
Median Error %
Trulia
Estimates
Zestimate
Our Work
 Location specific and temporal features
 Crime Safety
 School Proximity
 Stats and Trends
 New Geoscopes
 Solve the problem of geographic boundaries
 Model Learning Improvements
 Explicit modeling of location hierarchies
 Better learned parameters
 Better feature representation and normalization
New Features
8.97
8.78
8.82
8.84
8.65
8.7
8.75
8.8
8.85
8.9
8.95
9
Baseline Add
CrimeScore
only
Add
SchoolScore
only
Add avg
ppsqft/ hood
only
Improvement by Individual Features
Median Error
Percentage
New Geoscopes
New Geoscopes
New Geoscopes
 After the initial pass
 Coverage improved by 1.67% ~ 1.15million properties throughout the
nation
 330 more counties valued
 For San Mateo, median error goes from 8.97% to 8.85%
Model Learning Improvements
 Each geography is different. Static set of model parameters not
always ideal
 Using cross validation to learn parameters for each location model
from data
 Median error % improves from 8.97 to 8.69 (~3% relative improvement)
 Hierarchical Modeling
 Explicitly model Location Hierarchies to get smoother estimates using
higher level information
Whats Next?
 Spend more time optimizing new features  Optimization is
everything!
 Add price trends data to the hedonic model and simplify our learning
process
 Make per model parameter optimization scalable
 Incorporate hierarchical models into the existing mix

More Related Content

Trulia Estimates 2.0

  • 3. Motivation Trulia Estimates launched in 2011 Public records snowball has evolved since then, but the valuation algorithm has not Valuations already have a lot of visibility (valuation heatmaps etc) and we are planning to give them even more visibility in the near future (valuations history) Brilliant Basics Improve estimates before surfacing them everywhere
  • 4. Us v/s Competition 0 5 10 15 Trulia Estimates Zestimate Median Error % Trulia Estimates Zestimate
  • 5. Our Work Location specific and temporal features Crime Safety School Proximity Stats and Trends New Geoscopes Solve the problem of geographic boundaries Model Learning Improvements Explicit modeling of location hierarchies Better learned parameters Better feature representation and normalization
  • 6. New Features 8.97 8.78 8.82 8.84 8.65 8.7 8.75 8.8 8.85 8.9 8.95 9 Baseline Add CrimeScore only Add SchoolScore only Add avg ppsqft/ hood only Improvement by Individual Features Median Error Percentage
  • 9. New Geoscopes After the initial pass Coverage improved by 1.67% ~ 1.15million properties throughout the nation 330 more counties valued For San Mateo, median error goes from 8.97% to 8.85%
  • 10. Model Learning Improvements Each geography is different. Static set of model parameters not always ideal Using cross validation to learn parameters for each location model from data Median error % improves from 8.97 to 8.69 (~3% relative improvement) Hierarchical Modeling Explicitly model Location Hierarchies to get smoother estimates using higher level information
  • 11. Whats Next? Spend more time optimizing new features Optimization is everything! Add price trends data to the hedonic model and simplify our learning process Make per model parameter optimization scalable Incorporate hierarchical models into the existing mix