Trulia is improving its home valuation estimates (Trulia Estimates) by adding new location-specific data features, improving its modeling techniques, and expanding its geographic coverage. Some key updates include adding crime safety and school proximity scores as new features, using cross-validation to learn location-specific model parameters, and implementing hierarchical modeling to leverage information from higher geographic levels. The goal is to provide more accurate home valuations before surfacing them more prominently on the Trulia platform and to users. Initial results show median error rates decreasing from 8.97% to 8.85% after integrating new geographic boundaries called "geoscopes" into the models.
3. Motivation
Trulia Estimates launched in 2011
Public records snowball has evolved since then, but the valuation
algorithm has not
Valuations already have a lot of visibility (valuation heatmaps etc)
and we are planning to give them even more visibility in the near
future (valuations history)
Brilliant Basics Improve estimates before surfacing them
everywhere
4. Us v/s Competition
0 5 10 15
Trulia
Estimates
Zestimate
Median Error %
Trulia
Estimates
Zestimate
5. Our Work
Location specific and temporal features
Crime Safety
School Proximity
Stats and Trends
New Geoscopes
Solve the problem of geographic boundaries
Model Learning Improvements
Explicit modeling of location hierarchies
Better learned parameters
Better feature representation and normalization
9. New Geoscopes
After the initial pass
Coverage improved by 1.67% ~ 1.15million properties throughout the
nation
330 more counties valued
For San Mateo, median error goes from 8.97% to 8.85%
10. Model Learning Improvements
Each geography is different. Static set of model parameters not
always ideal
Using cross validation to learn parameters for each location model
from data
Median error % improves from 8.97 to 8.69 (~3% relative improvement)
Hierarchical Modeling
Explicitly model Location Hierarchies to get smoother estimates using
higher level information
11. Whats Next?
Spend more time optimizing new features Optimization is
everything!
Add price trends data to the hedonic model and simplify our learning
process
Make per model parameter optimization scalable
Incorporate hierarchical models into the existing mix