狠狠撸

狠狠撸Share a Scribd company logo
REAL ESTATE DATA ANALYSIS & INSIGHTS
USING CLUSTERTING TECHNIQUE
RECOMMENDATIONS
1. Cluster 3 seems to be the better segment for investment
options.
2. The primary reason being high rental yield along with low
rental share with good potential to rental rise.
3. On further drilling down Cluster 3 based on Rental yield ,
Rental share & Population parameters we can shortlist the
below areas.
State Place
Michigan Genesse,Macomb,Ingh
am - Counties
Texas Corpus Christi, Nueces,
Fort worth – Cities
Bell,Bexar,Tarrant –
Counties
Arlington City
Ohio Montgomery county
Illinois St. Clair County
Missouri Jackson County
1. THE CHART BELOW EXPLAINS HOW RENTAL YIELD & RENTAL SHARE PARAMETERS FARE IN THE AREAS
SELECTED.
2. THE DATA HAS BEEN ORDERED IN DECREASING VALUE OF RENTAL YIELD AND THE TREND HAS BEEN GIVEN.
0
5
10
15
20
25
Genesee
County
Corpus Christi
city
Nueces County Fort Worth city Macomb
County
Bell County Bexar County St. Clair County Montgomery
County
Ingham CountyJackson County Tarrant County Arlington city
Cluster chart-Rental Yield & Rental Share
Rent Yield Rent Share Linear (Rent Yield)
DETAILED SUMMARY OF ANALYSIS
CLUSTERING FOR REAL ESTATE DATA
Methodologies & Insights
AGENDA
? Synopsis
? Recommendations
? Appendix – SAS code
OBJECTIVE & APPROACH
? Goal :
Recommend a good place / zip code to buy property for
investment purpose
? K-means Clustering :
This algorithm uses minimizing the distance between
points and centroids for creating clusters. Effective for
large sized datasets.
PROC FASTCLUS procedure has been used for this
method.
ANALYSIS STEPS
? We can use clustering analysis on the given dataset to segment each data based on
the critical factors like Rental yield, Rental share of income, Place type and size of the
place.
? By this approach we can actually split the data in to high, medium and low returns for
investment.
? The goal of clustering would be to find similarities and differences within the data by
creating homogeneous groups wherein with in group similarities are maximized and
the between group similarities are minimized.
CLUSTER SUMMARY - PROFILING
CLUSTER 1 PROFILE
Variable Mean Pop mean Std dev Z score
Rental share 26% 21% 4% 1.25
Population 3597926 260474 2144874 1.6
Rental yield 5% 6% 2% 0.5
1. 19 data points fall in this cluster.
2. Rental share has highest z-score and it differentiates this cluster.
3. As rental share has high z-score, we can conclude this cluster comprises of
low income groups and has less scope for yield on investment.
4. This can be further seen in the rental yield z-score and population means
CLUSTER 2 PROFILE
Variable Mean Pop mean Std dev Z score
Rental share 20% 21% 4% 1.25
Rental yield 5% 6% 1% 1
Population 219706 260474 241058 0.17
1. 1176 data points fall in this cluster. This is roughly 73% of the total data.
Cluster 2 is the biggest cluster.
2. Rental yield is marginally high compared to cluster 1.
3. Even in cluster 2 rental share seems to be having higher z score.
4. Cluster 2 not ideal for investment option.
CLUSTER 3 PROFILE
Variable Mean Pop mean Std dev Z score
Rental yield 9% 6% 2% 1.5
Population 222090 260474 317575 0.75
Rental share 23% 21% 4% 0.5
1. 403 data points fall in this cluster. This is 25% of the total data.
2. Rental yield has the highest score of 1.5 and this differentiates this cluster.
3. Rental share z-score denotes that this cluster has potential to pay more
rent as their rental share value is relatively low compared to clusters 1 and
2.
4. Also cluster population size is decent enough compared to population
mean for any investment decision.
5. Cluster 3 has all the ingredients for an ideal investment option.
RECOMMENDATIONS
? Cluster 3 seems to be the ideal investment option.
? The reason being high Rental yield and low rental share values with
good potential for rental rise.
? On further analysis of cluster 3 data based on rental yield,
population size and propensity to given more rent we can shortlist
the below areas.
State Place
Michigan Genesse,Macomb,Ingham counties
Texas Corpus christi,Nueces,Fort worth – Cities
Bell,Bexar,Tarrant - Counties
Arlington city
Ohio Montgomery county
Illinois St.Clair county
MO Jackson county
APPENDIX – SAS CODE
? SAS code location in WPS :
Y:USERSUSER169ProgrammesClusteringReal Estate_clustering.sas

More Related Content

Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Balaji Athreya

  • 1. REAL ESTATE DATA ANALYSIS & INSIGHTS USING CLUSTERTING TECHNIQUE
  • 2. RECOMMENDATIONS 1. Cluster 3 seems to be the better segment for investment options. 2. The primary reason being high rental yield along with low rental share with good potential to rental rise. 3. On further drilling down Cluster 3 based on Rental yield , Rental share & Population parameters we can shortlist the below areas. State Place Michigan Genesse,Macomb,Ingh am - Counties Texas Corpus Christi, Nueces, Fort worth – Cities Bell,Bexar,Tarrant – Counties Arlington City Ohio Montgomery county Illinois St. Clair County Missouri Jackson County
  • 3. 1. THE CHART BELOW EXPLAINS HOW RENTAL YIELD & RENTAL SHARE PARAMETERS FARE IN THE AREAS SELECTED. 2. THE DATA HAS BEEN ORDERED IN DECREASING VALUE OF RENTAL YIELD AND THE TREND HAS BEEN GIVEN. 0 5 10 15 20 25 Genesee County Corpus Christi city Nueces County Fort Worth city Macomb County Bell County Bexar County St. Clair County Montgomery County Ingham CountyJackson County Tarrant County Arlington city Cluster chart-Rental Yield & Rental Share Rent Yield Rent Share Linear (Rent Yield)
  • 5. CLUSTERING FOR REAL ESTATE DATA Methodologies & Insights
  • 7. OBJECTIVE & APPROACH ? Goal : Recommend a good place / zip code to buy property for investment purpose ? K-means Clustering : This algorithm uses minimizing the distance between points and centroids for creating clusters. Effective for large sized datasets. PROC FASTCLUS procedure has been used for this method.
  • 8. ANALYSIS STEPS ? We can use clustering analysis on the given dataset to segment each data based on the critical factors like Rental yield, Rental share of income, Place type and size of the place. ? By this approach we can actually split the data in to high, medium and low returns for investment. ? The goal of clustering would be to find similarities and differences within the data by creating homogeneous groups wherein with in group similarities are maximized and the between group similarities are minimized.
  • 9. CLUSTER SUMMARY - PROFILING
  • 10. CLUSTER 1 PROFILE Variable Mean Pop mean Std dev Z score Rental share 26% 21% 4% 1.25 Population 3597926 260474 2144874 1.6 Rental yield 5% 6% 2% 0.5 1. 19 data points fall in this cluster. 2. Rental share has highest z-score and it differentiates this cluster. 3. As rental share has high z-score, we can conclude this cluster comprises of low income groups and has less scope for yield on investment. 4. This can be further seen in the rental yield z-score and population means
  • 11. CLUSTER 2 PROFILE Variable Mean Pop mean Std dev Z score Rental share 20% 21% 4% 1.25 Rental yield 5% 6% 1% 1 Population 219706 260474 241058 0.17 1. 1176 data points fall in this cluster. This is roughly 73% of the total data. Cluster 2 is the biggest cluster. 2. Rental yield is marginally high compared to cluster 1. 3. Even in cluster 2 rental share seems to be having higher z score. 4. Cluster 2 not ideal for investment option.
  • 12. CLUSTER 3 PROFILE Variable Mean Pop mean Std dev Z score Rental yield 9% 6% 2% 1.5 Population 222090 260474 317575 0.75 Rental share 23% 21% 4% 0.5 1. 403 data points fall in this cluster. This is 25% of the total data. 2. Rental yield has the highest score of 1.5 and this differentiates this cluster. 3. Rental share z-score denotes that this cluster has potential to pay more rent as their rental share value is relatively low compared to clusters 1 and 2. 4. Also cluster population size is decent enough compared to population mean for any investment decision. 5. Cluster 3 has all the ingredients for an ideal investment option.
  • 13. RECOMMENDATIONS ? Cluster 3 seems to be the ideal investment option. ? The reason being high Rental yield and low rental share values with good potential for rental rise. ? On further analysis of cluster 3 data based on rental yield, population size and propensity to given more rent we can shortlist the below areas. State Place Michigan Genesse,Macomb,Ingham counties Texas Corpus christi,Nueces,Fort worth – Cities Bell,Bexar,Tarrant - Counties Arlington city Ohio Montgomery county Illinois St.Clair county MO Jackson county
  • 14. APPENDIX – SAS CODE ? SAS code location in WPS : Y:USERSUSER169ProgrammesClusteringReal Estate_clustering.sas