際際滷

際際滷Share a Scribd company logo
MSCI 446  Data Mining 
Targeting potential Complete Solar customers 
using data mining algorithms 
Wendy DSouza, Jesse Feld, Sharan Gurkar, Merisa Lee
DATA 
Collected through . 
260,358 data sets for homeowners in California. 
125 Complete Solar customer data sets.
UNBALANCED DATA 
125 customers 
125 
X 7 
non-customers 
Due to large discrepancy, classification 
algorithms ignored the small subset of customers. 
Random sampling was performed 5-7 times on 
each algorithm for 125 non-customers to create 
a full, balanced data set.
DATA SETS 
Name 
Address 
City 
Net worth 
Pool owner? 
Age 
Education 
Marital status 
Household income 
Length of residence 
Home value 
Credit rating 
Class variable= 
Customer 
Not a customer
PRISM ON NON-CUSTOMERS 
Ran on 7 data sets with approximately 70 rules 
generated for each class attribute. As a result, the 
top 6 rules were chosen for each sample test. 
Rule Occurrences 
If household income is between $50,000 and $54,999, then household does 
not have solar power 
7 
If home market value is between $225,000 and $249,999, then household does 
not have solar power 
6 
If household income is between $35,000 and $39,999, then household does 
not have solar power 
6 
If city of residence is Vista, then household does not have solar power 5 
If city of residence is Coronado, then household does not have solar power 5 
If city of residence is Novato, then household does not have solar power 4
PRISM ON NON-CUSTOMERS 
Average Kappa: 0.303 
On average, 52.5% of the instances were classified 
correctly. 
PRISM was also ran on the set of Complete Solar 
customers 
-> results achieved were not as promising. 
-> likely due to variety in the data set.
1R 
Ran on the same 7 data sets. 
4 out of 7 sets returned city 
2 out of 7 sets returned home market value 
1 out of 7 sets returned household income 
Best predictor
1R 
Ran on the same 7 data sets. 
4 out of 7 sets returned city 
2 out of 7 sets returned home market value 
1 out of 7 sets returned household income 
Best predictor 
Removed the attribute city and the kappa value 
increased almost ever time with Home Market value 
as the best predictor.
1R 
Ran on the same 7 data sets. 
4 out of 7 sets returned city 
2 out of 7 sets returned home market value 
1 out of 7 sets returned household income 
Best predictor 
Removed the attribute home market value and 
Kappa value decreased every time. This shows the 
importance of home market value.
1R 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
-0.1 
Kappa Value with City Kappa Value without City Kappa Value without Home 
Market Value 
Test 1 
Test 2 
Test 3 
Test 4 
Test 5 
Test 6 
Test 7
1R 
80 
70 
60 
50 
40 
30 
20 
10 
0 
%Correctly Classified 
Instances with City 
%Correctly Classified 
Instances without City 
%Correctly Classified 
Instances without Home 
Market Value 
Test 1 
Test 2 
Test 3 
Test 4 
Test 5 
Test 6 
Test 7
CLUSTERING 
Weak attributes: marital status, gender, age, pool 
and education 
Strong attributes: income, home market value and 
city 
Cluster 1 Cluster 2 Cluster 3 
City SAN DIEGO SAN JOSE SAN DIEGO 
Pool Owner? No No No 
Age 44.5-53.4 44.5-53.4 44.5-53.4 
Education Level Unkown Unkown Grad School 
Marital Status Married Married Married 
Length of Residence 13.5+ years 1.5-3 years 13.5+ years 
Gender Male Male Male 
Income 100k-149k 250k+ 100k-149k 
Home Market Value 500k-749k 1M+ 500k-749k 
Credit Rating 750-799 700-749 750-799 
Solar Customer? No Yes No 
# of Points in Cluster 70 87 93
CONCLUSION 
Given marketing initiatives, Complete Solar should 
target consumers in San Jose and San Dimas, 
consumers with medium to high income and 
consumers with large homes.
Thank you for your time! 
Questions?

More Related Content

Targeting solar customers using Data Mining Techniques

  • 1. MSCI 446 Data Mining Targeting potential Complete Solar customers using data mining algorithms Wendy DSouza, Jesse Feld, Sharan Gurkar, Merisa Lee
  • 2. DATA Collected through . 260,358 data sets for homeowners in California. 125 Complete Solar customer data sets.
  • 3. UNBALANCED DATA 125 customers 125 X 7 non-customers Due to large discrepancy, classification algorithms ignored the small subset of customers. Random sampling was performed 5-7 times on each algorithm for 125 non-customers to create a full, balanced data set.
  • 4. DATA SETS Name Address City Net worth Pool owner? Age Education Marital status Household income Length of residence Home value Credit rating Class variable= Customer Not a customer
  • 5. PRISM ON NON-CUSTOMERS Ran on 7 data sets with approximately 70 rules generated for each class attribute. As a result, the top 6 rules were chosen for each sample test. Rule Occurrences If household income is between $50,000 and $54,999, then household does not have solar power 7 If home market value is between $225,000 and $249,999, then household does not have solar power 6 If household income is between $35,000 and $39,999, then household does not have solar power 6 If city of residence is Vista, then household does not have solar power 5 If city of residence is Coronado, then household does not have solar power 5 If city of residence is Novato, then household does not have solar power 4
  • 6. PRISM ON NON-CUSTOMERS Average Kappa: 0.303 On average, 52.5% of the instances were classified correctly. PRISM was also ran on the set of Complete Solar customers -> results achieved were not as promising. -> likely due to variety in the data set.
  • 7. 1R Ran on the same 7 data sets. 4 out of 7 sets returned city 2 out of 7 sets returned home market value 1 out of 7 sets returned household income Best predictor
  • 8. 1R Ran on the same 7 data sets. 4 out of 7 sets returned city 2 out of 7 sets returned home market value 1 out of 7 sets returned household income Best predictor Removed the attribute city and the kappa value increased almost ever time with Home Market value as the best predictor.
  • 9. 1R Ran on the same 7 data sets. 4 out of 7 sets returned city 2 out of 7 sets returned home market value 1 out of 7 sets returned household income Best predictor Removed the attribute home market value and Kappa value decreased every time. This shows the importance of home market value.
  • 10. 1R 0.5 0.4 0.3 0.2 0.1 0 -0.1 Kappa Value with City Kappa Value without City Kappa Value without Home Market Value Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7
  • 11. 1R 80 70 60 50 40 30 20 10 0 %Correctly Classified Instances with City %Correctly Classified Instances without City %Correctly Classified Instances without Home Market Value Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7
  • 12. CLUSTERING Weak attributes: marital status, gender, age, pool and education Strong attributes: income, home market value and city Cluster 1 Cluster 2 Cluster 3 City SAN DIEGO SAN JOSE SAN DIEGO Pool Owner? No No No Age 44.5-53.4 44.5-53.4 44.5-53.4 Education Level Unkown Unkown Grad School Marital Status Married Married Married Length of Residence 13.5+ years 1.5-3 years 13.5+ years Gender Male Male Male Income 100k-149k 250k+ 100k-149k Home Market Value 500k-749k 1M+ 500k-749k Credit Rating 750-799 700-749 750-799 Solar Customer? No Yes No # of Points in Cluster 70 87 93
  • 13. CONCLUSION Given marketing initiatives, Complete Solar should target consumers in San Jose and San Dimas, consumers with medium to high income and consumers with large homes.
  • 14. Thank you for your time! Questions?