This was a project myself and 3 classmates did for our MSCI 446 final project. We applied data mining techniques such as PRISM, 1R algorithm and clustering to better understand Complete Solar customers and strategically target customers with similar attributes to those in our customer market base.
1 of 14
More Related Content
Targeting solar customers using Data Mining Techniques
1. MSCI 446 Data Mining
Targeting potential Complete Solar customers
using data mining algorithms
Wendy DSouza, Jesse Feld, Sharan Gurkar, Merisa Lee
2. DATA
Collected through .
260,358 data sets for homeowners in California.
125 Complete Solar customer data sets.
3. UNBALANCED DATA
125 customers
125
X 7
non-customers
Due to large discrepancy, classification
algorithms ignored the small subset of customers.
Random sampling was performed 5-7 times on
each algorithm for 125 non-customers to create
a full, balanced data set.
4. DATA SETS
Name
Address
City
Net worth
Pool owner?
Age
Education
Marital status
Household income
Length of residence
Home value
Credit rating
Class variable=
Customer
Not a customer
5. PRISM ON NON-CUSTOMERS
Ran on 7 data sets with approximately 70 rules
generated for each class attribute. As a result, the
top 6 rules were chosen for each sample test.
Rule Occurrences
If household income is between $50,000 and $54,999, then household does
not have solar power
7
If home market value is between $225,000 and $249,999, then household does
not have solar power
6
If household income is between $35,000 and $39,999, then household does
not have solar power
6
If city of residence is Vista, then household does not have solar power 5
If city of residence is Coronado, then household does not have solar power 5
If city of residence is Novato, then household does not have solar power 4
6. PRISM ON NON-CUSTOMERS
Average Kappa: 0.303
On average, 52.5% of the instances were classified
correctly.
PRISM was also ran on the set of Complete Solar
customers
-> results achieved were not as promising.
-> likely due to variety in the data set.
7. 1R
Ran on the same 7 data sets.
4 out of 7 sets returned city
2 out of 7 sets returned home market value
1 out of 7 sets returned household income
Best predictor
8. 1R
Ran on the same 7 data sets.
4 out of 7 sets returned city
2 out of 7 sets returned home market value
1 out of 7 sets returned household income
Best predictor
Removed the attribute city and the kappa value
increased almost ever time with Home Market value
as the best predictor.
9. 1R
Ran on the same 7 data sets.
4 out of 7 sets returned city
2 out of 7 sets returned home market value
1 out of 7 sets returned household income
Best predictor
Removed the attribute home market value and
Kappa value decreased every time. This shows the
importance of home market value.
10. 1R
0.5
0.4
0.3
0.2
0.1
0
-0.1
Kappa Value with City Kappa Value without City Kappa Value without Home
Market Value
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
11. 1R
80
70
60
50
40
30
20
10
0
%Correctly Classified
Instances with City
%Correctly Classified
Instances without City
%Correctly Classified
Instances without Home
Market Value
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
12. CLUSTERING
Weak attributes: marital status, gender, age, pool
and education
Strong attributes: income, home market value and
city
Cluster 1 Cluster 2 Cluster 3
City SAN DIEGO SAN JOSE SAN DIEGO
Pool Owner? No No No
Age 44.5-53.4 44.5-53.4 44.5-53.4
Education Level Unkown Unkown Grad School
Marital Status Married Married Married
Length of Residence 13.5+ years 1.5-3 years 13.5+ years
Gender Male Male Male
Income 100k-149k 250k+ 100k-149k
Home Market Value 500k-749k 1M+ 500k-749k
Credit Rating 750-799 700-749 750-799
Solar Customer? No Yes No
# of Points in Cluster 70 87 93
13. CONCLUSION
Given marketing initiatives, Complete Solar should
target consumers in San Jose and San Dimas,
consumers with medium to high income and
consumers with large homes.