ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
atom D sciences
Building Machine learning applications for Enterprises
Industry : Finance/Micro-Finance/Loans
Case study : Predicting which applicant will be a Loan defaulter
Problem Statement
The Bank Indessa has not done well in last 3 quarters. Their NPAs (Non Performing Assets) have reached all
time high. After careful analysis, it was found that the majority of NPA was contributed by loan defaulters.
We developed a machine learning Model that predicted the applicants default status with almost 90%
accuracy.
Data set
Number of Rows(Customers) : 532428
Number of Columns (variables) : 45
TARGET : 0 means Default ;1 means- No default
Train –Test Division (85% -15%)
Sl.no variable class levels missing value percent zeroes percent
1 acc_now_delinq integer 0 0.003052 99.533740
2 addr_state factor 51 0 0.000000
3 annual_inc numeric 425943 0.000704 0.000000
4 application_type factor 2 0 0.000000
5 batch_enrolled factor 104 19.920506 0.000000
6 collection_recovery_fee numeric 425943 0 97.360914
7 collections_12_mths_ex_med integer 0 0.018547 98.664375
8 delinq_2yrs integer 0 0.003052 80.722773
9 desc factor 70639 0 0.000000
10 dti numeric 425943 0 0.047189
11 emp_length factor 11 5.045276 0.000000
12 emp_title factor 190125 0.00047 0.000000
13 funded_amnt integer 0 0 0.000000
14 funded_amnt_inv numeric 425943 0 0.026999
15 grade factor 7 0 0.000000
16 home_ownership factor 6 0 0.000000
17 initial_list_status factor 2 0 0.000000
18 inq_last_6mths integer 0 0.003052 56.156340
19 int_rate numeric 425943 0 0.000000
20 last_week_pay factor 98 0 0.000000
21 loan_amnt integer 0 0 0.000000
22 mths_since_last_delinq integer 0 51.138533 0.189697
23 mths_since_last_major_derog integer 0 75.024827 0.016904
24 mths_since_last_record integer 0 84.580566 0.137108
25 open_acc integer 0 0.003052 0.000939
26 pub_rec integer 0 0.003052 84.713917
27 purpose factor 14 0 0.000000
28 pymnt_plan factor 2 0 0.000000
29 recoveries numeric 425943 0 97.230615
30 revol_bal numeric 425943 0 0.373524
31 revol_util numeric 425943 0.054467 0.394419
32 sub_grade factor 35 0 0.000000
33 target integer 0 0 76.343783
34 term factor 2 0 0.000000
35 title factor 39694 0 0.000000
36 tot_coll_amt integer 0 7.898944 79.062926
37 tot_cur_bal numeric 425943 7.898944 0.014086
Applied Machine learning algorithms and
Build Predictive Model on 4.32 lakh
customers
"Building a machine learning model that Predicts a loan defaulter"
Predicting 68% defaulters correctly
90% overall accuracy
Which variables are important to find
the defaulters?
Dictionary
Last_week_pay,total_revolving credit limit,
Interest received till date,
total current balace of all accounts,
Total collection amount ever owed,
Sub_grade,Addr_state
Contact :
V Raviteja Valluri,
Founder & Data Scientist,
Atom D Sciences & Analytics Pvt Ltd,
raviteja@atomdsciences.com,
+91-8501903007
www.atomdsciences.com
"Building a machine learning model that Predicts a loan defaulter"
"Building a machine learning model that Predicts a loan defaulter"

More Related Content

"Building a machine learning model that Predicts a loan defaulter"

  • 1. atom D sciences Building Machine learning applications for Enterprises Industry : Finance/Micro-Finance/Loans Case study : Predicting which applicant will be a Loan defaulter
  • 2. Problem Statement The Bank Indessa has not done well in last 3 quarters. Their NPAs (Non Performing Assets) have reached all time high. After careful analysis, it was found that the majority of NPA was contributed by loan defaulters. We developed a machine learning Model that predicted the applicants default status with almost 90% accuracy.
  • 3. Data set Number of Rows(Customers) : 532428 Number of Columns (variables) : 45 TARGET : 0 means Default ;1 means- No default
  • 5. Sl.no variable class levels missing value percent zeroes percent 1 acc_now_delinq integer 0 0.003052 99.533740 2 addr_state factor 51 0 0.000000 3 annual_inc numeric 425943 0.000704 0.000000 4 application_type factor 2 0 0.000000 5 batch_enrolled factor 104 19.920506 0.000000 6 collection_recovery_fee numeric 425943 0 97.360914 7 collections_12_mths_ex_med integer 0 0.018547 98.664375 8 delinq_2yrs integer 0 0.003052 80.722773 9 desc factor 70639 0 0.000000 10 dti numeric 425943 0 0.047189 11 emp_length factor 11 5.045276 0.000000 12 emp_title factor 190125 0.00047 0.000000 13 funded_amnt integer 0 0 0.000000 14 funded_amnt_inv numeric 425943 0 0.026999 15 grade factor 7 0 0.000000 16 home_ownership factor 6 0 0.000000 17 initial_list_status factor 2 0 0.000000 18 inq_last_6mths integer 0 0.003052 56.156340 19 int_rate numeric 425943 0 0.000000 20 last_week_pay factor 98 0 0.000000 21 loan_amnt integer 0 0 0.000000 22 mths_since_last_delinq integer 0 51.138533 0.189697 23 mths_since_last_major_derog integer 0 75.024827 0.016904 24 mths_since_last_record integer 0 84.580566 0.137108 25 open_acc integer 0 0.003052 0.000939 26 pub_rec integer 0 0.003052 84.713917 27 purpose factor 14 0 0.000000 28 pymnt_plan factor 2 0 0.000000 29 recoveries numeric 425943 0 97.230615 30 revol_bal numeric 425943 0 0.373524 31 revol_util numeric 425943 0.054467 0.394419 32 sub_grade factor 35 0 0.000000 33 target integer 0 0 76.343783 34 term factor 2 0 0.000000 35 title factor 39694 0 0.000000 36 tot_coll_amt integer 0 7.898944 79.062926 37 tot_cur_bal numeric 425943 7.898944 0.014086
  • 6. Applied Machine learning algorithms and Build Predictive Model on 4.32 lakh customers
  • 8. Predicting 68% defaulters correctly 90% overall accuracy
  • 9. Which variables are important to find the defaulters?
  • 10. Dictionary Last_week_pay,total_revolving credit limit, Interest received till date, total current balace of all accounts, Total collection amount ever owed, Sub_grade,Addr_state
  • 11. Contact : V Raviteja Valluri, Founder & Data Scientist, Atom D Sciences & Analytics Pvt Ltd, raviteja@atomdsciences.com, +91-8501903007 www.atomdsciences.com