This document proposes a hybrid statistical and machine learning solution to detect phone fraud in real time with minimal false positives. It uses statistical analysis and anomaly detection on live streaming phone data to identify anomalous phone numbers. Machine learning with random forests is then used to evaluate additional call features. Graph analysis methods like triangle counting and PageRank are applied to uncover outliers. Confirmed fraudsters are used to train an ensemble machine learning model to progressively improve fraud identification. The system incorporates active learning to enhance detection over time as it remains in use.
1 of 14
More Related Content
Phone Fraud Detection
1. 1
Phone Fraudsters in a Haystack
Sri Kanajan, Prasad Telekuntla, Mijail Gomez
3rd place in Tata Telecommunications Global Hackathon
2. 2
Leaves International Missed Call
Unknowingly Calls Premium Number or
Manipulative Advertisement
$2 BILLION OF LOST REVENUE FROM
TELCOM PROVIDERS
Example of Phone Fraud
3. 3
Motivations
Current statistical solutions have low specificity and sensitivity
Human fraud analysts have to continually update their heuristic
based rules and thresholds
Need an adaptive solution that works in real time with minimal false
positives
4. 4
Statistical
Analysis
Anomaly
Detection
Live Streaming
Phone Data
Hybrid Statistical and Machine Learning Solution
Number of Callers/Callee/Cumulative Call
Duration
Machine Learning
(Random Forests)
Evaluation of other features in the call log such as
answer indicator, area code, pricing
Used Hackathon De-identified Phone Log
Dataset 16 GB
5. 5
Anomaly Detection Through Statistical Analysis
# of Unique Callers per Phone
Number
# of Unique Callees per Phone
Number
Cumulative Duration of Calls to
Specific Phone Numbers
ANOMALOUS Phone Numbers!!
7. 7
Fraud Detection Using Graph Metrics
Triangle Counting
PageRank
Others Note: Goal is to uncover the callers that are
very different from the large majority
8. 8
Using Principal Component Analysis to uncover the outliers in the graph metrics
Fraud Detection Using Graph Metrics
Possible Fraudsters!
12. 12
Ensemble Model Machine Learning and Statistical
With labeled data, the classifier can progressively identify patterns
beyond the graph metrics (uses all other features in the raw call log)
E.g. patterns in area codes or specific pricing plans used by fraudsters
Active learning is done online while the system is active. I.e. the
longer the system is in use, the better it gets