際際滷

際際滷Share a Scribd company logo
1
Phone Fraudsters in a Haystack
Sri Kanajan, Prasad Telekuntla, Mijail Gomez
3rd place in Tata Telecommunications Global Hackathon
2
Leaves International Missed Call
Unknowingly Calls Premium Number or
Manipulative Advertisement
$2 BILLION OF LOST REVENUE FROM
TELCOM PROVIDERS
Example of Phone Fraud
3
Motivations
 Current statistical solutions have low specificity and sensitivity
 Human fraud analysts have to continually update their heuristic
based rules and thresholds
 Need an adaptive solution that works in real time with minimal false
positives
4
Statistical
Analysis
Anomaly
Detection
Live Streaming
Phone Data
Hybrid Statistical and Machine Learning Solution
Number of Callers/Callee/Cumulative Call
Duration
Machine Learning
(Random Forests)
Evaluation of other features in the call log such as
answer indicator, area code, pricing
Used Hackathon De-identified Phone Log
Dataset 16 GB
5
Anomaly Detection Through Statistical Analysis
# of Unique Callers per Phone
Number
# of Unique Callees per Phone
Number
Cumulative Duration of Calls to
Specific Phone Numbers
ANOMALOUS Phone Numbers!!
6
Statistical
Analysis
Anomaly
Detection
Machine Learning
(Random Forests)
Graph
Analysis
Anomaly
Detection
Live Streaming
Phone Data
Predicted
Anomalies
Hybrid Statistical and Machine Learning Solution
7
Fraud Detection Using Graph Metrics
 Triangle Counting
 PageRank
 Others Note: Goal is to uncover the callers that are
very different from the large majority
8
Using Principal Component Analysis to uncover the outliers in the graph metrics
Fraud Detection Using Graph Metrics
Possible Fraudsters!
9
Statistical
Analysis
Anomaly
Detection
Machine Learning
(Random Forests)
Graph
Analysis
Anomaly
Detection
Live Streaming
Phone Data
Predicted
Anomalies
Human
Observed
Fraud
Analyst
Hybrid Statistical and Machine Learning Solution
Possible Fraud
10
Human Fraud Analyst Confirmation of Fraudster
www.fraud-detector.net
Fraud Detection Using Graph Metrics
11
Statistical
Analysis
Anomaly
Detection
Machine Learning
(Random Forests)
Graph
Analysis
Anomaly
Detection
Live Streaming
Phone Data
Predicted
Anomalies
Confirmed
Fraudsters
Human
Observed
Fraud
Analyst
Hybrid Statistical and Machine Learning Solution
Possible Fraud
12
Ensemble Model  Machine Learning and Statistical
 With labeled data, the classifier can progressively identify patterns
beyond the graph metrics (uses all other features in the raw call log)
 E.g. patterns in area codes or specific pricing plans used by fraudsters
 Active learning is done online while the system is active. I.e. the
longer the system is in use, the better it gets
14
Conclusion
Possible False Positive
Possible Fraudster
16
Acknowledgements
D3
Python
Zipfian Academy
Technologies Used

More Related Content

Phone Fraud Detection