際際滷

際際滷Share a Scribd company logo
February 2010

Assessing the predictive capacity measure of a Fraud
Management System
By Marco Scattareggia

Summary
Looking for a unique FMS predictive performance index,
one of the best choices is to adopt the area under the
ROC curve AUC. ROC curves give us more attractive
metrics, if compared to Precision/Recall graphs, because
they are not sensitive to Fraud/Not Fraud class
distributions skewness variability. In fact, the fraud class
distribution skewness changes both from country to
country and over time for the same telecommunication
operator in the same country.
Was the information contained in this article useful? Your feedback to email
is appreciated!

Marco Scattareggia has a degree on Electronic Engineer, works in Rome
(Italy) for EMEA HP and runs the CoE (Center of Excellence) to develop fraud
management solutions for telecommunication operators.

Acknowledgement
I would like to acknowledge the efforts of Luigi Catzola associated researcher
at SEMEION for his support and help in reviewing the KB. Very special thanks
go to Flavio Riciniello, Executive Officer of Eidos Consult and former Fraud
Manager of Telecom Italia, for his authoritative approval and suggestions
about skewness and kurtosis factors.


Introduction
When people use communications without paying for the service, they steal
from the Telecom Operators or Service Providers and commit a fraud. There
is no difference between stealing the property of another and stealing
services, as in both cases, something of value is taken without
compensation. According to the Communications Fraud Control Association
(CFCA), fraud losses for Network & Service Providers (NSP) are in the range
of $35 - $40 billion U.S. dollars worldwide. Operators are aware of the
problem and often use an automated Fraud Management System (FMS) to
fight fraudsters, but they still do not grasp the true complexity of fraud,
because there are many factors that can affect the accuracy of fraud
detection and it is possible to make the threshold for detection very high,
very low, or anywhere in-between, depending upon the FMS settings and
data sources.



                                    Page: 1
Every FMS collects demographic information about subscribers from the
Customer Care or Billing department, and monitors the events (calls,
messages, etc.) of subscribers over the network. The conceptual FMS
architecture in Figure 1 shows the fraud intelligence capabilities for scoring
new subscribers, clustering subscribers in different segments to induct
patterns or to allow rules to be applied to create an on-line filter for the
detection of anomalous events. After detecting anomalous events, the FMS
generates alarms and, when sufficient indications of fraud exist, flags fraud
cases to the analysts who are responsible for deciding what to do with these
cases. From an overall perspective, the function of the FMS is the
classification of network events to identify those that could be fraudulent.
The case scoring function enables the isolation of only those cases that have
enough probability to be in the True Positive category and avoids overloading
the analysts and the Fraud Manager with many False Positives.




                         Figure 1: FMS Architecture
This Knowledge Brief describes how to achieve accuracy in the classification
of fraudulent cases and how to to identify the best parameters to use as a
predictive performance index of a FMS. Key Performance Indicators (KPIs)
such as Accuracy, Misclassification Rate, and Hit Rate are popular with
operators, but are not enough to detect fraud. The couplet of Precision/Recall
works reasonably well in information retrieval applications and is a good
candidate for our purposes, but Sensitivity (the percent of Frauds classified
as fraud) and 1-Specificity (the percent of Not Frauds classified as fraud) fit
the fraud fighting process in the telecommunication arena better, even if the
proportions of Fraud to Not Fraud instances vary significantly from period to
period and country to country. Receiver Operating Characteristic (ROC)
curves, the plot of the Sensitivity against 1-Specificity at many cut-points,
are insensitive to class skews and the area under these curves (AUC)



                                    Page: 2
measures the ability of an FMS to separate Frauds from Not Frauds. AUC is
representative, at the same time, of how many true frauds are detected
every day, week or month (effectiveness) and of how many false alarms are
slowing down the fraud management process (efficiency).
The concepts presented here are based on recent data mining literature on
predictive and classification models. For example, see the tutorial "The Many
Faces of ROC Analysis in Machine Learning" issued during the ICML Twenty-
First International Conference on Machine Learning on July 4-8, 2004
(http://www.aicml.cs.ualberta.ca/_banff04/icml/). However, this Knowledge
Brief relies more on years of experience implementing FMS solutions for HP
Customers.

KPIs for FMS
Once Fraud Managers have analyzed what they want to do and have defined
their department goals, they need a way to measure progress toward these
goals via some Key Performance Indicators (KPIs). It is very important to
choose KPIs that are SMART:
           Specific  Linked to a particular activity, clear and unambiguous
           Measurable  objective
           Attainable  incrementally challenging but realistic
           Relevant  meaningful to the telecom operator
           Time bound  regularly measured and reviewed
Best practice is to measure a KPI at regular intervals and against specific
benchmarks. Benchmarking enables you to improve performance in a
systematic and logical way by measuring and comparing one operators
performance against the others, and then using lessons learned from the
best of them to make targeted improvements. It involves answering the
questions:
           Who performs better?
           Why are they better?
           What actions do we need to take in order to improve our
            performance?
The purpose of our analysis is discovering which KPI might better cover fraud
management necessities.

Notation
The red color represents Frauds while the blue represents Not Frauds. When
there is a mixture of both fraud and not fraud, the black color is used.
The totality of Frauds, 100% of observed fraud cases, matches the Total
Positive p cases given by the sum of frauds hit (True Positives) and frauds
missed (False Negatives) by FMS detection. The true positive rate of
detection of an FMS, TP, can be obtained dividing the number of True
Positives by the totality of frauds p.     The false negative rate of FMS



                                    Page: 3
detection, FN, is given by dividing the number of False Negatives by the
same p; they are complementary because TP + FN = 100% = 1.
Similarly, the totality of Not Frauds corresponds to the Total Negative n
cases given by the sum of false alarms (False Positives) and residual honest
subscribers (True Negatives); the associated rates FP and TN are
complementary: FP + TN = 100% = 1.
The four rates TP, FN, FP and TN can be summarized in a cross tabulation
matrix with columns of observed Frauds and Not Frauds and rows reporting
FMS predictions (see Table 1). This classification table has been also called a
Confusion Matrix because it helps in getting rid of confusion during
classification activity; it provides a measure of how well the FMS performs.


                                  Frauds                            Not Frauds
                     True Positive                        False Positive
Predicted Positive          TP = True Positive/p                 FP = False Positive/n

                     False Negative                       True Negative
Predicted Negative         FN = False Negative /p               TN = True Negative/n

                     Total Positive                       Total Negative
Total Cases          True Positive + False Negative = p   False Positive + True Negative = n
                                  TP + FN = 1                          FP + TN = 1


                        Table 1: Classification Matrix
The distribution of the four rates should be plotted by frequencies measured
along a delivered service value. Phone time elapsed in minutes or data
transferred in bytes can represent the proper service value in a
telecommunication context. If available, the corresponding value in local
currency: SDRs, dollars, or euros charged by the billing department, works
even better and allows benchmarking across different operators and
countries.
In the following paragraphs, Fraud and Not Fraud frequency distributions are
symmetric and look like Gaussian curves (see Figure 2). This is only for
purposes of illustration, because Fraud and Not Fraud distributions are
asymmetrical with a skewness factor negative for frauds and positive for not
frauds. Besides, it is important to analyze the kurtosis factor flattening the
distribution near the maximum of the curve. Kurtosis is a measure of
whether the curves are peaked or flat relative to a normal distribution. Data
sets with high kurtosis tend to have a distinct peak near the mean, decline
rather rapidly, and have heavy tails; data sets with low kurtosis tend to have
a flat top near the mean rather than a sharp peak (see Figure 3).
Confronting distributions of absolute Fraud and Not Fraud values, we see a
very strong skewness between them because in the telecommunication
context there could be a ratio of 1 Fraud for perhaps as many as every 1,000
or even 10,000 Not Frauds. Figure 4 shows this skewness, lowered by a
logarithmic vertical scale, and synthesizes the basic parameters necessary to


                                       Page: 4
analyze FMS predictive capabilities. The area under the red distribution
represents the p cases belonging to the Fraud class, while the blue one gives
the distribution of Not Fraud class to which the n cases belong.




                  Figure 2 Gauss probability distributions




                 Figure 3: Skewness and Kurtosis Factors
We might calculate the probability at or above any given threshold
(represented by a green line in Figure 4), for an alarm to be correct or
incorrect, by determining the fractions of cases properly classified if that
threshold were applied. In Figure 4 example, the threshold delivered service
value is 200 and 81% of subscribers would be correctly reported (True
Positives) as fraudsters, while 16% of honest subscribers would be
incorrectly classified (False Positives). At the same time, 19% of real
fraudsters would be incorrectly classified as honest subscribers (False
Negatives) and, finally, 84% of honest subscribers would be correctly
classified (True Negatives). The series of four values so computed should be
reported by Classification Matrixes as showed on the lower right corner of the
Figure 4.
The FMS Case Scoring function interprets such frequencies as probabilities
and predicts positive or negative cases according to the percentages of true
or false alarms at different cut-points given by the threshold values of the
delivered service.
Fraud analysts verify cases predicted positive and classify them as True
Positive or False Positive (resolution phase). All the other cases, predicted
negative, will be initially classified as True Negatives, but some of them
might turn out as False Negatives when accounting for the unpaid invoices in
the credit and risk department.



                                   Page: 5
Figure 4: Fraud Case Scoring
Fraud Managers should carefully compile Classification Matrixes and derive
from them powerful indexes combining the basic four variables TP, FN, FP
and TN in different KPIs:

      p = total Frauds
      True Positive rate = True Positive / p = TP = 1 -     FN
      False Negative rate = False Negative / p = FN = 1 -   TP
      n = total Not Frauds
      False Positive rate = False Positive / n = FP = 1 -   TN
      True Negative rate = True Negative / n = TN = 1 -      FP
    Accuracy = Total correctly classified / Total cases =
     (TP+TN)/(p+n)
    Misclassification Rate = "Total not correctly classified" / "Total
     cases" = (FN+FP)/(p+n)
    Accuracy = 1 - Misclassification Rate
    Precision = True Positive / (True Positive + False Positive)
    Recall    = True Positive / p = True Positive rate
    Hit Rate 1 = Precision
    Hit Rate 2 = Recall
    Sensitivity = True Positive rate      = Recall


                                 Page: 6
 Specificity = True Negative rate
    Sensitivity       = True Positive rate
    (1  Specificity) = False Positive rate

Accuracy and Misclassification Rate
Accuracy maximization is very popular within the analyst community, but it
is not appropriate for an FMS because it assumes equal misclassification
costs for both False Positive and False Negative errors. In fraud detection,
the cost of missing a case of fraud can be much higher than the cost of a
false alarm. Moreover, the fraud events class is comparatively rare, and
when fraudulent activity involves only 0.01% of a population to predict all
the events as Not Fraud achieves 99.99% Accuracy, which is highly
acceptable from a global perspective, but it is completely unacceptable for
effectively predicting fraud because, despite the high Accuracy, we would
miss all the fraud! In conclusion, adopting Accuracy as an FMS predictive
evaluation metric, we would wrongly assume that distribution between
Frauds and Not Frauds is constant and balanced.
It is not advisable to use the Misclassification Rate for evaluating FMS for
exactly the same reasons we discussed for discarding Accuracy. The
misclassification Rate is indeed the complementary percentage to Accuracy
(Misclassification Rate = 1- Accuracy).

Precision/Recall and Hit-Rate
Precision and Recall are the basic measures used in evaluating search
strategies. Precision is the ratio of the number of relevant records retrieved
to the total number of irrelevant and relevant records retrieved:
         Precision = correctly classified / total predicted positive
Recall is the ratio of the number of relevant records retrieved to the total
number of relevant records in the database:
           Recall = correctly classified / total positive existing
As previously done with the information retrieval best practices, we have to
analyze the Precision efficiency together with the Recall effectiveness to see
and to understand the tradeoff existing between them:
without adding information content, we cannot reach simultaneously higher
                        Precision and wider Recall.
The Precision-Recall graph plotted in Figure 5 visualizes this concept and in
the fraud detection process the definitions of Precision and Recalls became:
      Precision = True Positive / (True Positive + False Positive)
      Recall    = True Positive / p = True Positive rate
We cannot be satisfied using Precision/Recall as an FMS performance
predictive metric, because when the class distribution changes the metric will


                                    Page: 7
change too. In other words, Precision/Recall is sensitive to Fraud/Not Fraud
class distributions skewness variability. The minority class (Frauds) has much
lower Precision and Recall than the prevalent class (Not Frauds) and many
practitioners have observed that for extremely skewed classes the Recall of
Frauds is often 0 and there are no classification rules that can be generated
for it.
Often, FMS suppliers propose Hit Rate as the sole parameter to represent
the predictive performance of their system.
Among fraud managers Hit Rate is usually evaluated according to Precision,
while at other times it is described similar to Recall. In any case, Hit Rate
alone cannot be sufficient to estimate, at the same time, the effectiveness in
terms of how many real frauds are detected in comparison with their
totality (i.e., Recall or True Positive rate) and the efficiency in terms of how
few false alarms are going to slow down the resolution phase of the fraud
management process (i.e., Precision).




          Figure 5: Adding more Information for Retrieval

Sensitivity/Specificity and ROC curves
The important tradeoff between Precision and Recall we discussed earlier is
similar to the one that exists between the Sensitivity, given by the percent of
Frauds classified as Frauds, and Specificity, corresponding to the percent of
Not Frauds classified as Not Frauds, or its complementary value 1-Specificity
given by the percent of Not Frauds classified as Frauds.
To analyze Sensitivity and Specificity, it is advisable to adopt the ROC
(Receiver Operating Characteristic) curves by plotting, for each potential
threshold value, the frequency of true positive cases (Sensitivity) against the
frequency of false positives (1-Specificity). The diagonal straight line would


                                     Page: 8
signify that the system had a 50/50 chance of making a correct alarm (i.e.,
no better than flipping a coin).
In Figure 6 there are two ROC curves plotted by SPSS 13.0 for the
Windows ROC graph tool. These curves are evaluated according to the
threshold scores of one Neural Network and one C5.0 Decision Tree trained
by SPSS/Clementine 8.0 upon an HP-FMS 7.0-3 Case Archive containing
a sample of 1387 total Frauds and 25884 total Not Frauds.
The coordinates of ROC curves are the true and false positive rates or
frequencies:
   Sensitivity = Positives correctly classified / total Frauds
   Sensitivity = True Positive rate = TP
   1 - Specificity = 1 - True Negative rate = 1  TN/(total Not Fraud)
   1 - Specificity = False Positive rate = FP
In Telecommunications systems, the proportions of Fraud to Not Fraud
instances vary significantly from period to period and country to country. As
matter of fact, the ROC curves are insensitive to class skews; they do not
change when the absolute dimension of each class varies and skewness
varies as well but TP and FP, being percentages, do not.




                            Figure 6: ROC curves
Any performance metric that uses values from both Classification Matrix
columns will be inherently sensitive to Fraud / Not Fraud proportion changes;



                                    Page: 9
metrics such as Accuracy and Precision use values from both columns of the
Classification Matrix, and when the class distribution changes, these
measures will change as well, even if the fundamental FMS performance does
not. According to our notation, Sensitivity has the red color of Frauds and 1 
Specificity has the blue color of Not Frauds but the same is not true for
Accuracy and Precision (e.g., Precision has been defined by TP and FP which
are picked up from heterogeneous columns of the Classification Matrix).
To analyze the tradeoff between TP and FP we can consider their variability
depending on different thresholds. The following Figure 7 and Figure 8 show
the effect of two different thresholds values that change the true and false
positive frequencies from TP=0.489, FP=0.088; to TP=0.882, FP=0.446.




                       Figure 7: ROC curves analysis




                       Figure 8: ROC curves analysis
Then, by adding more information content (e.g., with a black list or a
qualitative rule) we see how the increased distance between the means of
the Fraud and Not Fraud distributions enable us to get a higher TP at the
same FP. In Figure 9, the TP has improved from 0.882 to 0.985 while the FP
stands still at the same value 0.446.




                       Figure 9: ROC curves analysis
Lowering the information content, the distance between the means of the
distributions will be smaller and the two distributions overlap considerably
more, resulting in performance degradation. In this last case we can observe


                                   Page: 10
in Figure 10 the lower TP 0.595 taking over the previous 0.882, while the FP
continues to stand still at 0.446.




                       Figure 10: ROC curves analysis
Comparing the four figures above, we should also take note of the area
under the ROC curve (AUC). In Figure 7 and Figure 8 the AUC is about 0.827,
then by adding information content on Figure 9 the AUC grows to 0.949 while
subtracting it as shown in Figure 10, the AUC value is reduced to 0.607.
AUC measures the ability of an FMS to separate Frauds from Not Frauds. It is
relatively representative (i.e., reflects the percentages) of FMS effectiveness
in terms of how many actual fraudulent events might be detected every day,
week, or month, and of its efficiency in terms of how few false alarms would
slow down the fraud management process.
Telecom Operators can use ROC curve analysis to evaluate FMS available in
the market and buy the best performer or, while configuring a FMS, the
Knowledge Manager could test different parameters or rules and tune FMS
performance in terms of TP and FP rates. One trivial observation is that
frequencies predicted negative, FN and TN, must be available, otherwise it
will not be possible to plot ROC curves and compute the AUC.




                                   Page: 11

More Related Content

Assessing the predictive capacity measur marco scattareggia

  • 1. February 2010 Assessing the predictive capacity measure of a Fraud Management System By Marco Scattareggia Summary Looking for a unique FMS predictive performance index, one of the best choices is to adopt the area under the ROC curve AUC. ROC curves give us more attractive metrics, if compared to Precision/Recall graphs, because they are not sensitive to Fraud/Not Fraud class distributions skewness variability. In fact, the fraud class distribution skewness changes both from country to country and over time for the same telecommunication operator in the same country. Was the information contained in this article useful? Your feedback to email is appreciated! Marco Scattareggia has a degree on Electronic Engineer, works in Rome (Italy) for EMEA HP and runs the CoE (Center of Excellence) to develop fraud management solutions for telecommunication operators. Acknowledgement I would like to acknowledge the efforts of Luigi Catzola associated researcher at SEMEION for his support and help in reviewing the KB. Very special thanks go to Flavio Riciniello, Executive Officer of Eidos Consult and former Fraud Manager of Telecom Italia, for his authoritative approval and suggestions about skewness and kurtosis factors. Introduction When people use communications without paying for the service, they steal from the Telecom Operators or Service Providers and commit a fraud. There is no difference between stealing the property of another and stealing services, as in both cases, something of value is taken without compensation. According to the Communications Fraud Control Association (CFCA), fraud losses for Network & Service Providers (NSP) are in the range of $35 - $40 billion U.S. dollars worldwide. Operators are aware of the problem and often use an automated Fraud Management System (FMS) to fight fraudsters, but they still do not grasp the true complexity of fraud, because there are many factors that can affect the accuracy of fraud detection and it is possible to make the threshold for detection very high, very low, or anywhere in-between, depending upon the FMS settings and data sources. Page: 1
  • 2. Every FMS collects demographic information about subscribers from the Customer Care or Billing department, and monitors the events (calls, messages, etc.) of subscribers over the network. The conceptual FMS architecture in Figure 1 shows the fraud intelligence capabilities for scoring new subscribers, clustering subscribers in different segments to induct patterns or to allow rules to be applied to create an on-line filter for the detection of anomalous events. After detecting anomalous events, the FMS generates alarms and, when sufficient indications of fraud exist, flags fraud cases to the analysts who are responsible for deciding what to do with these cases. From an overall perspective, the function of the FMS is the classification of network events to identify those that could be fraudulent. The case scoring function enables the isolation of only those cases that have enough probability to be in the True Positive category and avoids overloading the analysts and the Fraud Manager with many False Positives. Figure 1: FMS Architecture This Knowledge Brief describes how to achieve accuracy in the classification of fraudulent cases and how to to identify the best parameters to use as a predictive performance index of a FMS. Key Performance Indicators (KPIs) such as Accuracy, Misclassification Rate, and Hit Rate are popular with operators, but are not enough to detect fraud. The couplet of Precision/Recall works reasonably well in information retrieval applications and is a good candidate for our purposes, but Sensitivity (the percent of Frauds classified as fraud) and 1-Specificity (the percent of Not Frauds classified as fraud) fit the fraud fighting process in the telecommunication arena better, even if the proportions of Fraud to Not Fraud instances vary significantly from period to period and country to country. Receiver Operating Characteristic (ROC) curves, the plot of the Sensitivity against 1-Specificity at many cut-points, are insensitive to class skews and the area under these curves (AUC) Page: 2
  • 3. measures the ability of an FMS to separate Frauds from Not Frauds. AUC is representative, at the same time, of how many true frauds are detected every day, week or month (effectiveness) and of how many false alarms are slowing down the fraud management process (efficiency). The concepts presented here are based on recent data mining literature on predictive and classification models. For example, see the tutorial "The Many Faces of ROC Analysis in Machine Learning" issued during the ICML Twenty- First International Conference on Machine Learning on July 4-8, 2004 (http://www.aicml.cs.ualberta.ca/_banff04/icml/). However, this Knowledge Brief relies more on years of experience implementing FMS solutions for HP Customers. KPIs for FMS Once Fraud Managers have analyzed what they want to do and have defined their department goals, they need a way to measure progress toward these goals via some Key Performance Indicators (KPIs). It is very important to choose KPIs that are SMART: Specific Linked to a particular activity, clear and unambiguous Measurable objective Attainable incrementally challenging but realistic Relevant meaningful to the telecom operator Time bound regularly measured and reviewed Best practice is to measure a KPI at regular intervals and against specific benchmarks. Benchmarking enables you to improve performance in a systematic and logical way by measuring and comparing one operators performance against the others, and then using lessons learned from the best of them to make targeted improvements. It involves answering the questions: Who performs better? Why are they better? What actions do we need to take in order to improve our performance? The purpose of our analysis is discovering which KPI might better cover fraud management necessities. Notation The red color represents Frauds while the blue represents Not Frauds. When there is a mixture of both fraud and not fraud, the black color is used. The totality of Frauds, 100% of observed fraud cases, matches the Total Positive p cases given by the sum of frauds hit (True Positives) and frauds missed (False Negatives) by FMS detection. The true positive rate of detection of an FMS, TP, can be obtained dividing the number of True Positives by the totality of frauds p. The false negative rate of FMS Page: 3
  • 4. detection, FN, is given by dividing the number of False Negatives by the same p; they are complementary because TP + FN = 100% = 1. Similarly, the totality of Not Frauds corresponds to the Total Negative n cases given by the sum of false alarms (False Positives) and residual honest subscribers (True Negatives); the associated rates FP and TN are complementary: FP + TN = 100% = 1. The four rates TP, FN, FP and TN can be summarized in a cross tabulation matrix with columns of observed Frauds and Not Frauds and rows reporting FMS predictions (see Table 1). This classification table has been also called a Confusion Matrix because it helps in getting rid of confusion during classification activity; it provides a measure of how well the FMS performs. Frauds Not Frauds True Positive False Positive Predicted Positive TP = True Positive/p FP = False Positive/n False Negative True Negative Predicted Negative FN = False Negative /p TN = True Negative/n Total Positive Total Negative Total Cases True Positive + False Negative = p False Positive + True Negative = n TP + FN = 1 FP + TN = 1 Table 1: Classification Matrix The distribution of the four rates should be plotted by frequencies measured along a delivered service value. Phone time elapsed in minutes or data transferred in bytes can represent the proper service value in a telecommunication context. If available, the corresponding value in local currency: SDRs, dollars, or euros charged by the billing department, works even better and allows benchmarking across different operators and countries. In the following paragraphs, Fraud and Not Fraud frequency distributions are symmetric and look like Gaussian curves (see Figure 2). This is only for purposes of illustration, because Fraud and Not Fraud distributions are asymmetrical with a skewness factor negative for frauds and positive for not frauds. Besides, it is important to analyze the kurtosis factor flattening the distribution near the maximum of the curve. Kurtosis is a measure of whether the curves are peaked or flat relative to a normal distribution. Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails; data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak (see Figure 3). Confronting distributions of absolute Fraud and Not Fraud values, we see a very strong skewness between them because in the telecommunication context there could be a ratio of 1 Fraud for perhaps as many as every 1,000 or even 10,000 Not Frauds. Figure 4 shows this skewness, lowered by a logarithmic vertical scale, and synthesizes the basic parameters necessary to Page: 4
  • 5. analyze FMS predictive capabilities. The area under the red distribution represents the p cases belonging to the Fraud class, while the blue one gives the distribution of Not Fraud class to which the n cases belong. Figure 2 Gauss probability distributions Figure 3: Skewness and Kurtosis Factors We might calculate the probability at or above any given threshold (represented by a green line in Figure 4), for an alarm to be correct or incorrect, by determining the fractions of cases properly classified if that threshold were applied. In Figure 4 example, the threshold delivered service value is 200 and 81% of subscribers would be correctly reported (True Positives) as fraudsters, while 16% of honest subscribers would be incorrectly classified (False Positives). At the same time, 19% of real fraudsters would be incorrectly classified as honest subscribers (False Negatives) and, finally, 84% of honest subscribers would be correctly classified (True Negatives). The series of four values so computed should be reported by Classification Matrixes as showed on the lower right corner of the Figure 4. The FMS Case Scoring function interprets such frequencies as probabilities and predicts positive or negative cases according to the percentages of true or false alarms at different cut-points given by the threshold values of the delivered service. Fraud analysts verify cases predicted positive and classify them as True Positive or False Positive (resolution phase). All the other cases, predicted negative, will be initially classified as True Negatives, but some of them might turn out as False Negatives when accounting for the unpaid invoices in the credit and risk department. Page: 5
  • 6. Figure 4: Fraud Case Scoring Fraud Managers should carefully compile Classification Matrixes and derive from them powerful indexes combining the basic four variables TP, FN, FP and TN in different KPIs: p = total Frauds True Positive rate = True Positive / p = TP = 1 - FN False Negative rate = False Negative / p = FN = 1 - TP n = total Not Frauds False Positive rate = False Positive / n = FP = 1 - TN True Negative rate = True Negative / n = TN = 1 - FP Accuracy = Total correctly classified / Total cases = (TP+TN)/(p+n) Misclassification Rate = "Total not correctly classified" / "Total cases" = (FN+FP)/(p+n) Accuracy = 1 - Misclassification Rate Precision = True Positive / (True Positive + False Positive) Recall = True Positive / p = True Positive rate Hit Rate 1 = Precision Hit Rate 2 = Recall Sensitivity = True Positive rate = Recall Page: 6
  • 7. Specificity = True Negative rate Sensitivity = True Positive rate (1 Specificity) = False Positive rate Accuracy and Misclassification Rate Accuracy maximization is very popular within the analyst community, but it is not appropriate for an FMS because it assumes equal misclassification costs for both False Positive and False Negative errors. In fraud detection, the cost of missing a case of fraud can be much higher than the cost of a false alarm. Moreover, the fraud events class is comparatively rare, and when fraudulent activity involves only 0.01% of a population to predict all the events as Not Fraud achieves 99.99% Accuracy, which is highly acceptable from a global perspective, but it is completely unacceptable for effectively predicting fraud because, despite the high Accuracy, we would miss all the fraud! In conclusion, adopting Accuracy as an FMS predictive evaluation metric, we would wrongly assume that distribution between Frauds and Not Frauds is constant and balanced. It is not advisable to use the Misclassification Rate for evaluating FMS for exactly the same reasons we discussed for discarding Accuracy. The misclassification Rate is indeed the complementary percentage to Accuracy (Misclassification Rate = 1- Accuracy). Precision/Recall and Hit-Rate Precision and Recall are the basic measures used in evaluating search strategies. Precision is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved: Precision = correctly classified / total predicted positive Recall is the ratio of the number of relevant records retrieved to the total number of relevant records in the database: Recall = correctly classified / total positive existing As previously done with the information retrieval best practices, we have to analyze the Precision efficiency together with the Recall effectiveness to see and to understand the tradeoff existing between them: without adding information content, we cannot reach simultaneously higher Precision and wider Recall. The Precision-Recall graph plotted in Figure 5 visualizes this concept and in the fraud detection process the definitions of Precision and Recalls became: Precision = True Positive / (True Positive + False Positive) Recall = True Positive / p = True Positive rate We cannot be satisfied using Precision/Recall as an FMS performance predictive metric, because when the class distribution changes the metric will Page: 7
  • 8. change too. In other words, Precision/Recall is sensitive to Fraud/Not Fraud class distributions skewness variability. The minority class (Frauds) has much lower Precision and Recall than the prevalent class (Not Frauds) and many practitioners have observed that for extremely skewed classes the Recall of Frauds is often 0 and there are no classification rules that can be generated for it. Often, FMS suppliers propose Hit Rate as the sole parameter to represent the predictive performance of their system. Among fraud managers Hit Rate is usually evaluated according to Precision, while at other times it is described similar to Recall. In any case, Hit Rate alone cannot be sufficient to estimate, at the same time, the effectiveness in terms of how many real frauds are detected in comparison with their totality (i.e., Recall or True Positive rate) and the efficiency in terms of how few false alarms are going to slow down the resolution phase of the fraud management process (i.e., Precision). Figure 5: Adding more Information for Retrieval Sensitivity/Specificity and ROC curves The important tradeoff between Precision and Recall we discussed earlier is similar to the one that exists between the Sensitivity, given by the percent of Frauds classified as Frauds, and Specificity, corresponding to the percent of Not Frauds classified as Not Frauds, or its complementary value 1-Specificity given by the percent of Not Frauds classified as Frauds. To analyze Sensitivity and Specificity, it is advisable to adopt the ROC (Receiver Operating Characteristic) curves by plotting, for each potential threshold value, the frequency of true positive cases (Sensitivity) against the frequency of false positives (1-Specificity). The diagonal straight line would Page: 8
  • 9. signify that the system had a 50/50 chance of making a correct alarm (i.e., no better than flipping a coin). In Figure 6 there are two ROC curves plotted by SPSS 13.0 for the Windows ROC graph tool. These curves are evaluated according to the threshold scores of one Neural Network and one C5.0 Decision Tree trained by SPSS/Clementine 8.0 upon an HP-FMS 7.0-3 Case Archive containing a sample of 1387 total Frauds and 25884 total Not Frauds. The coordinates of ROC curves are the true and false positive rates or frequencies: Sensitivity = Positives correctly classified / total Frauds Sensitivity = True Positive rate = TP 1 - Specificity = 1 - True Negative rate = 1 TN/(total Not Fraud) 1 - Specificity = False Positive rate = FP In Telecommunications systems, the proportions of Fraud to Not Fraud instances vary significantly from period to period and country to country. As matter of fact, the ROC curves are insensitive to class skews; they do not change when the absolute dimension of each class varies and skewness varies as well but TP and FP, being percentages, do not. Figure 6: ROC curves Any performance metric that uses values from both Classification Matrix columns will be inherently sensitive to Fraud / Not Fraud proportion changes; Page: 9
  • 10. metrics such as Accuracy and Precision use values from both columns of the Classification Matrix, and when the class distribution changes, these measures will change as well, even if the fundamental FMS performance does not. According to our notation, Sensitivity has the red color of Frauds and 1 Specificity has the blue color of Not Frauds but the same is not true for Accuracy and Precision (e.g., Precision has been defined by TP and FP which are picked up from heterogeneous columns of the Classification Matrix). To analyze the tradeoff between TP and FP we can consider their variability depending on different thresholds. The following Figure 7 and Figure 8 show the effect of two different thresholds values that change the true and false positive frequencies from TP=0.489, FP=0.088; to TP=0.882, FP=0.446. Figure 7: ROC curves analysis Figure 8: ROC curves analysis Then, by adding more information content (e.g., with a black list or a qualitative rule) we see how the increased distance between the means of the Fraud and Not Fraud distributions enable us to get a higher TP at the same FP. In Figure 9, the TP has improved from 0.882 to 0.985 while the FP stands still at the same value 0.446. Figure 9: ROC curves analysis Lowering the information content, the distance between the means of the distributions will be smaller and the two distributions overlap considerably more, resulting in performance degradation. In this last case we can observe Page: 10
  • 11. in Figure 10 the lower TP 0.595 taking over the previous 0.882, while the FP continues to stand still at 0.446. Figure 10: ROC curves analysis Comparing the four figures above, we should also take note of the area under the ROC curve (AUC). In Figure 7 and Figure 8 the AUC is about 0.827, then by adding information content on Figure 9 the AUC grows to 0.949 while subtracting it as shown in Figure 10, the AUC value is reduced to 0.607. AUC measures the ability of an FMS to separate Frauds from Not Frauds. It is relatively representative (i.e., reflects the percentages) of FMS effectiveness in terms of how many actual fraudulent events might be detected every day, week, or month, and of its efficiency in terms of how few false alarms would slow down the fraud management process. Telecom Operators can use ROC curve analysis to evaluate FMS available in the market and buy the best performer or, while configuring a FMS, the Knowledge Manager could test different parameters or rules and tune FMS performance in terms of TP and FP rates. One trivial observation is that frequencies predicted negative, FN and TN, must be available, otherwise it will not be possible to plot ROC curves and compute the AUC. Page: 11