犖犖犖巌犖巌犖ム鍵犖犢犖犖÷弦犖ム犖犖犢 犢犖犖劇犖犖犖о顕犖÷権犖伍犖巌犖犖犖÷犖ム鍵犖犖о顕犖÷犖犢犖犖犖犖犖
犖犖犖犖∇顕犖∇犖犖 犖犖.犖犖. 犖犖迦犖犖犢 犖犖園犖犖巌犖о牽犖о鹸犖犖犢
犖犖項犖犖橿犖о権犖犖迦牽犖犖ム険犖犖犖項犖 Ph.D. 犢犖ム鍵 M.Sc. in Business Analytics and Data Science
犖犖迦犖迦牽犖∇犖犖犖萎犖橿肩犖迦犖迦硯犖巌犖 Actuarial science and risk management
犖犖犖萎肩犖犖巌犖巌犖犖萎権犖伍犖犢 犖犖犖迦犖園犖犖園犖犖巌犖犖園犖犖犖犖巌見犖迦牽犖犖迦肩犖犖犢
犖犖犖犖∇顕犖∇犖犢犖犖園犖犢犖迦牽犖迦犖犖迦牽犖犖迦犖犖ム顕犖∇見犖犢犖о権犖犖迦犖犖朽犢犖犢犖迦牽犖園犖犖迦牽犖犖犖犖÷犖犖犖犖橿犖園犖犖迦犖犖巌犖犖迦牽犖∇幻犖犖巌犖犖犖 犖犖犖萎犖犖о犖∇幻犖犖巌犖犖犖 17 犖犖巌犖犖迦犖 2018
1 of 54
Downloaded 41 times
More Related Content
Statistics and big data for justice and fairness
1. Statistics for justice and fairness
犖犖.犖犖.犖犖迦犖犖犢 犖犖園犖犖巌犖о牽犖о鹸犖犖犢
犖犖項犖犖迦犖о権犖犖迦牽犖犖ム険犖犖犖項犖
Ph.D. and M.Sc. in Business Analytics and Data Science
犖犖迦犖迦牽犖∇犖犖犖萎犖迦肩犖迦犖迦硯犖巌犖迦硯犖巌犖∇顕犖犖迦牽犖犖犖萎犖園犖犖園権犢犖ム鍵犖犖迦牽犖犖犖巌見犖迦牽犖犖о顕犖÷犖犖朽犖∇
犖犖犖萎肩犖犖巌犖巌犖犖萎権犖伍犖犢 犖犖犖迦犖園犖犖園犖犖巌犖犖園犖犖犖犖巌見犖迦牽犖犖迦肩犖犖犢
6. Roles of statistics in fairness and justice
Facilitate fairness
Detect anomaly and fraud
Prevent crime and anomaly
Regulatory Impact Assessment
15. There is no crime without any trace!
-Large deviation from normal or average man or cluster.
-Large deviation from past behavior.
-Inconsistency with themselves and surroundings.
-Repeated anomaly pattern.
-Caution on statistical detection of cheating and anomalous detection
Anomaly Detection
17. Large deviation from normal or average man or cluster.
v
58
Severity
Frequency58
18. Loss58 = f(Frequency57, Severity57, ICD-1057, ICD-957
,ICD-1058, ICD-958, age, gender)
Loss58
58
Predictors
Under Predict (Fraud or abuse)
v
vvvvv
vvvvv
vvvvv
vv
vv
v
v
vv
v
vv
Large deviation from past behavior.
19. Large deviation from past behavior.
TOEFL time 2
TOEFL time 1
Under Predict (Fraud or abuse)
v
v
vv
vv
vv
vvv
vv
vvv
vv
vv
v
v
vv
v
v
v
20. Inconsistency with themselves and surroundings.
-Low ability test taker can answer difficult item.
-K-index for copying! Eight dimensions
-Scoring test with contaminated response vector
-Influence function + Robust estimators
22. -5 -4 -3 -2 -1 0 1 2 3
0
10
20
Pseudovalue Distribution for an Optima Examinee
Proficiency
Estimaate
Frequency
From Incorrect
Responses
From Correct
Responses
28. Positive Predictive Value: PPV
Caution on statistical detection of cheating
64.76 % 99.30%
29. Statistical evidence as a red flag or warning
Physical evidence is always needed.
Early detection, protection, and prevention.
Bayesian flip is needed.
Caution on statistical detection of cheating
P(Cheating=Yes|Detection=Yes)
P(Detection=Yes|Cheating=Yes)
P(Cheating=No|Detection=No)
P(Detection=No|Cheating=No)
P(Cheating=Yes|Detection=Yes)=P(Detection=Yes|Cheating=Yes)*P(Cheating=Yes)
P(Detection=Yes)
35. LOF = Local density of k neighbor/Local density of its own point
The Higher LOF = the more extreme local outlier!!!!
Determine sigma (radius / reachable distance around point) so
that we can count k neighbor.
Local density for point = numbers of points within reachable
distance/sum of distance between points and all k neighbors
LOF