�ݺ�ߣ

Using fairness metrics to
solve ethical dilemmas of
machine learning
Kovács László
laszlo@peak1014.com

Using fairness metrics to solve ethical dilemmas of machine learning

Trust & Fairness
● Machine learning systems are not inherently fair
● Data requirements from the Probably Approximately Correct
learning theory: plenty of representative data
● Bias in → bias out
● Algorithms are objective, success criteria are not

The letter of the law
European Union: Policy and investment recommendations for trustworthy Artificial
Intelligence (guideline)
...
5. Measure and monitor the societal impact of AI
28. Consider the need for new regulation to ensure adequate protection from adverse impacts
12. Safeguard fundamental rights in AI-based public services and protect societal infrastructures
United States of America: Algorithmic Accountability Act of 2019
A bill to direct the Federal Trade Commission to require entities that use, store, or share personal
information to conduct automated decision system impact assessments (ADSIA). [... an ADSIA] means a
study evaluating an automated decision system and the automated decision system’s development process,
including the design and training data of the automated decision system, for impacts on accuracy, fairness,
bias, discrimination, privacy

The spirit of the law
Be fair, do not discriminate along sensitive attributes.
Be transparent, explainable, interpretable

What features, and by
what amount lead to the
prediction for a single
case?
What features, and by
what amount does the
model generally use to
make the predictions?
Explainability
(local scope)
Interpretability
(global scope)
Gaining the trust
of the user
Gaining the trust
of the analyst/regulator

The problem of fairness: error distribution
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
Discrimination or unfairness:
systematic over- or underestimation for members of a group

Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
FalseTrue
False
True
Reality
Prediction
FalseTrue
False
True
Reality
Prediction
Group A Group B
Population

Fairness metrics: not just for sensitive decisions

A critique by ProPublica
Overall, Northpointe’s assessment tool correctly predicts recidivism 61
percent of the time. But blacks are almost twice as likely as whites to be
labeled a higher risk but not actually re-ofend. It makes the opposite
mistake among whites: They are much more likely than blacks to be
labeled lower risk but go on to commit other crimes.

Results from ProPublica
● They calculated the False Positive and False Negative rate
These are not the deﬁnition of the FPR/FNR!

Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
False Negatives
True Positives + False Negatives
False Negative Rate =
Among the people who turned out to recidivate,
how many did we mark as “unlikely to reofend”?

Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
False Positives
False Positives + True Negatives
False Positive Rate =
Among the people who turned out to be innocent,
how many did we mark as “likely to reofend”?

Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
True Positives
True Positives + False Positives
Precision =
Among the people who we marked as “likely to reofend”,
how many did actually do so?
Precision is also known as Positive Predictive Value.

Results from ProPublica
With proper False Discovery Rate (1–Precision) & False Omission Rate calculations:
>
<
<
>

● There was a mistake in the calculation.
● Can we have parity for all metrics?

The impossibility of fairness metrics
In real-life situations where
● the rate of positive cases in the
groups are not equal and
● the model is not perfect,
it is impossible to satisfy all of these:
False Negative
Rate Parity
False Positive Rate
Parity
Positive Predictive
Value (precision)

No One True Fairness Metric
● Feature importance and explainability are not enough
● Decide on the metric first, then do the analysis
● Cannot prove correctness, but can shed light on problems

● Aequitas Bias report
● AI Fairness 360 by IBM
Tools for calculating fairness

Pre-processing
What can I do?
Fair models Post-processing
Use models which are
using a penalty for
unfair behaviour
Transform features in a
way so that the new
features are not
correlated with the
sensitive variable
Use different thresholds
for the different groups
It’s a hard optimization
task, and some existing
models cannot be
modified.
The newly created
features will be
meaningless.
Only makes corrections
at the marginal cases.

Transparency can help
● Audit decision-making systems
● Do not store or use sensitive data
● Rethink contract strategy – some algorithms are proprietary

● Mountains under white mist at daytime by Ivana Cajina, https://unsplash.com/photos/HDd-NQ_AMNQ
● Overitting by Chabacano – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3610704
● In an antique shop window by Pattie – https://www.flickr.com/photos/piratealice/3082374723
● Motorcycle wheel and tire by Oliver Walthard – https://unsplash.com/photos/mbL0XQy-d3o
● Woman wearing white and red blouse buying some veggies by Renate Vanaga – https://unsplash.com/photos/2pV2LwPVP9A
● Turned-on monitor by Blake Wisz – https://unsplash.com/photos/tE6th1h6Bfk
● https://worldvectorlogo.com/logo/the-washington-post, https://worldvectorlogo.com/logo/the-guardian-new-2018
● Shallow focus photo of compass by Aaron Burden – https://unsplash.com/photos/NXt5PrOb_7U
Image credits

Other Sources
● https://www.congress.gov/bill/116th-congress/house-bill/2231/text
● https://eur-lex.europa.eu/eli/reg/2016/679/oj
● https://www.theguardian.com/technology/2019/nov/10/apple-card-issuer-investigated-after-claims-of-sexist-credit-checks
● https://www.washingtonpost.com/health/2019/10/24/racial-bias-medical-algorithm-favors-white-patients-over-sicker-black-patients/
● https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
● Alexandra Chouldechova: Fair prediction with disparate impact: A study of bias in recidivism prediction instruments – https://arxiv.org/pdf/1610.07524.pdf

�ݺ�ߣ

Using fairness metrics to solve ethical dilemmas of machine learning

More Related Content

Using fairness metrics to solve ethical dilemmas of machine learning