Bias in algorithms is not only hurting performance or accuracy. We will discuss cases where machine learning based decision support systems had learned our racial/gender or other biases (which are implicitly embedded in everyday data), and thus re-created or even amplified unfair, discriminating or non-inclusive behavior.
These cases are pointing out that transparent processes and interpretability are not only tools for understanding and sanity checking the inner logic and reasoning of a machine learning system. They are crucial to build trust and to protect our social infrastructure from erosion caused by hidden unfair, unethical models.
We will review metrics designed to measure algorithmic fairness and discuss their practical and theoretical limitations to remind ourselves, that ethical behaviour need to be assessed case-by- case, not only with technical tools but also with human empathy.
1 of 30
Download to read offline
More Related Content
Using fairness metrics to solve ethical dilemmas of machine learning
1. Using fairness metrics to
solve ethical dilemmas of
machine learning
Kov叩cs L叩szl坦
laszlo@peak1014.com
3. Trust & Fairness
Machine learning systems are not inherently fair
Data requirements from the Probably Approximately Correct
learning theory: plenty of representative data
Bias in bias out
Algorithms are objective, success criteria are not
5. The letter of the law
European Union: Policy and investment recommendations for trustworthy Artificial
Intelligence (guideline)
...
5. Measure and monitor the societal impact of AI
28. Consider the need for new regulation to ensure adequate protection from adverse impacts
12. Safeguard fundamental rights in AI-based public services and protect societal infrastructures
United States of America: Algorithmic Accountability Act of 2019
A bill to direct the Federal Trade Commission to require entities that use, store, or share personal
information to conduct automated decision system impact assessments (ADSIA). [... an ADSIA] means a
study evaluating an automated decision system and the automated decision systems development process,
including the design and training data of the automated decision system, for impacts on accuracy, fairness,
bias, discrimination, privacy
6. The spirit of the law
Be fair, do not discriminate along sensitive attributes.
Be transparent, explainable, interpretable
7. What features, and by
what amount lead to the
prediction for a single
case?
What features, and by
what amount does the
model generally use to
make the predictions?
Explainability
(local scope)
Interpretability
(global scope)
Gaining the trust
of the user
Gaining the trust
of the analyst/regulator
10. The problem of fairness: error distribution
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
Discrimination or unfairness:
systematic over- or underestimation for members of a group
11. Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
FalseTrue
False
True
Reality
Prediction
FalseTrue
False
True
Reality
Prediction
Group A Group B
Population
16. A critique by ProPublica
Overall, Northpointes assessment tool correctly predicts recidivism 61
percent of the time. But blacks are almost twice as likely as whites to be
labeled a higher risk but not actually re-ofend. It makes the opposite
mistake among whites: They are much more likely than blacks to be
labeled lower risk but go on to commit other crimes.
17. Results from ProPublica
They calculated the False Positive and False Negative rate
These are not the de鍖nition of the FPR/FNR!
18. Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
False Negatives
True Positives + False Negatives
False Negative Rate =
Among the people who turned out to recidivate,
how many did we mark as unlikely to reofend?
19. Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
False Positives
False Positives + True Negatives
False Positive Rate =
Among the people who turned out to be innocent,
how many did we mark as likely to reofend?
20. Fairness metrics
NegativePositive
Neg.
Pos. True Positive
False Negative
False Positive
True Negative
Reality
Prediction
True Positives
True Positives + False Positives
Precision =
Among the people who we marked as likely to reofend,
how many did actually do so?
Precision is also known as Positive Predictive Value.
21. Results from ProPublica
With proper False Discovery Rate (1Precision) & False Omission Rate calculations:
>
<
<
>
22. There was a mistake in the calculation.
Can we have parity for all metrics?
23. The impossibility of fairness metrics
In real-life situations where
the rate of positive cases in the
groups are not equal and
the model is not perfect,
it is impossible to satisfy all of these:
False Negative
Rate Parity
False Positive Rate
Parity
Positive Predictive
Value (precision)
24. No One True Fairness Metric
Feature importance and explainability are not enough
Decide on the metric first, then do the analysis
Cannot prove correctness, but can shed light on problems
25. Aequitas Bias report
AI Fairness 360 by IBM
Tools for calculating fairness
26. Pre-processing
What can I do?
Fair models Post-processing
Use models which are
using a penalty for
unfair behaviour
Transform features in a
way so that the new
features are not
correlated with the
sensitive variable
Use different thresholds
for the different groups
Its a hard optimization
task, and some existing
models cannot be
modified.
The newly created
features will be
meaningless.
Only makes corrections
at the marginal cases.
27. Transparency can help
Audit decision-making systems
Do not store or use sensitive data
Rethink contract strategy some algorithms are proprietary
29. Mountains under white mist at daytime by Ivana Cajina, https://unsplash.com/photos/HDd-NQ_AMNQ
Overitting by Chabacano Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3610704
In an antique shop window by Pattie https://www.flickr.com/photos/piratealice/3082374723
Motorcycle wheel and tire by Oliver Walthard https://unsplash.com/photos/mbL0XQy-d3o
Woman wearing white and red blouse buying some veggies by Renate Vanaga https://unsplash.com/photos/2pV2LwPVP9A
Turned-on monitor by Blake Wisz https://unsplash.com/photos/tE6th1h6Bfk
https://worldvectorlogo.com/logo/the-washington-post, https://worldvectorlogo.com/logo/the-guardian-new-2018
Shallow focus photo of compass by Aaron Burden https://unsplash.com/photos/NXt5PrOb_7U
Image credits
30. Other Sources
https://www.congress.gov/bill/116th-congress/house-bill/2231/text
https://eur-lex.europa.eu/eli/reg/2016/679/oj
https://www.theguardian.com/technology/2019/nov/10/apple-card-issuer-investigated-after-claims-of-sexist-credit-checks
https://www.washingtonpost.com/health/2019/10/24/racial-bias-medical-algorithm-favors-white-patients-over-sicker-black-patients/
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Alexandra Chouldechova: Fair prediction with disparate impact: A study of bias in recidivism prediction instruments https://arxiv.org/pdf/1610.07524.pdf