Lets explore how statistics can trick us into believing something thats not true. This isnt always done on purpose. Well look at how focusing on recent events, choosing specific data to look at, and making assumptions about the size of a group can lead us to the wrong conclusions. Well show of how graphs and numbers can be used in misleading ways. The presentation aims to teach you how to look at statistics critically, understand their limits, and avoid fooling yourself with numbers.
1 of 34
Download to read offline
More Related Content
OSMC 2024 | The Subtle Art of Lying with Statistics Dave McAllister.pdf
1. The Subtle Art of Misleading
withStatistics
Dave McAllister
Sr. Open Source Technologist NGINX
4. 息2024 F5
4
Evidence-Based Decisions
PerformanceTracking
Resource Allocation
Identifying Patterns
Setting Realistic Goals
ThePowerofStatistics
Its a data-driven world
It is the mark of a truly intelligent person to be moved by statistics. - George Bernard Shaw
Data Interpretation and Decision-making
PerformanceMetrics and KPIs
Cultural Impact
Risk Management
5. 息2024 F5
5
Misinterpretation
Cherry-Picking Data
Correlation vs. Causation
Visualization
Ignoring Margin of Error
Spurious Precision
Survivorship Bias
ButtheNumbersGame
isJustThat,aGame
Figures don't lie, but liars can figure. - Darrell Huff
6. 息2024 F5
6
The Meaning of Averages
#1:SayWhatYouMean
Your KPI for downtime is 1 hour per month trailing.
Which one wouldyou rather report?
1.56 Hours/month or 0.53 Hours/month
It is now proved beyond doubt that smoking is one of
the leading causes of statistics. Fletcher Knebel
7. 息2024 F5
7
The Meaning of Averages
#1:SayWhatYouMean
Your KPI for downtime is 1 hour per month trailing.
Which one wouldyou rather report?
1.56 Hours/month or 0.53 Hours/month
Current average income perperson in US = 59,384 USD
Average household size in US = 2.63 people
Household incomeobviously = $156,179.92
(Its not)
To earn a million dollars, you just need to have 14.8 kids, right?
It is now proved beyond doubt that smoking is one of
the leading causes of statistics. Fletcher Knebel
8. 息2024 F5
8
#2:NeverDeviate
A statistical point without range is meaningless
The mean temperature is 61F/16.1C
In Death Valley, this is in a range of 15F - 104F
A statistical distribution without expressing deviation is
worthless
Sample response times from server
Sample A: [100ms, 105ms, 110ms, 102ms, 108ms]
Sample B: [50ms, 100ms, 150ms, 90ms, 160ms]
Std Deviation A = 3.3ms
Std Deviation B = 44.8ms
In ancient times they had no statistics so they had to fall back on lies.
Stephen Leacock
Normal Distribution
Paranormal Distribution
How Sample Size Affects Standard Error - dummies
9. 息2024 F5
9
TANSTAAFS (with apologies to Robert Heinlein)
#3:SampleforYourDesiredResults
Results are only as good as thesample data.
But it can be really convincing
Size matters (4 out of 5 SREs prefer OpenTofu)
Bias is the hidden trap
If you cant find the bias, look harder
If yousample only iPhone users, Androiddoesnt exist
So is access
If yousample in a first class airline lounge, well
Questions (and interviewers) can be their own bias
Accuracy of observation is the equivalent of accuracy of thinking.
Wallace Stevens
10. 息2024 F5
10
#4:DontFearProvingYourPredictions
Cognitive bias
Selection bias
Recency bias
Status Quo bias
If there have been no recent outages or incidents
Perception that the system is very stable
Complacency
Inadequate preparation for potential failures
The type of measure used placed constraints on which statistics can be used. Stanley Smith Stevens
11. 息2024 F5
11
#5:DontSweattheBias
Youare trialing a new deployment tool for speed
and reliability
Yousample via survey the senior engineers who
were invited to try the new tool
1.
Do therespondents cover the entire group?
2. Confirmation Bias: Is the leading teamwording
questions to get skewed answers
3. Recency Bias: Is the last experience what is
being reported?
4. Status Quo Bias: Is change scary?
Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar
Self-selection Bias:
12. 息2024 F5
12
#5:DontSweattheBias
Youare trialing a new deployment tool for speed
and reliability
Q1: How happy are you with the new and improved
tool?
Q2: Would you agree that the new tool has
streamlined your deployment?
Q3: How beneficial has the new tool been?
1. Self-selection Bias: Do the respondents cover
the entire group?
2. Confirmation Bias:
Is thewording of questions leading to skewed
answers?
3. Recency Bias: Is the last experience what is
being reported?
4. Status QuoBias: Is change scary?
Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar
Confirmation Bias:
13. 息2024 F5
13
DevOpsSurveyandBias{Tape|Tires|Ply|Cut}
Youare trialing a new deployment tool for speed
and reliability
Q1: How happy are you with the new and improved
tool based on your most recent deployment?
Q2: How much did thetool help in your last
deployment?
Q3: Most recently, how beneficial has the new tool
been?
1. Self-selection Bias: Do the respondents cover
the entire group?
2. Confirmation Bias: Is the leading team wording
questions to get skewed answers
3. Recency Bias:
Is thelast experience what is being reported?
4. Status Quo Bias: Is change scary?
Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar
Recency Bias:
14. 息2024 F5
14
DevOpsSurveyandBias
Youare trialing a new deployment tool for speed
and reliability
Q1: How disruptive was the new tool?
Q2: Isthe existing process more reliable?
Q3: How much effort did you spend on using the
new tool?
1. Self-selection Bias: Do the respondents cover
the entire group?
2. Confirmation Bias: Is the leading team wording
questions to get skewed answers
3. Recency Bias: Is the last experience what is
being reported?
4. Status Quo Bias:
Is change scary?
Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar
Status Quo Bias:
22. 息2024 F5
22
#8:ObviouslyIfBFollowsA,thenACausesB
Those who use 'Correlation is not the same as causation' as a magic incantation to dismiss all fact-using professions are
fools holding a lit match in one hand and an open gas can in the other, screaming, 'One has nothing to do with the other!'
You want to prove you have too much automation.
Your DevOps team notices that as the number of
deployment automation scripts increases, the
number of incidents in production increases.
25. 息2024 F5
25
Data Dredging and p-hacking
#9:Find(orMake)thedatayouwant
Data dredging
The practice of searching through large volumes of data to find
patterns or relationships that can be presented as statistically
significant
P-hacking
Manipulating the data analysis process to achieve statistically
significant results.
Instead of starting with a hypothesis and testing it, I instead abused the data to see what correlations shake out. Its a dangerous way to
go about analysis, because any sufficiently large dataset will yield strong correlations completely at random. Tyler Vigen
28. 息2024 F5
28
Initial source : Public Religion Research Institute.
Publish date 01-Nov-2021
(they did a good job,btw)
Total Sample : 2508
People trusting far-right sources completely: 90
People trusting Fox News completely: ~66
So at 82%, 54 people out of 66 believe
Or 54 out of 2508
2.15%
#10:PostHoc,ErgoPropterHoc
Afterthis,thereforebecauseofthis
29. 息2024 F5
29
The answer is pointless without the question>
What do you get if you multiply 6 X 9?
Which of course is 4213
30. 息2024 F5
30
Statistics are informative, enlightening and pretty cool.
But always apply the ultimate test:Does this make sense?
Look out for:
Conscious and Unconscious bias
Data Dredging
Unclear or incomplete information
Confounding variables
Pretty pictures
ASimpleWaytoMislead?
Numbers dont lie. People do. Hugh Baker
32. 息2024 F5
32
Somesuggestedreading
Statistics for the Rest of Us Albert Rutherford
How to Lie with Statistics DarrellHuff
LyingNumbers Hugh Barker
Naked Statistics Charles Wheelan
Statistics for Dummies Deborah J Rumsey
Standard Deviations: Flawed Assumptions, Tortured Data, and Other
Ways to Lie with Statistics Gary Smith
Misleading Statistics Real Life Examples Of Data Misuse (datapine.com)
Spurious Correlations (tylervigen.com)
A History of Bayes' Theorem
Statistics are history. And history is always written by the winners.
Dave McAllister
33. 息2024 F5
33
NGINXandOpenSource
We will continue to offer our open source projects under OSI-
approved licenses
We will not remove features from our open source
technology to move them to a paid feature
We will not impose artificial limits on the use of our open
source projects
We will commit to consistency and transparency in our
acceptance of contributions