This document discusses risks and mitigations when releasing data. It defines risk as the probability of something happening multiplied by the resulting cost or benefit. There are risks of physical harm, legal harm, reputational harm, and privacy breaches to data subjects, collectors, processors, and those releasing the data. Risk levels can be low, medium, or high. The document provides strategies for mitigating risks such as considering partial data releases, including locals to assess risks in local languages/contexts, and being aware of how data may interact with other datasets. It also discusses handling personally identifiable information by learning to spot red flags like names, addresses, exact locations, codes, and uncommon terms that could identify individuals.
1 of 20
Download to read offline
More Related Content
Sjt risks and mitigations of releasing data
1. Risks and mitigations of
releasing data
Risk analysis and
complexity in de-identifying
and releasing data.
Sara-Jayne Terp
RDF Discussion
2. First, Do No Harm
“If you make a dataset public, you
have a responsibility, to the best
of your knowledge, skills, and advice, to
do no harm to the people connected to that dataset.
You balance making data
available to people who can do
good with it and protecting the
data subjects, sources, and
managers.”
2
4. RISK
“The probability of something happening
multiplied by the resulting cost or benefit
if it does” (Oxford English Dictionary)
Three parts:
?Cost/benefit
?Probability
?Subject (to what/whom)
4
5. Subjects: Physical
5
“Witnesses told us that
a helicopter had been
circling around the
area for hours by the
time the bakery opened
in the afternoon. It
had, perhaps, 200
people lined up to get
bread. Suddenly, the
helicopter dropped a
bomb that hit a building
11. Risk to Whom?
? Data subjects (elections example)
? Data collectors (conflict example)
? Data processing team (military equipment example)
? Person releasing the data (corruption example)
? Person using the data
11
14. PII
“Personally identifiable information?(PII) is any data that
could potentially identify a specific individual. Any
information that can be used to distinguish one
person from another and can be used for de-
anonymizing anonymous data can be
considered?PII.”
14
15. Learn to spot Red Flags
? Names, addresses, phone numbers
? Locations: lat/long, GIS traces, locality (e.g. home +
work as an identifier)
? Members of small populations
? Untranslated text
? Codes (e.g. “41”)
? Slang terms
? Can be combined with other datasets to produce
PII
15
16. Consider Partial Release
Release to only some groups
? Academics
? People in your organisation
? Data subjects
Release at lower granularity
? Town/district level, not street
? Subset or sample of data ‘rows’
? Subset of data ‘columns’
16
17. Include locals
Locals can spot:
?Local languages
?Local slang
?Innocent-looking phrases
Locals might also choose the risk
17