Prezentacja Anny-Lisy Bouyer (Journalism++) na konferencji Digital Journalism Day (21 maja 2015) w SWPS
1 of 24
Download to read offline
More Related Content
Anna-Lise Bouyer - Data-driven agencies for visualisation
1. Anne-Lise Bouyer
COO & project manager of Journalism++
Data-driven agencies
for visualisation
@annelisebouyer I 2015
2. Anne-Lise Bouyer
COO and project manager of Journalism++
Anne-Lise Bouyer I 2015
We are an agency for data driven storytelling
We accompany newsrooms in their transitions towards
the web of data
We are building tools to help journalists to work with data
9. // Body counting
No NGO or public body had an answer.
The Greek administration counts the number
of migrants who cross.
But not the number of migrants who do not
make it.
10. // Body counting
We gathered all available information
mentioning migrants and refugees who died on
their way to or in Europe.
Lists, news archives, PDF files, databases...
14. // Body counting
After deduplication, we had structured
information about approximately 2,800 events.
We then geocoded them using a simple
heuristic.
19. // Body counting
With this information, we could assess the
dangerosity of each migration route.
The wall made the journey to Greece 10 times
more dangerous for refugees.
21. // Costs / Benefits analysis
Data-driven projects take time.
Investment in time is huge and is best split
across a team.
For each story, you still need to analyze the
collected data.
22. // Costs / Benefits analysis
Investment in time is largest at the beginning.
With time, information flows to you and your
listening capabilities increase (social media
searches, Google alerts).
23. // Costs / Benefits analysis
Investment in time is largest at the beginning.
With time, information flows to you and your
listening capabilities increase (social media
searches, Google alerts).
#7: In early 2013, together with a group of 10 European journalists, we wanted to assess the impact of this wall. Its a 10-kilometer long, 4-meter high razor-wire wall between Greece and Turkey. Almost the whole border is made of the Evros river, this strech of land was where most refugees crossed into Greece and then into the Europen Union. The wall was built in December 2012.
#8: OSINT = open source intelligence, i.e use information that was already published. Its a term from intelligence, which is opposed to HUMINT (humans gathering information, the equivalent of shoe-and-leather journalism) and SIGINT (signal intelligence, which you cannot do in journalism unless you are breaking the law).
#12: This PDF is from United Against Racism, an NGO that aggregates data on the subject. But it is very badly structured. You can see that the dates dont have a common formats, as well as the name column, which contains information about gender and age as well. Some events are spread on 2 lines, others on just one. There are also definition problems: United Against Racism counted foetuses as victims, for instance.
#13: There is a project to automatically extract this kind of information from news reports: PULS, by the university of Helsinki. It crawls over 80,000 news sources in 8 languages almost in real time and extracts what it considers relevant information from each, in a Big Data fashion. It was also the least useful data source. The duplicates, the errors and the missing data points made it useless.
#14: Here is an example of why its hard to automate data gathering on such a topic. First of all, most of the articles were not in English. We had to work with all the languages of the Mediteranean and Europe. Then, information can be hidden. Interest for refugees dying varies greatly in time. Sometimes, youll have 20 articles for a single death, sometimes (most of the time), no article at all. Here, a short blurb from 2001 says that one migrant died and another died, too, a few weeks before (bolded sentence).
#15: The simple heuristic we used was to extract all geographical information from the data in one list (the list contained words like Great Britain, Evros river, French etc.) We ordered the list from least precise descriptor (an adjective of a country) to most precise (a city). We assigned the location of each event based on the most precise point we had for it. This very fast method proved 90% effective.
#20: The number of people who died purely because of the wall is around 50. The extra number of people who crossed by sea after the wall was build was about 5,000 (there were 5,000 crossings by sea in 2012 and 10,000 in 2013). With a mortality rate of 1%, that means 50 people died.
#21: We published the story with a team of journalists throughout Europe and pulished the data. It is now used by the International Organization of Migrations, which used it as the basis for their recent report. In other words, our OSINT data is now the official data.
#22: The data in itself is not valuable, you still need to make a story out of it.
#23: Investment is larger at the beginning because, as time goes by, whats needed is simply to update the database. With a google alert, it takes just a few minutes to structure the information about an event in the database.
For the Migrants Files, it took about 100 hours to compile the data for everything until 2013. Once we were running, in 2014, it was quicker to add data. It was still valuable: we showed that the Mare Nostrum operation, by which the Italian navy tried to fetch every single boat leaving Libya, went from a great success to a disaster between May and October.
#24: Investment is larger at the beginning because, as time goes by, whats needed is simply to update the database. With a google alert, it takes just a few minutes to structure the information about an event in the database.
For the Migrants Files, it took about 100 hours to compile the data for everything until 2013. Once we were running, in 2014, it was quicker to add data. It was still valuable: we showed that the Mare Nostrum operation, by which the Italian navy tried to fetch every single boat leaving Libya, went from a great success to a disaster between May and October.