One definition of investigative journalism – as given in UNESCO’s investigative journalism manual – says that it is about exposing the truth about public interest issues, issues whose details are kept under wraps (deliberately or otherwise) by the people involved.
Using this definition, data and the stories extracted from it can constitute a central pillar of investigative journalism. Data constitutes the ‘raw material’ that a journalist uses to cast light into darkness, clear up ambiguities and solve apparent contradictions in their story. Investigative journalism and data-driven journalism have a big overlap.
They both engage in in-depth research and sift through information to exclude any impurities (fake news or misleading data). Data thus plays an important role in the various stages of an investigation, including how it is presented within the story.
You should state the importance of the data clearly within your report. You should also be careful to distinguish data from facts: data does not necessarily mean fact. There may be biases in the way data is collected, and you should always be careful to test your data and establish how it is linked to the incident you are investigating.
When you first start working on an investigation, you should look up the data already available - whether official or unofficial - in order to answer as many of your questions as possible before moving on to posing questions whose answers are unknown and coming up with hypotheses and possibilities.
A story begins with looking up data. You will need to have a way of collecting and presenting data, which is exactly what data-driven journalism provides. Data will help make your story measurable. It will allow you to render your "hows" as measurable "how muchs", allowing the reader to more clearly see the scope of the problem alongside the added value of any new information that you obtain from private sources of data or from data not initially included.
Let’s take a look at the different stages involved in data-driven journalism:
Stage 1: Looking for sources
First of all, review the available open source data relevant to the investigation, making sure – before doing anything else – that you are familiar with the frequency with which it is made available. You should also review any private, verifiable sources of data that you or your employer might gain access to.
Open source data includes reports from the World Bank, the World Health Organisation (WHO) and the Food and Agriculture Organisation (FAO), annual government statistical reports, and social media websites.
Stage 2: Handling the data
There are several database programmes that may be of use in handling data:
• Microsoft Excel (spreadsheets)
• OpenRefine (data refining)
• Fusion Tables (verification)
• MySQL (databases)
• SOLR and Access (databases)
There are several new techniques from data-driven journalism that may be of assistance in investigations:
• Analysing data taken from the social media profiles of perpetrators or influential people. This can help you tease out lines of investigation or access information from non-traditional sources (Donald Trump’s tweets about a particular incident, for example, predating his presidency by many years).
• Analysing audience reactions to prominent public issues.
• Accessing historical data relevant to your investigation. For example, working out the dates on which something happened can provide you with new ways of understanding present problems (the date of a famine in a particular country with chronic water supply problems...)
• Working out where something happened (a military operation in a particular country, for example) or a photo or video clip was taken. You can use data that has been deleted using archiving tools like Internet Archive or the Wayback Machine.
When refining large quantities of data, you can analyse and compare using a particular chronological or geographical filter. This can give your story new dimensions that may not have been immediately clear. If you go deep into the data, you may even find new stories.
In 2011, the Guardian was able to establish who was responsible for looting during rioting that had taken place across the UK in August 2011. The Reading the Riots project, conducted in cooperation with LSE, was heavily data-driven.
The Panama Papers project drew on more than 11.5 million documents making up 2.6 terabytes of data dating from 1977 to 2015 and concerning about 214,000 corporate entities.
The International Consortium of Investigative Journalists (ICIJ) incorporated the data into a database that makes sifting through it and searching it much easier.
The Paradise Papers project, which likewise incorporates about 13.4 million documents obtained by the Suddeutsche Zeitung and showing how the world’s super-rich invest their money (ICIJ)
Stage 3: Analysis
After collating and refining the data, there are several methods you can use to analyse the data:
• Descriptive analysis: answers the questions “what?”, “who? “, “how”, “where” and “when?”
• Diagnostic analysis: answers the question “why?”
• Advanced analysis making predictions about future scenarios. A successful example is provided by Noun Post’s report Golden Generals.
Stage 4: Preparing a data-driven investigation
When putting together your story, there are various tools you can use to present data in way that is easier to understand:
• Charts and graphs
• Interactive maps
Tools that may be of interest include Tableau Public and Many Eyes, which will allow you to present data visually in a range of different ways, and Geocommons and Google Fusion Tables, which will allow you to produce maps using coordinates. The AJMI has produced a guidebook to data-driven journalism that provides detailed instructions on how to go about doing this.
Saving data and documents
There are various programmes you can use to store data:
Google Drive: Google Drive is associated with your personal email. It can be used as a digital memory folder allowing you to save data. You can also work on it directly, whether through the Google Docs interface or through a Google Sheet (Excel).
Xperia Companion: Download this programme to produce backup copies of your data. It allows you to transfer files easily from one device to another and store it safely.
Dropbox: Dropbox allows you to keep your files safe in a cloud folder. You can then access them wherever you are in the world.
Verifying open source material
Traditional methods may seem like a better bet when trying to expose difficult facts, but the development of advanced techniques for gathering news from open source and user-generated content is playing an ever-bigger role in investigative journalism.
In 2018, the BBC conducted an investigation in Cameroon which proved that contrary to what many had believed, government forces had been committing war crimes against civilians. The investigation took months of research and drew on a video clip taken with a mobile phone camera and published on social media showing armed men assaulting and then executing two women and two children. The video clip was verified and analysed scientifically.
The armed men, the place where the incident took place and the type and source of the weapons used were all identified. By comparing Google Maps with the crime scene, the team were able to prove that not Boko Haram but government forces had carried out the executions, and that they had taken place not in Mali but in Cameroon. We will look at some of the details of the incident, and the digital tools used to analyse it, in more detail later on. In Sudan, BBC journalists were able to collect and review more than 300 videos shot by activists on the ground, allowing them to reconstruct a scene showing that the Rapid Response Forces had fired live ammunition on protesters in July 2019.
In both of these cases, journalists were able to dispense with traditional methods and with teams on the ground while conducting their investigations. Thousands of videos shared on social media websites, carefully verified, were the deciding factor in the investigation. Fact-checking began with the investigation of fabricated photos or decontextualised video clips.
But these techniques have got better and their use more sophisticated, creating a space for a new type of open source journalism. These modern techniques can be combined with traditional techniques to produce high-quality investigations.
An earlier version of this article first appeared in the AJMI publication, Investigative Journalism Handbook