Tuesday, 21 April 2020 By Ronald Holtshausen and Helena Tran Our role as independent Forensic Data Analysts is to stand aside and let the data tell the story. In doing so, it’s our goal to ensure the story is true, impartial and in context. It’s the last part, ‘in context’, that is really important because although data doesn’t lie, it can be skewed to tell a variety of truths. The COVID-19 pandemic is frightening, and is impacting all our lives, not to mention awakening our natural curiosity to learn more about something we just don’t quite yet understand. This drive to seek out, digest and consume as much information as possible should not be confused with being informed. In this new world where data is always at our fingertips, information is one thing, but using that information to generate insights requires context. For example, on a day to day basis we are confronted by new buzz words and statistics in the media such as ‘cases’, ‘tests conducted’, ‘patients recovered’ and ‘deaths’. Out of that list, it’s the ‘cases’ figure that many of us (and the media) are looking at in determining how countries (including our own) are managing, containing and recovering from the virus. But considering these statistics out of context can lead to panic, as seen already, so comprehending the statistics in context of how many patients have been tested, compared to the country’s population and density is crucial. The effectiveness of countries to contain the virus, with context To illustrate this, we have selected five countries and initially charted their number of confirmed cases (Figure A) and then compared that to charts using additional contextual data (as at 15 April 2020). Cases per country Looking at just the number of cases per country, a simple visual comparison quickly suggests that the United States is significantly less effective than other countries at managing the spread of the virus, which is a common media headline. Cases as a percentage of the population However, introducing some contextual data regarding a country’s population and calculating the number of cases as a percentage of the population (Figure B), we can see significant changes to our initial rankings – in which the United States now moves to third (of our five examples) and Iceland moves from appearing the most effective to the least effective at managing the virus. Cases as a percentage of COVID-19 testing conducted We can further extend this comparison by including another contextual data set regarding the extent of COVID-19 testing conducted. Using this we can calculate the number of confirmed cases as a percentage of those tested (Figure C). Interestingly, we now see the United Kingdom as the least effective despite having fewer cases than that of the United States and Italy. The number of COVID-19 tests conducted in a country can also provide us with a level of confidence over how well we feel the reported cases represent a country’s population as a whole. Tabled below (Table 1) is the percentage of each country’s population tested for COVID-19. As seen above, Iceland has done the most widespread testing of the five, providing us with the highest level of confidence that the reported cases are reflective of the population as a whole. To illustrate this further, we have taken the same chart as Figure B and coloured each country according to our proposed confidence level from Green – Confident – to Red – not confident (Figure D) Population density Finally let’s consider a final contextual data set, population density. Population density is relevant given the high apparent person-to-person transmission rate of the virus (i.e. the closer people are to one another the easier it may be for the virus to spread). Taking into account the population density (Table 2) and considering our analysis of cases as a percentage of tests completed (in Figure C), we note the United Kingdom, Italy, Iceland and Australia all show a close correlation to their population density. The exception being the United States which seems to be battling to suppress the virus despite having a significantly lower population density than that of the United Kingdom and Italy. Again, putting this in to context, a potential reason for this may be explained by New York, which represents approximately a third of all cases in the United States (as at the writing of this article), where Manhattan has a (staggering) density of 27,346 people per square kilometre. Conclusion As seen above, comparing countries purely by the number of cases or a single statistic alone does not tell the whole story. Through this article and the simplified analysis conducted, we have seen the same data suggest three different countries (the United States, Iceland and the United Kingdom) as being the least effective at dealing with the spread of COVID-19. The above may also not surprise you given the extent to which COVID-19 has been covered by the media, but what if the topic was not COVID-19. What if the data related to a corporate dispute, legal case or class action regarding an oil spill, warranty claims on motor vehicles, construction dispute or any more specific and esoteric issue where the media coverage or context may not be so public or well understood. It is in those cases that an independent Forensic Data Analyst can help utilise a forensic analysis methodology, technology and analytic algorithms to let the data tell its story without bias, referencing other data sources to contextualise the findings and find insights that would otherwise be hidden. Sources • https://www.worldometers.info/world-population/population-by-country/ • https://ourworldindata.org/covid-testing • https://coronavirus.jhu.edu/map.html • https://worldpopulationreview.com/boroughs/manhattan-population/ • https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200415-sitrep-86-covid-19.pdf