The power of ”information”​


Always been a fan of graphs and visualization and during the last months I have certainly seen my share of them. It seems everyone is now a statistician or data scientist, building regression models and own predictions of the pandemic. With the data available you too can become a homemade corona statistician. However, think before you start data crunching and end up visualizing information in the same way as everyone else: days since something on one axis and count of something on the other.

What data are we actually converting to “information”?

The underlying data that most analysis are relying on, is the confirmed cases. Isn’t the count of confirmed cases in relation to how eager a country is to test people and confirm a case? In some countries they are eager to test and in some countries not. There are speculations on the actual count of cases actually being 25-30 times bigger. What! Then what is the figure we are actually monitoring on a daily basis… I guess this applies to everyday problems of analysis, how reliable and consistent is the data and why does the viewer not care about its validity.

Comparing countries and regions against each other, using these count figures, seems to be the typical way to display corona ”information”. Although I like absolute figures, I am more of a relative measure man myself. I find it hard to figure out the problem state of Japan when being compared to Diamond Princess (confirmed cases March 11th: Japan 639 vs. Diamond Princess 706). Or, in what relation should I compare Iran to New York. Logarithmic scales with the coefficient being the doubling speed is a good idea. This at least allows for some comparability of the situation.

What could be a relevant measure?

A more relevant data set to use could be the count of hospitalized people due to corona. One measure could be the percentage of people in the region or country that are hospitalized. A ratio compared to inhabitants, therefore comparable over countries. Combining this information with care taking capacity, namely suitable ventilators, could be the next step. Have already seen some figures on ventilators per capita, which was certainly interesting. Clear differences between the starting point regarding this ratio, as some countries are clearly better prepared (when it comes to equipment at least). As almost everyone globally are now in the need for more ventilators simultaneously, the global supply chain is choked. It seems the country with the most domestic production capacity will be best off to satisfy their needs. Note that therefore the number on ventilators is gradually increasing, which should be considered in the measure.

Current track record show that 20-30% of the hospitalized require intensive care. Out of those 50-70% will require artificial breathing aid. If I do the math correctly, it should mean out of the hospitalized 10-20% require ventilators. Therefore measuring the count of how many are hospitalized (not the assumed confirmed cases), multiplying that with 10-20% and comparing that figure against available ventilators, should tell how severe the situation is per region/country. Even though I am all for transparency, I am not sure I would like to see that number for my country or region…

Power of analysis and visualization

I am grateful for the sheer amount of publicity the graphs are receiving. It has been a great way to spread the news that we are facing a tragic pandemic. Even though the underlying data might be misleading, most certainly this information have triggered rapid actions for governments and death numbers even trigger doubtful people to believe in its evident threat. These times have been an education on data analytics and the power of visualization to many. Even now we are starting to see the first positive global pollution heat maps as a result of the reduced travel and manufacturing in China – go China!

Would however predict that economics will triumph over environment, and soon we will go from viewing ’confirmed cases and deaths’ to aftermath analysis on economical consequences such as ’number of bankruptcies’. When those analysis start popping up, I hope to see the information in ratios, not in plain figures. I also hope the underlying data which is to be used for the analysis to be more valid, e.g. ‘companies bankrupt’ instead of ‘companies confirmed to be in financial distress’.