What is the one COVID-19 graph everyone should follow?

It seems the only thing multiplying faster than COVID-19 is the data and analysis about it. How can we cut through the noise and focus on the crucial facts? 

Check out our Q&A below with Webster Gova, Umuzi's head of data science, as he introduces his favourite analysis we should all be following to track how quickly COVID-19 is spreading, and whether we can lift the lockdown, and safely return to places of work and learning.

If you enjoy Webster’s analysis and want to go deeper, check out his Medium post on COVID-19 visualisations.

If you’d like to learn more about becoming a data scientist with Umuzi, check out Umuzi’s data science careers page as well as our great list of data science self study resources.

Question: If you had to pick just one COVID-19 analysis or graph for us to follow, which would it be and why?

Webster: We would like a graph that does not just show us how many people are infected, but can show us how fast the virus is spreading at any given point in time at local communities where infections are detected like districts and municipalities. This will help us understand in real time what interventions besides lockdown can help slow down the spread of the virus, instead of total lockdowns.

Question: Help us understand this graph. The horizontal axis is time and we can see the key dates when South Africa implemented travel restrictions and a general lockdown. The vertical axis is some measure of how COVID-19 is spreading, R(t). What is R(t) and why do you think we should pay attention to it?

Webster: The people who study the spread of infectious diseases, epidemiologists, like to talk of an R0 value, which shows how quickly an infectious disease spreads. If a virus has an R0 of 1, every person infected will, on average, infect 1 additional person. An R0 greater than 1 means more than 1 person will be infected, and suggests that the virus spreads rapidly, quickly infecting everyone in the population. A value below 1 suggests that it would spread more slowly, and disappear before everyone gets infected. R0 is a static figure which allows us to compare the infectiousness of different diseases. 

The Rt is the change in R0 in real time, also known as the effective reproductive number (Rt), or the reproductive number at time t. Rt shows how the infectiousness of a disease changes over time, depending on the circumstances. The Rt helps us identify sources of new infection earlier and the actual local transmission rate for those infections at any given moment.

The coronavirus spreads really quickly and takes over a week before we know a person is infected while they infect other people. We are implementing various measures to try to slow its spread and thus need a graph that can show us how a change in behaviour and testing in different places impacts the spread of the virus in real time. 

This graph shows the Rt for South Africa. The local spread of the virus has been declining over time after initial cases were detected from travellers that had returned from high risk countries. In my opinion, those initial infections are the ones that spread very quickly since they could only be detected 10 days into the infection. As a result of the government’s quick response to test travelers, and limit new infections, local transmission slowed down even before the lockdown. The graph shows how the rate at which the infection spread almost remained constant after flight restrictions as no new infections were being introduced from outside South Africa. After the lockdown started, this graph suggests that infections in local communities have been minimal since those infections detectable in the first week of April have not increased as much as they were before the lockdown. It suggests that testing, contact tracing, social distancing, self-isolation and other social behavioural changes before and after the lockdown have played a huge role in reducing the spread of the virus in South Africa.

Question: Why does the graph show three different levels of R(t) - high, mid, low? 

Webster: The three levels are just levels of statistical uncertainties due to some estimations made in the calculation from the possibilities of incorrectness of the data reported. The most likely value is the average, while the “low” and “high” values are the “best case” and “worst case” scenarios for estimations of the value, respectively.

Question: When is it safe for our communities to lift lockdown and when is it safe for us to return to places of work and learning?

Webster: The shortest answer is when the Rt value has dropped below 1, but we all know that might take a long time. We must remain vigilant, or the Rt could shoot up after lockdown is lifted. We need to accept that the coronavirus will remain in our population for a long time to come, we simply need to avoid situations where we unnecessarily put vulnerable people at risk by multiple re-introductions from work, recreational centers, schools and churches. In my humble opinion, I would say that those working in environments where social distancing is possible can return to work and continue practicing all the necessary precautions needed to slow down the spread of the virus. Otherwise, the risk of transmission still exists. Taking precaution, getting tested and self-isolation have proven to be reasonably effective at slowing the spread.

Question: If we want to learn more about your COVID-19 analysis, where can we find it?

Webster:  I will be doing some posts as soon as any changes in my analysis come up on my LinkedIn, Medium and I will have some charts on Datawrapper.

Question: How did you become a data scientist and what advice do you have for anyone wanting to explore this career path?

Webster: I was working as a chemical analyst at Plascon, and my then manager encouraged me to attempt a problem related to multivariate analysis in some chemical samples, better known at time as chemometrics. This basically means analysing more than 2 variables, in that case, spectral data and close to 10 variables for all historical samples from 6 months to a year. The only way to solve that problem was to apply data science. I have gone through a combination of a series of self-learning (on DataCamp, Coursera and AWS) and formal education (completed a Masters and now busy with my PhD) to keep up to date with the right combination of skills.

-

If you enjoyed Webster’s analysis and want to go deeper, check out his Medium post on COVID-19 visualisations.

If you’d like to learn more about becoming a data scientist with Umuzi, check out Umuzi’s data science careers page as well as our great list of data science self study resources.