Apr 20 2020
Despite 75% of the population in New York’s Capital Region staying at home, the COVID-19 pandemic is likely to shoot up locally during the mid-half of May. This trend has been predicted by a powerful machine learning model that can estimate the impact of the pandemic even in smaller cities.
If the rate of individuals remaining at home reduces to 50%, the COVID-19 pandemic will shoot up again in early June.
Malik Magdon-Ismail, a scientist from Rensselaer Polytechnic Institute, has customized the models he creates to work with scarce data points, similar to those available in smaller cities or during the initial phase in a pandemic. Such data points normally make trend-spotting complicated.
There are no simple, robust, general tools that, for example, officials in Albany could use to make projections. These models show that the projections vary enormously from one city to another. This knowledge could relieve some of the uncertainty that is around in developing policy.
Malik Magdon-Ismail, Professor of Computer Science, Rensselaer Polytechnic Institute
Magdon-Ismail is also an expert in pattern recognition, data mining, and machine learning.
To develop the models, Magdon-Ismail used county data that was available via the New York State Department of Health and Mental Hygiene, These models can forecast the local scenarios of the pandemic, like the infectious force of the pandemic, the rate of infections that occur over a period, estimates for asymptomatic infections, and the speed at which minor infections turn out to be serious.
The new research model is a continuing work and, since the nature of the work is sensitive to time, previous models have been launched on the arXiv preprint server, which has been moderated but is yet to be reviewed by peers.
Magdon-Ismail’s model for the Capital Region integrates the information updated till April 10th, 2020, from Rensselaer, Albany, Schenectady, and Saratoga counties. Using a total at-risk population of 855,000, this model has predicted that day-to-day confirmed infections will rise at 1,490 on June 8th, 2020, with 50% of the population remaining at home, or 750 on May 28th, 2020, with 75% of the population remaining at home. This means the number of infections would reach a total of 58,000 or 29,000, in that order.
As of April 10th, 2020, confirmed infections are about 1,000 and according to the model, there would be as many as 14,000 asymptomatic cases during that time.
It is difficult to model smaller cities with machine learning, especially with available limited data points that are updated less often than an epicenter like New York City or the picture of the country as a whole. Generic machine learning that operates on these data points would most probably result in wrong predictions. To overcome this problem, Magdon-Ismail works on basic models and employs “robust” algorithms that integrate solutions that are beyond the mathematically ideal algorithms.
The machine gives you the model that best fits the data, but it turns out the best is usually a very fragile principle. There are lots of different models, lots of different explanations that are essentially as good. To make the output robust, we consider the collection of models that have near-optimal levels of consistency with the data. I find a variety of models that fit the data, and then I use all of those models together to predict.
Malik Magdon-Ismail, Professor of Computer Science, Rensselaer Polytechnic Institute
According to Magdon-Ismail, creating analogous models for other smaller cities in New York State would be as simple as “running the numbers.”
In a previous attempt, which was also published online in arXiv, Magdon-Ismail applied this algorithm to the data from the very outset of the pandemic in the United States. With only a few of the infections reported from January 20th to March 14th, 2020, the previous data was also as limited as that is available in smaller cities. The previous data gave another insight too—it provided a forecast of what the virus would do if it is left unchecked.
Early data is captured in the analogy: if you want to learn about a lion, you don’t observe the lion in the zoo, you have to observe the lion on the savannah. And basically what that means is early dynamics of the pandemic. Nobody really knows what’s going on, nobody really knows whether it’s serious, so nobody’s really done anything. And that’s where you see how it will really behave.
Malik Magdon-Ismail, Professor of Computer Science, Rensselaer Polytechnic Institute
Machine Learning Models Predict COVID-19 Impact in Smaller Cities
Video Credit: Rensselaer Polytechnic Institute.