When I was 6 or 7, my initial interest in birds lay in sorting them by their characteristics – size, color, behaviors. Later on, the thrill was in discovering new exotic species in my birding books – imagining that giant condors and eagles that inhabited the tops of the eucalyptus trees in our neighborhood. More recently, in my high school years, my interest has moved to questions of “why” or “how” – as I explored in an earlier episode, how the iridescence we see in the hummingbird’s gorgets can be applied to camera lens technology, or how the relationship between woodpeckers could indicate new ancestor species.
I recently participated in a science writing contest. The following post is a bit different from what I have done in the past – it is a research hypothesis that explores using statistical analytical tools to develop a warning system to predict high risk zoonotic viruses. Articles in the Economist and Bloomberg BusinessWeek have referenced similar research efforts underway in labs and institutions worldwide. This research serves as a way for me to combine my interests in ornithology and statistics, and to see how I might apply it to solving a real-world problem!
Several pandemics have been zoonotic – involving animal-transmitted viruses. These animal carriers have exhibited some common features, forming a repetitive data pattern. This study uses past data to predict the future – by analyzing data about past animal transmitters, we can hypothesize about at-risk animal populations.
There have been multiple instances of zoonotic pandemics including COVID-19, Swine Flu, Ebola, MERS and Avian Flu. In each case, some animal attributes were causative and increased the animal’s likeliness to cause a pandemic. This data can be utilized in a statistical model.
The first step in the study is to identify a set of variables – animal traits most relevant to causing a pandemic. These variables should be causative, unique and measurable. Key variables include numerical values such as population density and population, boolean values such as domestication, and categorical values such as diet type. Other relevant variables could be known pathogen-host (boolean), hibernation pattern (categorical) and range (categorical). However, traits such as plumage and nesting structure would not be as relevant. The matrix below illustrates hypothetical data of two pandemic animals.
Next, we can use data reduction techniques to remove irrelevant and confounding variables. A principal component analysis would yield variables with the highest r2 when compared to the y-value, eliminating the list to only the most relevant data.
The best predictive model would be a binary logistic regression, which uses numerical, categorical and boolean data to estimate the probability of how likely a given animal is to cause a pandemic. Before creating the model, we must standardize the data. Numerical data such as population is left unchanged, while categorical data is represented with dummy binary values. Boolean values such as domestication can be expressed as 0 or 1, as can categorical data such as diet type.
Computer software like MATLAB can then create a logistic regression that will estimate each animal’s risk level. Applied to a large collection of species, it chooses the likeliest candidates, allowing scientists to monitor only the top few. Data collected about current animals might resemble this:
This data shows that Rock Doves have characteristics of a potential pandemic animal – high values for population density and human consumption, and truth values for human consumption and human domestication. Similarly, the Common Raven has a high value for population density and a truth value for omnivore. The regression uses this data to estimate the probabilities for both animals.
Could applying this research have a difference? If a model like this can be designed, governments and scientists could target research and monitor resources more effectively, roll out preventive measures faster. Predicting which animals could cause the next zoonotic pandemic could greatly reduce the risk of a similar pandemic in the future.