Early Data Sharing Can Contain Pandemics: A Call For Collaboration

Apr 03, 2020

The coronavirus pandemic is creating unprecedented acts of technological collaboration around the world. As just one example, the Covid-19 Open Research Dataset is making more than 24,000 coronavirus papers available, and searchable, in one place.

It is inspiring to see the world’s best scientist and researchers working together, and separately, to accelerate the creation of new treatments — assess the efficacy of existing ones — and rapidly investigate novel anti-contagion and prophylaxis innovation.

But to create a safer world we need to turn our attention to the full epidemiological continuum. That starts with the earliest stages of a future pandemic, when the signals are there but often distributed and indistinct, requiring a great deal of data collection, both structured and unstructured, and machine learning to analyze the signals.

There is no greater impact we can have on preventing the spread of a pathogen than sophisticated detection and extrapolation. We’re seeing the incredible value of time right now, in cities around the world, when we look at the scary math of doubling.

With the current focus on suppression and containment, what has been overlooked is that a small company called GeoSure — disclosure, I am an advisor and shareholder — was able to recognize the dangerous signals in Wuhan earlier than just about anyone. GeoSure is a travel safety company that analyzes data from thousands of sources, simultaneously, and then applies predictive analytics to create risks scores. In this case, they even analyzed unstructured data such as changing airline and train schedules, and their capacities, for valuable insights and signals.

Based on that, on January 24th GeoSure raised the Health & Medical Risk score for Wuhan to 100, the highest possible, This was at a time when the Chinese government was reporting fewer than 8,000 cases — a number GeoSure found to be understated by a factor of 10. At the same time, GeoSure raised risk levels for Seattle, Washington and other gateway cities by 20 percent or more — an orders-of-magnitude escalation

Moreover, the predictive signals screamed that in the coming weeks the virus would spread, and the actual number of cases were likely — at a 95 percent confidence interval — to be substantially higher by a factor of at least 10! They also recognized that it was more widely geographically distributed than had been asserted. In fact, GeoSure predicted substantial near-term global impact across more than 130 cities, far ahead of the WHO’s official categorization of COVID-19 as a pandemic on March 11.

That a tiny risk modeling tech start-up was able to generate such timely, accurate and forward-looking data, ahead of agencies like the CDC, NGOs and the WHO — as well as massive for-profit corporations with domain expertise — demonstrates the imperative of data-sharing that brings the world’s best minds in epidemiology and data science together. And that frees us from the need to rely on any one country’s representation.

The sources of the raw data available to GeoSure — and others — include government agencies; independent researchers and academics; large corporations who share data (many do not, unfortunately); and purpose-driven startups in the data and analytics space. They are the heat-seeking data missiles society needs.

Imagine, though, if more data sources were opened up, and more people were able to innovate based on that — the equivalent of the Open Dataset for drug research I described earlier. We need an Open DataSet for location-based signals, which will democratize assess for a safer world.

We need a collaboration revolution. Industry leaders in all sectors must transform the way they gather and process data — by opening up both process and sources. In a recent National Public Radio story on 3M, who makes tens of millions of surgical masks, one of their executives casually noted that they picked up “strange disease patterns coming out of China.” The world had a right to know, immediately what those patterns were; one can’t imagine any long-term corporate benefit from keeping that data proprietary. There are ways to overcome regulatory handcuffing by not releasing material non-public information.

Since all eyes are on the future of COVID-19, what GeoSure’s data is saying with confidence is that the pathogen will remain a health risk for at least six-to-twelve months. They do not see abatement for between four-and-six months, even if we do everything right.

The world is visibly under siege from the invisible, and with no time to lose we must call on all those researching the coronavirus to open their data sources and come together to tackle this enormous global challenge. Just like one person can end up infecting 134, one data point can be magnified to save the lives of thousands.

The pandemic is showing that borders are arbitrary, and our biologics are no different no matter where we live. If we don’t share our data, our planet will share the burden.