COVID-19 resources for Data Scientists

| Leave a comment

Berkeley, and the UC System in general, have many projects researching COVID-19. There is no end of research directions that can and should be explored for us to understand the health, societal, and economic implications of the disease. Here is a collection of resources for ongoing projects at Berkeley and a few other institutions.


Published on March 13, 2020, this op-ed by Justin Lessler captures what is still considered an accurate overall prediction of Covid and society: “Coronavirus will linger after the pandemic ends. But it won’t be as bad. We have a long, painful process ahead of us before it’s just a part of normal life, though.”

Berkeley Engineering has a collection of research efforts on Covid.

This and many other conversations about Covid at Berkeley (ongoing, frequent youtube videos) are starting points for research, complete with Berkeley experts you can contact to guide you!

Videos at the above link which are sponsored by our Division:


Berkeley, MIT, CMU, Illinois, Chicago, Princeton, Microsoft, and a couple of other places have teamed up with C3.ai to create the C3.ai Digital Transformation Institute, an AI research consortium (press release). The first call for proposals (due May 1, sadly) for Covid + AI research has gone out. There will be others.

Most useful for us is the list of topics for research awards. These areas should spur your thoughts:

  1. Applying machine learning/AI methods to mitigate the spread of the Covid pandemic
  2. Genome-specific Covid medical protocols, including precision medicine of host responses
  3. Biomedical informatics methods for drug design and repurposing
  4. Design and sharing of clinical trials for collecting and analyzing data on medications, therapies, and interventions
  5. Modeling, simulation, prediction of Covid propagation and efficacy of interventions
  6. Logistics and optimization analysis for design of public health strategies and interventions
  7. Rigorous approaches to designing sampling and testing strategies
  8. Data analytics for Covid research harnessing private and sensitive data, including the role of edge computing/IoT for gathering data
  9. Improving societal resilience in response to the spread of Covid Pandemic
  10. Broader efforts in biomedicine, infectious disease modeling, response logistics and optimization, public health efforts, tools, and methodologies around the containment of rising infectious diseases, and response to pandemics so as to be better prepared for future infectious diseases.

Down the street from Berkeley, the Stanford Institute for Human-Centered AI has a list of open research projects related to Covid:
Johns Hopkins’ Center for Health Security has a fantastic array of coverage, including research, media, and a collection of “Fact Sheets” about Covid. For example, there are many social media and mobile apps available for tracking users’ contact with Covid.

The New England Journal of Medicine (NEJM), The Lancet (and The Lancet’s Infectious Disease journal), the Journal of the American Medical Association (JAMA) are the mainstays of medical research.

The New England Journal of Medicine’s high-level editorial series called “Covid-19 Notes” discusses responses and recent changes by healthcare providers and suppliers for Covid.


Most of the official, trustworthy references for Covid are raw numbers from CDC, WHO, and various state and federal websites. https://www.who.int/emergencies/diseases/novel-coronavirus-2019 https://www.cdc.gov/coronavirus/2019-ncov http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

The study of disease in humans, called Epidemiology, is a mature field. Understanding a little about epidemiology will help you to have useful insights for practitioners.

  • How do we know 5-10% of Americans are diabetic? Did we count them all?
  • How do we know what the proximal cause of death is for an individual? For a country?

A great reference for understanding disease surveillance and tests for disease: Leon Gordis, Epidemiology (any edition).

Be careful to understand several fundamental challenges that are as often missed in popular media as they are appreciated.

  • Different countries report different things, and two countries’ reports should only be compared with extreme caution. Firstly, it’s well-established that diseases affect populations differently (slightly differently, at least), based on race and country of origin. Secondly, different reporting bodies use different standards for what is or is not reported: for example, some cities in China reported people who died of the disease in the same category as those who survived, since both groups are no longer actively transmitting the disease.
  • In particular, you must distinguish case fatality rate (known) from infection fatality rate (unknown).

Finally, Kaggle has ongoing projects and public databases.

Leave a Reply

Your email address will not be published. Required fields are marked *

You can add images to your comment by clicking here.

This site uses Akismet to reduce spam. Learn how your comment data is processed.