COVID-19 Data Repository

The NIST COVID19-DATA repository is being made available to aid in meeting the White House Call to Action for the Nation’s artificial intelligence experts to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19.

The data in this repository provides direct access to features within the CORD-19 dataset so that researchers can interactively browse and search through the dataset, using keywords, authors, or institutions. AI researchers can directly query the data, which includes both relevant keywords and the full text of the articles. It is our hope that our contributions provide the research community with easy methods to access the underlying information in the collection. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date.

Coronavirus diagram

Note: If you are just looking for the static data files from this site, they are available in XML, JSON and CSV formats here:

FormatSize# of Files
Click for XML (tar)4.2 G 144,241
Click for XML (zip)1.3 G 144,241
Click for JSON (tar)4.2 G 144,241
Click for JSON (zip)1.4 G 144,241
Click for CSV (tar)4.1 G 144,241
Click for CSV (zip)1.3 G 144,241

Additional features provided here include:
  • Ability to browse/search all articles
  • Full text of documents for AI researchers
  • Keyword analysis, with intuitive links for common terms - Click Here
  • Query by field, select fields to search from schema representing papers - Click Here
  • Expanded formats in csv, json, and xml
  • Direct links to articles, authors, institutions, and licenses
  • Programmatic access via a REST API
  • See this help file for examples of how to use this system