Title:
|
Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-COV-2 infections unreliable |
Published:
|
2020-06-09 |
Journal:
|
Proc Natl Acad Sci U S A |
DOI:
|
10.1073/pnas.2007295117 |
DOI_URL:
|
http://doi.org/10.1073/pnas.2007295117 |
Author Name:
|
Mavian Carla |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/mavian_carla |
Author Name:
|
Pond Sergei Kosakovsky |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/pond_sergei_kosakovsky |
Author Name:
|
Marini Simone |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/marini_simone |
Author Name:
|
Magalis Brittany Rife |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/magalis_brittany_rife |
Author Name:
|
Vandamme Anne Mieke |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/vandamme_anne_mieke |
Author Name:
|
Dellicour Simon |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/dellicour_simon |
Author Name:
|
Scarpino Samuel V |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/scarpino_samuel_v |
Author Name:
|
Houldcroft Charlotte |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/houldcroft_charlotte |
Author Name:
|
Villabona Arenas Julian |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/villabona_arenas_julian |
Author Name:
|
Paisie Taylor K |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/paisie_taylor_k |
Author Name:
|
Trovo Ndia S |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/trovo_ndia_s |
Author Name:
|
Boucher Christina |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/boucher_christina |
Author Name:
|
Zhang Yun |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/zhang_yun |
Author Name:
|
Scheuermann Richard H |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/scheuermann_richard_h |
Author Name:
|
Gascuel Olivier |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/gascuel_olivier |
Author Name:
|
Lam Tommy Tsan Yuk |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/lam_tommy_tsan_yuk |
Author Name:
|
Suchard Marc A |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/suchard_marc_a |
Author Name:
|
Abecasis Ana |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/abecasis_ana |
Author Name:
|
Wilkinson Eduan |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/wilkinson_eduan |
Author Name:
|
de Oliveira Tulio |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/de_oliveira_tulio |
Author Name:
|
Bento Ana I |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/bento_ana_i |
Author Name:
|
Schmidt Heiko A |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/schmidt_heiko_a |
Author Name:
|
Martin Darren |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/martin_darren |
Author Name:
|
Hadfield James |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/hadfield_james |
Author Name:
|
Faria Nuno |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/faria_nuno |
Author Name:
|
Grubaugh Nathan D |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/grubaugh_nathan_d |
Author Name:
|
Neher Richard A |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/neher_richard_a |
Author Name:
|
Baele Guy |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/baele_guy |
Author Name:
|
Lemey Philippe |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/lemey_philippe |
Author Name:
|
Stadler Tanja |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/stadler_tanja |
Author Name:
|
Albert Jan |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/albert_jan |
Author Name:
|
Crandall Keith A |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/crandall_keith_a |
Author Name:
|
Leitner Thomas |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/leitner_thomas |
Author Name:
|
Stamatakis Alexandros |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/stamatakis_alexandros |
Author Name:
|
Prosperi Mattia |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/prosperi_mattia |
Author Name:
|
Salemi Marco |
Author link:
|
https://covid19-data.nist.gov/pid/rest/local/author/salemi_marco |
sha:
|
3b3794fc48e257c10d1130a51f9fef688c28c215 |
license:
|
cc-by |
license_url:
|
https://creativecommons.org/licenses/by/4.0/ |
source_x:
|
Medline; PMC |
source_x_url:
|
https://www.medline.com/https://www.ncbi.nlm.nih.gov/pubmed/ |
pubmed_id:
|
32381734 |
pubmed_id_url:
|
https://www.ncbi.nlm.nih.gov/pubmed/32381734 |
pmcid:
|
PMC7293693 |
pmcid_url:
|
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7293693 |
url:
|
https://doi.org/10.1073/pnas.2007295117
https://www.ncbi.nlm.nih.gov/pubmed/32381734/ |
has_full_text:
|
TRUE |
Keywords Extracted from Text Content:
|
SARS-CoV-2's
C.
human
bats
coronavirus
humans
clade/ lineage
Wuhan
root
nucleotide
https://www.gisaid.org/
Nextstrain (5), but this
Americans
coronavirus disease 2019
SARS-CoV-2
B
COVID-19
SI Appendix
network
solid
Wuhan B-type virus |
Extracted Text Content in Record:
|
First 5000 Characters:There is obvious interest in gaining insights into the epidemiology and evolution of the virus that has recently emerged in humans as the cause of the coronavirus disease 2019 (COVID-19) pandemic. The recent paper by Forster et al. (1) analyzed 160 severe acute respiratory syndrome coronavirus (SARS-CoV-2) full genomes available (https://www.gisaid.org/) in early March 2020. The central claim is the identification of three main SARS-CoV-2 types, named A, B, and C, circulating in different proportions among Europeans and Americans (types A and C) and East Asians (type B). According to a median-joining network analysis, variant A is proposed to be the ancestral type because it links to the sequence of a coronavirus from bats, used as an outgroup to trace the ancestral origin of the human strains. The authors further suggest that the "ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia." There are several serious flaws with their findings and interpretation. First, and most obviously, the sequence identity between SARS-CoV-2 and the bat virus is only 96.2%, implying that these viral genomes (which are nearly 30,000 nucleotides long) differ by more than 1,000 mutations. Such a distant outgroup is unlikely to provide a reliable root for the network. Yet, strangely, the branch to the bat virus, in figure 1 of their paper, is only 16 or 17 mutations in length. Indeed, the network seems to be misrooted, because (see their SI Appendix, figure S4 ) a virus from Wuhan from week 0 (24 December 2019) is portrayed as a descendant of a clade of viruses collected in weeks 1 to 9 (presumably from many places outside China), which makes no evolutionary (2) or epidemiological sense (3) .
As for the finding of three main SARS-CoV-2 types, we must underline that finding different lineages in different countries and regions is expected with any RNA virus experiencing founder effects (2) . According to Forster et al.'s (1) own analysis, a single synonymous mutation (nucleotide change in a gene that does not result in a modified protein) distinguishes type A from type B, while one nonsynonymous mutation (resulting in a protein with a single amino acid change) separates types A and C, and another one separates types B and C. Given SARS-CoV-2's fast evolutionary rate, random emergence of new mutations is entirely expected, even in a relatively short timeframe (4) . When a viral strain is introduced and spreads in a new population, such random mutations can be propagated without them being selected or advantageous, due to founder effects. The fact that SARS-CoV-2 sequences show some geographical clustering is not new and is nicely and interactively shown on Nextstrain (5), but this cannot be used as a proof of biological differences unless backed by solid experimental data (6) . This is particularly true for the work of Forster et al., since their findings are based on a nonrepresentative dataset of 160 genomes, with no significant correlation between prevalence of confirmed cases and number of sequenced strains per country (7, 8) . The essential role of representative sampling is well documented in the literature (9), but was not acknowledged by the authors, who, instead, claim that their "network faithfully traces routes of infections for documented cases," without taking into consideration missing viral diversity, or evaluating multiple transmission hypotheses that would be consistent with sequence data, or even providing any support on the robustness of the branching pattern in their network. Ultimately, no firm conclusion should be drawn without evaluating the probability of alternative dissemination routes.
The inappropriate application and interpretation of phylogenetic methods to analyze limited and unevenly sampled datasets begs for restraint about origin, directionality, and early clade/ lineage inference of SARS-CoV-2. We feel the urgency to reframe the current debate in more rigorous scientific terms, given the dangerous implications of misunderstanding the true dispersal dynamics of SARS-CoV-2 and the COVID-19 pandemic.
We are grateful to Paul Sharp, Andrew Rambaut |
Keywords Extracted from PMC Text:
|
https://www.gisaid.org/
SARS-CoV-2
"
solid
COVID-19
bats
's (1
root
Wuhan B-type virus
Wuhan
Americans
Nextstrain (5), but this
C.
SI Appendix
B
nucleotide
human
coronavirus disease 2019
humans
coronavirus
SARS-CoV-2's |
Extracted PMC Text Content in Record:
|
First 5000 Characters:There is obvious interest in gaining insights into the epidemiology and evolution of the virus that has recently emerged in humans as the cause of the coronavirus disease 2019 (COVID-19) pandemic. The recent paper by Forster et al. (1) analyzed 160 severe acute respiratory syndrome coronavirus (SARS-CoV-2) full genomes available (https://www.gisaid.org/) in early March 2020. The central claim is the identification of three main SARS-CoV-2 types, named A, B, and C, circulating in different proportions among Europeans and Americans (types A and C) and East Asians (type B). According to a median-joining network analysis, variant A is proposed to be the ancestral type because it links to the sequence of a coronavirus from bats, used as an outgroup to trace the ancestral origin of the human strains. The authors further suggest that the "ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia." There are several serious flaws with their findings and interpretation. First, and most obviously, the sequence identity between SARS-CoV-2 and the bat virus is only 96.2%, implying that these viral genomes (which are nearly 30,000 nucleotides long) differ by more than 1,000 mutations. Such a distant outgroup is unlikely to provide a reliable root for the network. Yet, strangely, the branch to the bat virus, in figure 1 of their paper, is only 16 or 17 mutations in length. Indeed, the network seems to be misrooted, because (see their SI Appendix, figure S4) a virus from Wuhan from week 0 (24 December 2019) is portrayed as a descendant of a clade of viruses collected in weeks 1 to 9 (presumably from many places outside China), which makes no evolutionary (2) or epidemiological sense (3).
As for the finding of three main SARS-CoV-2 types, we must underline that finding different lineages in different countries and regions is expected with any RNA virus experiencing founder effects (2). According to Forster et al.'s (1) own analysis, a single synonymous mutation (nucleotide change in a gene that does not result in a modified protein) distinguishes type A from type B, while one nonsynonymous mutation (resulting in a protein with a single amino acid change) separates types A and C, and another one separates types B and C. Given SARS-CoV-2's fast evolutionary rate, random emergence of new mutations is entirely expected, even in a relatively short timeframe (4). When a viral strain is introduced and spreads in a new population, such random mutations can be propagated without them being selected or advantageous, due to founder effects. The fact that SARS-CoV-2 sequences show some geographical clustering is not new and is nicely and interactively shown on Nextstrain (5), but this cannot be used as a proof of biological differences unless backed by solid experimental data (6). This is particularly true for the work of Forster et al., since their findings are based on a nonrepresentative dataset of 160 genomes, with no significant correlation between prevalence of confirmed cases and number of sequenced strains per country (7, 8). The essential role of representative sampling is well documented in the literature (9), but was not acknowledged by the authors, who, instead, claim that their "network faithfully traces routes of infections for documented [COVID-19] cases," without taking into consideration missing viral diversity, or evaluating multiple transmission hypotheses that would be consistent with sequence data, or even providing any support on the robustness of the branching pattern in their network. Ultimately, no firm conclusion should be drawn without evaluating the probability of alternative dissemination routes.
The inappropriate application and interpretation of phylogenetic methods to analyze limited and unevenly sampled datasets begs for restraint about origin, directionality, and early clade/lineage inference of SARS-CoV-2. We feel the urgency to reframe the current debate in more rigorous scientific terms, given the dangerous implications of misunderstanding the true dispersal dynamics of SARS-CoV-2 and the COVID-19 pandemic. |
PDF JSON Files:
|
document_parses/pdf_json/3b3794fc48e257c10d1130a51f9fef688c28c215.json |
PMC JSON Files:
|
document_parses/pmc_json/PMC7293693.xml.json |
G_ID:
|
sampling_bias_and_incorrect_rooting_make_phylogenetic_network_tracing_of_sars_cov_2 |