a graph based evidence synthesis approach to detecting outbreak clusters an application CORD-Papers-2022-06-02 (Version 1)

Title: A graph-based evidence synthesis approach to detecting outbreak clusters: An application to dog rabies
Abstract: Early assessment of infectious disease outbreaks is key to implementing timely and effective control measures. In particular rapidly recognising whether infected individuals stem from a single outbreak sustained by local transmission or from repeated introductions is crucial to adopt effective interventions. In this study we introduce a new framework for combining several data streams e.g. temporal spatial and genetic data to identify clusters of related cases of an infectious disease. Our method explicitly accounts for underreporting and allows incorporating preexisting information about the disease such as its serial interval spatial kernel and mutation rate. We define for each data stream a graph connecting all cases with edges weighted by the corresponding pairwise distance between cases. Each graph is then pruned by removing distances greater than a given cutoff defined based on preexisting information on the disease and assumptions on the reporting rate. The pruned graphs corresponding to different data streams are then merged by intersection to combine all data types; connected components define clusters of cases related for all types of data. Estimates of the reproduction number (the average number of secondary cases infected by an infectious individual in a large population) and the rate of importation of the disease into the population are also derived. We test our approach on simulated data and illustrate it using data on dog rabies in Central African Republic. We show that the outbreak clusters identified using our method are consistent with structures previously identified by more complex computationally intensive approaches.
Published: 2018-12-17
Journal: PLoS Comput Biol
DOI: 10.1371/journal.pcbi.1006554
DOI_URL: http://doi.org/10.1371/journal.pcbi.1006554
Author Name: Cori Anne
Author link: https://covid19-data.nist.gov/pid/rest/local/author/cori_anne
Author Name: Nouvellet Pierre
Author link: https://covid19-data.nist.gov/pid/rest/local/author/nouvellet_pierre
Author Name: Garske Tini
Author link: https://covid19-data.nist.gov/pid/rest/local/author/garske_tini
Author Name: Bourhy Herv
Author link: https://covid19-data.nist.gov/pid/rest/local/author/bourhy_herv
Author Name: Nakoun Emmanuel
Author link: https://covid19-data.nist.gov/pid/rest/local/author/nakoun_emmanuel
Author Name: Jombart Thibaut
Author link: https://covid19-data.nist.gov/pid/rest/local/author/jombart_thibaut
sha: 1b08674379f9805e1ab55ce13c056b716433b8a2
license: cc-by
license_url: https://creativecommons.org/licenses/by/4.0/
source_x: PMC
source_x_url: https://www.ncbi.nlm.nih.gov/pubmed/
pubmed_id: 30557340
pubmed_id_url: https://www.ncbi.nlm.nih.gov/pubmed/30557340
pmcid: PMC6312344
pmcid_url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6312344
url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6312344/
has_full_text: TRUE
Keywords Extracted from Text Content: stem dog rabies nodes rabies brain samples dog rabies https://www.gov.uk/government/ organisations/department-for-internationaldevelopment solid black lines Bourhy bovine Fig 4D https://www T n G HPRU-2012-10080 TJ B left Pruned (A-C 5.6.6 [4, 5] human Fig 4A-4C Fig 5A West-African Ebola virus canine rabies Fig 2 canine Ypma rabies dogs −4 −5 human rabies green https://doi.org/10.1371/journal.pcbi.1006554.g003 inner colours PREDEMICS TPR https://www.usaid.gov Trizol reagent PHE j 2 N}. vertices [6, 7] S3-S6 Figs a1111111111 a1111111111 Fig 1A nodes N G humans nodes https://www.ncbi.nlm.nih.gov/nuccore/JQ685977.1 � a1111111111 nihr.ac.uk S1 Fig. https://mrc outer colours MR/K010174/1 Fig 1G Fig 1 1D MR/R015600/1 PN sections patients PNG left panel https://doi.org/10.1371/journal.pcbi.1006554.g002 UK Beugin avian influenza left column dogs https://doi.org/10.1371/journal.pcbi.1006554.g005 5,221 https://ec.europa.eu/ RABV genome Fig 5C rabies rabies
Extracted Text Content in Record: First 5000 Characters:Early assessment of infectious disease outbreaks is key to implementing timely and effective control measures. In particular, rapidly recognising whether infected individuals stem from a single outbreak sustained by local transmission, or from repeated introductions, is crucial to adopt effective interventions. In this study, we introduce a new framework for combining several data streams, e.g. temporal, spatial and genetic data, to identify clusters of related cases of an infectious disease. Our method explicitly accounts for underreporting, and allows incorporating preexisting information about the disease, such as its serial interval, spatial kernel, and mutation rate. We define, for each data stream, a graph connecting all cases, with edges weighted by the corresponding pairwise distance between cases. Each graph is then pruned by removing distances greater than a given cutoff, defined based on preexisting information on the disease and assumptions on the reporting rate. The pruned graphs corresponding to different data streams are then merged by intersection to combine all data types; connected components define clusters of cases related for all types of data. Estimates of the reproduction number (the average number of secondary cases infected by an infectious individual in a large population), and the rate of importation of the disease into the population, are also derived. We test our approach on simulated data and illustrate it using data on dog rabies in Central African Republic. We show that the outbreak clusters identified using our method are consistent with structures previously identified by more complex, computationally intensive approaches. Early assessment of infectious disease outbreaks is key to implementing timely and effective control measures. In particular, rapidly recognising whether infected individuals stem from a single outbreak sustained by local transmission, or from repeated introductions, is PLOS Computational Biology | https://doi.crucial to adopt effective interventions. In this study, we introduce a new approach which combines different types of data to identify clusters of related cases of an infectious disease. This approach relies on representing each type of data (e.g. temporal, spatial, or genetic) as a graph where nodes are cases, and two nodes are connected if the corresponding cases are closely related for this data. Our method then identifies clusters of cases which likely stem from the same introduction. Furthermore, we can use the size of these clusters to infer transmissibility of the disease and the number of importations of the pathogen into the population. We apply this approach to analyse dog rabies epidemics in Central African Republic. We show that outbreak clusters identified using our method are consistent with structures previously identified by more complex and computationally intensive approaches. Using simulated rabies epidemics, we show that our method has excellent potential for optimally detecting outbreak clusters. We also identify promising areas of research for transforming our method into a routine analysis tool for processing disease surveillance data. A graph-based evidence synthesis approach to detecting outbreak clusters PLOS Computational Biology | https://doi.A graph-based evidence synthesis approach to detecting outbreak clusters PLOS Computational Biology | https://doi. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Infectious disease outbreaks are a recurring threat to humans and animals, with potentially disastrous impacts on human health, economy, and biodiversity. Over the last decade, major epidemics such as the 2009 influenza pandemic [1] , the emergence of the Middle-East Respiratory Syndrome (MERS, [2, 3] ) and the West-African Ebola virus disease outbreak (EVD, [4, 5] ) have re-emphasized the need for assessing outbreaks at an early stage. Indeed, rapid identification of clusters of cases linked by transmission and subsequent intervention remain our best chance of containing, or at least mitigating disease epidemics. The identification of clusters of cases is also key to adapting the response in ongoing epidemics. Indeed, early assessment of the relative contributions of local transmission versus case importation is essential for designing appropriate intervention strategies. For instance, a nosocomial outbreak may be driven by within-hospital transmissions or repeated introductions from the community, calling for different control measures [6, 7] . Similarly at a country level, local transmissions and importations of cases from other countries will require different control measures, e.g. social distancing and prevention versus border closing [8, 9] . In the case of zoonotic infections, it is also crucial to identify the extent to which within-species transmission and spill-over from the reservoir (i.e. cross-species transmission) contribute to the observed incidence, as illustrated in the case of avian influenza
Keywords Extracted from PMC Text: G Fig 5C Fig 1C RABV genome humans dogs [40,41 Fig 1 Gn [4,5] line nodes N 95%1/3 ⋂n canine rabies j ≤ ∀ i avian influenza [10,11 TPR within-host Fig 5A Bourhy Ypma j ∈ N}. rabies Fig 2 5.6.6 human dni fn Pobs fn,π, in the presence of underreporting (S1 Text) R bovine S1 Fig. patients dogs nodes 1D ∀ n}. 's fn,π = fn prune 5,221 j (Fig 1B and 1E border n. brain samples [6,7] dog rabies Fig 4D Fig 1G rabies dogs human rabies Beugin Trizol reagent Fig 1A West-African Ebola virus
Extracted PMC Text Content in Record: First 5000 Characters:Infectious disease outbreaks are a recurring threat to humans and animals, with potentially disastrous impacts on human health, economy, and biodiversity. Over the last decade, major epidemics such as the 2009 influenza pandemic [1], the emergence of the Middle-East Respiratory Syndrome (MERS, [2,3]) and the West-African Ebola virus disease outbreak (EVD, [4,5]) have re-emphasized the need for assessing outbreaks at an early stage. Indeed, rapid identification of clusters of cases linked by transmission and subsequent intervention remain our best chance of containing, or at least mitigating disease epidemics. The identification of clusters of cases is also key to adapting the response in ongoing epidemics. Indeed, early assessment of the relative contributions of local transmission versus case importation is essential for designing appropriate intervention strategies. For instance, a nosocomial outbreak may be driven by within-hospital transmissions or repeated introductions from the community, calling for different control measures [6,7]. Similarly at a country level, local transmissions and importations of cases from other countries will require different control measures, e.g. social distancing and prevention versus border closing [8,9]. In the case of zoonotic infections, it is also crucial to identify the extent to which within-species transmission and spill-over from the reservoir (i.e. cross-species transmission) contribute to the observed incidence, as illustrated in the case of avian influenza [10,11], bovine tuberculosis [12], or MERS [3,8,9]. Methodologically, the identification of clusters of cases linked by transmission (i.e. cases belonging to the same transmission tree or stemming from a single introduction, here referred to as 'outbreak clusters') is strongly related to other fields which have received considerable attention from statisticians and modellers over the last decades. First, it is closely linked to outbreak detection methods, which generally aim to identify excesses of cases compared to a reference 'baseline' in incidence time series [13–19], and are for instance routinely used to detect the beginning of seasonal influenza epidemics from surveillance data [20–22]. Some extensions to spatiotemporal data [23–26] have shown geographic information can be a useful complement to temporal data [27,28], but little efforts have been devoted to generalising these approaches to other types of data such as pathogen whole genome sequences (WGS). Second, the identification of outbreak clusters is also closely related to outbreak reconstruction methods, which infer transmission chains using complex outbreak models integrating multiple sources of data such as time, space, and WGS [29–32]. Lastly, population genetics has had a long-standing tradition of developing clustering methods [33–35], some of which have proved useful for studying pathogen populations [36,37]. Unfortunately, despite these connections to well-developed methodological fields, the integration of multiple data sources (e.g. epidemiological and genomic data) to identify outbreak clusters remains in its infancy [38]. In this study, we introduce a new, simple and intuitive framework for combining various sources of information to identify clusters of related cases of a disease. This evidence synthesis approach can combine various data streams such as the timing and location of the cases, as well as WGS of the pathogen, to identify such outbreak clusters. Our method relies on the observation that, in an outbreak, individuals who infect one another are likely to be closely related with respect to various characteristics; for instance, their symptom onsets appear within the same time period and in neighbouring locations, and they bear genetically similar pathogen strains. We consider that, to be part of the same outbreak cluster, two cases must be closely related in all relevant data streams. These data sources describe relationships between cases in intrinsically different spaces (e.g., temporal, spatial, genetic), but they can all be used to compute pairwise distances between cases (e.g. number of days between dates of onset, geographic distance between locations, number of mutations between pathogen WGS). We define, for each data stream, a weighted graph, where nodes correspond to cases, and the edge between two cases is weighted by the pairwise distance (for this data stream) between these two cases, so that 'heavy' edges indicate pairs of cases unlikely to have infected one another. To retain only relevant connections, each graph is then pruned by removing heavy edges, defined as edges whose weight exceeds a predefined cutoff distance (Fig 1). Defining the adequate cutoff is central to identifying clusters of related cases. We develop a framework for defining cutoffs based on the expected distance distributions between observed cases in an outbreak. This enables us to incorporate pre-existing information about the disease, e.g.
PDF JSON Files: document_parses/pdf_json/1b08674379f9805e1ab55ce13c056b716433b8a2.json
PMC JSON Files: document_parses/pmc_json/PMC6312344.xml.json
G_ID: a_graph_based_evidence_synthesis_approach_to_detecting_outbreak_clusters_an_application