RESUMEN
The serial interval of an infectious disease is an important variable in epidemiology. It is defined as the period of time between the symptom onset times of the infector and infectee in a direct transmission pair. Under partially sampled data, purported infector-infectee pairs may actually be separated by one or more unsampled cases in between. Misunderstanding such pairs as direct transmissions will result in overestimating the length of serial intervals. On the other hand, two cases that are infected by an unseen third case (known as coprimary transmission) may be classified as a direct transmission pair, leading to an underestimation of the serial interval. Here, we introduce a method to jointly estimate the distribution of serial intervals factoring in these two sources of error. We simultaneously estimate the distribution of the number of unsampled intermediate cases between purported infector-infectee pairs, as well as the fraction of such pairs that are coprimary. We also extend our method to situations where each infectee has multiple possible infectors, and show how to factor this additional source of uncertainty into our estimates. We assess our method's performance on simulated data sets and find that our method provides consistent and robust estimates. We also apply our method to data from real-life outbreaks of four infectious diseases and compare our results with published results. With similar accuracy, our method of estimating serial interval distribution provides unique advantages, allowing its application in settings of low sampling rates and large population sizes, such as widespread community transmission tracked by routine public health surveillance.
Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , Brotes de Enfermedades , Factores de TiempoRESUMEN
In the management of infectious disease outbreaks, grouping cases into clusters and understanding their underlying epidemiology are fundamental tasks. In genomic epidemiology, clusters are typically identified either using pathogen sequences alone or with sequences in combination with epidemiological data such as location and time of collection. However, it may not be feasible to culture and sequence all pathogen isolates, so sequence data may not be available for all cases. This presents challenges for identifying clusters and understanding epidemiology, because these cases may be important for transmission. Demographic, clinical and location data are likely to be available for unsequenced cases, and comprise partial information about their clustering. Here, we use statistical modelling to assign unsequenced cases to clusters already identified by genomic methods, assuming that a more direct method of linking individuals, such as contact tracing, is not available. We build our model on pairwise similarity between cases to predict whether cases cluster together, in contrast to using individual case data to predict the cases' clusters. We then develop methods that allow us to determine whether a pair of unsequenced cases are likely to cluster together, to group them into their most probable clusters, to identify which are most likely to be members of a specific (known) cluster, and to estimate the true size of a known cluster given a set of unsequenced cases. We apply our method to tuberculosis data from Valencia, Spain. Among other applications, we find that clustering can be predicted successfully using spatial distance between cases and whether nationality is the same. We can identify the correct cluster for an unsequenced case, among 38 possible clusters, with an accuracy of approximately 35â%, higher than both direct multinomial regression (17â%) and random selection (< 5â%).
Asunto(s)
Brotes de Enfermedades , Genómica , Humanos , Análisis por Conglomerados , Modelos LogísticosRESUMEN
Serial intervals - the time between symptom onset in infector and infectee - are a fundamental quantity in infectious disease control. However, their estimation requires knowledge of individuals' exposures, typically obtained through resource-intensive contact tracing efforts. We introduce an alternate framework using virus sequences to inform who infected whom and thereby estimate serial intervals. We apply our technique to SARS-CoV-2 sequences from case clusters in the first two COVID-19 waves in Victoria, Australia. We find that our approach offers high resolution, cluster-specific serial interval estimates that are comparable with those obtained from contact data, despite requiring no knowledge of who infected whom and relying on incompletely-sampled data. Compared to a published serial interval, cluster-specific serial intervals can vary estimates of the effective reproduction number by a factor of 2-3. We find that serial interval estimates in settings such as schools and meat processing/packing plants are shorter than those in healthcare facilities.