Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Cell ; 186(26): 5690-5704.e20, 2023 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-38101407

RESUMEN

The maturation of genomic surveillance in the past decade has enabled tracking of the emergence and spread of epidemics at an unprecedented level. During the COVID-19 pandemic, for example, genomic data revealed that local epidemics varied considerably in the frequency of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) lineage importation and persistence, likely due to a combination of COVID-19 restrictions and changing connectivity. Here, we show that local COVID-19 epidemics are driven by regional transmission, including across international boundaries, but can become increasingly connected to distant locations following the relaxation of public health interventions. By integrating genomic, mobility, and epidemiological data, we find abundant transmission occurring between both adjacent and distant locations, supported by dynamic mobility patterns. We find that changing connectivity significantly influences local COVID-19 incidence. Our findings demonstrate a complex meaning of "local" when investigating connected epidemics and emphasize the importance of collaborative interventions for pandemic prevention and mitigation.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/virología , Genómica , Pandemias/prevención & control , Salud Pública , SARS-CoV-2/genética , Control de Infecciones , Geografía
2.
Nature ; 609(7925): 101-108, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35798029

RESUMEN

As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing and/or sequencing capacity, which can also introduce biases1-3. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing4,5. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We developed and deployed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detected emerging variants of concern up to 14 days earlier in wastewater samples, and identified multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.


Asunto(s)
COVID-19 , SARS-CoV-2 , Monitoreo Epidemiológico Basado en Aguas Residuales , Aguas Residuales , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/virología , Humanos , ARN Viral/análisis , ARN Viral/genética , SARS-CoV-2/clasificación , SARS-CoV-2/genética , SARS-CoV-2/aislamiento & purificación , Análisis de Secuencia de ARN , Aguas Residuales/virología
3.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38200583

RESUMEN

MOTIVATION: The genomic surveillance of viral pathogens such as SARS-CoV-2 and HIV-1 has been critical to modern epidemiology and public health, but the use of sequence analysis pipelines requires computational expertise, and web-based platforms require sending potentially sensitive raw sequence data to remote servers. RESULTS: We introduce ViralWasm, a user-friendly graphical web application suite for viral genomics. All ViralWasm tools utilize WebAssembly to execute the original command line tools client-side directly in the web browser without any user setup, with a cost of just 2-3x slowdown with respect to their command line counterparts. AVAILABILITY AND IMPLEMENTATION: The ViralWasm tool suite can be accessed at: https://niema-lab.github.io/ViralWasm.


Asunto(s)
Genómica , Programas Informáticos , Humanos , Genómica/métodos , Navegador Web , Genoma Viral
4.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37171896

RESUMEN

MOTIVATION: In viral molecular epidemiology, reconstruction of consensus genomes from sequence data is critical for tracking mutations and variants of concern. However, as the number of samples that are sequenced grows rapidly, compute resources needed to reconstruct consensus genomes can become prohibitively large. RESULTS: ViralConsensus is a fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data. ViralConsensus is orders of magnitude faster and more memory-efficient than existing methods. Further, unlike existing methods, ViralConsensus can pipe data directly from a read mapper via standard input and performs viral consensus calling on-the-fly, making it an ideal tool for viral sequencing pipelines. AVAILABILITY AND IMPLEMENTATION: ViralConsensus is freely available at https://github.com/niemasd/ViralConsensus as an open-source software project.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Análisis de Secuencia de ADN/métodos , Consenso , Genoma Viral , Algoritmos
5.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37369033

RESUMEN

MOTIVATION: Driven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. RESULTS: We propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA's tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. AVAILABILITY AND IMPLEMENTATION: HOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.


Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Bases de Datos de Proteínas , Péptidos/química , Motor de Búsqueda , Algoritmos , Biblioteca de Péptidos
6.
J Proteome Res ; 22(6): 1639-1648, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-37166120

RESUMEN

As current shotgun proteomics experiments can produce gigabytes of mass spectrometry data per hour, processing these massive data volumes has become progressively more challenging. Spectral clustering is an effective approach to speed up downstream data processing by merging highly similar spectra to minimize data redundancy. However, because state-of-the-art spectral clustering tools fail to achieve optimal runtimes, this simply moves the processing bottleneck. In this work, we present a fast spectral clustering tool, HyperSpec, based on hyperdimensional computing (HDC). HDC shows promising clustering capability while only requiring lightweight binary operations with high parallelism that can be optimized using low-level hardware architectures, making it possible to run HyperSpec on graphics processing units to achieve extremely efficient spectral clustering performance. Additionally, HyperSpec includes optimized data preprocessing modules to reduce the spectrum preprocessing time, which is a critical bottleneck during spectral clustering. Based on experiments using various mass spectrometry data sets, HyperSpec produces results with comparable clustering quality as state-of-the-art spectral clustering tools while achieving speedups by orders of magnitude, shortening the clustering runtime of over 21 million spectra from 4 h to only 24 min.


Asunto(s)
Algoritmos , Péptidos , Péptidos/análisis , Espectrometría de Masas/métodos , Proteómica/métodos , Análisis por Conglomerados
7.
Bioinformatics ; 37(5): 714-716, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32814953

RESUMEN

MOTIVATION: In molecular epidemiology, the identification of clusters of transmissions typically requires the alignment of viral genomic sequence data. However, existing methods of multiple sequence alignment (MSA) scale poorly with respect to the number of sequences. RESULTS: ViralMSA is a user-friendly reference-guided MSA tool that leverages the algorithmic techniques of read mappers to enable the MSA of ultra-large viral genome datasets. It scales linearly with the number of sequences, and it is able to align tens of thousands of full viral genomes in seconds. However, alignments produced by ViralMSA omit insertions with respect to the reference genome. AVAILABILITY AND IMPLEMENTATION: ViralMSA is freely available at https://github.com/niemasd/ViralMSA as an open-source software project. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Genoma Viral , Epidemiología Molecular , Alineación de Secuencia
8.
BMC Med Inform Decis Mak ; 21(1): 177, 2021 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-34082739

RESUMEN

BACKGROUND: The ability to prioritize people living with HIV (PLWH) by risk of future transmissions could aid public health officials in optimizing epidemiological intervention. While methods exist to perform such prioritization based on molecular data, their effectiveness and accuracy are poorly understood, and it is unclear how one can directly compare the accuracy of different methods. We introduce SEPIA (Simulation-based Evaluation of PrIoritization Algorithms), a novel simulation-based framework for determining the effectiveness of prioritization algorithms. SEPIA expands upon prior related work by defining novel metrics of effectiveness with which to compare prioritization techniques, as well as by creating a simulation-based tool with which to perform such effectiveness comparisons. Under several metrics of effectiveness that we propose, we compare two existing prioritization approaches: one phylogenetic (ProACT) and one distance-based (growth of HIV-TRACE transmission clusters). RESULTS: Using all proposed metrics, ProACT consistently slightly outperformed the transmission cluster growth approach. However, both methods consistently performed just marginally better than random, suggesting that there is significant room for improvement in prioritization tools. CONCLUSION: We hope that, by providing ways to quantify the effectiveness of prioritization methods in simulation, SEPIA will aid researchers in developing novel risk prioritization tools for PLWH.


Asunto(s)
Infecciones por VIH , Sepia , Algoritmos , Animales , Simulación por Computador , Humanos , Filogenia
9.
Bioinformatics ; 35(11): 1852-1861, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-30395173

RESUMEN

MOTIVATION: The ability to simulate epidemics as a function of model parameters allows insights that are unobtainable from real datasets. Further, reconstructing transmission networks for fast-evolving viruses like Human Immunodeficiency Virus (HIV) may have the potential to greatly enhance epidemic intervention, but transmission network reconstruction methods have been inadequately studied, largely because it is difficult to obtain 'truth' sets on which to test them and properly measure their performance. RESULTS: We introduce FrAmework for VIral Transmission and Evolution Simulation (FAVITES), a robust framework for simulating realistic datasets for epidemics that are caused by fast-evolving pathogens like HIV. FAVITES creates a generative model to produce contact networks, transmission networks, phylogenetic trees and sequence datasets, and to add error to the data. FAVITES is designed to be extensible by dividing the generative model into modules, each of which is expressed as a fixed API that can be implemented using various models. We use FAVITES to simulate HIV datasets and study the realism of the simulated datasets. We then use the simulated data to study the impact of the increased treatment efforts on epidemiological outcomes. We also study two transmission network reconstruction methods and their effectiveness in detecting fast-growing clusters. AVAILABILITY AND IMPLEMENTATION: FAVITES is available at https://github.com/niemasd/FAVITES, and a Docker image can be found on DockerHub (https://hub.docker.com/r/niemasd/favites). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Infecciones por VIH , Filogenia , Humanos
10.
Syst Biol ; 67(3): 475-489, 2018 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-29165679

RESUMEN

Models of tree evolution have mostly focused on capturing the cladogenesis processes behind speciation. Processes that derive the evolution of genomic elements, such as repeats, are not necessarily captured by these existing models. In this article, we design a model of tree evolution that we call the dual-birth model, and we show how it can be useful in studying the evolution of short Alu repeats found in the human genome in abundance. The dual-birth model extends the traditional birth-only model to have two rates of propagation, one for active nodes that propagate often, and another for inactive nodes, that with a lower rate, activate and start propagating. Adjusting the ratio of the rates controls the expected tree balance. We present several theoretical results under the dual-birth model, introduce parameter estimation techniques, and study the properties of the model in simulations. We then use the dual-birth model to estimate the number of active Alu elements and their rates of propagation and activation in the human genome based on a large phylogenetic tree that we build from close to one million Alu sequences.


Asunto(s)
Elementos Alu/genética , Clasificación/métodos , Evolución Molecular , Genoma Humano/genética , Modelos Biológicos , Filogenia , Humanos
13.
bioRxiv ; 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-37873270

RESUMEN

Coronaviruses exhibit many mechanisms of genetic innovation1-5, including the acquisition of accessory genes that originate by capture of cellular genes or through duplication of existing viral genes6,7. Accessory genes influence viral host range and cellular tropism, but little is known about how selection acts on these variable regions of virus genomes. We used experimental evolution of mouse hepatitis virus (MHV) encoding a cellular AKAP7 phosphodiesterase and an inactive native phosphodiesterase, NS2 (ref 8) to simulate the capture of a host gene and analyze its evolution. After courses of serial infection, the gene encoding inactive NS2, ORF2, unexpectedly remained intact, suggesting it is under cryptic constraint uncoupled from the function of NS2. In contrast, AKAP7 was retained under strong selection but rapidly lost under relaxed selection. Guided by the retention of ORF2 and similar patterns in related betacoronaviruses, we analyzed ORF8 of SARS-CoV-2, which arose via gene duplication6 and contains premature stop codons in several globally successful lineages. As with MHV ORF2, the coding-defective SARS-CoV-2 ORF8 gene remains largely intact, mirroring patterns observed during MHV experimental evolution, challenging assumptions on the dynamics of gene loss in virus genomes and extending these findings to viruses currently adapting to humans.

14.
AIDS ; 38(2): 245-254, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-37890471

RESUMEN

OBJECTIVES: This study investigates primary peer-referral engagement (PRE) strategies to assess which strategy results in engaging higher numbers of people with HIV (PWH) who are virally unsuppressed. DESIGN: We develop a modeling study that simulates an HIV epidemic (transmission, disease progression, and viral evolution) over 6 years using an agent-based model followed by simulating PRE strategies. We investigate two PRE strategies where referrals are based on social network strategies (SNS) or sexual partner contact tracing (SPCT). METHODS: We parameterize, calibrate, and validate our study using data from Chicago on Black sexual minority men to assess these strategies for a population with high incidence and prevalence of HIV. For each strategy, we calculate the number of PWH recruited who are undiagnosed or out-of-care (OoC) and the number of direct or indirect transmissions. RESULTS: SNS and SPCT identified 256.5 [95% confidence interval (CI) 234-279] and 15 (95% CI 7-27) PWH, respectively. Of these, SNS identified 159 (95% CI 142-177) PWH OoC and 32 (95% CI 21-43) PWH undiagnosed compared with 9 (95% CI 3-18) and 2 (95% CI 0-5) for SPCT. SNS identified 15.5 (95% CI 6-25) and 7.5 (95% CI 2-11) indirect and direct transmission pairs, whereas SPCT identified 6 (95% CI 0-8) and 5 (95% CI 0-8), respectively. CONCLUSION: With no testing constraints, SNS is the more effective strategy to identify undiagnosed and OoC PWH. Neither strategy is successful at identifying sufficient indirect or direct transmission pairs to investigate transmission networks.


Asunto(s)
Infecciones por VIH , Minorías Sexuales y de Género , Masculino , Humanos , Infecciones por VIH/epidemiología , Parejas Sexuales , Red Social , Trazado de Contacto
15.
bioRxiv ; 2023 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-37745602

RESUMEN

Zoonotic spillovers of viruses have occurred through the animal trade worldwide. The start of the COVID-19 pandemic was traced epidemiologically to the Huanan Wholesale Seafood Market, the site with the most reported wildlife vendors in the city of Wuhan, China. Here, we analyze publicly available qPCR and sequencing data from environmental samples collected in the Huanan market in early 2020. We demonstrate that the SARS-CoV-2 genetic diversity linked to this market is consistent with market emergence, and find increased SARS-CoV-2 positivity near and within a particular wildlife stall. We identify wildlife DNA in all SARS-CoV-2 positive samples from this stall. This includes species such as civets, bamboo rats, porcupines, hedgehogs, and one species, raccoon dogs, known to be capable of SARS-CoV-2 transmission. We also detect other animal viruses that infect raccoon dogs, civets, and bamboo rats. Combining metagenomic and phylogenetic approaches, we recover genotypes of market animals and compare them to those from other markets. This analysis provides the genetic basis for a short list of potential intermediate hosts of SARS-CoV-2 to prioritize for retrospective serological testing and viral sampling.

16.
medRxiv ; 2023 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-34704096

RESUMEN

Background: Schools are high-risk settings for SARS-CoV-2 transmission, but necessary for children's educational and social-emotional wellbeing. Previous research suggests that wastewater monitoring can detect SARS-CoV-2 infections in controlled residential settings with high levels of accuracy. However, its effective accuracy, cost, and feasibility in non-residential community settings is unknown. Methods: The objective of this study was to determine the effectiveness and accuracy of community-based passive wastewater and surface (environmental) surveillance to detect SARS-CoV-2 infection in neighborhood schools compared to weekly diagnostic (PCR) testing. We implemented an environmental surveillance system in nine elementary schools with 1700 regularly present staff and students in southern California. The system was validated from November 2020 - March 2021. Findings: In 447 data collection days across the nine sites 89 individuals tested positive for COVID-19, and SARS-CoV-2 was detected in 374 surface samples and 133 wastewater samples. Ninety-three percent of identified cases were associated with an environmental sample (95% CI: 88% - 98%); 67% were associated with a positive wastewater sample (95% CI: 57% - 77%), and 40% were associated with a positive surface sample (95% CI: 29% - 52%). The techniques we utilized allowed for near-complete genomic sequencing of wastewater and surface samples. Interpretation: Passive environmental surveillance can detect the presence of COVID-19 cases in non-residential community school settings with a high degree of accuracy. Funding: County of San Diego, Health and Human Services Agency, National Institutes of Health, National Science Foundation, Centers for Disease Control.

17.
Lancet Reg Health Am ; 19: 100449, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36844610

RESUMEN

Background: Schools are high-risk settings for SARS-CoV-2 transmission, but necessary for children's educational and social-emotional wellbeing. Previous research suggests that wastewater monitoring can detect SARS-CoV-2 infections in controlled residential settings with high levels of accuracy. However, its effective accuracy, cost, and feasibility in non-residential community settings is unknown. Methods: The objective of this study was to determine the effectiveness and accuracy of community-based passive wastewater and surface (environmental) surveillance to detect SARS-CoV-2 infection in neighborhood schools compared to weekly diagnostic (PCR) testing. We implemented an environmental surveillance system in nine elementary schools with 1700 regularly present staff and students in southern California. The system was validated from November 2020 to March 2021. Findings: In 447 data collection days across the nine sites 89 individuals tested positive for COVID-19, and SARS-CoV-2 was detected in 374 surface samples and 133 wastewater samples. Ninety-three percent of identified cases were associated with an environmental sample (95% CI: 88%-98%); 67% were associated with a positive wastewater sample (95% CI: 57%-77%), and 40% were associated with a positive surface sample (95% CI: 29%-52%). The techniques we utilized allowed for near-complete genomic sequencing of wastewater and surface samples. Interpretation: Passive environmental surveillance can detect the presence of COVID-19 cases in non-residential community school settings with a high degree of accuracy. Funding: County of San Diego, Health and Human Services Agency, National Institutes of Health, National Science Foundation, Centers for Disease Control.

18.
GigaByte ; 2022: gigabyte37, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36968795

RESUMEN

Epidemic simulations require the ability to sample contact networks from various random graph models. Existing methods can simulate city-scale or even country-scale contact networks, but they are unable to feasibly simulate global-scale contact networks due to high memory consumption. NiemaGraphGen (NGG) is a memory-efficient graph generation tool that enables the simulation of global-scale contact networks. NGG avoids storing the entire graph in memory and is instead intended to be used in a data streaming pipeline, resulting in memory consumption that is orders of magnitude smaller than existing tools. NGG provides a massively-scalable solution for simulating social contact networks, enabling global-scale epidemic simulation studies.

19.
Viruses ; 14(4)2022 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-35458504

RESUMEN

The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime.


Asunto(s)
Filogenia , Humanos , Epidemiología Molecular , Alineación de Secuencia , Flujo de Trabajo
20.
J Chem Theory Comput ; 18(7): 4047-4069, 2022 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-35710099

RESUMEN

Atomistic Molecular Dynamics (MD) simulations provide researchers the ability to model biomolecular structures such as proteins and their interactions with drug-like small molecules with greater spatiotemporal resolution than is otherwise possible using experimental methods. MD simulations are notoriously expensive computational endeavors that have traditionally required massive investment in specialized hardware to access biologically relevant spatiotemporal scales. Our goal is to summarize the fundamental algorithms that are employed in the literature to then highlight the challenges that have affected accelerator implementations in practice. We consider three broad categories of accelerators: Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs). These categories are comparatively studied to facilitate discussion of their relative trade-offs and to gain context for the current state of the art. We conclude by providing insights into the potential of emerging hardware platforms and algorithms for MD.


Asunto(s)
Algoritmos , Simulación de Dinámica Molecular , Computadores
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA