RESUMEN
As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing and/or sequencing capacity, which can also introduce biases1-3. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing4,5. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We developed and deployed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detected emerging variants of concern up to 14 days earlier in wastewater samples, and identified multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.
Asunto(s)
COVID-19 , SARS-CoV-2 , Monitoreo Epidemiológico Basado en Aguas Residuales , Aguas Residuales , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/virología , Humanos , ARN Viral/análisis , ARN Viral/genética , SARS-CoV-2/clasificación , SARS-CoV-2/genética , SARS-CoV-2/aislamiento & purificación , Análisis de Secuencia de ARN , Aguas Residuales/virologíaRESUMEN
Bacteroides fragilis is a Gram-negative commensal bacterium commonly found in the human colon, which differentiates into two genomospecies termed divisions I and II. Through a comprehensive collection of 694 B. fragilis whole genome sequences, we identify novel features distinguishing these divisions. Our study reveals a distinct geographic distribution with division I strains predominantly found in North America and division II strains in Asia. Additionally, division II strains are more frequently associated with bloodstream infections, suggesting a distinct pathogenic potential. We report differences between the two divisions in gene abundance related to metabolism, virulence, stress response, and colonization strategies. Notably, division II strains harbor more antimicrobial resistance (AMR) genes than division I strains. These findings offer new insights into the functional roles of division I and II strains, indicating specialized niches within the intestine and potential pathogenic roles in extraintestinal sites. IMPORTANCE: Understanding the distinct functions of microbial species in the gut microbiome is crucial for deciphering their impact on human health. Classifying division II strains as Bacteroides fragilis can lead to erroneous associations, as researchers may mistakenly attribute characteristics observed in division II strains to the more extensively studied division I B. fragilis. Our findings underscore the necessity of recognizing these divisions as separate species with distinct functions. We unveil new findings of differential gene prevalence between division I and II strains in genes associated with intestinal colonization and survival strategies, potentially influencing their role as gut commensals and their pathogenicity in extraintestinal sites. Despite the significant niche overlap and colonization patterns between these groups, our study highlights the complex dynamics that govern strain distribution and behavior, emphasizing the need for a nuanced understanding of these microorganisms.
Asunto(s)
Bacteroides fragilis , Variación Genética , Genoma Bacteriano , Bacteroides fragilis/genética , Bacteroides fragilis/patogenicidad , Bacteroides fragilis/aislamiento & purificación , Humanos , Genoma Bacteriano/genética , Microbioma Gastrointestinal/genética , Filogenia , Infecciones por Bacteroides/microbiología , Secuenciación Completa del Genoma , Farmacorresistencia Bacteriana/genéticaRESUMEN
Bacteroides fragilis is a prominent member of the human gut microbiota, playing crucial roles in maintaining gut homeostasis and host health. Although it primarily functions as a beneficial commensal, B. fragilis can become pathogenic. To determine the genetic basis of its duality, we conducted a comparative genomic analysis of 813 B. fragilis strains, representing both commensal and pathogenic origins. Our findings reveal that pathogenic strains emerge across diverse phylogenetic lineages, due in part to rapid gene exchange and the adaptability of the accessory genome. We identified 16 phylogenetic groups, differentiated by genes associated with capsule composition, interspecies competition, and host interactions. A microbial genome-wide association study identified 44 genes linked to extra-intestinal survival and pathogenicity. These findings reveal how genomic diversity within commensal species can lead to the emergence of pathogenic traits, broadening our understanding of microbial evolution in the gut.
RESUMEN
Despite extensive efforts, extracting information on medication exposure from clinical records remains challenging. To complement this approach, we developed the tandem mass spectrometry (MS/MS) based GNPS Drug Library. This resource integrates MS/MS data for drugs and their metabolites/analogs with controlled vocabularies on exposure sources, pharmacologic classes, therapeutic indications, and mechanisms of action. It enables direct analysis of drug exposure and metabolism from untargeted metabolomics data independent of clinical records. Our library facilitates stratification of individuals in clinical studies based on the empirically detected medications, exemplified by drug-dependent microbiota-derived N-acyl lipid changes in a cohort with human immunodeficiency virus. The GNPS Drug Library holds potential for broader applications in drug discovery and precision medicine.
RESUMEN
Next-generation sequencing technologies have enabled many advances across diverse areas of biology, with many benefiting from increased sample size. Although the cost of running next-generation sequencing instruments has dropped substantially over time, the cost of sample preparation methods has lagged behind. To counter this, researchers have adapted library miniaturization protocols and large sample pools to maximize the number of samples that can be prepared by a certain amount of reagents and sequenced in a single run. However, due to high variability of sample quality, over and underrepresentation of samples in a sequencing run has become a major issue in high-throughput sequencing. This leads to misinterpretation of results due to increased noise, and additional time and cost rerunning underrepresented samples. To overcome this problem, we present a normalization method that uses shallow iSeq sequencing to accurately inform pooling volumes based on read distribution. This method is superior to the widely used fluorometry methods, which cannot specifically target adapter-ligated molecules that contribute to sequencing output. Our normalization method not only quantifies adapter-ligated molecules but also allows normalization of feature space; for example, we can normalize to reads of interest such as non-ribosomal reads. As a result, this normalization method improves the efficiency of high-throughput next-generation sequencing by reducing noise and producing higher average reads per sample with more even sequencing depth. IMPORTANCE High-throughput next generation sequencing (NGS) has significantly contributed to the field of genomics; however, further improvements can maximize the potential of this important tool. Uneven sequencing of samples in a multiplexed run is a common issue that leads to unexpected extra costs or low-quality data. To mitigate this problem, we introduce a normalization method based on read counts rather than library concentration. This method allows for an even distribution of features of interest across samples, improving the statistical power of data sets and preventing the financial loss associated with resequencing libraries. This method optimizes NGS, which already has huge importance across many areas of biology.
Asunto(s)
Genómica , Programas Informáticos , Genómica/métodos , Análisis de Secuencia de ADN , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
Bacteroides fragilis is a Gram-negative commensal bacterium commonly found in the human colon that differentiates into two genomospecies termed division I and II. We leverage a comprehensive collection of 694 B. fragilis whole genome sequences and report differential gene abundance to further support the recent proposal that divisions I and II represent separate species. In division I strains, we identify an increased abundance of genes related to complex carbohydrate degradation, colonization, and host niche occupancy, confirming the role of division I strains as gut commensals. In contrast, division II strains display an increased prevalence of plant cell wall degradation genes and exhibit a distinct geographic distribution, primarily originating from Asian countries, suggesting dietary influences. Notably, division II strains have an increased abundance of genes linked to virulence, survival in toxic conditions, and antimicrobial resistance, consistent with a higher incidence of these strains in bloodstream infections. This study provides new evidence supporting a recent proposal for classifying divisions I and II B. fragilis strains as distinct species, and our comparative genomic analysis reveals their niche-specific roles.
RESUMEN
As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing/sequencing capacity, which can also introduce biases. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here, we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We develop and deploy improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detect emerging variants of concern up to 14 days earlier in wastewater samples, and identify multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.