RESUMEN
MOTIVATION: Beyond identifying genetic variants, we introduce a set of Boolean relations, which allows for a comprehensive classification of the relations of every pair of variants by taking all minimal alignments into account. We present an efficient algorithm to compute these relations, including a novel way of efficiently computing all minimal alignments within the best theoretical complexity bounds. RESULTS: We show that these relations are common, and many non-trivial, for variants of the CFTR gene in dbSNP. Ultimately, we present an approach for the storing and indexing of variants in the context of a database that enables efficient querying for all these relations. AVAILABILITY AND IMPLEMENTATION: A Python implementation is available at https://github.com/mutalyzer/algebra/tree/v0.2.0 as well as an interface at https://mutalyzer.nl/algebra.
Asunto(s)
Algoritmos , Manejo de Datos , Bases de Datos Factuales , Programas InformáticosRESUMEN
It has long been known that biological species can be identified from mass spectrometry data alone. Ten years ago, we described a method and software tool, compareMS2, for calculating a distance between sets of tandem mass spectra, as routinely collected in proteomics. This method has seen use in species identification and mixture characterization in food and feed products, as well as other applications. Here, we present the first major update of this software, including a new metric, a graphical user interface and additional functionality. The data have been deposited to ProteomeXchange with dataset identifier PXD034932.
Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , AlgoritmosRESUMEN
BACKGROUND & AIMS: Patients with multiple recurrent Clostridioides difficile infection (rCDI) have a disturbed gut microbiota that can be restored by fecal microbiota transplantation (FMT). Despite extensive screening, healthy feces donors may carry bacteria in their intestinal tract that could have long-term health effects, such as potentially procarcinogenic polyketide synthase-positive (pks+) Escherichia coli. Here, we aim to determine whether the pks abundance and persistence of pks+E coli is influenced by pks status of the donor feces. METHODS: In a cohort of 49 patients with rCDI treated with FMT and matching donor samples-the largest cohort of its kind, to our knowledge-we retrospectively screened fecal metagenomes for pks+E coli and compared the presence of pks in patients before and after treatment and to their respective donors. RESULTS: The pks island was more prevalent (P = .026) and abundant (P < .001) in patients with rCDI (pre-FMT, 27 of 49 [55%]; median, 0.46 reads per kilobase per million [RPKM] pks) than in healthy donors (3 of 8 donors [37.5%], 11 of 38 samples [29%]; median, 0.01 RPKM pks). The pks status of patients post-FMT depended on the pks status of the donor suspension with which the patient was treated (P = .046). Particularly, persistence (8 of 9 cases) or clearance (13 of 18) of pks+E coli in pks+ patients was correlated to pks in the donor (P = .004). CONCLUSIONS: We conclude that FMT contributes to pks+E coli persistence or eradication in patients with rCDI but that donor-to-patient transmission of pks+E coli is unlikely.
Asunto(s)
Clostridioides difficile/patogenicidad , Infecciones por Clostridium/terapia , Escherichia coli/crecimiento & desarrollo , Trasplante de Microbiota Fecal , Microbioma Gastrointestinal , Adulto , Anciano , Anciano de 80 o más Años , Infecciones por Clostridium/diagnóstico , Infecciones por Clostridium/microbiología , Disbiosis , Escherichia coli/enzimología , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Trasplante de Microbiota Fecal/efectos adversos , Femenino , Humanos , Masculino , Metagenoma , Metagenómica , Persona de Mediana Edad , Sintasas Poliquetidas/genética , Sintasas Poliquetidas/metabolismo , Reinfección , Estudios Retrospectivos , Factores de Tiempo , Resultado del TratamientoRESUMEN
MOTIVATION: Unambiguous variant descriptions are of utmost importance in clinical genetic diagnostics, scientific literature and genetic databases. The Human Genome Variation Society (HGVS) publishes a comprehensive set of guidelines on how variants should be correctly and unambiguously described. We present the implementation of the Mutalyzer 2 tool suite, designed to automatically apply the HGVS guidelines so users do not have to deal with the HGVS intricacies explicitly to check and correct their variant descriptions. RESULTS: Mutalyzer is profusely used by the community, having processed over 133 million descriptions since its launch. Over a five year period, Mutalyzer reported a correct input in â¼50% of cases. In 41% of the cases either a syntactic or semantic error was identified and for â¼7% of cases, Mutalyzer was able to automatically correct the description. AVAILABILITY AND IMPLEMENTATION: Mutalyzer is an Open Source project under the GNU Affero General Public License. The source code is available on GitHub (https://github.com/mutalyzer/mutalyzer) and a running instance is available at: https://mutalyzer.nl.
Asunto(s)
Variación Genética , Programas Informáticos , Humanos , Genoma HumanoRESUMEN
Each year diagnostic laboratories in the Netherlands profile thousands of individuals for heritable disease using next-generation sequencing (NGS). This requires pathogenicity classification of millions of DNA variants on the standard 5-tier scale. To reduce time spent on data interpretation and increase data quality and reliability, the nine Dutch labs decided to publicly share their classifications. Variant classifications of nearly 100,000 unique variants were catalogued and compared in a centralized MOLGENIS database. Variants classified by more than one center were labeled as "consensus" when classifications agreed, and shared internationally with LOVD and ClinVar. When classifications opposed (LB/B vs. LP/P), they were labeled "conflicting", while other nonconsensus observations were labeled "no consensus". We assessed our classifications using the InterVar software to compare to ACMG 2015 guidelines, showing 99.7% overall consistency with only 0.3% discrepancies. Differences in classifications between Dutch labs or between Dutch labs and ACMG were mainly present in genes with low penetrance or for late onset disorders and highlight limitations of the current 5-tier classification system. The data sharing boosted the quality of DNA diagnostics in Dutch labs, an initiative we hope will be followed internationally. Recently, a positive match with a case from outside our consortium resulted in a more definite disease diagnosis.
Asunto(s)
Enfermedades Genéticas Congénitas/diagnóstico , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Difusión de la Información/métodos , Exactitud de los Datos , Bases de Datos Genéticas , Enfermedades Genéticas Congénitas/genética , Guías como Asunto , Humanos , Laboratorios , Países Bajos , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: Bacteria carry a wide array of genes, some of which have multiple alleles. These different alleles are often responsible for distinct types of virulence and can determine the classification at the subspecies levels (e.g., housekeeping genes for Multi Locus Sequence Typing, MLST). Therefore, it is important to rapidly detect not only the gene of interest, but also the relevant allele. Current sequencing-based methods are limited to mapping reads to each of the known allele reference, which is a time-consuming procedure. RESULTS: To address this limitation, we developed BacTag - a pipeline that rapidly and accurately detects which genes are present in a sequencing dataset and reports the allele of each of the identified genes. We exploit the fact that different alleles of the same gene have a high similarity. Instead of mapping the reads to each of the allele reference sequences, we preprocess the database prior to the analysis, which makes the subsequent gene and allele identification efficient. During the preprocessing, we determine a representative reference sequence for each gene and store the differences between all alleles and this chosen reference. Throughout the analysis we estimate whether the gene is present in the sequencing data by mapping the reads to this reference sequence; if the gene is found, we compare the variants to those in the preprocessed database. This allows to detect which specific allele is present in the sequencing data. Our pipeline was successfully tested on artificial WGS E. coli, S. pseudintermedius, P. gingivalis, M. bovis, Borrelia spp. and Streptomyces spp. data and real WGS E. coli and K. pneumoniae data in order to report alleles of MLST house-keeping genes. CONCLUSIONS: We developed a new pipeline for fast and accurate gene and allele recognition based on database preprocessing and parallel computing and performed better or comparable to the current popular tools. We believe that our approach can be useful for a wide range of projects, including bacterial subspecies classification, clinical diagnostics of bacterial infections, and epidemiological studies.
Asunto(s)
Bacterias/clasificación , Bacterias/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Tipificación Molecular/métodos , Análisis de Secuencia de ADN/métodos , Alelos , Bases de Datos Genéticas , Genes Bacterianos , Genoma BacterianoRESUMEN
Although previous studies have documented a bottleneck in the transmission of mtDNA genomes from mothers to offspring, several aspects remain unclear, including the size and nature of the bottleneck. Here, we analyze the dynamics of mtDNA heteroplasmy transmission in the Genomes of the Netherlands (GoNL) data, which consists of complete mtDNA genome sequences from 228 trios, eight dizygotic (DZ) twin quartets, and 10 monozygotic (MZ) twin quartets. Using a minor allele frequency (MAF) threshold of 2%, we identified 189 heteroplasmies in the trio mothers, of which 59% were transmitted to offspring, and 159 heteroplasmies in the trio offspring, of which 70% were inherited from the mothers. MZ twin pairs exhibited greater similarity in MAF at heteroplasmic sites than DZ twin pairs, suggesting that the heteroplasmy MAF in the oocyte is the major determinant of the heteroplasmy MAF in the offspring. We used a likelihood method to estimate the effective number of mtDNA genomes transmitted to offspring under different bottleneck models; a variable bottleneck size model provided the best fit to the data, with an estimated mean of nine individual mtDNA genomes transmitted. We also found evidence for negative selection during transmission against novel heteroplasmies (in which the minor allele has never been observed in polymorphism data). These novel heteroplasmies are enhanced for tRNA and rRNA genes, and mutations associated with mtDNA diseases frequently occur in these genes. Our results thus suggest that the female germ line is able to recognize and select against deleterious heteroplasmies.
Asunto(s)
ADN Mitocondrial , Familia , Heterogeneidad Genética , Patrón de Herencia , Población Blanca/genética , Alelos , Femenino , Frecuencia de los Genes , Humanos , Masculino , Modelos Genéticos , Modelos Estadísticos , Mutación , Países Bajos , Polimorfismo Genético , Selección Genética , GemelosRESUMEN
Next-generation sequencing is radically changing how DNA diagnostic laboratories operate. What started as a single-gene profession is now developing into gene panel sequencing and whole-exome and whole-genome sequencing (WES/WGS) analyses. With further advances in sequencing technology and concomitant price reductions, WGS will soon become the standard and be routinely offered. Here, we focus on the critical steps involved in performing WGS, with a particular emphasis on points where WGS differs from WES, the important variables that should be taken into account, and the quality control measures that can be taken to monitor the process. The points discussed here, combined with recent publications on guidelines for reporting variants, will facilitate the routine implementation of WGS into a diagnostic setting.
Asunto(s)
Genoma Humano/genética , Exoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Proteína 2 de Unión a Metil-CpG/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Alternative splicing is a powerful mechanism present in eukaryotic cells to obtain a wide range of transcripts and protein isoforms from a relatively small number of genes. The mechanisms regulating (alternative) splicing and the paradigm of consecutive splicing have recently been challenged, especially for genes with a large number of introns. RNA-Seq, a powerful technology using deep sequencing in order to determine transcript structure and expression levels, is usually performed on mature mRNA, therefore not allowing detailed analysis of splicing progression. Sequencing pre-mRNA at different stages of splicing potentially provides insight into mRNA maturation. Although the number of tools that analyze total and cytoplasmic RNA in order to elucidate the transcriptome composition is rapidly growing, there are no tools specifically designed for the analysis of nuclear RNA (which contains mixtures of pre- and mature mRNA). We developed dedicated algorithms to investigate the splicing process. In this paper, we present a new classification of RNA-Seq reads based on three major stages of splicing: pre-, intermediate- and post-splicing. Applying this novel classification we demonstrate the possibility to analyze the order of splicing. Furthermore, we uncover the potential to investigate the multi-step nature of splicing, assessing various types of recursive splicing events. We provide the data that gives biological insight into the order of splicing, show that non-sequential splicing of certain introns is reproducible and coinciding in multiple cell lines. We validated our observations with independent experimental technologies and showed the reliability of our method. The pipeline, named SplicePie, is freely available at: https://github.com/pulyakhina/splicing_analysis_pipeline. The example data can be found at: https://barmsijs.lumc.nl/HG/irina/example_data.tar.gz.
Asunto(s)
Empalme Alternativo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Línea Celular , Humanos , IntronesRESUMEN
BACKGROUND: Activation of mast cells through FcεRI plays an important role in acute allergic reactions. However, little is known about the function of mast cells in patients with chronic allergic inflammation or the effect of repeated FcεRI triggering occurring in such responses. OBJECTIVE: We aimed to identify changes in mast cell function after repeated FcεRI triggering and to correlate these changes to chronic allergic responses in tissue. METHODS: Human cord blood-derived mast cells were treated for 2 weeks with anti-IgE. The function of naive or treated mast cells was analyzed by means of RNA sequencing, quantitative RT-PCR, flow cytometry, and functional assays. Protein secretion was measured with ELISAs and multiplex assays. RESULTS: We observed several changes in mast cell function after repeated anti-IgE triggering. Although the acute response was dampened, we identified 289 genes significantly upregulated after repeated anti-IgE. Most of these genes (84%) were not upregulated after a single anti-IgE stimulus, indicating a significantly different response mode characterized by increased antigen presentation, response to bacteria, and chemotaxis. Changes in mast cell function were related to changes in expression of the transcription factors RXRA and BATF and others. Importantly, we found a substantial overlap between genes upregulated after repeated anti-IgE triggering and genes upregulated in tissue from patients with chronic allergy, in particular those of patients with chronic rhinosinusitis. CONCLUSION: Our study provides evidence for intrinsic modulation of mast cell function on repeated FcεRI-mediated activation. The overlap with gene expression in tissues is suggestive of a direct link between repeated IgE-mediated activation of mast cells and chronic allergy.
Asunto(s)
Hipersensibilidad/inmunología , Mastocitos/inmunología , Receptores de IgE/inmunología , Anticuerpos Antiidiotipos/farmacología , Enfermedad Crónica , Expresión Génica , Humanos , Hipersensibilidad/genética , Inmunoglobulina E/inmunología , Mastocitos/efectos de los fármacos , Factores de Transcripción/genéticaRESUMEN
MOTIVATION: Unambiguous sequence variant descriptions are important in reporting the outcome of clinical diagnostic DNA tests. The standard nomenclature of the Human Genome Variation Society (HGVS) describes the observed variant sequence relative to a given reference sequence. We propose an efficient algorithm for the extraction of HGVS descriptions from two sequences with three main requirements in mind: minimizing the length of the resulting descriptions, minimizing the computation time and keeping the unambiguous descriptions biologically meaningful. RESULTS: Our algorithm is able to compute the HGVS descriptions of complete chromosomes or other large DNA strings in a reasonable amount of computation time and its resulting descriptions are relatively small. Additional applications include updating of gene variant database contents and reference sequence liftovers. AVAILABILITY: The algorithm is accessible as an experimental service in the Mutalyzer program suite (https://mutalyzer.nl). The C++ source code and Python interface are accessible at: https://github.com/mutalyzer/description-extractor. CONTACT: j.k.vis@lumc.nl.
Asunto(s)
Algoritmos , Variación Genética , Análisis de Secuencia de ADN/métodos , Genoma Humano , HumanosRESUMEN
The dystrophin protein encoding DMD gene is the longest human gene. The 2.2 Mb long human dystrophin transcript takes 16 hours to be transcribed and is co-transcriptionally spliced. It contains long introns (24 over 10kb long, 5 over 100kb long) and the heterogeneity in intron size makes it an ideal transcript to study different aspects of the human splicing process. Splicing is a complex process and much is unknown regarding the splicing of long introns in human genes. Here, we used ultra-deep transcript sequencing to characterize splicing of the dystrophin transcripts in 3 different human skeletal muscle cell lines, and explored the order of intron removal and multi-step splicing. Coverage and read pair analyses showed that around 40% of the introns were not always removed sequentially. Additionally, for the first time, we report that non-consecutive intron removal resulted in 3 or more joined exons which are flanked by unspliced introns and we defined these joined exons as an exon block. Lastly, computational and experimental data revealed that, for the majority of dystrophin introns, multistep splicing events are used to splice out a single intron. Overall, our data show for the first time in a human transcript, that multi-step intron removal is a general feature of mRNA splicing.
Asunto(s)
Distrofina/genética , Sitios de Empalme de ARN , Empalme del ARN , Línea Celular , Biología Computacional/métodos , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ARN/métodosRESUMEN
MOTIVATION: Advances in sequencing technologies and computational algorithms have enabled the study of genomic variants to dissect their functional consequence. Despite this unprecedented progress, current tools fail to reliably detect and characterize more complex allelic variants, such as short tandem repeats (STRs). We developed TSSV as an efficient and sensitive tool to specifically profile all allelic variants present in targeted loci. Based on its design, requiring only two short flanking sequences, TSSV can work without the use of a complete reference sequence to reliably profile highly polymorphic, repetitive or uncharacterized regions. RESULTS: We show that TSSV can accurately determine allelic STR structures in mixtures with 10% representation of minor alleles or complex mixtures in which a single STR allele is shared. Furthermore, we show the universal utility of TSSV in two other independent studies: characterizing de novo mutations introduced by transcription activator-like effector nucleases (TALENs) and profiling the noise and systematic errors in an IonTorrent sequencing experiment. TSSV complements the existing tools by aiding the study of highly polymorphic and complex regions and provides a high-resolution map that can be used in a wide range of applications, from personal genomics to forensic analysis and clinical diagnostics. AVAILABILITY AND IMPLEMENTATION: We have implemented TSSV as a Python package that can be installed through the command-line using pip install TSSV command. Its source code and documentation are available at https://pypi.python.org/pypi/tssv and http://www.lgtc.nl/tssv.
Asunto(s)
Alelos , Genómica/métodos , Repeticiones de Microsatélite , Programas Informáticos , Algoritmos , Desoxirribonucleasas/metabolismo , Distrofina/genética , Femenino , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Mutación , Análisis de Secuencia de ADNRESUMEN
AIMS/HYPOTHESIS: Not all obese individuals develop type 2 diabetes. Why some obese individuals retain normal glucose tolerance (NGT) is not well understood. We hypothesise that the biochemical mechanisms that underlie the function of adipose tissue can help explain the difference between obese individuals with NGT and those with type 2 diabetes. METHODS: RNA sequencing was used to analyse the transcriptome of samples extracted from visceral adipose tissue (VAT) and subcutaneous adipose tissue (SAT) of obese women with NGT or type 2 diabetes who were undergoing bariatric surgery. The gene expression data was analysed by bioinformatic visualisation and statistical analyses techniques. RESULTS: A network-based approach to distinguish obese individuals with NGT from obese individuals with type 2 diabetes identified acetyl-CoA metabolic network downregulation as an important feature in the pathophysiology of type 2 diabetes in obese individuals. In general, genes within two reaction steps of acetyl-CoA were found to be downregulated in the VAT and SAT of individuals with type 2 diabetes. Upon weight loss and amelioration of metabolic abnormalities three months following bariatric surgery, the expression level of these genes recovered to levels seen in individuals with NGT. We report four novel genes associated with type 2 diabetes and recovery upon weight loss: ACAT1 (encoding acetyl-CoA acetyltransferase 1), ACACA (encoding acetyl-CoA carboxylase α), ALDH6A1 (encoding aldehyde dehydrogenase 6 family, member A1) and MTHFD1 (encoding methylenetetrahydrofolate dehydrogenase). CONCLUSIONS/INTERPRETATION: Downregulation of the acetyl-CoA network in VAT and SAT is an important feature in the pathophysiology of type 2 diabetes in obese individuals. ACAT1, ACACA, ALDH6A1 and MTHFD1 represent novel biomarkers in adipose tissue associated with type 2 diabetes in obese individuals.
Asunto(s)
Acetilcoenzima A/metabolismo , Tejido Adiposo/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Obesidad/enzimología , Acetil-CoA C-Acetiltransferasa/genética , Acetil-CoA Carboxilasa/genética , Adipocitos/metabolismo , Adulto , Femenino , Humanos , Grasa Intraabdominal/metabolismo , Masculino , Metilenotetrahidrofolato Deshidrogenasa (NADP)/genética , Persona de Mediana Edad , Antígenos de Histocompatibilidad Menor , Obesidad/metabolismo , Análisis de Secuencia de ARN , Pérdida de Peso/fisiologíaRESUMEN
Duchenne and Becker muscular dystrophies are caused by out-of-frame and in-frame mutations, respectively, in the dystrophin encoding DMD gene. Molecular therapies targeting the precursor-mRNA are in clinical trials and show promising results. These approaches will depend on the stability and expression levels of dystrophin mRNA in skeletal muscles and heart. We report that the DMD gene is more highly expressed in heart than in skeletal muscles, in mice and humans. The transcript mutated in the mdx mouse model shows a 5' to 3' imbalance compared with that of its wild-type counterpart and reading frame restoration via antisense-mediated exon skipping does not correct this event. We also report significant transcript instability in 22 patients with Becker dystrophy, clarifying the fact that transcript imbalance is not caused by premature nonsense mutations. Finally, we demonstrate that transcript stability, rather than transcriptional rate, is an important determinant of dystrophin protein levels in patients with Becker dystrophy. We suggest that the availability of the complete transcript is a key factor to determine protein abundance and thus will influence the outcome of mRNA-targeting therapies.
Asunto(s)
Distrofina/genética , ARN Mensajero/metabolismo , Regiones no Traducidas 3' , Regiones no Traducidas 5' , Animales , Codón sin Sentido , Distrofina/metabolismo , Ectima Contagioso , Exones , Humanos , Ratones , Ratones Endogámicos mdx , Músculo Esquelético/metabolismo , Distrofia Muscular de Duchenne/metabolismo , Miocardio/metabolismo , Degradación de ARNm Mediada por Codón sin Sentido , Transcripción GenéticaRESUMEN
Wastewater-based epidemiological surveillance at municipal wastewater treatment plants has proven to play an important role in COVID-19 surveillance. Considering international passenger hubs contribute extensively to global transmission of viruses, wastewater surveillance at this type of location may be of added value as well. The aim of this study is to explore the potential of long-term wastewater surveillance at a large passenger hub as an additional tool for public health surveillance during different stages of a pandemic. Here, we present an analysis of SARS-CoV-2 viral loads in airport wastewater by reverse-transcription quantitative polymerase chain reaction (RT-qPCR) from the beginning of the COVID-19 pandemic in Feb 2020, and an analysis of SARS-CoV-2 variants by whole-genome next-generation sequencing from Sep 2020, both until Sep 2022, in the Netherlands. Results are contextualized using (inter)national measures and data sources such as passenger numbers, clinical surveillance data and national wastewater surveillance data. Our findings show that wastewater surveillance was possible throughout the study period, irrespective of measures, as viral loads were detected and quantified in 98.6 % (273/277) of samples. Emergence of SARS-CoV-2 variants, identified in 91.0 % (161/177) of sequenced samples, coincided with increases in viral loads. Furthermore, trends in viral load and variant detection in airport wastewater closely followed, and in some cases preceded, trends in national daily average viral load in wastewater and variants detected in clinical surveillance. Wastewater-based epidemiology at a large international airport is a valuable addition to classical COVID-19 surveillance and the developed expertise can be applied in pandemic preparedness plans for other (emerging) pathogens in the future.
Asunto(s)
Aeropuertos , COVID-19 , SARS-CoV-2 , Carga Viral , Aguas Residuales , COVID-19/epidemiología , Aguas Residuales/virología , Países Bajos/epidemiología , Humanos , Monitoreo Epidemiológico Basado en Aguas Residuales , Monitoreo del Ambiente/métodosRESUMEN
It has been postulated that aging is the consequence of an accelerated accumulation of somatic DNA mutations and that subsequent errors in the primary structure of proteins ultimately reach levels sufficient to affect organismal functions. The technical limitations of detecting somatic changes and the lack of insight about the minimum level of erroneous proteins to cause an error catastrophe hampered any firm conclusions on these theories. In this study, we sequenced the whole genome of DNA in whole blood of two pairs of monozygotic (MZ) twins, 40 and 100 years old, by two independent next-generation sequencing (NGS) platforms (Illumina and Complete Genomics). Potentially discordant single-base substitutions supported by both platforms were validated extensively by Sanger, Roche 454, and Ion Torrent sequencing. We demonstrate that the genomes of the two twin pairs are germ-line identical between co-twins, and that the genomes of the 100-year-old MZ twins are discerned by eight confirmed somatic single-base substitutions, five of which are within introns. Putative somatic variation between the 40-year-old twins was not confirmed in the validation phase. We conclude from this systematic effort that by using two independent NGS platforms, somatic single nucleotide substitutions can be detected, and that a century of life did not result in a large number of detectable somatic mutations in blood. The low number of somatic variants observed by using two NGS platforms might provide a framework for detecting disease-related somatic variants in phenotypically discordant MZ twins.
Asunto(s)
Envejecimiento/genética , Células Sanguíneas/fisiología , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación/genética , Gemelos Monocigóticos/genética , Adulto , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana EdadRESUMEN
Norovirus is the primary cause of viral gastroenteritis (GE). To investigate norovirus epidemiology, there is a need for whole-genome sequencing and reference sets consisting of complete genomes. To investigate the potential of shotgun metagenomic sequencing on the Illumina platform for whole-genome sequencing, 71 reverse transcriptase quantitative PCR (RT-qPCR) norovirus positive-feces (threshold cycle [CT], <30) samples from norovirus surveillance within The Netherlands were subjected to metagenomic sequencing. Data were analyzed through an in-house next-generation sequencing (NGS) analysis workflow. Additionally, we assessed the potential of metagenomic sequencing for the surveillance of off-target viruses that are of importance for public health, e.g., sapovirus, rotavirus A, enterovirus, parechovirus, aichivirus, adenovirus, and bocaparvovirus. A total of 60 complete and 10 partial norovirus genomes were generated, representing 7 genogroup I capsid genotypes and 12 genogroup II capsid genotypes. In addition to the norovirus genomes, the metagenomic approach yielded partial or complete genomes of other viruses for 39% of samples from children and 6.7% of samples from adults, including adenovirus 41 (N = 1); aichivirus 1 (N = 1); coxsackievirus A2 (N = 2), A4 (N = 2), A5 (N = 1), and A16 (N = 1); bocaparvovirus 1 (N = 1) and 3 (N = 1); human parechovirus 1 (N = 2) and 3 (N = 1); Rotavirus A (N = 1); and a sapovirus GI.7 (N = 1). The sapovirus GI.7 was initially not detected through RT-qPCR and warranted an update of the primer and probe set. Metagenomic sequencing on the Illumina platform robustly determines complete norovirus genomes and may be used to broaden gastroenteritis surveillance by capturing off-target enteric viruses. IMPORTANCE Viral gastroenteritis results in significant morbidity and mortality in vulnerable individuals and is primarily caused by norovirus. To investigate norovirus epidemiology, there is a need for whole-genome sequencing and reference sets consisting of full genomes. Using surveillance samples sent to the Dutch National Institute for Public Health and the Environment (RIVM), we compared metagenomics against conventional techniques, such as RT-qPCR and Sanger-sequencing, with norovirus as the target pathogen. We determined that metagenomics is a robust method to generate complete norovirus genomes, in parallel to many off-target pathogenic enteric virus genomes, thereby broadening our surveillance efforts. Moreover, we detected a sapovirus that was not detected by our validated gastroenteritis RT-qPCR panel, which exemplifies the strength of metagenomics. Our study shows that metagenomics can be used for public health gastroenteritis surveillance, the generation of reference-sets for molecular epidemiology, and how it compares to current surveillance strategies.
Asunto(s)
Infecciones por Adenoviridae , Infecciones por Adenovirus Humanos , Enteritis , Infecciones por Enterovirus , Enterovirus , Gastroenteritis , Norovirus , Rotavirus , Sapovirus , Virus , Niño , Adulto , Humanos , Lactante , Salud Pública , Metagenómica , ARN Viral/genética , Gastroenteritis/epidemiología , Rotavirus/genética , Virus/genética , Norovirus/genética , Adenoviridae/genética , Sapovirus/genética , Enterovirus/genética , HecesRESUMEN
The implementation and integration of wastewater-based epidemiology constitutes a valuable addition to existing pathogen surveillance systems, such as clinical surveillance for SARS-CoV-2. In the Netherlands, SARS-CoV-2 variant circulation is monitored by performing whole-genome sequencing on wastewater samples. In this manuscript, we describe the detection of an AY.43 lineage (Delta variant) amid a period of BA.5 (Omicron variant) dominance in wastewater samples from two wastewater treatment plants (WWTPs) during the months of August and September of 2022. Our results describe a temporary emergence, which was absent in samples from other WWTPs, and which coincided with peaks in viral load. We show how these lineage estimates can be traced back to lineage-specific substitution patterns. The absence of this variant from reported clinical data, but high associated viral loads suggest cryptic transmission. Our findings highlight the additional value of wastewater surveillance for generating insights into circulating pathogens.