RESUMEN
PMS2 is one of the DNA-mismatch repair genes included in routine genetic testing for Lynch syndrome and colorectal, ovarian, and endometrial cancers. PMS2 is also included in the American College of Medical Genetics and Genomics' List of Secondary Findings Genes in the context of clinical exome and genome sequencing. However, sequencing of PMS2 by short-read-based next-generation sequencing technologies is complicated by the presence of the pseudogene PMS2CL, and is often supplemented by long-range-based approaches, such as long-range PCR or long-read-based next-generation sequencing, which increases the complexity and cost. This article describes a bioinformatics homology triage workflow that can eliminate the need for long-read-based testing for PMS2 in the vast majority of patients undergoing exome sequencing, thus simplifying PMS2 testing and reducing the associated cost.
Asunto(s)
Secuenciación del Exoma , Exones , Secuenciación de Nucleótidos de Alto Rendimiento , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto , Humanos , Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Neoplasias Colorrectales Hereditarias sin Poliposis/diagnóstico , Biología Computacional/métodos , Exoma/genética , Secuenciación del Exoma/métodos , Exones/genética , Pruebas Genéticas/métodos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto/genéticaRESUMEN
Lymphoid malignancies are a heterogeneous group of hematological disorders characterized by a diverse range of morphologic, immunophenotypic, and clinical features. Next-generation sequencing (NGS) is increasingly being applied to delineate the complex nature of these malignancies and identify high-value biomarkers with diagnostic, prognostic, or therapeutic benefit. However, there are various challenges in using NGS routinely to characterize lymphoid malignancies, including pre-analytic issues, such as sequencing DNA from formalin-fixed, paraffin-embedded tissue, and optimizing the bioinformatic workflow for accurate variant calling and filtering. This study reports the clinical validation of a custom capture-based NGS panel to test for molecular markers in a range of lymphoproliferative diseases and histiocytic neoplasms. The fully validated clinical assay represents an accurate and sensitive tool for detection of single-nucleotide variants and small insertion/deletion events to facilitate the characterization and management of patients with hematologic cancers specifically of lymphoid origin.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biomarcadores de Tumor/genética , Linfoma/genética , Linfoma/diagnóstico , Reproducibilidad de los Resultados , Polimorfismo de Nucleótido Simple , Femenino , Masculino , Trastornos Linfoproliferativos/genética , Trastornos Linfoproliferativos/diagnóstico , Mutación , Mutación INDELRESUMEN
Diffuse midline glioma, H3 K27-altered (DMG-H3 K27) is an aggressive group of diffuse gliomas that predominantly occurs in pediatric patients, involves midline structures, and displays loss of H3 p.K28me3 (K27me3) expression by immunohistochemistry and characteristic genetic/epigenetic profile. Rare examples of a diffuse glioma with an H3 p.K28M (K27M) mutation and without involvement of the midline structures, so-called "diffuse hemispheric glioma with H3 p.K28M (K27M) mutation" (DHG-H3 K27), have been reported. Herein, we describe 2 additional cases of radiologically confirmed DHG-H3 K27 and summarize previously reported cases. We performed histological, immunohistochemical, molecular, and DNA methylation analysis and provided clinical follow-up in both cases. Overall, DHG-H3 K27 is an unusual group of diffuse gliomas that shows similar clinical, histopathological, genomic, and epigenetic features to DMG-H3 K27 as well as enrichment for activating alterations in MAPK pathway genes. These findings suggest that DHG-H3 K27 is closely related to DMG-H3 K27 and may represent an unusual presentation of DMG-H3 K27 without apparent midline involvement and with frequent MAPK pathway activation. Detailed reports of additional cases with clinical follow-up will be important to expand our understanding of this unusual group of diffuse gliomas and to better define the clinical outcome and how to classify DHG-H3 K27.
Asunto(s)
Neoplasias Encefálicas , Glioma , Humanos , Niño , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Histonas/genética , Glioma/genética , Glioma/patología , Mutación/genética , EpigenómicaRESUMEN
BACKGROUND: DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age. RESULTS: Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader-Willi syndrome (PWS), Beckwith-Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver-Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868[Formula: see text]0.995). CONCLUSION: We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.
Asunto(s)
Síndrome de Beckwith-Wiedemann , Síndrome de Silver-Russell , Humanos , Impresión Genómica , Metilación de ADN , Síndrome de Beckwith-Wiedemann/diagnóstico , Síndrome de Beckwith-Wiedemann/genética , Síndrome de Silver-Russell/diagnóstico , Síndrome de Silver-Russell/genética , Aprendizaje Automático SupervisadoRESUMEN
Innovation in sequencing instrumentation is increasing the per-batch data volumes and decreasing the per-base costs. Multiplexed chemistry protocols after the addition of index tags have further contributed to efficient and cost-effective sequencer utilization. With these pooled processing strategies, however, comes an increased risk of sample contamination. Sample contamination poses a risk of missing critical variants in a patient sample or wrongly reporting variants derived from the contaminant, which are particularly relevant issues in oncology specimen testing in which low variant allele frequencies have clinical relevance. Small custom-targeted next-generation sequencing (NGS) panels yield limited variants and pose challenges in delineating true somatic variants versus contamination calls. A number of popular contamination identification tools have the ability to perform well in whole-genome/exome sequencing data; however, in smaller gene panels, there are fewer variant candidates for the tools to perform accurately. To prevent clinical reporting of potentially contaminated samples in small next-generation sequencing panels, we have developed MICon (Microhaplotype Contamination detection), a novel contamination detection model that uses microhaplotype site variant allele frequencies. In a heterogeneous hold-out test cohort of 210 samples, the model displayed state-of-the-art performance with an area under the receiver-operating characteristic curve of 0.995.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Laboratorios , Humanos , Flujo de Trabajo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Automático SupervisadoRESUMEN
Capture-based library preparation for next generation sequencing (NGS) offers a balance between sequencing depth and bioinformatics cost of analysis. Liquid handling automation enhances the reliability of the library preparation process by reducing sample-to-sample variation and substantially enhances throughput, particularly when it can be employed in a 'walk-away' fashion with limited hands-on interaction. This requires complex series of mixing and heating steps like those utilized in capture chemistries to happen on the liquid handler. While developing liquid handling automation for Integrated DNA Technologies (IDT) xGen Exome, Illumina TruSight Oncology 500, and Personal Genome Diagnostics (PGDx) elio Plasma Resolve chemistries on the PerkinElmer Sciclone liquid handler, we found that applying the capture temperatures recommended for manual library preparation results in low yield on automation. To restore the final library yield, we reduced bead binding and/or heated wash temperatures of the Peltier heaters on the liquid handlers by about 10°C. Since this applied across three unique capture-based chemistries, we consider this a generalizable principle of automating capture on the Sciclone. We hypothesize that this is driven by the very different thermodynamic environments represented by a sealed plate on a thermal cycler and a plate with a lid on a Peltier heater. This phenomenon should be considered when automating NGS library preparation on PerkinElmer Sciclone instruments.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Automatización , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reproducibilidad de los Resultados , TemperaturaRESUMEN
Detecting gene fusions involving driver oncogenes is pivotal in clinical diagnosis and treatment of cancer patients. Recent developments in next-generation sequencing (NGS) technologies have enabled improved assays for bioinformatics-based gene fusions detection. In clinical applications, where a small number of fusions are clinically actionable, targeted polymerase chain reaction (PCR)-based NGS chemistries, such as the QIAseq RNAscan assay, aim to improve accuracy compared to standard RNA sequencing. Existing informatics methods for gene fusion detection in NGS-based RNA sequencing assays traditionally use a transcriptome-based spliced alignment approach or a de-novo assembly approach. Transcriptome-based spliced alignment methods face challenges with short read mapping yielding low quality alignments. De-novo assembly-based methods yield longer contigs from short reads that can be more sensitive for genomic rearrangements, but face performance and scalability challenges. Consequently, there exists a need for a method to efficiently and accurately detect fusions in targeted PCR-based NGS chemistries. We describe SeekFusion, a highly accurate and computationally efficient pipeline enabling identification of gene fusions from PCR-based NGS chemistries. Utilizing biological samples processed with the QIAseq RNAscan assay and in-silico simulated data we demonstrate that SeekFusion gene fusion detection accuracy outperforms popular existing methods such as STAR-Fusion, TOPHAT-Fusion and JAFFA-hybrid. We also present results from 4,484 patient samples tested for neurological tumors and sarcoma, encompassing details on some novel fusions identified.
RESUMEN
BACKGROUND: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. OBJECTIVE: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. METHODS: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient's first positive COVID-19 nucleic acid test result. RESULTS: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). CONCLUSIONS: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19-positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.
Asunto(s)
COVID-19 , Sistemas de Información en Laboratorio Clínico , Aprendizaje Profundo , Algoritmos , Registros Electrónicos de Salud , Humanos , Estudios Retrospectivos , SARS-CoV-2RESUMEN
Glycosylation is an important protein modification that involves enzymatic attachment of sugars to amino acid residues. Understanding the structure of these sugars and the effects of glycosylation are vital for developing indicators of disease development and progression. Although computational methods based on mass spectrometric data have proven to be effective in monitoring changes in the glycome, developing such methods for the glycoproteome are challenging, largely due to the inherent complexity in simultaneously studying glycan structures with their corresponding glycosylation sites. This paper introduces a computational framework for identifying intact N-linked glycopeptides, i.e. glycopeptides with N-linked glycans attached to their glycosylation sites, in complex proteome samples. Scoring algorithms are presented for tandem mass spectra of glycopeptides resulting from collision-induced dissociation (CID), higher-energy C-trap dissociation (HCD), and electron transfer dissociation (ETD) fragmentation modes. An empirical false-discovery rate estimation method, based on a target-decoy search approach, is derived for assigning confidence. The power of our method is further enhanced when multiple data sets are pooled together to increase identification confidence. Using this framework, 103 highly confident N-linked glycopeptides from 53 sites across 33 glycoproteins were identified in complex human serum proteome samples using conventional proteomic platforms with standard depletion of the 7-most abundant proteins. These results indicate that our method is ready to be used for characterizing site-specific protein glycosylation in complex samples.