RESUMEN
MOTIVATION: Next-generation sequencing is rapidly improving diagnostic rates in rare Mendelian diseases, but even with whole genome or whole exome sequencing, the majority of cases remain unsolved. Increasingly, RNA sequencing is being used to solve many cases that evade diagnosis through sequencing alone. Specifically, the detection of aberrant splicing in many rare disease patients suggests that identifying RNA splicing outliers is particularly useful for determining causal Mendelian disease genes. However, there is as yet a paucity of statistical methodologies to detect splicing outliers. RESULTS: We developed LeafCutterMD, a new statistical framework that significantly improves the previously published LeafCutter in the context of detecting outlier splicing events. Through simulations and analysis of real patient data, we demonstrate that LeafCutterMD has better power than the state-of-the-art methodology while controlling false-positive rates. When applied to a cohort of disease-affected probands from the Mayo Clinic Center for Individualized Medicine, LeafCutterMD recovered all aberrantly spliced genes that had previously been identified by manual curation efforts. AVAILABILITY AND IMPLEMENTATION: The source code for this method is available under the opensource Apache 2.0 license in the latest release of the LeafCutter software package available online at http://davidaknowles.github.io/leafcutter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genoma , Enfermedades Raras , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Empalme del ARN , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Análisis de Secuencia de ARN , Programas InformáticosRESUMEN
Lymphoid malignancies are a heterogeneous group of hematological disorders characterized by a diverse range of morphologic, immunophenotypic, and clinical features. Next-generation sequencing (NGS) is increasingly being applied to delineate the complex nature of these malignancies and identify high-value biomarkers with diagnostic, prognostic, or therapeutic benefit. However, there are various challenges in using NGS routinely to characterize lymphoid malignancies, including pre-analytic issues, such as sequencing DNA from formalin-fixed, paraffin-embedded tissue, and optimizing the bioinformatic workflow for accurate variant calling and filtering. This study reports the clinical validation of a custom capture-based NGS panel to test for molecular markers in a range of lymphoproliferative diseases and histiocytic neoplasms. The fully validated clinical assay represents an accurate and sensitive tool for detection of single-nucleotide variants and small insertion/deletion events to facilitate the characterization and management of patients with hematologic cancers specifically of lymphoid origin.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biomarcadores de Tumor/genética , Linfoma/genética , Linfoma/diagnóstico , Reproducibilidad de los Resultados , Polimorfismo de Nucleótido Simple , Femenino , Masculino , Trastornos Linfoproliferativos/genética , Trastornos Linfoproliferativos/diagnóstico , Mutación , Mutación INDELRESUMEN
PMS2 is one of the DNA-mismatch repair genes included in routine genetic testing for Lynch syndrome and colorectal, ovarian, and endometrial cancers. PMS2 is also included in the American College of Medical Genetics and Genomics' List of Secondary Findings Genes in the context of clinical exome and genome sequencing. However, sequencing of PMS2 by short-read-based next-generation sequencing technologies is complicated by the presence of the pseudogene PMS2CL, and is often supplemented by long-range-based approaches, such as long-range PCR or long-read-based next-generation sequencing, which increases the complexity and cost. This article describes a bioinformatics homology triage workflow that can eliminate the need for long-read-based testing for PMS2 in the vast majority of patients undergoing exome sequencing, thus simplifying PMS2 testing and reducing the associated cost.
Asunto(s)
Secuenciación del Exoma , Exones , Secuenciación de Nucleótidos de Alto Rendimiento , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto , Humanos , Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Neoplasias Colorrectales Hereditarias sin Poliposis/diagnóstico , Biología Computacional/métodos , Exoma/genética , Secuenciación del Exoma/métodos , Exones/genética , Pruebas Genéticas/métodos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto/genéticaRESUMEN
Innovation in sequencing instrumentation is increasing the per-batch data volumes and decreasing the per-base costs. Multiplexed chemistry protocols after the addition of index tags have further contributed to efficient and cost-effective sequencer utilization. With these pooled processing strategies, however, comes an increased risk of sample contamination. Sample contamination poses a risk of missing critical variants in a patient sample or wrongly reporting variants derived from the contaminant, which are particularly relevant issues in oncology specimen testing in which low variant allele frequencies have clinical relevance. Small custom-targeted next-generation sequencing (NGS) panels yield limited variants and pose challenges in delineating true somatic variants versus contamination calls. A number of popular contamination identification tools have the ability to perform well in whole-genome/exome sequencing data; however, in smaller gene panels, there are fewer variant candidates for the tools to perform accurately. To prevent clinical reporting of potentially contaminated samples in small next-generation sequencing panels, we have developed MICon (Microhaplotype Contamination detection), a novel contamination detection model that uses microhaplotype site variant allele frequencies. In a heterogeneous hold-out test cohort of 210 samples, the model displayed state-of-the-art performance with an area under the receiver-operating characteristic curve of 0.995.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Laboratorios , Humanos , Flujo de Trabajo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Automático SupervisadoRESUMEN
Expansion of CTG trinucleotide repeats (TNR) in the transcription factor 4 (TCF4) gene is highly associated with Fuchs Endothelial Corneal Dystrophy (FECD). Due to limitations in the availability of DNA from diseased corneal endothelium, sizing of CTG repeats in FECD patients has typically been determined using DNA samples isolated from peripheral blood leukocytes. However, it is non-feasible to extract enough DNA from surgically isolated FECD corneal endothelial tissue to determine repeat length based on current technology. To circumvent this issue, total RNA was isolated from FECD corneal endothelium and sequenced using long-read sequencing. Southern blotting of DNA samples isolated from primary cultures of corneal endothelium from these same affected individuals was also assessed. Both long read sequencing and Southern blot analysis showed significantly longer CTG TNR expansion (>1000 repeats) in the corneal endothelium from FECD patients than those characterized in leukocytes from the same individuals (<90 repeats). Our findings suggest that the TCF4 CTG repeat expansions in the FECD corneal endothelium are much longer than those found in leukocytes.
Asunto(s)
ADN/genética , Endotelio Corneal/patología , Distrofia Endotelial de Fuchs/patología , Leucocitos/patología , Factor de Transcripción 4/genética , Expansión de Repetición de Trinucleótido , Anciano , Anciano de 80 o más Años , Southern Blotting , Niño , ADN/análisis , Endotelio Corneal/metabolismo , Femenino , Distrofia Endotelial de Fuchs/epidemiología , Distrofia Endotelial de Fuchs/genética , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Leucocitos/metabolismo , Masculino , Persona de Mediana Edad , Reacción en Cadena de la PolimerasaRESUMEN
Trichorhinophalangeal syndrome type I (TRPSI) is a rare disorder that causes distinctive ectodermal, facial, and skeletal features affecting the hair (tricho-), nose (rhino-), and fingers and toes (phalangeal) and is inherited in an autosomal dominant pattern. TRPSI is caused by loss of function variants in TRPS1, involved in the regulation of chondrocyte and perichondrium development. Pathogenic variants in TRPS1 include missense mutations and deletions with variable breakpoints, with only a single instance of an intragenic duplication reported to date. Here we report an affected individual presenting with a classic TRPSI phenotype who is heterozygous for a de novo intragenic â¼36.3-kbp duplication affecting exons 2-4 of TRPS1 Molecular analysis revealed the duplication to be in direct tandem orientation affecting the splicing of TRPS1 The aberrant transcripts are predicted to produce a truncated TRPS1 missing the nuclear localization signal and the GATA and IKAROS-like zinc-finger domains resulting in functional TRPS1 haploinsufficiency. Our study identifies a novel intragenic tandem duplication of TRPS1 and highlights the importance of molecular characterization of intragenic duplications.
Asunto(s)
Dedos/anomalías , Enfermedades del Cabello/genética , Síndrome de Langer-Giedion/genética , Nariz/anomalías , Proteínas Represoras/genética , Anciano , Niño , Proteínas de Unión al ADN/genética , Exones/genética , Familia , Femenino , Duplicación de Gen/genética , Enfermedades del Cabello/etiología , Humanos , Síndrome de Langer-Giedion/etiología , Masculino , Persona de Mediana Edad , Mutación , Mutación Missense/genética , Linaje , Fenotipo , Empalme del ARN/genética , Proteínas Represoras/metabolismo , Eliminación de Secuencia/genética , Factores de Transcripción/genética , Dedos de Zinc/genéticaRESUMEN
Amplification of a CAG trinucleotide motif (CTG18.1) within the TCF4 gene has been strongly associated with Fuchs Endothelial Corneal Dystrophy (FECD). Nevertheless, a small minority of clinically unaffected elderly patients who have expanded CTG18.1 sequences have been identified. To test the hypothesis that the CAG expansions in these patients are protected from FECD because they have interruptions within the CAG repeats, we utilized a combination of an amplification-free, long-read sequencing method and a new target-enrichment sequence analysis tool developed by Pacific Biosciences to interrogate the sequence structure of expanded repeats. The sequencing was successful in identifying a previously described interruption within an unexpanded allele and provided sequence data on expanded alleles greater than 2000 bases in length. The data revealed considerable heterogeneity in the size distribution of expanded repeats within each patient. Detailed analysis of the long sequence reads did not reveal any instances of interruptions to the expanded CAG repeats, but did reveal novel variants within the AGG repeats that flank the CAG repeats in two of the five samples from clinically unaffected patients with expansions. This first examination of the sequence structure of CAG repeats in CTG18.1 suggests that factors other than interruptions to the repeat structure account for the absence of disease in some elderly patients with repeat expansions in the TCF4 gene.
Asunto(s)
Distrofia Endotelial de Fuchs/genética , Amplificación de Genes , Predisposición Genética a la Enfermedad , Factor de Transcripción 4/genética , Expansión de Repetición de Trinucleótido , Alelos , Biología Computacional/métodos , Distrofia Endotelial de Fuchs/diagnóstico , Edición Génica , Estudios de Asociación Genética , Genómica/métodos , Genotipo , Humanos , Fenotipo , ARN Guía de Kinetoplastida , Repeticiones de TrinucleótidosRESUMEN
BACKGROUND: RNA sequencing has been proposed as a means of increasing diagnostic rates in studies of undiagnosed rare inherited disease. Recent studies have reported diagnostic improvements in the range of 7.5-35% by profiling splicing, gene expression quantification and allele specific expression. To-date however, no study has systematically assessed the presence of gene-fusion transcripts in cases of germline disease. Fusion transcripts are routinely identified in cancer studies and are increasingly recognized as having diagnostic, prognostic or therapeutic relevance. Isolated reports exist of fusion transcripts being detected in cases of developmental and neurological phenotypes, and thus, systematic application of fusion detection to germline conditions may further increase diagnostic rates. However, current fusion detection methods are unsuited to the investigation of germline disease due to performance biases arising from their development using tumor, cell-line or in-silico data. METHODS: We describe a tailored approach to fusion candidate identification and prioritization in a cohort of 47 undiagnosed, suspected inherited disease patients. We modify an existing fusion transcript detection algorithm by eliminating its cell line-derived filtering steps, and instead, prioritize candidates using a custom workflow that integrates genomic and transcriptomic sequence alignment, biological and technical annotations, customized categorization logic, and phenotypic prioritization. RESULTS: We demonstrate that our approach to fusion transcript identification and prioritization detects genuine fusion events excluded by standard analyses and efficiently removes phenotypically unimportant candidates and false positive events, resulting in a reduced candidate list enriched for events with potential phenotypic relevance. We describe the successful genetic resolution of two previously undiagnosed disease cases through the detection of pathogenic fusion transcripts. Furthermore, we report the experimental validation of five additional cases of fusion transcripts with potential phenotypic relevance. CONCLUSIONS: The approach we describe can be implemented to enable the detection of phenotypically relevant fusion transcripts in studies of rare inherited disease. Fusion transcript detection has the potential to increase diagnostic rates in rare inherited disease and should be included in RNA-based analytical pipelines aimed at genetic diagnosis.