RESUMEN
MOTIVATION: Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov Models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false positive detections in a typical human whole genome, creating a significant manual review burden. RESULTS: We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern Large Language Models (LLMs). We train our model on 37 Whole Genome Sequences (WGS) from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3 and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. AVAILABILITY AND IMPLEMENTATION: Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/.
RESUMEN
PURPOSE: Variants of uncertain significance (VUS) are a common result of diagnostic genetic testing and can be difficult to manage with potential misinterpretation and downstream costs, including time investment by clinicians. We investigated the rate of VUS reported on diagnostic testing via multi-gene panels (MGPs) and exome and genome sequencing (ES/GS) to measure the magnitude of uncertain results and explore ways to reduce their potentially detrimental impact. METHODS: Rates of inconclusive results due to VUS were collected from over 1.5 million sequencing test results from 19 clinical laboratories in North America from 2020 to 2021. RESULTS: We found a lower rate of inconclusive test results due to VUSs from ES/GS (22.5%) compared with MGPs (32.6%; P < .0001). For MGPs, the rate of inconclusive results correlated with panel size. The use of trios reduced inconclusive rates (18.9% vs 27.6%; P < .0001), whereas the use of GS compared with ES had no impact (22.2% vs 22.6%; P = ns). CONCLUSION: The high rate of VUS observed in diagnostic MGP testing warrants examining current variant reporting practices. We propose several approaches to reduce reported VUS rates, while directing clinician resources toward important VUS follow-up.
Asunto(s)
Predisposición Genética a la Enfermedad , Pruebas Genéticas , Humanos , Pruebas Genéticas/métodos , Genómica , Exoma/genética , América del NorteRESUMEN
BACKGROUND: Copy number variants (CNVs) play a significant role in human heredity and disease. However, sensitive and specific characterization of germline CNVs from NGS data has remained challenging, particularly for hybridization-capture data in which read counts are the primary source of copy number information. RESULTS: We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for computing target- and copy number-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon to a full chromosome. CONCLUSIONS: In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly fewer false positive detections compared to other callers. Overall sensitivity is 80-90% for deletion CNVs spanning 1-4 targets and 90-100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs.
Asunto(s)
Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Algoritmos , Simulación por Computador , Exones , Células Germinativas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , HumanosAsunto(s)
Fosfatidilinositol 3-Quinasa Clase I/genética , Subunidades alfa de la Proteína de Unión al GTP/genética , MAP Quinasa Quinasa 1/genética , Mosaicismo , Proteínas Proto-Oncogénicas p21(ras)/genética , Ácidos Nucleicos Libres de Células/sangre , Ácidos Nucleicos Libres de Células/genética , Niño , Preescolar , Fosfatidilinositol 3-Quinasa Clase I/sangre , Femenino , Subunidades alfa de la Proteína de Unión al GTP/sangre , Predisposición Genética a la Enfermedad , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , MAP Quinasa Quinasa 1/sangre , Masculino , Mutación/genética , Proteínas Proto-Oncogénicas p21(ras)/sangreRESUMEN
A 13-year-old child presented with three simultaneous malignancies: glioblastoma multiforme, Burkitt lymphoma, and colonic adenocarcinoma. She was treated for her diseases without success and died 8 months after presentation. Genetic analysis revealed a homozygous mutation in the PMS2 gene, consistent with constitutional mismatch repair deficiency. Her siblings and parents were screened: three of four siblings and both parents were heterozygous for this mutation; the fourth sibling did not have the mutation.
Asunto(s)
Adenosina Trifosfatasas/genética , Neoplasias Encefálicas/genética , Neoplasias Colorrectales/genética , Enzimas Reparadoras del ADN/genética , Proteínas de Unión al ADN/genética , Neoplasias Primarias Múltiples/genética , Síndromes Neoplásicos Hereditarios/genética , Adenocarcinoma/genética , Adolescente , Linfoma de Burkitt/genética , Neoplasias del Colon/genética , Femenino , Glioblastoma/genética , Humanos , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto , Mutación , LinajeRESUMEN
Mendelian disorders are prevalent in neonatal and pediatric intensive care units and are a leading cause of morbidity and mortality in these settings. Current diagnostic pipelines that integrate phenotypic and genotypic data are expert-dependent and time-intensive. Artificial intelligence (AI) tools may help address these challenges. Dx29 is an open-source AI tool designed for use by clinicians. It analyzes the patient's phenotype and genotype to generate a ranked differential diagnosis. We used Dx29 to retrospectively analyze 25 acutely ill infants who had been diagnosed with a Mendelian disorder, using a targeted panel of ~5000 genes. For each case, a trio (proband and both parents) file containing gene variant information was analyzed, alongside patient phenotype, which was provided to Dx29 by three approaches: (1) AI extraction from medical records, (2) AI extraction with manual review/editing, and (3) manual entry. We then identified the rank of the correct diagnosis in Dx29's differential diagnosis. With these three approaches, Dx29 ranked the correct diagnosis in the top 10 in 92-96% of cases. These results suggest that non-expert use of Dx29's automated phenotyping and subsequent data analysis may compare favorably to standard workflows utilized by bioinformatics experts to analyze genomic data and diagnose Mendelian diseases.
RESUMEN
BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
Asunto(s)
Biomarcadores de Tumor , Pruebas Genéticas/métodos , Genómica/métodos , Neoplasias/genética , Oncogenes , Variaciones en el Número de Copia de ADN , Pruebas Genéticas/normas , Genómica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutación , Neoplasias/diagnóstico , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.
Asunto(s)
ADN Tumoral Circulante/genética , Oncología Médica , Neoplasias/genética , Medicina de Precisión , Análisis de Secuencia de ADN/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Límite de Detección , Guías de Práctica Clínica como Asunto , Reproducibilidad de los ResultadosRESUMEN
OBJECTIVE: To perform genotype-phenotype, clinical and molecular analysis in a large 3-generation family with autosomal dominant congenital spinal muscular atrophy. METHODS: Using a combined genetic approach including whole genome scanning, next generation sequencing-based multigene panel, whole genome sequencing, and targeted variant Sanger sequencing, we studied the proband and multiple affected individuals of this family who presented bilateral proximal lower limb muscle weakness and atrophy. RESULTS: We identified a novel heterozygous variant, c.1826T > C; p.Ile609Thr, in the DYNC1H1 gene localized within the common haplotype in the 14q32.3 chromosomal region which cosegregated with disease in this large family. Within the family, affected individuals were found to have a wide array of clinical variability. Although some individuals presented the typical lower motor neuron phenotype with areflexia and denervation, others presented with muscle weakness and atrophy, hyperreflexia, and absence of denervation suggesting a predominant upper motor neuron disease. In addition, some affected individuals presented with an intermediate phenotype characterized by hyperreflexia and denervation, expressing a combination of lower and upper motor neuron defects. CONCLUSION: Our study demonstrates the wide clinical variability associated with a single disease causing variant in DYNC1H1 gene and this variant demonstrated a high penetrance within this large family.
Asunto(s)
Dineínas Citoplasmáticas/genética , Atrofia Muscular Espinal/genética , Mutación Missense , Adolescente , Adulto , Niño , Preescolar , Femenino , Heterocigoto , Humanos , Extremidad Inferior/fisiopatología , Masculino , Persona de Mediana Edad , Neuronas Motoras/fisiología , Músculo Esquelético/fisiopatología , Atrofia Muscular Espinal/patología , Linaje , Fenotipo , Reflejo , Extremidad Superior/fisiopatologíaRESUMEN
Hereditary hemochromatosis is an inherited disorder of iron metabolism, characterized by high absorption of iron by the gastrointestinal tract leading to a toxic accumulation of iron in various organs and impaired organ function. Three variants in the HFE gene (p.C282Y, p.H63D, and p.S65C) are commonly associated with the development of the disease. Of these, p.C282Y homozygotes are at the highest risk. Compound heterozygotes of p.C282Y along with p.H63D or p.S65C have reduced penetrance. Furthermore, p.H63D homozygotes are not at an increased risk and little is known about the risk associated with homozygocity for p.S65C. Our current clinical assay for the three common HFE variants utilizes the LightCycler platform and paired probes employing fluorescent resonance energy transfer. To increase throughput and decrease costs, we developed a method whereby automated extraction was combined with unlabeled probes and differential melt profiles to detect these variants using the LightCycler 480 instrument. Using this approach, 43 samples extracted with three different extraction platforms were correctly genotyped. These data demonstrate that the newly developed assay to genotype the HFE mutations p.C282Y, p.H63D, and p.S65C, combined with high-throughput extraction platforms, is accurate and reproducible and represents an alternative to previously described tests.
Asunto(s)
Sondas de ADN/genética , Técnicas de Genotipaje/métodos , Hemocromatosis/genética , Antígenos de Histocompatibilidad Clase I/genética , Proteínas de la Membrana/genética , Mutación Missense , Sustitución de Aminoácidos , Sondas de ADN/química , Femenino , Hemocromatosis/diagnóstico , Proteína de la Hemocromatosis , Heterocigoto , Homocigoto , Humanos , MasculinoRESUMEN
Legius syndrome (LS) is an autosomal dominant disorder caused by germline loss-of-function mutations in the sprouty-related, EVH1 domain containing 1 (SPRED1) gene. The phenotype of LS is multiple café au lait macules (CALM) with other commonly reported manifestations, including intertriginous freckling, lipomas, macrocephaly, and learning disabilities including ADHD and developmental delays. Since the earliest signs of LS and neurofibromatosis type 1 (NF1) syndrome are pigmentary findings, the two are indistinguishable and individuals with LS may meet the National Institutes of Health diagnostic criteria for NF1 syndrome. However, individuals are not known to have an increased risk for developing tumors (compared with NF1 patients). It is therefore important to fully characterize the phenotype differences between NF1 and LS because the prognoses of these two disorders differ greatly. We have developed a mutation database that characterizes the known variants in the SPRED1 gene in an effort to facilitate this process for testing and interpreting results. This database is free to the public and will be updated quarterly.