Búsqueda | Portal Regional de la BVS

Improving variant calling using population data and deep learning.

Chen, Nae-Chyun; Kolesnikov, Alexey; Goel, Sidharth; Yun, Taedong; Chang, Pi-Chuan; Carroll, Andrew.

BMC Bioinformatics ; 24(1): 197, 2023 May 12.

Artículo en Inglés | MEDLINE | ID: mdl-37173615

RESUMEN

Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.

Asunto(s)

Aprendizaje Profundo , Humanos , Frecuencia de los Genes , Secuenciación Completa del Genoma , Estudio de Asociación del Genoma Completo , Genoma Humano , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.

Olson, Nathan D; Wagner, Justin; McDaniel, Jennifer; Stephens, Sarah H; Westreich, Samuel T; Prasanna, Anish G; Johanson, Elaine; Boja, Emily; Maier, Ezekiel J; Serang, Omar; Jáspez, David; Lorenzo-Salazar, José M; Muñoz-Barrera, Adrián; Rubio-Rodríguez, Luis A; Flores, Carlos; Kyriakidis, Konstantinos; Malousi, Andigoni; Shafin, Kishwar; Pesout, Trevor; Jain, Miten; Paten, Benedict; Chang, Pi-Chuan; Kolesnikov, Alexey; Nattestad, Maria; Baid, Gunjan; Goel, Sidharth; Yang, Howard; Carroll, Andrew; Eveleigh, Robert; Bourgey, Mathieu; Bourque, Guillaume; Li, Gen; Ma, ChouXian; Tang, LinQi; Du, YuanPing; Zhang, ShaoWei; Morata, Jordi; Tonda, Raúl; Parra, Genís; Trotta, Jean-Rémi; Brueffer, Christian; Demirkaya-Budak, Sinem; Kabakci-Zorlu, Duygu; Turgut, Deniz; Kalay, Özem; Budak, Gungor; Narci, Kübra; Arslan, Elif; Brown, Richard; Johnson, Ivan J.

Cell Genom ; 2(5)2022 May 11.

Artículo en Inglés | MEDLINE | ID: mdl-35720974

RESUMEN

The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.

Shafin, Kishwar; Pesout, Trevor; Chang, Pi-Chuan; Nattestad, Maria; Kolesnikov, Alexey; Goel, Sidharth; Baid, Gunjan; Kolmogorov, Mikhail; Eizenga, Jordan M; Miga, Karen H; Carnevali, Paolo; Jain, Miten; Carroll, Andrew; Paten, Benedict.

Nat Methods ; 18(11): 1322-1332, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34725481

RESUMEN

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).

Asunto(s)

Genes , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Nanoporos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genoma Humano , Humanos , Anotación de Secuencia Molecular

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA