Pesquisa | Portal de Pesquisa da BVS

Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis.

Tetikol, H Serhat; Turgut, Deniz; Narci, Kubra; Budak, Gungor; Kalay, Ozem; Arslan, Elif; Demirkaya-Budak, Sinem; Dolgoborodov, Alexey; Kabakci-Zorlu, Duygu; Semenyuk, Vladimir; Jain, Amit; Davis-Dusenbery, Brandi N.

Nat Commun ; 13(1): 4384, 2022 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-35927245

RESUMO

Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.

Assuntos

Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano/genética , Genômica/métodos , Humanos , Análise de Sequência de DNA/métodos , Software

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.

Olson, Nathan D; Wagner, Justin; McDaniel, Jennifer; Stephens, Sarah H; Westreich, Samuel T; Prasanna, Anish G; Johanson, Elaine; Boja, Emily; Maier, Ezekiel J; Serang, Omar; Jáspez, David; Lorenzo-Salazar, José M; Muñoz-Barrera, Adrián; Rubio-Rodríguez, Luis A; Flores, Carlos; Kyriakidis, Konstantinos; Malousi, Andigoni; Shafin, Kishwar; Pesout, Trevor; Jain, Miten; Paten, Benedict; Chang, Pi-Chuan; Kolesnikov, Alexey; Nattestad, Maria; Baid, Gunjan; Goel, Sidharth; Yang, Howard; Carroll, Andrew; Eveleigh, Robert; Bourgey, Mathieu; Bourque, Guillaume; Li, Gen; Ma, ChouXian; Tang, LinQi; Du, YuanPing; Zhang, ShaoWei; Morata, Jordi; Tonda, Raúl; Parra, Genís; Trotta, Jean-Rémi; Brueffer, Christian; Demirkaya-Budak, Sinem; Kabakci-Zorlu, Duygu; Turgut, Deniz; Kalay, Özem; Budak, Gungor; Narci, Kübra; Arslan, Elif; Brown, Richard; Johnson, Ivan J.

Cell Genom ; 2(5)2022 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-35720974

RESUMO

The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA