Pesquisa | BVS CLAP/SMR-OPAS/OMS

SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads.

Denti, Luca; Khorsand, Parsoa; Bonizzoni, Paola; Hormozdiari, Fereydoun; Chikhi, Rayan.

Nat Methods ; 20(4): 550-558, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36550274

RESUMO

Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.

Assuntos

Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Genoma Humano , Sequências Repetitivas de Ácido Nucleico

Nebula: ultra-efficient mapping-free structural variant genotyper.

Khorsand, Parsoa; Hormozdiari, Fereydoun.

Nucleic Acids Res ; 49(8): e47, 2021 05 07.

Artigo em Inglês | MEDLINE | ID: mdl-33503255

RESUMO

Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.

Assuntos

Genômica/métodos , Técnicas de Genotipagem/métodos , Simulação por Computador , Bases de Dados Genéticas , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Polimorfismo de Nucleotídeo Único , Software , Sequenciamento Completo do Genoma

Comparative genome analysis using sample-specific string detection in accurate long reads.

Khorsand, Parsoa; Denti, Luca; Bonizzoni, Paola; Chikhi, Rayan; Hormozdiari, Fereydoun.

Bioinform Adv ; 1(1): vbab005, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36700094

RESUMO

Motivation: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results: We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation: Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA