Búsqueda | OPS/OMS Uruguay

SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads.

Denti, Luca; Khorsand, Parsoa; Bonizzoni, Paola; Hormozdiari, Fereydoun; Chikhi, Rayan.

Nat Methods ; 20(4): 550-558, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-36550274

RESUMEN

Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.

Asunto(s)

Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica/métodos , Genoma Humano , Secuencias Repetitivas de Ácidos Nucleicos

Nebula: ultra-efficient mapping-free structural variant genotyper.

Khorsand, Parsoa; Hormozdiari, Fereydoun.

Nucleic Acids Res ; 49(8): e47, 2021 05 07.

Artículo en Inglés | MEDLINE | ID: mdl-33503255

RESUMEN

Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.

Asunto(s)

Genómica/métodos , Técnicas de Genotipaje/métodos , Simulación por Computador , Bases de Datos Genéticas , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL , Polimorfismo de Nucleótido Simple , Programas Informáticos , Secuenciación Completa del Genoma

Comparative genome analysis using sample-specific string detection in accurate long reads.

Khorsand, Parsoa; Denti, Luca; Bonizzoni, Paola; Chikhi, Rayan; Hormozdiari, Fereydoun.

Bioinform Adv ; 1(1): vbab005, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-36700094

RESUMEN

Motivation: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results: We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation: Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA