Búsqueda | BVS Bolivia

Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing.

Henglin, Mir; Ghareghani, Maryam; Harvey, William; Porubsky, David; Koren, Sergey; Eichler, Evan E; Ebert, Peter; Marschall, Tobias.

bioRxiv ; 2024 Jun 20.

Artículo en Inglés | MEDLINE | ID: mdl-38529499

RESUMEN

Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de-novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de-novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio-phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Ebert, Peter; Audano, Peter A; Zhu, Qihui; Rodriguez-Martin, Bernardo; Porubsky, David; Bonder, Marc Jan; Sulovari, Arvis; Ebler, Jana; Zhou, Weichen; Serra Mari, Rebecca; Yilmaz, Feyza; Zhao, Xuefang; Hsieh, PingHsun; Lee, Joyce; Kumar, Sushant; Lin, Jiadong; Rausch, Tobias; Chen, Yu; Ren, Jingwen; Santamarina, Martin; Höps, Wolfram; Ashraf, Hufsah; Chuang, Nelson T; Yang, Xiaofei; Munson, Katherine M; Lewis, Alexandra P; Fairley, Susan; Tallon, Luke J; Clarke, Wayne E; Basile, Anna O; Byrska-Bishop, Marta; Corvelo, André; Evani, Uday S; Lu, Tsung-Yu; Chaisson, Mark J P; Chen, Junjie; Li, Chong; Brand, Harrison; Wenger, Aaron M; Ghareghani, Maryam; Harvey, William T; Raeder, Benjamin; Hasenfeld, Patrick; Regier, Allison A; Abel, Haley J; Hall, Ira M; Flicek, Paul; Stegle, Oliver; Gerstein, Mark B; Tubio, Jose M C.

Science ; 372(6537)2021 04 02.

Artículo en Inglés | MEDLINE | ID: mdl-33632895

RESUMEN

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

Asunto(s)

Variación Genética , Genoma Humano , Haplotipos , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Secuencias Repetitivas Esparcidas , Masculino , Grupos de Población/genética , Sitios de Carácter Cuantitativo , Retroelementos , Análisis de Secuencia de ADN , Inversión de Secuencia , Secuenciación Completa del Genoma

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.

Porubsky, David; Ebert, Peter; Audano, Peter A; Vollger, Mitchell R; Harvey, William T; Marijon, Pierre; Ebler, Jana; Munson, Katherine M; Sorensen, Melanie; Sulovari, Arvis; Haukness, Marina; Ghareghani, Maryam; Lansdorp, Peter M; Paten, Benedict; Devine, Scott E; Sanders, Ashley D; Lee, Charles; Chaisson, Mark J P; Korbel, Jan O; Eichler, Evan E; Marschall, Tobias.

Nat Biotechnol ; 39(3): 302-308, 2021 03.

Artículo en Inglés | MEDLINE | ID: mdl-33288906

RESUMEN

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

Asunto(s)

Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Padres , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Haplotipos , Humanos , Puerto Rico/etnología

Single-cell analysis of structural variations and complex rearrangements with tri-channel processing.

Sanders, Ashley D; Meiers, Sascha; Ghareghani, Maryam; Porubsky, David; Jeong, Hyobin; van Vliet, M Alexandra C C; Rausch, Tobias; Richter-Pechanska, Paulina; Kunz, Joachim B; Jenni, Silvia; Bolognini, Davide; Longo, Gabriel M C; Raeder, Benjamin; Kinanen, Venla; Zimmermann, Jürgen; Benes, Vladimir; Schrappe, Martin; Mardin, Balca R; Kulozik, Andreas E; Bornhauser, Beat; Bourquin, Jean-Pierre; Marschall, Tobias; Korbel, Jan O.

Nat Biotechnol ; 38(3): 343-354, 2020 03.

Artículo en Inglés | MEDLINE | ID: mdl-31873213

RESUMEN

Structural variation (SV), involving deletions, duplications, inversions and translocations of DNA segments, is a major source of genetic variability in somatic cells and can dysregulate cancer-related pathways. However, discovering somatic SVs in single cells has been challenging, with copy-number-neutral and complex variants typically escaping detection. Here we describe single-cell tri-channel processing (scTRIP), a computational framework that integrates read depth, template strand and haplotype phase to comprehensively discover SVs in individual cells. We surveyed SV landscapes of 565 single cells, including transformed epithelial cells and patient-derived leukemic samples, to discover abundant SV classes, including inversions, translocations and complex DNA rearrangements. Analysis of the leukemic samples revealed four times more somatic SVs than cytogenetic karyotyping, submicroscopic copy-number alterations, oncogenic copy-neutral rearrangements and a subclonal chromothripsis event. Advancing current methods, single-cell tri-channel processing can directly measure SV mutational processes in individual cells, such as breakage-fusion-bridge cycles, facilitating studies of clonal evolution, genetic mosaicism and SV formation mechanisms, which could improve disease classification for precision medicine.

Asunto(s)

Biología Computacional/métodos , Variación Estructural del Genoma , Leucemia/genética , Análisis de la Célula Individual/métodos , Línea Celular , Cromotripsis , Evolución Clonal , Reordenamiento Génico , Humanos , Mutación INDEL , Inversión de Secuencia , Translocación Genética

Strand-seq enables reliable separation of long reads by chromosome via expectation maximization.

Ghareghani, Maryam; Porubsky, David; Sanders, Ashley D; Meiers, Sascha; Eichler, Evan E; Korbel, Jan O; Marschall, Tobias.

Bioinformatics ; 34(13): i115-i123, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-29949971

RESUMEN

Motivation: Current sequencing technologies are able to produce reads orders of magnitude longer than ever possible before. Such long reads have sparked a new interest in de novo genome assembly, which removes reference biases inherent to re-sequencing approaches and allows for a direct characterization of complex genomic variants. However, even with latest algorithmic advances, assembling a mammalian genome from long error-prone reads incurs a significant computational burden and does not preclude occasional misassemblies. Both problems could potentially be mitigated if assembly could commence for each chromosome separately. Results: To address this, we show how single-cell template strand sequencing (Strand-seq) data can be leveraged for this purpose. We introduce a novel latent variable model and a corresponding Expectation Maximization algorithm, termed SaaRclust, and demonstrates its ability to reliably cluster long reads by chromosome. For each long read, this approach produces a posterior probability distribution over all chromosomes of origin and read directionalities. In this way, it allows to assess the amount of uncertainty inherent to sparse Strand-seq data on the level of individual reads. Among the reads that our algorithm confidently assigns to a chromosome, we observed more than 99% correct assignments on a subset of Pacific Bioscience reads with 30.1× coverage. To our knowledge, SaaRclust is the first approach for the in silico separation of long reads by chromosome prior to assembly. Availability and implementation: https://github.com/daewoooo/SaaRclust.

Asunto(s)

Cromosomas Humanos , Simulación por Computador , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Femenino , Genoma Humano , Humanos , Análisis de Secuencia de ADN/métodos

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA