Pesquisa | BVS CLAP/SMR-OPAS/OMS

Resolving single-cell copy number profiling for large datasets.

Ruohan, Wang; Yuwei, Zhang; Mengbo, Wang; Xikang, Feng; Jianping, Wang; Shuai Cheng, Li.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35801503

RESUMO

The advances of single-cell DNA sequencing (scDNA-seq) enable us to characterize the genetic heterogeneity of cancer cells. However, the high noise and low coverage of scDNA-seq impede the estimation of copy number variations (CNVs). In addition, existing tools suffer from intensive execution time and often fail on large datasets. Here, we propose SeCNV, an efficient method that leverages structural entropy, to profile the copy numbers. SeCNV adopts a local Gaussian kernel to construct a matrix, depth congruent map (DCM), capturing the similarities between any two bins along the genome. Then, SeCNV partitions the genome into segments by minimizing the structural entropy from the DCM. With the partition, SeCNV estimates the copy numbers within each segment for cells. We simulate nine datasets with various breakpoint distributions and amplitudes of noise to benchmark SeCNV. SeCNV achieves a robust performance, i.e. the F1-scores are higher than 0.95 for breakpoint detections, significantly outperforming state-of-the-art methods. SeCNV successfully processes large datasets (>50 000 cells) within 4 min, while other tools fail to finish within the time limit, i.e. 120 h. We apply SeCNV to single-nucleus sequencing datasets from two breast cancer patients and acoustic cell tagmentation sequencing datasets from eight breast cancer patients. SeCNV successfully reproduces the distinct subclones and infers tumor heterogeneity. SeCNV is available at https://github.com/deepomicslab/SeCNV.

Assuntos

Neoplasias da Mama , Variações do Número de Cópias de DNA , Algoritmos , Neoplasias da Mama/genética , Feminino , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos

SuperTAD-Fast: Accelerating Topologically Associating Domains Detection Through Discretization.

Ling, Zhao; Zhang, Yu Wei; Li, Shuai Cheng.

J Comput Biol ; 31(9): 784-796, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39047029

RESUMO

High-throughput chromosome conformation capture (Hi-C) technology captures spatial interactions of DNA sequences into matrices, and software tools are developed to identify topologically associating domains (TADs) from the Hi-C matrices. With structural information theory, SuperTAD adopted a dynamic programming approach to find the TAD hierarchy with minimal structural entropy. However, the algorithm suffers from high time complexity. To accelerate this algorithm, we design and implement an approximation algorithm with a theoretical performance guarantee. We implemented a package, SuperTAD-Fast. Using Hi-C matrices and simulated data, we demonstrated that SuperTAD-Fast achieved great runtime improvement compared with SuperTAD. SuperTAD-Fast shows high consistency and significant enrichment of structural proteins from Hi-C data of human cell lines in comparison with the existing six hierarchical TADs detecting methods.

Assuntos

Cromatina , Técnicas Genéticas , Software , Cromatina/química , Cromatina/genética , Simulação por Computador , Algoritmos , Entropia , Genoma

Ver mais detalhes

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA