Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes.

Yu, Wenjuan; Luo, Haohui; Yang, Jinbao; Zhang, Shengchen; Jiang, Heling; Zhao, Xianjia; Hui, Xingqi; Sun, Da; Li, Liang; Wei, Xiu-Qing; Lonardi, Stefano; Pan, Weihua.

Genome Res ; 34(2): 326-340, 2024 Mar 20.

Artículo en Inglés | MEDLINE | ID: mdl-38428994

RESUMEN

Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.

Asunto(s)

Metagenoma , Programas Informáticos , Análisis de Secuencia de ADN/métodos , Algoritmos , Metagenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos

lncRNASNP v3: an updated database for functional variants in long non-coding RNAs.

Yang, Yanbo; Wang, Dongyang; Miao, Ya-Ru; Wu, Xiaohong; Luo, Haohui; Cao, Wen; Yang, Wenqian; Yang, Jianye; Guo, An-Yuan; Gong, Jing.

Nucleic Acids Res ; 51(D1): D192-D198, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36350671

RESUMEN

Long non-coding RNAs (lncRNAs) act as versatile regulators of many biological processes and play vital roles in various diseases. lncRNASNP is dedicated to providing a comprehensive repository of single nucleotide polymorphisms (SNPs) and somatic mutations in lncRNAs and their impacts on lncRNA structure and function. Since the last release in 2018, there has been a huge increase in the number of variants and lncRNAs. Thus, we updated the lncRNASNP to version 3 by expanding the species to eight eukaryotic species (human, chimpanzee, pig, mouse, rat, chicken, zebrafish, and fruitfly), updating the data and adding several new features. SNPs in lncRNASNP have increased from 11 181 387 to 67 513 785. The human mutations have increased from 1 174 768 to 2 387 685, including 1 031 639 TCGA mutations and 1 356 046 CosmicNCVs. Compared with the last release, updated and new features in lncRNASNP v3 include (i) SNPs in lncRNAs and their impacts on lncRNAs for eight species, (ii) SNP effects on miRNA-lncRNA interactions for eight species, (iii) lncRNA expression profiles for six species, (iv) disease & GWAS-associated lncRNAs and variants, (v) experimental & predicted lncRNAs and drug target associations and (vi) SNP effects on lncRNA expression (eQTL) across tumor & normal tissues. The lncRNASNP v3 is freely available at http://gong_lab.hzau.edu.cn/lncRNASNP3/.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Polimorfismo de Nucleótido Simple , ARN Largo no Codificante , Animales , Humanos , MicroARNs/genética , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo

Pancan-MNVQTLdb: systematic identification of multi-nucleotide variant quantitative trait loci in 33 cancer types.

Wang, Dongyang; Cao, Wen; Yang, Wenqian; Jin, Weiwei; Luo, Haohui; Niu, Xiaohui; Gong, Jing.

NAR Cancer ; 4(4): zcac043, 2022 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-36568962

RESUMEN

Multi-nucleotide variants (MNVs) are defined as clusters of two or more nearby variants existing on the same haplotype in an individual. Recent studies have identified millions of MNVs in human populations, but their functions remain largely unknown. Numerous studies have demonstrated that single-nucleotide variants could serve as quantitative trait loci (QTLs) by affecting molecular phenotypes. Therefore, we propose that MNVs can also affect molecular phenotypes by influencing regulatory elements. Using the genotype data from The Cancer Genome Atlas (TCGA), we first identified 223 759 unique MNVs in 33 cancer types. Then, to decipher the functions of these MNVs, we investigated the associations between MNVs and six molecular phenotypes, including coding gene expression, miRNA expression, lncRNA expression, alternative splicing, DNA methylation and alternative polyadenylation. As a result, we identified 1 397 821 cis-MNVQTLs and 402 381 trans-MNVQTLs. We further performed survival analysis and identified 46 173 MNVQTLs associated with patient overall survival. We also linked the MNVQTLs to genome-wide association studies (GWAS) data and identified 119 762 MNVQTLs that overlap with existing GWAS loci. Finally, we developed Pancan-MNVQTLdb (http://gong_lab.hzau.edu.cn/mnvQTLdb/) for data retrieval and download. Pancan-MNVQTLdb will help decipher the functions of MNVs in different cancer types and be an important resource for genetic and cancer research.

HICANCER: accurate and complete cancer genome phasing with Hi-C reads.

Pan, Weihua; Gong, Desheng; Sun, Da; Luo, Haohui.

Sci Rep ; 11(1): 6609, 2021 03 23.

Artículo en Inglés | MEDLINE | ID: mdl-33758310

RESUMEN

Due to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HICANCER. HICANCER solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HICANCER is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Programas Informáticos , Genoma Humano , Humanos , Células K562

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA