A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs.

Duan, You; Zhang, Wanting; Cheng, Yingyin; Shi, Mijuan; Xia, Xiao-Qin

Duan, You; Zhang, Wanting; Cheng, Yingyin; Shi, Mijuan; Xia, Xiao-Qin.

Afiliação

Duan Y; Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
Zhang W; University of Chinese Academy of Sciences, Beijing 100049, China.
Cheng Y; Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
Shi M; The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China.
Xia XQ; Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.

RNA ; 27(1): 80-98, 2021 01.

Article em En | MEDLINE | ID: mdl-33055239

ABSTRACT

ABSTRACT

High-throughput RNA sequencing unveiled the complexity of transcriptome and significantly increased the records of long noncoding RNAs (lncRNAs), which were reported to participate in a variety of biological processes. Identification of lncRNAs is a key step in lncRNA analysis, and a bunch of bioinformatics tools have been developed for this purpose in recent years. While these tools allow us to identify lncRNA more efficiently and accurately, they may produce inconsistent results, making selection a confusing issue. We compared the performance of 41 analysis models based on 14 software packages and different data sets, including high-quality data and low-quality data from 33 species. In addition, computational efficiency, robustness, and joint prediction of the models were explored. As a practical guidance, key points for lncRNA identification under different situations were summarized. In this investigation, no one of these models could be superior to others under all test conditions. The performance of a model relied to a great extent on the source of transcripts and the quality of assemblies. As general references, FEELnc_all_cl, CPC, and CPAT_mouse work well in most species while COME, CNCI, and lncScore are good choices for model organisms. Since these tools are sensitive to different factors such as the species involved and the quality of assembly, researchers must carefully select the appropriate tool based on the actual data. Alternatively, our test suggests that joint prediction could behave better than any single model if proper models were chosen. All scripts/data used in this research can be accessed at http//bioinfo.ihb.ac.cn/elit.

Assuntos

Biologia Computacional/métodos; Genoma; RNA Longo não Codificante/genética; RNA Mensageiro/genética; Software; Animais; Benchmarking; Conjuntos de Dados como Assunto; Sequenciamento de Nucleotídeos em Larga Escala; Humanos; Camundongos; Modelos Genéticos; Anotação de Sequência Molecular; Plantas/genética; RNA Longo não Codificante/classificação; RNA Longo não Codificante/metabolismo; RNA Mensageiro/classificação; RNA Mensageiro/metabolismo; Especificidade da Espécie; Transcriptoma

Palavras-chave

joint prediction; long noncoding RNA identification; non-model species; simulated and biological data sets; tools comparison

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / RNA Mensageiro / Genoma / Biologia Computacional / RNA Longo não Codificante Tipo de estudo: Diagnostic_studies / Guideline / Prognostic_studies Limite: Animals / Humans Idioma: En Revista: RNA Assunto da revista: BIOLOGIA MOLECULAR Ano de publicação: 2021 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google