FexRNA: Exploratory Data Analysis and Feature Selection of Non-Coding RNA.
IEEE/ACM Trans Comput Biol Bioinform
; 18(6): 2795-2801, 2021.
Article
em En
| MEDLINE
| ID: mdl-33539302
Non-coding RNA (ncRNA) is involved in many biological processes and diseases in all species. Many ncRNA datasets exist that provide ncRNA data in FASTA format which is well suited for biomedical purposes. However, for ncRNA analysis and classification, statistical learning methods require hidden numerical features from the data. Furthermore, in the literature, a wealth of sequence intrinsic features has been proposed for ncRNA identification. The extraction of hidden features, their analysis, and usage of a suitable set of features is crucial for the performance of any statistical learning method. To alleviate the posed challenges, we generated 96 feature datasets from ncRNA widely used features. The feature datasets are based on RNACentral and consist of species, ncRNA types, and expert databases that are available on the FexRNA platform. Additionally, the feature datasets are explored and analysed to provide statistical information, univariate, and bivariate analysis. We sought to determine which of these 17 features would be most appropriate to use in developing ncRNA classification approaches. For feature selection (FS), a two-phase hierarchical FS framework based on correlation and majority voting is proposed and evaluated on 5 species. The FexRNA platform provides information about ncRNA feature analysis and selection.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Software
/
Análise de Sequência de RNA
/
Biologia Computacional
/
RNA não Traduzido
/
Aprendizado de Máquina
Idioma:
En
Revista:
ACM Trans Comput Biol Bioinform
Ano de publicação:
2021
Tipo de documento:
Article