Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics.
IEEE Trans Neural Netw Learn Syst
; 31(8): 2857-2867, 2020 08.
Article
em En
| MEDLINE
| ID: mdl-31170082
In the postgenome era, many problems in bioinformatics have arisen due to the generation of large amounts of imbalanced data. In particular, the computational classification of precursor microRNA (pre-miRNA) involves a high imbalance in the classes. For this task, a classifier is trained to identify RNA sequences having the highest chance of being miRNA precursors. The big issue is that well-known pre-miRNAs are usually just a few in comparison to the hundreds of thousands of candidate sequences in a genome, which results in highly imbalanced data. This imbalance has a strong influence on most standard classifiers and, if not properly addressed, the classifier is not able to work properly in a real-life scenario. This work provides a comparative assessment of recent deep neural architectures for dealing with the large imbalanced data issue in the classification of pre-miRNAs. We present and analyze recent architectures in a benchmark framework with genomes of animals and plants, with increasing imbalance ratios up to 1:2000. We also propose a new graphical way for comparing classifiers performance in the context of high-class imbalance. The comparative results obtained show that, at a very high imbalance, deep belief neural networks can provide the best performance.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Plantas
/
Bases de Dados Factuais
/
Redes Neurais de Computação
/
Biologia Computacional
/
Aprendizado Profundo
Limite:
Animals
/
Humans
Idioma:
En
Ano de publicação:
2020
Tipo de documento:
Article