Pesquisa | Portal Regional da BVS

Computational Analysis of Transposable Elements and CircRNAs in Plants.

Oliveira, Liliane Santana; Patera, Andressa Caroline; Domingues, Douglas Silva; Sanches, Danilo Sipoli; Lopes, Fabricio Martins; Bugatti, Pedro Henrique; Saito, Priscila Tiemi Maeda; Maracaja-Coutinho, Vinicius; Durham, Alan Mitchell; Paschoal, Alexandre Rossi.

Methods Mol Biol ; 2362: 147-172, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34195962

RESUMO

This chapter provides two main contributions: (1) a description of computational tools and databases used to identify and analyze transposable elements (TEs) and circRNAs in plants; and (2) data analysis on public TE and circRNA data. Our goal is to highlight the primary information available in the literature on circular noncoding RNAs and transposable elements in plants. The exploratory analysis performed on publicly available circRNA and TEs data help discuss four sequence features. Finally, we investigate the association on circRNAs:TE in plants in the model organism Arabidopsis thaliana.

Assuntos

Arabidopsis , Elementos de DNA Transponíveis , Arabidopsis/genética , Biologia Computacional , Elementos de DNA Transponíveis/genética , Plantas/genética , RNA Circular

TERL: classification of transposable elements by convolutional neural networks.

da Cruz, Murilo Horacio Pereira; Domingues, Douglas Silva; Saito, Priscila Tiemi Maeda; Paschoal, Alexandre Rossi; Bugatti, Pedro Henrique.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-34020551

RESUMO

Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.

Assuntos

Elementos de DNA Transponíveis , Redes Neurais de Computação , Conjuntos de Dados como Assunto

Exploring Active Learning Based on Representativeness and Uncertainty for Biomedical Data Classification.

Bressan, Rafael S; Camargo, Guilherme; Bugatti, Pedro Henrique; Saito, Priscila Tiemi Maeda.

IEEE J Biomed Health Inform ; 23(6): 2238-2244, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-30442623

RESUMO

Nowadays, there is an abundance of biomedical data, such as images and genetic sequences, among others. However, there is a lack of annotation to such volume of data, due to the high costs involved to perform this task. Thus, it is mandatory to develop techniques to ease the burden of human annotation. To reach such goal active learning strategies can be applied. However, the state-of-the-art active learning methods, generally, are not feasible to lead with real-world datasets. Another important issue, that is generally neglected by these methods, is related to the conception that the classifier tends to learn more and more at each iteration. Their adopted selection criteria do not properly exploit the knowledge of the classifier. Therefore, in this paper, we propose the use of an active learning approach, in order to leverage the learning process, including the proposal of a novel active learning strategy. The main difference of our proposed strategy is related to the participation of the classifier in an extremely active way in its learning process. So, we can better maximize and prioritize the knowledge that is obtained by the classifier at each iteration, making use of this knowledge in a more appropriate and useful way when selecting more informative samples. To do so, in our selection criteria, we give significant importance to the classifications suggested by the classifier. In addition, jointly with the participation and the knowledge of the classifier, we consider both uncertainty and representativeness criteria through a fine-grained analysis of the samples. Experimental results show that our novel active learning approach outperforms state-of-the-art active learning methods, considering several supervised classifiers. Hence, dealing with real dataset problems in a better way, equalizing the tradeoff between annotation task and higher accuracy rates.

Assuntos

Diagnóstico por Computador/métodos , Aprendizado de Máquina , Informática Médica/métodos , Algoritmos , Bases de Dados Factuais , Humanos , Descoberta do Conhecimento , Neoplasias/classificação

Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants.

Negri, Tatianne da Costa; Alves, Wonder Alexandre Luz; Bugatti, Pedro Henrique; Saito, Priscila Tiemi Maeda; Domingues, Douglas Silva; Paschoal, Alexandre Rossi.

Brief Bioinform ; 20(2): 682-689, 2019 03 25.

Artigo em Inglês | MEDLINE | ID: mdl-29697740

RESUMO

MOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.

Assuntos

Biologia Computacional/métodos , Plantas/genética , RNA Longo não Codificante/genética , RNA de Plantas/genética , Regulação da Expressão Gênica de Plantas , Aprendizado de Máquina , Plantas/classificação , Especificidade da Espécie

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA