Pesquisa | Portal Regional da BVS

Secondary structure prediction of long noncoding RNA: review and experimental comparison of existing approaches.

Bugnon, L A; Edera, A A; Prochetto, S; Gerard, M; Raad, J; Fenoy, E; Rubiolo, M; Chorostecki, U; Gabaldón, T; Ariel, F; Di Persia, L E; Milone, D H; Stegmayer, G.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35692094

RESUMO

MOTIVATION: In contrast to messenger RNAs, the function of the wide range of existing long noncoding RNAs (lncRNAs) largely depends on their structure, which determines interactions with partner molecules. Thus, the determination or prediction of the secondary structure of lncRNAs is critical to uncover their function. Classical approaches for predicting RNA secondary structure have been based on dynamic programming and thermodynamic calculations. In the last 4 years, a growing number of machine learning (ML)-based models, including deep learning (DL), have achieved breakthrough performance in structure prediction of biomolecules such as proteins and have outperformed classical methods in short transcripts folding. Nevertheless, the accurate prediction for lncRNA still remains far from being effectively solved. Notably, the myriad of new proposals has not been systematically and experimentally evaluated. RESULTS: In this work, we compare the performance of the classical methods as well as the most recently proposed approaches for secondary structure prediction of RNA sequences using a unified and consistent experimental setup. We use the publicly available structural profiles for 3023 yeast RNA sequences, and a novel benchmark of well-characterized lncRNA structures from different species. Moreover, we propose a novel metric to assess the predictive performance of methods, exclusively based on the chemical probing data commonly used for profiling RNA structures, avoiding any potential bias incorporated by computational predictions when using dot-bracket references. Our results provide a comprehensive comparative assessment of existing methodologies, and a novel and public benchmark resource to aid in the development and comparison of future approaches. AVAILABILITY: Full source code and benchmark datasets are available at: https://github.com/sinc-lab/lncRNA-folding. CONTACT: lbugnon@sinc.unl.edu.ar.

Assuntos

RNA Longo não Codificante , Biologia Computacional/métodos , Estrutura Secundária de Proteína , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro , Software

Deep Learning for the discovery of new pre-miRNAs: Helping the fight against COVID-19.

Bugnon, L A; Raad, J; Merino, G A; Yones, C; Ariel, F; Milone, D H; Stegmayer, G.

Mach Learn Appl ; 6: 100150, 2021 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-34939043

RESUMO

The Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) has been recently found responsible for the pandemic outbreak of a novel coronavirus disease (COVID-19). In this work, a novel approach based on deep learning is proposed for identifying precursors of small active RNA molecules named microRNA (miRNA) in the genome of the novel coronavirus. Viral miRNA-like molecules have shown to modulate the host transcriptome during the infection progression, thus their identification is crucial for helping the diagnosis or medical treatment of the disease. The existence of the mature miRNAs derived from computationally predicted miRNA precursors (pre-miRNAs) in the novel coronavirus was validated with small RNA-seq data from SARS-CoV-2-infected human cells. The results demonstrate that computational models can provide accurate and useful predictions of pre-miRNAs in the SARS-CoV-2 genome, underscoring the relevance of machine learning in the response to a global sanitary emergency. Moreover, the interpretability of our model shed light on the molecular mechanisms underlying the viral infection, thus contributing to the fight against the COVID-19 pandemic and the fast development of new treatments. Our study shows how recent advances in machine learning can be used, effectively, in response to public health emergencies. The approach developed in this work could be of great help in future similar emergencies to accelerate the understanding of the singularities of any viral agent and for the development of novel therapies. Data and source code available at: https://sourceforge.net/projects/sourcesinc/files/aicovid/.

High precision in microRNA prediction: A novel genome-wide approach with convolutional deep residual networks.

Yones, C; Raad, J; Bugnon, L A; Milone, D H; Stegmayer, G.

Comput Biol Med ; 134: 104448, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-33979731

RESUMO

MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays and computational methods are needed for the precise prediction of novel candidates to miRNA. This task can be done by searching homologous with sequence alignment tools, but results are restricted to sequences that are very similar to the known miRNA precursors (pre-miRNAs). Besides, a very important property of pre-miRNAs, their secondary structure, is not taken into account by these methods. To fill this gap, many machine learning approaches were proposed in the last years. However, the methods are generally tested in very controlled conditions. If these methods were used under real conditions, the false positives increase and the precisions fall quite below those published. This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network (mirDNN). This model was tested with several genomes of animals and plants, the full-genomes, achieving a precision up to 5 times larger than other approaches at the same recall rates. Furthermore, a novel validation methodology was used to ensure that the performance reported in this study can be effectively achieved when using mirDNN in novel species. To provide fast an easy access to mirDNN, a web demo is available at http://sinc.unl.edu.ar/web-demo/mirdnn/. The demo can process FASTA files with multiple sequences to calculate the prediction scores and generates the nucleotide importance plots. FULL SOURCE CODE: http://sourceforge.net/projects/sourcesinc/files/mirdnn and https://github.com/cyones/mirDNN. CONTACT: gstegmayer@sinc.unl.edu.ar.

Assuntos

MicroRNAs , Animais , Biologia Computacional , Genoma , MicroRNAs/genética , Alinhamento de Sequência , Software

DL4papers: a deep learning approach for the automatic interpretation of scientific articles.

Bugnon, L A; Yones, C; Raad, J; Gerard, M; Rubiolo, M; Merino, G; Pividori, M; Di Persia, L; Milone, D H; Stegmayer, G.

Bioinformatics ; 36(11): 3499-3506, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32091584

RESUMO

MOTIVATION: In precision medicine, next-generation sequencing and novel preclinical reports have led to an increasingly large amount of results, published in the scientific literature. However, identifying novel treatments or predicting a drug response in, for example, cancer patients, from the huge amount of papers available remains a laborious and challenging work. This task can be considered a text mining problem that requires reading a lot of academic documents for identifying a small set of papers describing specific relations between key terms. Due to the infeasibility of the manual curation of these relations, computational methods that can automatically identify them from the available literature are urgently needed. RESULTS: We present DL4papers, a new method based on deep learning that is capable of analyzing and interpreting papers in order to automatically extract relevant relations between specific keywords. DL4papers receives as input a query with the desired keywords, and it returns a ranked list of papers that contain meaningful associations between the keywords. The comparison against related methods showed that our proposal outperformed them in a cancer corpus. The reliability of the DL4papers output list was also measured, revealing that 100% of the first two documents retrieved for a particular search have relevant relations, in average. This shows that our model can guarantee that in the top-2 papers of the ranked list, the relation can be effectively found. Furthermore, the model is capable of highlighting, within each document, the specific fragments that have the associations of the input keywords. This can be very useful in order to pay attention only to the highlighted text, instead of reading the full paper. We believe that our proposal could be used as an accurate tool for rapidly identifying relationships between genes and their mutations, drug responses and treatments in the context of a certain disease. This new approach can certainly be a very useful and valuable resource for the advancement of the precision medicine field. AVAILABILITY AND IMPLEMENTATION: A web-demo is available at: http://sinc.unl.edu.ar/web-demo/dl4papers/. Full source code and data are available at: https://sourceforge.net/projects/sourcesinc/files/dl4papers/. CONTACT: lbugnon@sinc.unl.edu.ar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Software , Mineração de Dados , Humanos , Medicina de Precisão , Reprodutibilidade dos Testes

Genome-wide hairpins datasets of animals and plants for novel miRNA prediction.

Bugnon, L A; Yones, C; Raad, J; Milone, D H; Stegmayer, G.

Data Brief ; 25: 104209, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-31453279

RESUMO

This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) "positive": meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) "unlabeled": indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/.

Extreme learning machines for reverse engineering of gene regulatory networks from expression time series.

Rubiolo, M; Milone, D H; Stegmayer, G.

Bioinformatics ; 34(7): 1253-1260, 2018 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-29182723

RESUMO

Motivation: The reconstruction of gene regulatory networks (GRNs) from genes profiles has a growing interest in bioinformatics for understanding the complex regulatory mechanisms in cellular systems. GRNs explicitly represent the cause-effect of regulation among a group of genes and its reconstruction is today a challenging computational problem. Several methods were proposed, but most of them require different input sources to provide an acceptable prediction. Thus, it is a great challenge to reconstruct a GRN only from temporal gene expression data. Results: Extreme Learning Machine (ELM) is a new supervised neural model that has gained interest in the last years because of its higher learning rate and better performance than existing supervised models in terms of predictive power. This work proposes a novel approach for GRNs reconstruction in which ELMs are used for modeling the relationships between gene expression time series. Artificial datasets generated with the well-known benchmark tool used in DREAM competitions were used. Real datasets were used for validation of this novel proposal with well-known GRNs underlying the time series. The impact of increasing the size of GRNs was analyzed in detail for the compared methods. The results obtained confirm the superiority of the ELM approach against very recent state-of-the-art methods in the same experimental conditions. Availability and implementation: The web demo can be found at http://sinc.unl.edu.ar/web-demo/elm-grnnminer/. The source code is available at https://sourceforge.net/projects/sourcesinc/files/elm-grnnminer. Contact: mrubiolo@santafe-conicet.gov.ar. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Software , Aprendizado de Máquina Supervisionado , Escherichia coli/genética , Modelos Genéticos , Saccharomyces cerevisiae/genética

Genome-wide pre-miRNA discovery from few labeled examples.

Yones, C; Stegmayer, G; Milone, D H.

Bioinformatics ; 34(4): 541-549, 2018 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-29028911

RESUMO

Motivation: Although many machine learning techniques have been proposed for distinguishing miRNA hairpins from other stem-loop sequences, most of the current methods use supervised learning, which requires a very good set of positive and negative examples. Those methods have important practical limitations when they have to be applied to a real prediction task. First, there is the challenge of dealing with a scarce number of positive (well-known) pre-miRNA examples. Secondly, it is very difficult to build a good set of negative examples for representing the full spectrum of non-miRNA sequences. Thirdly, in any genome, there is a huge class imbalance (1: 10 000) that is well-known for particularly affecting supervised classifiers. Results: To enable efficient and speedy genome-wide predictions of novel miRNAs, we present miRNAss, which is a novel method based on semi-supervised learning. It takes advantage of the information provided by the unlabeled stem-loops, thereby improving the prediction rates, even when the number of labeled examples is low and not representative of the classes. An automatic method for searching negative examples to initialize the algorithm is also proposed so as to spare the user this difficult task. MiRNAss obtained better prediction rates and shorter execution times than state-of-the-art supervised methods. It was validated with genome-wide data from three model species, with more than one million of hairpin sequences each, thereby demonstrating its applicability to a real prediction task. Availability and implementation: An R package can be downloaded from https://cran.r-project.org/package=miRNAss. In addition, a web-demo for testing the algorithm is available at http://fich.unl.edu.ar/sinc/web-demo/mirnass. All the datasets that were used in this study and the sets of predicted pre-miRNA are available on http://sourceforge.net/projects/sourcesinc/files/mirnass. Contact: cyones@sinc.unl.edu.ar. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Eucariotos/metabolismo , Genoma , MicroRNAs/metabolismo , Aprendizado de Máquina Supervisionado , Animais , Anopheles/genética , Anopheles/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Eucariotos/genética , Genômica/métodos , MicroRNAs/química , Conformação de Ácido Nucleico , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos

Monitoring and assessment of ingestive chewing sounds for prediction of herbage intake rate in grazing cattle.

Galli, J R; Cangiano, C A; Pece, M A; Larripa, M J; Milone, D H; Utsumi, S A; Laca, E A.

Animal ; 12(5): 973-982, 2018 May.

Artigo em Inglês | MEDLINE | ID: mdl-28994354

RESUMO

Accurate measurement of herbage intake rate is critical to advance knowledge of the ecology of grazing ruminants. This experiment tested the integration of behavioral and acoustic measurements of chewing and biting to estimate herbage dry matter intake (DMI) in dairy cows offered micro-swards of contrasting plant structure. Micro-swards constructed with plastic pots were offered to three lactating Holstein cows (608±24.9 kg of BW) in individual grazing sessions (n=48). Treatments were a factorial combination of two forage species (alfalfa and fescue) and two plant heights (tall=25±3.8 cm and short=12±1.9 cm) and were offered on a gradient of increasing herbage mass (10 to 30 pots) and number of bites (~10 to 40 bites). During each grazing session, sounds of biting and chewing were recorded with a wireless microphone placed on the cows' foreheads and a digital video camera to allow synchronized audio and video recordings. Dry matter intake rate was higher in tall alfalfa than in the other three treatments (32±1.6 v. 19±1.2 g/min). A high proportion of jaw movements in every grazing session (23 to 36%) were compound jaw movements (chew-bites) that appeared to be a key component of chewing and biting efficiency and of the ability of cows to regulate intake rate. Dry matter intake was accurately predicted based on easily observable behavioral and acoustic variables. Chewing sound energy measured as energy flux density (EFD) was linearly related to DMI, with 74% of EFD variation explained by DMI. Total chewing EFD, number of chew-bites and plant height (tall v. short) were the most important predictors of DMI. The best model explained 91% of the variation in DMI with a coefficient of variation of 17%. Ingestive sounds integrate valuable information to remotely monitor feeding behavior and predict DMI in grazing cows.

Assuntos

Bovinos/fisiologia , Ingestão de Alimentos , Comportamento Alimentar , Mastigação , Acústica , Animais , Feminino , Lactação , Medicago sativa , Poaceae

MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data.

Kamenetzky, L; Stegmayer, G; Maldonado, L; Macchiaroli, N; Yones, C; Milone, D H.

Genomics ; 107(6): 274-80, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27107656

Assuntos

Equinococose/genética , Echinococcus multilocularis/genética , MicroRNAs/genética , Animais , Equinococose/parasitologia , Echinococcus multilocularis/patogenicidade , Genoma , Humanos , MicroRNAs/isolamento & purificação

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA