Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-34020551

RESUMO

Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.


Assuntos
Elementos de DNA Transponíveis , Redes Neurais de Computação , Conjuntos de Dados como Assunto
2.
Brief Bioinform ; 20(2): 682-689, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-29697740

RESUMO

MOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.


Assuntos
Biologia Computacional/métodos , Plantas/genética , RNA Longo não Codificante/genética , RNA de Plantas/genética , Regulação da Expressão Gênica de Plantas , Aprendizado de Máquina , Plantas/classificação , Especificidade da Espécie
3.
Brief Bioinform ; 19(6): 1273-1289, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-28575144

RESUMO

The competing endogenous RNA hypothesis has gained increasing attention as a potential global regulatory mechanism of microRNAs (miRNAs), and as a powerful tool to predict the function of many noncoding RNAs, including miRNAs themselves. Most studies have been focused on animals, although target mimic (TMs) discovery as well as important computational and experimental advances has been developed in plants over the past decade. Thus, our contribution summarizes recent progresses in computational approaches for research of miRNA:TM interactions. We divided this article in three main contributions. First, a general overview of research on TMs in plants is presented with practical descriptions of the available literature, tools, data, databases and computational reports. Second, we describe a common protocol for the computational and experimental analyses of TM. Third, we provide a bioinformatics approach for the prediction of TM motifs potentially cross-targeting both members within the same or from different miRNA families, based on the identification of consensus miRNA-binding sites from known TMs across sequenced genomes, transcriptomes and known miRNAs. This computational approach is promising because, in contrast to animals, miRNA families in plants are large with identical or similar members, several of which are also highly conserved. From the three consensus TM motifs found with our approach: MIM166, MIM171 and MIM159/319, the last one has found strong support on the recent experimental work by Reichel and Millar [Specificity of plant microRNA TMs: cross-targeting of mir159 and mir319. J Plant Physiol 2015;180:45-8]. Finally, we stress the discussion on the major computational and associated experimental challenges that have to be faced in future ceRNA studies.


Assuntos
Biologia Computacional , Mimetismo Molecular , Plantas/genética , RNA de Plantas/genética , MicroRNAs/genética
4.
Bioinformatics ; 35(19): 3873-3874, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30874795

RESUMO

MOTIVATION: Mirtrons arise from short introns with atypical cleavage by using the splicing mechanism. In the current literature, there is no repository centralizing and organizing the data available to the public. To fill this gap, we developed mirtronDB, the first knowledge database dedicated to mirtron, and it is available at http://mirtrondb.cp.utfpr.edu.br/. MirtronDB currently contains a total of 1407 mirtron precursors and 2426 mirtron mature sequences in 18 species. RESULTS: Through a user-friendly interface, users can now browse and search mirtrons by organism, organism group, type and name. MirtronDB is a specialized resource that provides free and user-friendly access to knowledge on mirtron data. AVAILABILITY AND IMPLEMENTATION: MirtronDB is available at http://mirtrondb.cp.utfpr.edu.br/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Conhecimento , Íntrons , MicroRNAs , Splicing de RNA , Software
5.
BMC Genomics ; 19(1): 556, 2018 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-30055586

RESUMO

BACKGROUND: Streptococcus agalactiae, also known as Group B Streptococcus (GBS), is a Gram-positive bacterium that colonizes the gastrointestinal and genitourinary tract of humans. This bacterium has also been isolated from various animals, such as fish and cattle. Non-coding RNAs (ncRNAs) can act as regulators of gene expression in bacteria, such as Streptococcus pneumoniae and Streptococcus pyogenes. However, little is known about the genomic distribution of ncRNAs and RNA families in S. agalactiae. RESULTS: Comparative genome analysis of 27 S. agalactiae strains showed more than 5 thousand genomic regions identified and classified as Core, Exclusive, and Shared genome sequences. We identified 27 to 89 RNA families per genome distributed over these regions, from these, 25 were in Core regions while Shared and Exclusive regions showed variations amongst strains. We propose that the amount and type of ncRNA present in each genome can provide a pattern to contribute in the identification of the clonal types. CONCLUSIONS: The identification of RNA families provides an insight over ncRNAs, sRNAs and ribozymes function, that can be further explored as targets for antibiotic development or studied in gene regulation of cellular processes. RNA families could be considered as markers to determine infection capabilities of different strains. Lastly, pan-genome analysis of GBS including the full range of functional transcripts provides a broader approach in the understanding of this pathogen.


Assuntos
Genoma Bacteriano , RNA não Traduzido/genética , Streptococcus agalactiae/genética , Anotação de Sequência Molecular , RNA não Traduzido/classificação
6.
Mem Inst Oswaldo Cruz ; 113(6): e180053, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29846381

RESUMO

The mosquito Aedes aegypti is the main vector of several arthropod-borne diseases that have global impacts. In a previous meta-analysis, our group identified a vector gene set containing 110 genes strongly associated with infections of dengue, West Nile and yellow fever viruses. Of these 110 genes, four genes allowed a highly accurate classification of infected status. More recently, a new study of Ae. aegypti infected with Zika virus (ZIKV) was published, providing new data to investigate whether this "infection" gene set is also altered during a ZIKV infection. Our hypothesis is that the infection-associated signature may also serve as a proxy to classify the ZIKV infection in the vector. Raw data associated with the NCBI/BioProject were downloaded and re-analysed. A total of 18 paired-end replicates corresponding to three ZIKV-infected samples and three controls were included in this study. The nMDS technique with a logistic regression was used to obtain the probabilities of belonging to a given class. Thus, to compare both gene sets, we used the area under the curve and performed a comparison using the bootstrap method. Our meta-signature was able to separate the infected mosquitoes from the controls with good predictive power to classify the Zika-infected mosquitoes.


Assuntos
Aedes/virologia , Mosquitos Vetores/virologia , Transcriptoma , Zika virus/genética , Animais , Zika virus/isolamento & purificação , Infecção por Zika virus/transmissão
7.
Comput Struct Biotechnol J ; 23: 22-33, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38075396

RESUMO

The Rubiaceae plant family, comprising 3 subfamilies and over 13,000 species, is known for producing significant bioactive compounds such as caffeine and monoterpene indole alkaloids. Despite an increase in available genomes from the Rubiaceae family over the past decade, a systematic analysis of the metabolic gene clusters (MGCs) encoded by these genomes has been lacking. In this study, we aim to identify and analyze metabolic gene clusters within complete Rubiaceae genomes through a comparative analysis of eight species. Applying two bioinformatics pipelines, we identified 2372 candidate MGCs, organized into 549 gene cluster families (GCFs). To enhance the reliability of these findings, we developed coexpression networks and conducted orthology analyses. Using genomic data from Solanum lycopersicum (Solanaceae) for comparative purposes, we provided a detailed view of predicted metabolic enzymes, pathways, and coexpression networks. We bring some examples of MGCs and GCFs involved in biological pathways of terpenes, saccharides and alkaloids. Such insights lay the groundwork for discovering new compounds and associated MGCs within the Rubiaceae family, with potential implications in developing more robust crop species and expanding the understanding of plant metabolism. This large-scale exploration also provides a new perspective on the evolution and structure-function relationship of these clusters, offering opportunities for the highly efficient utilization of these unique metabolites. The outcome of this study contributes to a broader comprehension of the biosynthetic pathways, elucidating multiple aspects of specialized metabolism and offering innovative avenues for biotechnological applications.

8.
Adv Protein Chem Struct Biol ; 139: 289-334, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38448139

RESUMO

Studies focusing on characterizing circRNAs with the potential to translate into peptides are quickly advancing. It is helping to elucidate the roles played by circRNAs in several biological processes, especially in the emergence and development of diseases. While various tools are accessible for predicting coding regions within linear sequences, none have demonstrated accurate open reading frame detection in circular sequences, such as circRNAs. Here, we present cirCodAn, a novel tool designed to predict coding regions in circRNAs. We evaluated the performance of cirCodAn using datasets of circRNAs with strong translation evidence and showed that cirCodAn outperformed the other tools available to perform a similar task. Our findings demonstrate the applicability of cirCodAn to identify coding regions in circRNAs, which reveals the potential of use of cirCodAn in future research focusing on elucidating the biological roles of circRNAs and their encoded proteins. cirCodAn is freely available at https://github.com/denilsonfbar/cirCodAn.


Assuntos
RNA Circular , Fases de Leitura Aberta/genética
9.
Noncoding RNA ; 10(4)2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39195572

RESUMO

This is a mini-review capturing the views and opinions of selected participants at the 2021 IEEE BIBM 3rd Annual LncRNA Workshop, held in Dubai, UAE. The views and opinions are expressed on five broad themes related to problems in lncRNA, namely, challenges in the computational analysis of lncRNAs, lncRNAs and cancer, lncRNAs in sports, lncRNAs and COVID-19, and lncRNAs in human brain activity.

10.
Nat Genet ; 56(4): 721-731, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38622339

RESUMO

Coffea arabica, an allotetraploid hybrid of Coffea eugenioides and Coffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ~30.5 thousand years ago, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding of C. arabica.


Assuntos
Coffea , Coffea/genética , Café , Genoma de Planta/genética , Metagenômica , Melhoramento Vegetal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA