Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
BMC Genomics ; 18(1): 804, 2017 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-29047334

RESUMO

BACKGROUND: In recent years, a rapidly increasing number of RNA transcripts has been generated by thousands of sequencing projects around the world, creating enormous volumes of transcript data to be analyzed. An important problem to be addressed when analyzing this data is distinguishing between long non-coding RNAs (lncRNAs) and protein coding transcripts (PCTs). Thus, we present a Support Vector Machine (SVM) based method to distinguish lncRNAs from PCTs, using features based on frequencies of nucleotide patterns and ORF lengths, in transcripts. METHODS: The proposed method is based on SVM and uses the first ORF relative length and frequencies of nucleotide patterns selected by PCA as features. FASTA files were used as input to calculate all possible features. These features were divided in two sets: (i) 336 frequencies of nucleotide patterns; and (ii) 4 features derived from ORFs. PCA were applied to the first set to identify 6 groups of frequencies that could most contribute to the distinction. Twenty-four experiments using the 6 groups from the first set and the features from the second set where built to create the best model to distinguish lncRNAs from PCTs. RESULTS: This method was trained and tested with human (Homo sapiens), mouse (Mus musculus) and zebrafish (Danio rerio) data, achieving 98.21%, 98.03% and 96.09%, accuracy, respectively. Our method was compared to other tools available in the literature (CPAT, CPC, iSeeRNA, lncRNApred, lncRScan-SVM and FEELnc), and showed an improvement in accuracy by ≈3.00%. In addition, to validate our model, the mouse data was classified with the human model, and vice-versa, achieving ≈97.80% accuracy in both cases, showing that the model is not overfit. The SVM models were validated with data from rat (Rattus norvegicus), pig (Sus scrofa) and fruit fly (Drosophila melanogaster), and obtained more than 84.00% accuracy in all these organisms. Our results also showed that 81.2% of human pseudogenes and 91.7% of mouse pseudogenes were classified as non-coding. Moreover, our method was capable of re-annotating two uncharacterized sequences of Swiss-Prot database with high probability of being lncRNAs. Finally, in order to use the method to annotate transcripts derived from RNA-seq, previously identified lncRNAs of human, gorilla (Gorilla gorilla) and rhesus macaque (Macaca mulatta) were analyzed, having successfully classified 98.62%, 80.8% and 91.9%, respectively. CONCLUSIONS: The SVM method proposed in this work presents high performance to distinguish lncRNAs from PCTs, as shown in the results. To build the model, besides using features known in the literature regarding ORFs, we used PCA to identify features among nucleotide pattern frequencies that contribute the most in distinguishing lncRNAs from PCTs, in reference data sets. Interestingly, models created with two evolutionary distant species could distinguish lncRNAs of even more distant species.


Assuntos
Biologia Computacional/métodos , Fases de Leitura Aberta/genética , RNA não Traduzido/genética , Máquina de Vetores de Suporte , Animais , Humanos , Camundongos , Anotação de Sequência Molecular , RNA Mensageiro/genética , Peixe-Zebra/genética
2.
BMC Bioinformatics ; 17(Suppl 18): 464, 2016 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28105919

RESUMO

BACKGROUND: snoReport uses RNA secondary structure prediction combined with machine learning as the basis to identify the two main classes of small nucleolar RNAs, the box H/ACA snoRNAs and the box C/D snoRNAs. Here, we present snoReport 2.0, which substantially improves and extends in the original method by: extracting new features for both box C/D and H/ACA box snoRNAs; developing a more sophisticated technique in the SVM training phase with recent data from vertebrate organisms and a careful choice of the SVM parameters C and γ; and using updated versions of tools and databases used for the construction of the original version of snoReport. To validate the new version and to demonstrate its improved performance, we tested snoReport 2.0 in different organisms. RESULTS: Results of the training and test phases of boxes H/ACA and C/D snoRNAs, in both versions of snoReport, are discussed. Validation on real data was performed to evaluate the predictions of snoReport 2.0. Our program was applied to a set of previously annotated sequences, some of them experimentally confirmed, of humans, nematodes, drosophilids, platypus, chickens and leishmania. We significantly improved the predictions for vertebrates, since the training phase used information of these organisms, but H/ACA box snoRNAs identification was improved for the other ones. CONCLUSION: We presented snoReport 2.0, to predict H/ACA box and C/D box snoRNAs, an efficient method to find true positives and avoid false positives in vertebrate organisms. H/ACA box snoRNA classifier showed an F-score of 93 % (an improvement of 10 % regarding the previous version), while C/D box snoRNA classifier, an F-Score of 94 % (improvement of 14 %). Besides, both classifiers exhibited performance measures above 90 %. These results show that snoReport 2.0 avoid false positives and false negatives, allowing to predict snoRNAs with high quality. In the validation phase, snoReport 2.0 predicted 67.43 % of vertebrate organisms for both classes. For Nematodes and Drosophilids, 69 % and 76.67 %, for H/ACA box snoRNAs were predicted, respectively, showing that snoReport 2.0 is good to identify snoRNAs in vertebrates and also H/ACA box snoRNAs in invertebrates organisms.


Assuntos
Biologia Computacional/métodos , Eucariotos/genética , RNA Nucleolar Pequeno/química , Máquina de Vetores de Suporte , Animais , Sequência de Bases , Biologia Computacional/instrumentação , Eucariotos/química , Humanos , Dados de Sequência Molecular , RNA Nucleolar Pequeno/genética , Vertebrados/genética
3.
PLoS Genet ; 7(10): e1002345, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22046142

RESUMO

Paracoccidioides is a fungal pathogen and the cause of paracoccidioidomycosis, a health-threatening human systemic mycosis endemic to Latin America. Infection by Paracoccidioides, a dimorphic fungus in the order Onygenales, is coupled with a thermally regulated transition from a soil-dwelling filamentous form to a yeast-like pathogenic form. To better understand the genetic basis of growth and pathogenicity in Paracoccidioides, we sequenced the genomes of two strains of Paracoccidioides brasiliensis (Pb03 and Pb18) and one strain of Paracoccidioides lutzii (Pb01). These genomes range in size from 29.1 Mb to 32.9 Mb and encode 7,610 to 8,130 genes. To enable genetic studies, we mapped 94% of the P. brasiliensis Pb18 assembly onto five chromosomes. We characterized gene family content across Onygenales and related fungi, and within Paracoccidioides we found expansions of the fungal-specific kinase family FunK1. Additionally, the Onygenales have lost many genes involved in carbohydrate metabolism and fewer genes involved in protein metabolism, resulting in a higher ratio of proteases to carbohydrate active enzymes in the Onygenales than their relatives. To determine if gene content correlated with growth on different substrates, we screened the non-pathogenic onygenale Uncinocarpus reesii, which has orthologs for 91% of Paracoccidioides metabolic genes, for growth on 190 carbon sources. U. reesii showed growth on a limited range of carbohydrates, primarily basic plant sugars and cell wall components; this suggests that Onygenales, including dimorphic fungi, can degrade cellulosic plant material in the soil. In addition, U. reesii grew on gelatin and a wide range of dipeptides and amino acids, indicating a preference for proteinaceous growth substrates over carbohydrates, which may enable these fungi to also degrade animal biomass. These capabilities for degrading plant and animal substrates suggest a duality in lifestyle that could enable pathogenic species of Onygenales to transfer from soil to animal hosts.


Assuntos
Onygenales/genética , Paracoccidioides/genética , Paracoccidioidomicose/microbiologia , Proteínas Quinases/genética , Metabolismo dos Carboidratos/genética , Sistemas de Liberação de Medicamentos , Evolução Molecular , Genoma Fúngico , Genoma Mitocondrial/genética , Humanos , Família Multigênica/genética , Onygenales/enzimologia , Paracoccidioides/enzimologia , Filogenia , Proteólise , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA
4.
BMC Bioinformatics ; 14 Suppl 11: S6, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564294

RESUMO

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.


Assuntos
Biologia Computacional/métodos , Software , Bacillus cereus/genética , Genoma , Fluxo de Trabalho
5.
Algorithms Mol Biol ; 17(1): 1, 2022 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-35033127

RESUMO

BACKGROUND: SORTING BY TRANSPOSITIONS (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be [Formula: see text]-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation [Formula: see text] into a simple permutation [Formula: see text], presumably easier to handle with. The permutation [Formula: see text] is obtained by inserting new symbols into [Formula: see text] in a way that the lower bound of the transposition distance of [Formula: see text] is kept on [Formula: see text]. The simplification is guaranteed to keep the lower bound, not the transposition distance. A sequence of operations sorting [Formula: see text] can be mimicked to sort [Formula: see text]. RESULTS AND CONCLUSIONS: First, using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all [Formula: see text]. Next, motivated by a problem identified in the EH algorithm, which causes it, in scenarios involving how the input permutation is simplified, to require one extra transposition above the 1.375-approximation ratio, we propose a new approximation algorithm to solve SBT ensuring the 1.375-approximation ratio for all [Formula: see text]. We implemented our algorithm and EH's. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, [Formula: see text]. The results show that the EH algorithm exceeds the approximation ratio of 1.375 for permutations with a size greater than 7. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are also compared with others available in the literature. Finally, we investigate the performance of both implementations on longer permutations of maximum length 500. From the experiments, we conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher.

6.
Front Oncol ; 11: 681579, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34178670

RESUMO

BACKGROUND: Colorectal cancer (CRC) is a heterogeneous cancer. Its treatment depends on its anatomical site and distinguishes between colon, rectum, and rectosigmoid junction cancer. This study aimed to identify diagnostic and prognostic biomarkers using networks of CRC-associated transcripts that can be built based on competing endogenous RNAs (ceRNA). METHODS: RNA expression and clinical information data of patients with colon, rectum, and rectosigmoid junction cancer were obtained from The Cancer Genome Atlas (TCGA). The RNA expression profiles were assessed through bioinformatics analysis, and a ceRNA was constructed for each CRC site. A functional enrichment analysis was performed to assess the functional roles of the ceRNA networks in the prognosis of colon, rectum, and rectosigmoid junction cancer. Finally, to verify the ceRNA impact on prognosis, an overall survival analysis was performed. RESULTS: The study identified various CRC site-specific prognosis biomarkers: hsa-miR-1271-5p, NRG1, hsa-miR-130a-3p, SNHG16, and hsa-miR-495-3p in the colon; E2F8 in the rectum and DMD and hsa-miR-130b-3p in the rectosigmoid junction. We also identified different biological pathways that highlight differences in CRC behavior at different anatomical sites, thus reinforcing the importance of correctly identifying the tumor site. CONCLUSIONS: Several potential prognostic markers for colon, rectum, and rectosigmoid junction cancer were found. CeRNA networks could provide better understanding of the differences between, and common factors in, prognosis of colon, rectum, and rectosigmoid junction cancer.

7.
Theory Biosci ; 139(4): 349-359, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33219910

RESUMO

Many small nucleolar RNAs and many of the hairpin precursors of miRNAs are processed from long non-protein-coding host genes. In contrast to their highly conserved and heavily structured payload, the host genes feature poorly conserved sequences. Nevertheless, there is mounting evidence that the host genes have biological functions beyond their primary task of carrying a ncRNA as payload. So far, no connections between the function of the host genes and the function of their payloads have been reported. Here we investigate whether there is evidence for an association of host gene function or mechanisms with the type of payload. To assess this hypothesis we test whether the miRNA host genes (MIRHGs), snoRNA host genes (SNHGs), and other lncRNA host genes can be distinguished based on sequence and/or structure features unrelated to their payload. A positive answer would imply a functional and mechanistic correlation between host genes and their payload, provided the classification does not depend on the presence and type of the payload. A negative answer would indicate that to the extent that secondary functions are acquired, they are not strongly constrained by the prior, primary function of the payload. We find that the three classes can be distinguished reliably when the classifier is allowed to extract features from the payloads. They become virtually indistinguishable, however, as soon as only sequence and structure of parts of the host gene distal from the snoRNAs or miRNA payload is used for classification. This indicates that the functions of MIRHGs and SNHGs are largely independent of the functions of their payloads. Furthermore, there is no evidence that the MIRHGs and SNHGs form coherent classes of long non-coding RNAs distinguished by features other than their payloads.


Assuntos
MicroRNAs , RNA Longo não Codificante , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Nucleolar Pequeno/genética
8.
J Fungi (Basel) ; 6(4)2020 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-33255176

RESUMO

Most people infected with the fungus Paracoccidioides spp. do not get sick, but approximately 5% develop paracoccidioidomycosis. Understanding how host immunity determinants influence disease development could lead to novel preventative or therapeutic strategies; hence, we used two mouse strains that are resistant (A/J) or susceptible (B10.A) to P. brasiliensis to study how dendritic cells (DCs) respond to the infection. RNA sequencing analysis showed that the susceptible strain DCs remodeled their transcriptomes much more intensely than those from the resistant strain, agreeing with a previous model of more intense innate immunity response in the susceptible strain. Contrastingly, these cells also repress genes/processes involved in antigen processing and presentation, such as lysosomal activity and autophagy. After the interaction with P. brasiliensis, both DCs and macrophages from the susceptible mouse reduced the autophagy marker LC3-II recruitment to the fungal phagosome compared to the resistant strain cells, confirming this pathway's repression. These results suggest that impairment in antigen processing and presentation processes might be partially responsible for the inefficient activation of the adaptive immune response in this model.

9.
Evol Bioinform Online ; 15: 1176934319889974, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31839702

RESUMO

Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. Cloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We present here the results and a guide for the deployment of a cloud environment for Bioinformatics exploring the characteristics of various NoSQL database systems to persist provenance data.

10.
Genes (Basel) ; 9(8)2018 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-30049970

RESUMO

The telomerase RNA in yeasts is large, usually >1000 nt, and contains functional elements that have been extensively studied experimentally in several disparate species. Nevertheless, they are very difficult to detect by homology-based methods and so far have escaped annotation in the majority of the genomes of Saccharomycotina. This is a consequence of sequences that evolve rapidly at nucleotide level, are subject to large variations in size, and are highly plastic with respect to their secondary structures. Here, we report on a survey that was aimed at closing this gap in RNA annotation. Despite considerable efforts and the combination of a variety of different methods, it was only partially successful. While 27 new telomerase RNAs were identified, we had to restrict our efforts to the subgroup Saccharomycetacea because even this narrow subgroup was diverse enough to require different search models for different phylogenetic subgroups. More distant branches of the Saccharomycotina remain without annotated telomerase RNA.

11.
Noncoding RNA ; 3(1)2017 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-29657283

RESUMO

Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

12.
Noncoding RNA ; 3(4)2017 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-29657296

RESUMO

Studies have highlighted the importance of non-coding RNA regulation in plant-microbe interaction. However, the roles of sugarcane microRNAs (miRNAs) in the regulation of disease responses have not been investigated. Firstly, we screened the sRNA transcriptome of sugarcane infected with Acidovorax avenae. Conserved and novel miRNAs were identified. Additionally, small interfering RNAs (siRNAs) were aligned to differentially expressed sequences from the sugarcane transcriptome. Interestingly, many siRNAs aligned to a transcript encoding a copper-transporter gene whose expression was induced in the presence of A. avenae, while the siRNAs were repressed in the presence of A. avenae. Moreover, a long intergenic non-coding RNA was identified as a potential target or decoy of miR408. To extend the bioinformatics analysis, we carried out independent inoculations and the expression patterns of six miRNAs were validated by quantitative reverse transcription-PCR (qRT-PCR). Among these miRNAs, miR408-a copper-microRNA-was downregulated. The cleavage of a putative miR408 target, a laccase, was confirmed by a modified 5'RACE (rapid amplification of cDNA ends) assay. MiR408 was also downregulated in samples infected with other pathogens, but it was upregulated in the presence of a beneficial diazotrophic bacteria. Our results suggest that regulation by miR408 is important in sugarcane sensing whether microorganisms are either pathogenic or beneficial, triggering specific miRNA-mediated regulatory mechanisms accordingly.

13.
Rev Iberoam Micol ; 22(4): 203-12, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16499412

RESUMO

Paracoccidioides brasiliensis is a dimorphic and thermo-regulated fungus which is the causative agent of paracoccidioidomycosis, an endemic disease widespread in Latin America that affects 10 million individuals. Pathogenicity is assumed to be a consequence of the dimorphic transition from mycelium to yeast cells during human infection. This review shows the results of the P. brasiliensis transcriptome project which generated 6,022 assembled groups from mycelium and yeast phases. Computer analysis using the tools of bioinformatics revealed several aspects from the transcriptome of this pathogen such as: general and differential metabolism in mycelium and yeast cells; cell cycle, DNA replication, repair and recombination; RNA biogenesis apparatus; translation and protein fate machineries; cell wall; hydrolytic enzymes; proteases; GPI-anchored proteins; molecular chaperones; insights into drug resistance and transporters; oxidative stress response and virulence. The present analysis has provided a more comprehensive view of some specific features considered relevant for the understanding of basic and applied knowledge of P. brasiliensis.


Assuntos
Genoma Fúngico , Paracoccidioides/genética , Parede Celular/metabolismo , Quitosana/metabolismo , Farmacorresistência Fúngica/genética , Proteínas Fúngicas/genética , Perfilação da Expressão Gênica , Genes Fúngicos , Humanos , América Latina/epidemiologia , Chaperonas Moleculares/genética , Estresse Oxidativo/genética , Paracoccidioides/ultraestrutura , Paracoccidioidomicose/epidemiologia , Paracoccidioidomicose/microbiologia , Transcrição Gênica , Virulência/genética
14.
Genet Mol Res ; 4(3): 590-8, 2005 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-16342044

RESUMO

Interpro is a widely used tool for protein annotation in genome sequencing projects, demanding a large amount of computation and representing a huge time-consuming step. We present a strategy to execute programs using databases Pfam, PROSITE and ProDom of Interpro in a distributed environment using a Java-based messaging system. We developed a two-layer scheduling architecture of the distributed infrastructure. Then, we made experiments and analyzed the results. Our distributed system gave much better results than Interpro Pfam, PROSITE and ProDom running in a centralized platform. This approach seems to be appropriate and promising for highly demanding computational tools used for biological applications.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Projeto Genoma Humano , Análise de Sequência de Proteína/métodos , Bases de Dados Factuais , Humanos , Alinhamento de Sequência
15.
Genet Mol Res ; 4(2): 203-15, 2005 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-16110442

RESUMO

Paracoccidioides brasiliensis is the etiological agent of paracoccidioidomycosis, an endemic mycosis of Latin America. This fungus presents a dimorphic character; it grows as a mycelium at room temperature, but it is isolated as yeast from infected individuals. It is believed that the transition from mycelium to yeast is important for the infective process. The Functional and Differential Genome of Paracoccidioides brasiliensis Project--PbGenome Project was developed to study the infection process by analyzing expressed sequence tags--ESTs, isolated from both mycelial and yeast forms. The PbGenome Project was executed by a consortium that included 70 researchers (professors and students) from two sequencing laboratories of the midwest region of Brazil; this project produced 25,741 ESTs, 19,718 of which with sufficient quality to be analyzed. We describe the computational procedures used to receive process, analyze these ESTs, and help with their functional annotations; we also detail the services that were used for sequence data exploration. Various programs were compared for filtering and grouping the sequences, and they were adapted to a user-friendly interface. This system made the analysis of the differential transcriptome of P. brasiliensis possible.


Assuntos
Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Genoma Fúngico/genética , Paracoccidioides/genética , Transcrição Gênica/genética , Brasil , Regulação Fúngica da Expressão Gênica/genética , Interface Usuário-Computador
16.
Int J Genomics ; 2015: 502795, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26558254

RESUMO

Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

17.
J Bioinform Comput Biol ; 13(6): 1550021, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26223200

RESUMO

Noncoding RNAs (ncRNAs) have been focus of intense research over the last few years. Since characteristics and signals of ncRNAs are not entirely known, researchers use different computational tools together with their biological knowledge to predict putative ncRNAs. In this context, this work presents ncRNA-Agents, a multi-agent system to annotate ncRNAs based on the output of different tools, using inference rules to simulate biologists' reasoning. Experiments with data from the fungus Saccharomyces cerevisiae allowed to measure the performance of ncRNA-Agents, with better sensibility, when compared to Infernal, a widely used tool for annotating ncRNA. Besides, data of the Schizosaccharomyces pombe and Paracoccidioides brasiliensis fungi identified novel putative ncRNAs, which demonstrated the usefulness of our approach. NcRNA-Agents can be be found at: http://www.biomol.unb.br/ncrna-agents.


Assuntos
Biologia Computacional/métodos , RNA não Traduzido/genética , Software , Bases de Dados Genéticas , Anotação de Sequência Molecular/métodos , Paracoccidioides/genética , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética
19.
J Comput Biol ; 20(1): 30-7, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23294270

RESUMO

The live phylogeny problem generalizes the phylogeny problem while admitting the existence of living ancestors among the taxonomic objects. This problem suits the case of fast-evolving species, like virus, and the construction of phylogenies for nonbiological objects like documents, images, and database records. In this article, we formalize the live phylogeny problem for distances and character states and introduce polynomial-time algorithms for particular versions of the problems. We believe that more general versions of the problems are NP-hard and that many heuristic and approximation approaches may be developed as solution strategies.


Assuntos
Algoritmos , Filogenia , Biologia Computacional , Evolução Molecular , Conceitos Matemáticos
20.
Genes (Basel) ; 3(3): 378-90, 2012 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-24704975

RESUMO

The Rfam database contains information about non-coding RNAs emphasizing their secondary structures and organizing them into families of homologous RNA genes or functional RNA elements. Recently, a higher order organization of Rfam in terms of the so-called clans was proposed along with its "decimal release". In this proposition, some of the families have been assigned to clans based on experimental and computational data in order to find related families. In the present work we investigate an alternative classification for the RNA families based on tree edit distance. The resulting clustering recovers some of the Rfam clans. The majority of clans, however, are not recovered by the structural clustering. Instead, they get dispersed into larger clusters, which correspond roughly to well-described RNA classes such as snoRNAs, miRNAs, and CRISPRs. In conclusion, a structure-based clustering can contribute to the elucidation of the relationships among the Rfam families beyond the realm of clans and classes.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa