Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros










Intervalo de año de publicación
1.
Algorithms Mol Biol ; 17(1): 1, 2022 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-35033127

RESUMEN

BACKGROUND: SORTING BY TRANSPOSITIONS (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be [Formula: see text]-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation [Formula: see text] into a simple permutation [Formula: see text], presumably easier to handle with. The permutation [Formula: see text] is obtained by inserting new symbols into [Formula: see text] in a way that the lower bound of the transposition distance of [Formula: see text] is kept on [Formula: see text]. The simplification is guaranteed to keep the lower bound, not the transposition distance. A sequence of operations sorting [Formula: see text] can be mimicked to sort [Formula: see text]. RESULTS AND CONCLUSIONS: First, using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all [Formula: see text]. Next, motivated by a problem identified in the EH algorithm, which causes it, in scenarios involving how the input permutation is simplified, to require one extra transposition above the 1.375-approximation ratio, we propose a new approximation algorithm to solve SBT ensuring the 1.375-approximation ratio for all [Formula: see text]. We implemented our algorithm and EH's. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, [Formula: see text]. The results show that the EH algorithm exceeds the approximation ratio of 1.375 for permutations with a size greater than 7. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are also compared with others available in the literature. Finally, we investigate the performance of both implementations on longer permutations of maximum length 500. From the experiments, we conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher.

2.
Theory Biosci ; 139(4): 349-359, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33219910

RESUMEN

Many small nucleolar RNAs and many of the hairpin precursors of miRNAs are processed from long non-protein-coding host genes. In contrast to their highly conserved and heavily structured payload, the host genes feature poorly conserved sequences. Nevertheless, there is mounting evidence that the host genes have biological functions beyond their primary task of carrying a ncRNA as payload. So far, no connections between the function of the host genes and the function of their payloads have been reported. Here we investigate whether there is evidence for an association of host gene function or mechanisms with the type of payload. To assess this hypothesis we test whether the miRNA host genes (MIRHGs), snoRNA host genes (SNHGs), and other lncRNA host genes can be distinguished based on sequence and/or structure features unrelated to their payload. A positive answer would imply a functional and mechanistic correlation between host genes and their payload, provided the classification does not depend on the presence and type of the payload. A negative answer would indicate that to the extent that secondary functions are acquired, they are not strongly constrained by the prior, primary function of the payload. We find that the three classes can be distinguished reliably when the classifier is allowed to extract features from the payloads. They become virtually indistinguishable, however, as soon as only sequence and structure of parts of the host gene distal from the snoRNAs or miRNA payload is used for classification. This indicates that the functions of MIRHGs and SNHGs are largely independent of the functions of their payloads. Furthermore, there is no evidence that the MIRHGs and SNHGs form coherent classes of long non-coding RNAs distinguished by features other than their payloads.


Asunto(s)
MicroARNs , ARN Largo no Codificante , MicroARNs/genética , ARN Largo no Codificante/genética , ARN Nucleolar Pequeño/genética
3.
Genes (Basel) ; 9(8)2018 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-30049970

RESUMEN

The telomerase RNA in yeasts is large, usually >1000 nt, and contains functional elements that have been extensively studied experimentally in several disparate species. Nevertheless, they are very difficult to detect by homology-based methods and so far have escaped annotation in the majority of the genomes of Saccharomycotina. This is a consequence of sequences that evolve rapidly at nucleotide level, are subject to large variations in size, and are highly plastic with respect to their secondary structures. Here, we report on a survey that was aimed at closing this gap in RNA annotation. Despite considerable efforts and the combination of a variety of different methods, it was only partially successful. While 27 new telomerase RNAs were identified, we had to restrict our efforts to the subgroup Saccharomycetacea because even this narrow subgroup was diverse enough to require different search models for different phylogenetic subgroups. More distant branches of the Saccharomycotina remain without annotated telomerase RNA.

4.
BMC Genomics ; 18(1): 804, 2017 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-29047334

RESUMEN

BACKGROUND: In recent years, a rapidly increasing number of RNA transcripts has been generated by thousands of sequencing projects around the world, creating enormous volumes of transcript data to be analyzed. An important problem to be addressed when analyzing this data is distinguishing between long non-coding RNAs (lncRNAs) and protein coding transcripts (PCTs). Thus, we present a Support Vector Machine (SVM) based method to distinguish lncRNAs from PCTs, using features based on frequencies of nucleotide patterns and ORF lengths, in transcripts. METHODS: The proposed method is based on SVM and uses the first ORF relative length and frequencies of nucleotide patterns selected by PCA as features. FASTA files were used as input to calculate all possible features. These features were divided in two sets: (i) 336 frequencies of nucleotide patterns; and (ii) 4 features derived from ORFs. PCA were applied to the first set to identify 6 groups of frequencies that could most contribute to the distinction. Twenty-four experiments using the 6 groups from the first set and the features from the second set where built to create the best model to distinguish lncRNAs from PCTs. RESULTS: This method was trained and tested with human (Homo sapiens), mouse (Mus musculus) and zebrafish (Danio rerio) data, achieving 98.21%, 98.03% and 96.09%, accuracy, respectively. Our method was compared to other tools available in the literature (CPAT, CPC, iSeeRNA, lncRNApred, lncRScan-SVM and FEELnc), and showed an improvement in accuracy by ≈3.00%. In addition, to validate our model, the mouse data was classified with the human model, and vice-versa, achieving ≈97.80% accuracy in both cases, showing that the model is not overfit. The SVM models were validated with data from rat (Rattus norvegicus), pig (Sus scrofa) and fruit fly (Drosophila melanogaster), and obtained more than 84.00% accuracy in all these organisms. Our results also showed that 81.2% of human pseudogenes and 91.7% of mouse pseudogenes were classified as non-coding. Moreover, our method was capable of re-annotating two uncharacterized sequences of Swiss-Prot database with high probability of being lncRNAs. Finally, in order to use the method to annotate transcripts derived from RNA-seq, previously identified lncRNAs of human, gorilla (Gorilla gorilla) and rhesus macaque (Macaca mulatta) were analyzed, having successfully classified 98.62%, 80.8% and 91.9%, respectively. CONCLUSIONS: The SVM method proposed in this work presents high performance to distinguish lncRNAs from PCTs, as shown in the results. To build the model, besides using features known in the literature regarding ORFs, we used PCA to identify features among nucleotide pattern frequencies that contribute the most in distinguishing lncRNAs from PCTs, in reference data sets. Interestingly, models created with two evolutionary distant species could distinguish lncRNAs of even more distant species.


Asunto(s)
Biología Computacional/métodos , Sistemas de Lectura Abierta/genética , ARN no Traducido/genética , Máquina de Vectores de Soporte , Animales , Humanos , Ratones , Anotación de Secuencia Molecular , ARN Mensajero/genética , Pez Cebra/genética
5.
Noncoding RNA ; 3(1)2017 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-29657283

RESUMEN

Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

6.
J Bioinform Comput Biol ; 13(6): 1550021, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26223200

RESUMEN

Noncoding RNAs (ncRNAs) have been focus of intense research over the last few years. Since characteristics and signals of ncRNAs are not entirely known, researchers use different computational tools together with their biological knowledge to predict putative ncRNAs. In this context, this work presents ncRNA-Agents, a multi-agent system to annotate ncRNAs based on the output of different tools, using inference rules to simulate biologists' reasoning. Experiments with data from the fungus Saccharomyces cerevisiae allowed to measure the performance of ncRNA-Agents, with better sensibility, when compared to Infernal, a widely used tool for annotating ncRNA. Besides, data of the Schizosaccharomyces pombe and Paracoccidioides brasiliensis fungi identified novel putative ncRNAs, which demonstrated the usefulness of our approach. NcRNA-Agents can be be found at: http://www.biomol.unb.br/ncrna-agents.


Asunto(s)
Biología Computacional/métodos , ARN no Traducido/genética , Programas Informáticos , Bases de Datos Genéticas , Anotación de Secuencia Molecular/métodos , Paracoccidioides/genética , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética
8.
J Comput Biol ; 20(1): 30-7, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23294270

RESUMEN

The live phylogeny problem generalizes the phylogeny problem while admitting the existence of living ancestors among the taxonomic objects. This problem suits the case of fast-evolving species, like virus, and the construction of phylogenies for nonbiological objects like documents, images, and database records. In this article, we formalize the live phylogeny problem for distances and character states and introduce polynomial-time algorithms for particular versions of the problems. We believe that more general versions of the problems are NP-hard and that many heuristic and approximation approaches may be developed as solution strategies.


Asunto(s)
Algoritmos , Filogenia , Biología Computacional , Evolución Molecular , Conceptos Matemáticos
9.
BMC Bioinformatics ; 14 Suppl 11: S6, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564294

RESUMEN

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Bacillus cereus/genética , Genoma , Flujo de Trabajo
10.
Genes (Basel) ; 3(3): 378-90, 2012 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-24704975

RESUMEN

The Rfam database contains information about non-coding RNAs emphasizing their secondary structures and organizing them into families of homologous RNA genes or functional RNA elements. Recently, a higher order organization of Rfam in terms of the so-called clans was proposed along with its "decimal release". In this proposition, some of the families have been assigned to clans based on experimental and computational data in order to find related families. In the present work we investigate an alternative classification for the RNA families based on tree edit distance. The resulting clustering recovers some of the Rfam clans. The majority of clans, however, are not recovered by the structural clustering. Instead, they get dispersed into larger clusters, which correspond roughly to well-described RNA classes such as snoRNAs, miRNAs, and CRISPRs. In conclusion, a structure-based clustering can contribute to the elucidation of the relationships among the Rfam families beyond the realm of clans and classes.

11.
Toxicon ; 53(4): 427-36, 2009 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-19708221

RESUMEN

Bothrops atrox is a highly dangerous pit viper in the Brazilian Amazon region. We produced a global catalogue of gene transcripts to identify the main toxin and other protein families present in the B. atrox venom gland. We prepared a directional cDNA library, from which a set of 610 high quality expressed sequence tags (ESTs) were generated by bioinformatics processing. Our data indicated a predominance of transcripts encoding mainly metalloproteinases (59% of the toxins). The expression pattern of the B. atrox venom was similar to Bothrops insularis, Bothrops jararaca and Bothrops jararacussu in terms of toxin type, although some differences were observed. B. atrox showed a higher amount of the PIII class of metalloproteinases which correlates well with the observed intense hemorrhagic action of its toxin. Also, the PLA2 content was the second highest in this sample compared to the other three Bothrops transcriptomes. To our knowledge, this work is the first transcriptome analysis of an Amazonian rain forest pit viper and it will contribute to the body of knowledge regarding the gene diversity of the venom gland of members of the Bothrops genus. Moreover, our results can be used for future studies with other snake species from the Amazon region to investigate differences in gene patterns or phylogenetic relationships.


Asunto(s)
Bothrops/fisiología , Venenos de Crotálidos/metabolismo , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica , Animales , Venenos de Crotálidos/genética , Masculino
12.
Toxicon ; 53(4): 427-436, Jan 19, 2009.
Artículo en Inglés | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP, SESSP-IBACERVO | ID: biblio-1068238

RESUMEN

Bothrops atrox is a highly dangerous pit viper in the Brazilian Amazon region.We produced a global catalogue of gene transcripts to identify the main toxin and other protein families present in the B. atrox venom gland. We prepared a directional cDNA library, from which a set of 610 high quality expressed sequence tags (ESTs) were generated by bioinformatics processing. Our data indicated a predominance of transcripts encoding mainly metalloproteinases(59% of the toxins). The expression pattern of the B. atrox venom was similar to Bothrops insularis, Bothrops jararaca and Bothrops jararacussu in terms of toxin type, although some differences were observed. B. atrox showed a higher amount of the PIII classof metalloproteinases which correlates well with the observed intense hemorrhagic action of its toxin. Also, the PLA2 content was the second highest in this sample compared to theother three Bothrops transcriptomes. To our knowledge, this work is the first transcriptome analysis of an Amazonian rain forest pit viper and it will contribute to the body of knowledge regarding the gene diversity of the venom gland of members of the Bothropsgenus. Moreover, our results can be used for future studies with other snake species from the Amazon region to investigate differences in gene patterns or phylogenetic relationships.


Asunto(s)
Animales , Bothrops/clasificación , Etiquetas de Secuencia Expresada , Metaloproteasas/análisis , Transcriptoma , Venenos de Serpiente/toxicidad , Variación Genética/genética
13.
Genet Mol Res ; 4(3): 590-8, 2005 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-16342044

RESUMEN

Interpro is a widely used tool for protein annotation in genome sequencing projects, demanding a large amount of computation and representing a huge time-consuming step. We present a strategy to execute programs using databases Pfam, PROSITE and ProDom of Interpro in a distributed environment using a Java-based messaging system. We developed a two-layer scheduling architecture of the distributed infrastructure. Then, we made experiments and analyzed the results. Our distributed system gave much better results than Interpro Pfam, PROSITE and ProDom running in a centralized platform. This approach seems to be appropriate and promising for highly demanding computational tools used for biological applications.


Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Proyecto Genoma Humano , Análisis de Secuencia de Proteína/métodos , Bases de Datos Factuales , Humanos , Alineación de Secuencia
14.
Genet Mol Res ; 4(2): 203-15, 2005 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-16110442

RESUMEN

Paracoccidioides brasiliensis is the etiological agent of paracoccidioidomycosis, an endemic mycosis of Latin America. This fungus presents a dimorphic character; it grows as a mycelium at room temperature, but it is isolated as yeast from infected individuals. It is believed that the transition from mycelium to yeast is important for the infective process. The Functional and Differential Genome of Paracoccidioides brasiliensis Project--PbGenome Project was developed to study the infection process by analyzing expressed sequence tags--ESTs, isolated from both mycelial and yeast forms. The PbGenome Project was executed by a consortium that included 70 researchers (professors and students) from two sequencing laboratories of the midwest region of Brazil; this project produced 25,741 ESTs, 19,718 of which with sufficient quality to be analyzed. We describe the computational procedures used to receive process, analyze these ESTs, and help with their functional annotations; we also detail the services that were used for sequence data exploration. Various programs were compared for filtering and grouping the sequences, and they were adapted to a user-friendly interface. This system made the analysis of the differential transcriptome of P. brasiliensis possible.


Asunto(s)
Biología Computacional/métodos , Etiquetas de Secuencia Expresada , Genoma Fúngico/genética , Paracoccidioides/genética , Transcripción Genética/genética , Brasil , Regulación Fúngica de la Expresión Génica/genética , Interfaz Usuario-Computador
15.
J Biol Chem ; 280(26): 24706-14, 2005 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-15849188

RESUMEN

Paracoccidioides brasiliensis is the causative agent of paracoccidioidomycosis, a disease that affects 10 million individuals in Latin America. This report depicts the results of the analysis of 6,022 assembled groups from mycelium and yeast phase expressed sequence tags, covering about 80% of the estimated genome of this dimorphic, thermo-regulated fungus. The data provide a comprehensive view of the fungal metabolism, including overexpressed transcripts, stage-specific genes, and also those that are up- or down-regulated as assessed by in silico electronic subtraction and cDNA microarrays. Also, a significant differential expression pattern in mycelium and yeast cells was detected, which was confirmed by Northern blot analysis, providing insights into differential metabolic adaptations. The overall transcriptome analysis provided information about sequences related to the cell cycle, stress response, drug resistance, and signal transduction pathways of the pathogen. Novel P. brasiliensis genes have been identified, probably corresponding to proteins that should be addressed as virulence factor candidates and potential new drug targets.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Genoma Fúngico , Micelio/metabolismo , Paracoccidioides/metabolismo , Transcripción Genética , Northern Blotting , ADN Complementario/metabolismo , Regulación hacia Abajo , Etiquetas de Secuencia Expresada , Biblioteca de Genes , Internet , Modelos Biológicos , Datos de Secuencia Molecular , Análisis de Secuencia por Matrices de Oligonucleótidos , Paracoccidioides/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ADN , Transducción de Señal , Regulación hacia Arriba
16.
Rev Iberoam Micol ; 22(4): 203-12, 2005 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-16499412

RESUMEN

Paracoccidioides brasiliensis is a dimorphic and thermo-regulated fungus which is the causative agent of paracoccidioidomycosis, an endemic disease widespread in Latin America that affects 10 million individuals. Pathogenicity is assumed to be a consequence of the dimorphic transition from mycelium to yeast cells during human infection. This review shows the results of the P. brasiliensis transcriptome project which generated 6,022 assembled groups from mycelium and yeast phases. Computer analysis using the tools of bioinformatics revealed several aspects from the transcriptome of this pathogen such as: general and differential metabolism in mycelium and yeast cells; cell cycle, DNA replication, repair and recombination; RNA biogenesis apparatus; translation and protein fate machineries; cell wall; hydrolytic enzymes; proteases; GPI-anchored proteins; molecular chaperones; insights into drug resistance and transporters; oxidative stress response and virulence. The present analysis has provided a more comprehensive view of some specific features considered relevant for the understanding of basic and applied knowledge of P. brasiliensis.


Asunto(s)
Genoma Fúngico , Paracoccidioides/genética , Pared Celular/metabolismo , Quitosano/metabolismo , Farmacorresistencia Fúngica/genética , Proteínas Fúngicas/genética , Perfilación de la Expresión Génica , Genes Fúngicos , Humanos , América Latina/epidemiología , Chaperonas Moleculares/genética , Estrés Oxidativo/genética , Paracoccidioides/ultraestructura , Paracoccidioidomicosis/epidemiología , Paracoccidioidomicosis/microbiología , Transcripción Genética , Virulencia/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...