Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 51(D1): D760-D766, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36408900

RESUMO

The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.


Assuntos
Genoma , Células Procarióticas , Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Bactérias/classificação , Bactérias/genética
2.
Plant Cell ; 34(9): 3214-3232, 2022 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-35689625

RESUMO

Fungal interactions with plant roots, either beneficial or detrimental, have a crucial impact on agriculture and ecosystems. The cosmopolitan plant pathogen Fusarium oxysporum (Fo) provokes vascular wilts in more than a hundred different crops. Isolates of this fungus exhibit host-specific pathogenicity, which is conferred by lineage-specific Secreted In Xylem (SIX) effectors encoded on accessory genomic regions. However, such isolates also can colonize the roots of other plants asymptomatically as endophytes or even protect them against pathogenic strains. The molecular determinants of endophytic multihost compatibility are largely unknown. Here, we characterized a set of Fo candidate effectors from tomato (Solanum lycopersicum) root apoplastic fluid; these early root colonization (ERC) effectors are secreted during early biotrophic growth on main and alternative plant hosts. In contrast to SIX effectors, ERCs have homologs across the entire Fo species complex as well as in other plant-interacting fungi, suggesting a conserved role in fungus-plant associations. Targeted deletion of ERC genes in a pathogenic Fo isolate resulted in reduced virulence and rapid activation of plant immune responses, while ERC deletion in a nonpathogenic isolate led to impaired root colonization and biocontrol ability. Strikingly, some ERCs contribute to Fo infection on the nonvascular land plant Marchantia polymorpha, revealing an evolutionarily conserved mechanism for multihost colonization by root infecting fungi.


Assuntos
Fusarium , Solanum lycopersicum , Ecossistema , Doenças das Plantas
3.
Mol Plant Microbe Interact ; 35(1): 39-48, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34546764

RESUMO

Albugo candida is an obligate oomycete pathogen that infects many plants in the Brassicaceae family. We resequenced the genome of isolate Ac2V using PacBio long reads and constructed an assembly augmented by Illumina reads. The Ac2VPB genome assembly is 10% larger and more contiguous compared with a previous version. Our annotation of the new assembly, aided by RNA-sequencing information, revealed a 175% expansion (40 to 110) in the CHxC effector class, which we redefined as "CCG" based on motif analysis. This class of effectors consist of arrays of phylogenetically related paralogs residing in gene sparse regions, and shows signatures of positive selection and presence/absence polymorphism. This work provides a resource that allows the dissection of the genomic components underlying A. candida adaptation and, particularly, the role of CCG effectors in virulence and avirulence on different hosts.[Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY 4.0 International license.


Assuntos
Brassicaceae , Oomicetos , Candida/genética , Genoma , Oomicetos/genética , Doenças das Plantas
4.
Genome Biol ; 22(1): 349, 2021 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-34930397

RESUMO

We have developed an efficient and inexpensive pipeline for streamlining large-scale collection and genome sequencing of bacterial isolates. Evaluation of this method involved a worldwide research collaboration focused on the model organism Salmonella enterica, the 10KSG consortium. Following the optimization of a logistics pipeline that involved shipping isolates as thermolysates in ambient conditions, the project assembled a diverse collection of 10,419 isolates from low- and middle-income countries. The genomes were sequenced using the LITE pipeline for library construction, with a total reagent cost of less than USD$10 per genome. Our method can be applied to other large bacterial collections to underpin global collaborations.


Assuntos
Genoma Bacteriano , Sequenciamento Completo do Genoma/métodos , DNA Bacteriano/isolamento & purificação , Genoma , Humanos , Salmonella enterica/genética , Sequenciamento Completo do Genoma/economia
5.
Plant Cell ; 32(7): 2158-2177, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32409319

RESUMO

Plant innate immunity relies on nucleotide binding leucine-rich repeat receptors (NLRs) that recognize pathogen-derived molecules and activate downstream signaling pathways. We analyzed the variation in NLR gene copy number and identified plants with a low number of NLR genes relative to sister species. We specifically focused on four plants from two distinct lineages, one monocot lineage (Alismatales) and one eudicot lineage (Lentibulariaceae). In these lineages, the loss of NLR genes coincides with loss of the well-known downstream immune signaling complex ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1)/PHYTOALEXIN DEFICIENT 4 (PAD4). We expanded our analysis across whole proteomes and found that other characterized immune genes were absent only in Lentibulariaceae and Alismatales. Additionally, we identified genes of unknown function that were convergently lost together with EDS1/PAD4 in five plant species. Gene expression analyses in Arabidopsis (Arabidopsis thaliana) and Oryza sativa revealed that several homologs of the candidates are differentially expressed during pathogen infection, drought, and abscisic acid treatment. Our analysis provides evolutionary evidence for the rewiring of plant immunity in some plant lineages, as well as the coevolution of the EDS1/PAD4 pathway and drought responses.


Assuntos
Alismatales/genética , Proteínas NLR/genética , Imunidade Vegetal/genética , Proteínas de Plantas/genética , Alismatales/imunologia , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Hidrolases de Éster Carboxílico/genética , Proteínas de Ligação a DNA/genética , Resistência à Doença/genética , Resistência à Doença/imunologia , Secas , Evolução Molecular , Dosagem de Genes , Regulação da Expressão Gênica de Plantas , Magnoliopsida/genética , Magnoliopsida/imunologia , Oryza/genética , Filogenia , Transdução de Sinais , Sintenia
6.
PLoS One ; 14(2): e0211598, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30811422

RESUMO

Molecular tools adapted from bacterial CRISPR (Clustered Regulatory Interspaced Short Palindromic Repeats) systems for adaptive immunity have become widely used for plant genome engineering, both to investigate gene functions and to engineer desirable traits. A number of different Cas (CRISPR-associated) nucleases are now used but, as most studies performed to date have engineered different targets using a variety of plant species and molecular tools, it has been difficult to draw conclusions about the comparative performance of different nucleases. Due to the time and effort required to regenerate engineered plants, efficiency is critical. In addition, there have been several reports of mutations at sequences with less than perfect identity to the target. While in some plant species it is possible to remove these so-called 'off-targets' by backcrossing to a parental line, the specificity of genome engineering tools is important when targeting specific members of closely-related gene families, especially when recent paralogues are co-located in the genome and unlikely to segregate. Specificity is also important for species that take years to reach sexual maturity or that are clonally propagated. Here, we directly compare the efficiency and specificity of Cas nucleases from different bacterial species together with engineered variants of Cas9. We find that the nucleotide content of the target correlates with efficiency and that Cas9 from Staphylococcus aureus (SaCas9) is comparatively most efficient at inducing mutations. We also demonstrate that 'high-fidelity' variants of Cas9 can reduce off-target mutations in plants. We present these molecular tools as standardised DNA parts to facilitate their re-use.


Assuntos
Sistemas CRISPR-Cas/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Genoma de Planta/genética , Plantas/genética , Endonucleases/genética , Edição de Genes/métodos , Engenharia Genética/métodos
7.
Nat Commun ; 9(1): 3735, 2018 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-30282993

RESUMO

Yellow rust, caused by Puccinia striiformis f. sp. tritici (Pst), is a devastating fungal disease threatening much of global wheat production. Race-specific resistance (R)-genes are used to control rust diseases, but the rapid emergence of virulent Pst races has prompted the search for a more durable resistance. Here, we report the cloning of Yr15, a broad-spectrum R-gene derived from wild emmer wheat, which encodes a putative kinase-pseudokinase protein, designated as wheat tandem kinase 1, comprising a unique R-gene structure in wheat. The existence of a similar gene architecture in 92 putative proteins across the plant kingdom, including the barley RPG1 and a candidate for Ug8, suggests that they are members of a distinct family of plant proteins, termed here tandem kinase-pseudokinases (TKPs). The presence of kinase-pseudokinase structure in both plant TKPs and the animal Janus kinases sheds light on the molecular evolution of immune responses across these two kingdoms.


Assuntos
Basidiomycota/patogenicidade , Resistência à Doença/genética , Genes de Plantas/fisiologia , Doenças das Plantas/imunologia , Proteínas de Plantas/genética , Triticum/fisiologia , Animais , Mapeamento Cromossômico , Evolução Molecular , Hordeum/genética , Janus Quinases/genética , Mutagênese , Doenças das Plantas/microbiologia , Plantas Geneticamente Modificadas , Domínios Proteicos/genética , Domínios Proteicos/fisiologia , Triticum/microbiologia
8.
Genome Biol ; 19(1): 23, 2018 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-29458393

RESUMO

BACKGROUND: The plant immune system is innate and encoded in the germline. Using it efficiently, plants are capable of recognizing a diverse range of rapidly evolving pathogens. A recently described phenomenon shows that plant immune receptors are able to recognize pathogen effectors through the acquisition of exogenous protein domains from other plant genes. RESULTS: We show that plant immune receptors with integrated domains are distributed unevenly across their phylogeny in grasses. Using phylogenetic analysis, we uncover a major integration clade, whose members underwent repeated independent integration events producing diverse fusions. This clade is ancestral in grasses with members often found on syntenic chromosomes. Analyses of these fusion events reveals that homologous receptors can be fused to diverse domains. Furthermore, we discover a 43 amino acid long motif associated with this dominant integration clade which is located immediately upstream of the fusion site. Sequence analysis reveals that DNA transposition and/or ectopic recombination are the most likely mechanisms of formation for nucleotide binding leucine rich repeat proteins with integrated domains. CONCLUSIONS: The identification of this subclass of plant immune receptors that is naturally adapted to new domain integration will inform biotechnological approaches for generating synthetic receptors with novel pathogen "baits."


Assuntos
Fusão Gênica , Loci Gênicos , Proteínas NLR/genética , Proteínas de Plantas/genética , Poaceae/genética , Poaceae/imunologia , Receptores Imunológicos/genética , Motivos de Aminoácidos , Cromossomos de Plantas , Duplicação Gênica , Genes de Plantas , Proteínas NLR/química , Filogenia , Proteínas de Plantas/química , Poaceae/classificação , Domínios Proteicos/genética , Receptores Imunológicos/química , Sintenia , Translocação Genética
9.
Genome Res ; 27(5): 885-896, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28420692

RESUMO

Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Anotação de Sequência Molecular/métodos , Proteínas de Plantas/genética , Translocação Genética , Triticum/genética , Algoritmos , Mapeamento de Sequências Contíguas/normas , Anotação de Sequência Molecular/normas , Polimorfismo Genético , Poliploidia
10.
Elife ; 62017 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-28262094

RESUMO

Cell surface receptors govern a multitude of signalling pathways in multicellular organisms. In plants, prominent examples are the receptor kinases FLS2 and BRI1, which activate immunity and steroid-mediated growth, respectively. Intriguingly, despite inducing distinct signalling outputs, both receptors employ common downstream signalling components, which exist in plasma membrane (PM)-localised protein complexes. An important question is thus how these receptor complexes maintain signalling specificity. Live-cell imaging revealed that FLS2 and BRI1 form PM nanoclusters. Using single-particle tracking we could discriminate both cluster populations and we observed spatiotemporal separation between immune and growth signalling platforms. This finding was confirmed by visualising FLS2 and BRI1 within distinct PM nanodomains marked by specific remorin proteins and differential co-localisation with the cytoskeleton. Our results thus suggest that signalling specificity between these pathways may be explained by the spatial separation of FLS2 and BRI1 with their associated signalling components within dedicated PM nanodomains.


Assuntos
Proteínas de Arabidopsis/análise , Arabidopsis/química , Membrana Celular/química , Proteínas Quinases/análise , Receptores de Superfície Celular/análise , Microscopia Intravital , Análise Espaço-Temporal
12.
BMC Res Notes ; 9: 130, 2016 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-26922376

RESUMO

BACKGROUND: To cope with the ever-increasing amount of sequence data generated in the field of genomics, the demand for efficient and fast database searches that drive functional and structural annotation in both large- and small-scale genome projects is on the rise. The tools of the BLAST+ suite are the most widely employed bioinformatic method for these database searches. Recent trends in bioinformatics application development show an increasing number of JavaScript apps that are based on modern frameworks such as Node.js. Until now, there is no way of using database searches with the BLAST+ suite from a Node.js codebase. RESULTS: We developed blastjs, a Node.js library that wraps the search tools of the BLAST+ suite and thus allows to easily add significant functionality to any Node.js-based application. CONCLUSION: blastjs is a library that allows the incorporation of BLAST+ functionality into bioinformatics applications based on JavaScript and Node.js. The library was designed to be as user-friendly as possible and therefore requires only a minimal amount of code in the client application. The library is freely available under the MIT license at https://github.com/teammaclean/blastjs.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Bases de Dados Genéticas , Humanos , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA
13.
Plant Methods ; 11: 34, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26052341

RESUMO

BACKGROUND: The genus Cuscuta is a group of parasitic plants that are distributed world-wide. The process of parasitization starts with a Cuscuta plant coiling around the host stem. The parasite's haustorial organs then establish a vascular connection allowing for access to the phloem content. The host and the parasite form new cellular connections, suggesting coordination of developmental and biochemical processes. Simultaneous monitoring of gene expression in the parasite's and host's tissues may shed light on the complex events occurring between the parasitic and host cells and may help to overcome experimental limitations (i.e. how to separate host tissue from Cuscuta tissue at the haustorial connection). A novel approach is to use bioinformatic analysis to classify sequencing reads as either belonging to the host or to the parasite and to characterize the expression patterns. Owing to the lack of a comprehensive genomic dataset from Cuscuta spp., such a classification has not been performed previously. RESULTS: We first classified RNA-Seq reads from an interface region between the non-model parasitic plant Cuscuta japonica and the non-model host plant Impatiens balsamina. Without established reference sequences, we classified reads as originating from either of the plants by stepwise similarity search against de novo assembled transcript sets of C. japonica and I. balsamina, unigene sets of the same genus, and cDNA sequences of the same family. We then assembled de novo transcriptomes from the classified read sets. We assessed the quality of the classification by mapping reads to contigs of both plants, achieving a misclassification rate low enough (0.22-0.39%) to be used reliably for differential gene expression analysis. Finally, we applied our read classification method to RNA-Seq data from the interface between the non-model parasitic plant C. japonica and the model host plant Glycine max. Analysis of gene expression profiles at 5 parasitizing stages revealed differentially expressed genes from both C. japonica and G. max, and uncovered the coordination of cellular processes between the two plants. CONCLUSIONS: We demonstrated that reliable identification of differentially expressed transcripts in undissected interface region of the parasite-host association is feasible and informative with respect to differential-expression patterns.

14.
Funct Plant Biol ; 42(7): 655-667, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32480709

RESUMO

Climate models predict an increased likelihood of seasonal droughts for many areas of the world. Breeding for drought tolerance could be accelerated by marker-assisted selection. As a basis for marker identification, we studied the genetic variance, predictability of field performance and potential costs of tolerance in potato (Solanum tuberosum L.). Potato produces high calories per unit of water invested, but is drought-sensitive. In 14 independent pot or field trials, 34 potato cultivars were grown under optimal and reduced water supply to determine starch yield. In an artificial dataset, we tested several stress indices for their power to distinguish tolerant and sensitive genotypes independent of their yield potential. We identified the deviation of relative starch yield from the experimental median (DRYM) as the most efficient index. DRYM corresponded qualitatively to the partial least square model-based metric of drought stress tolerance in a stress effect model. The DRYM identified significant tolerance variation in the European potato cultivar population to allow tolerance breeding and marker identification. Tolerance results from pot trials correlated with those from field trials but predicted field performance worse than field growth parameters. Drought tolerance correlated negatively with yield under optimal conditions in the field. The distribution of yield data versus DRYM indicated that tolerance can be combined with average yield potentials, thus circumventing potential yield penalties in tolerance breeding.

15.
Nat Plants ; 1(4): 15025, 2015 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-27247031

RESUMO

The concept that proteins and small RNAs can move to and function in distant body parts is well established. However, non-cell-autonomy of small RNA molecules raises the question: To what extent are protein-coding messenger RNAs (mRNAs) exchanged between tissues in plants? Here we report the comprehensive identification of 2,006 genes producing mobile RNAs in Arabidopsis thaliana. The analysis of variant ecotype transcripts that were present in heterografted plants allowed the identification of mRNAs moving between various organs under normal or nutrient-limiting conditions. Most of these mobile transcripts seem to follow the phloem-dependent allocation pathway transporting sugars from photosynthetic tissues to roots via the vasculature. Notably, a high number of transcripts also move in the opposite, root-to-shoot direction and are transported to specific tissues including flowers. Proteomic data on grafted plants indicate the presence of proteins from mobile RNAs, allowing the possibility that they may be translated at their destination site. The mobility of a high number of mRNAs suggests that a postulated tissue-specific gene expression profile might not be predictive for the actual plant body part in which a transcript exerts its function.


Assuntos
Arabidopsis/genética , RNA Mensageiro/genética , Arabidopsis/crescimento & desenvolvimento , Ecótipo , Flores/genética , Regulação da Expressão Gênica de Plantas , Raízes de Plantas/genética , Brotos de Planta/genética , RNA de Plantas/genética , RNA de Plantas/metabolismo
16.
Front Plant Sci ; 3: 272, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23233858

RESUMO

The specific recognition of miRNAs by Argonaute (AGO) proteins, the effector proteins of the RNA-induced silencing complex, constitutes the final step of the biogenesis of miRNAs and is crucial for their target interaction. In the genome of Arabidopsis thaliana (Ath), 10 different AGO proteins are encoded and the sorting decision, which miRNA associates with which AGO protein, was reported to depend exclusively on the identity of the 5'-sequence position of mature miRNAs. Hence, with only four different bases possible, a 5'-position-only sorting signal would not suffice to specifically target all 10 different AGOs individually or would suggest redundant AGO action. Alternatively, other and as of yet unidentified sorting signals may exist. We analyzed a dataset comprising 117 Ath-miRNAs with clear sorting preference to either AGO1, AGO2, or AGO5 as identified in co-immunoprecipitation experiments combined with sequencing. While mutual information analysis did not identify any other single position but the 5'-nucleotide to be informative for the sorting at sufficient statistical significance, significantly better than random classification results using Random Forests nonetheless suggest that additional positions and combinations thereof also carry information with regard to the AGO sorting. Positions 2, 6, 9, and 13 appear to be of particular importance. Furthermore, uracil bases at defined positions appear to be important for the sorting to AGO2 and AGO5, in particular. No predictive value was associated with miRNA length or base pair binding pattern in the miRNA:miRNA* duplex. From inspecting available AGO gene expression data in Arabidopsis, we conclude that the temporal and spatial expression profile may also contribute to the fine-tuning of miRNA sorting and function.

17.
Methods Mol Biol ; 918: 127-50, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22893290

RESUMO

Molecular biomarkers are molecules whose concentrations in a biological system inform about the current phenotypical state and, more importantly, may also be predictive of future phenotypic trait endpoints. The identification of biomarkers has gained much attention in targeted plant breeding since technologies have become available that measure many molecules across different levels of molecular organization and at decreasing costs. In this chapter, we outline the general strategy and workflow of conducting biomarker discovery studies. Critical aspects of study design as well as the statistical data analysis and model building will be highlighted.


Assuntos
Cruzamento/métodos , Plantas/genética , Plantas/metabolismo , Biomarcadores/metabolismo , Interpretação Estatística de Dados
18.
Trends Plant Sci ; 17(11): 666-74, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22784824

RESUMO

Directed plant cell growth is governed by deposition and alterations of cell wall components under turgor pressure. A key regulatory element of anisotropic growth, and hence cell shape, is the directional deposition of cellulose microfibrils. The microfibrils are synthesized by plasma membrane-located cellulose synthase complexes that co-align with and move along cortical microtubules. That the parallel relation between cortical microtubules and extracellular microfibrils is causal has been named the alignment hypothesis. Three recent studies revealed that the previously identified pom2 mutant codes for a large cellulose synthases interacting (CSI1) protein which also binds cortical microtubules. This review summarizes these findings, provides structure-function models and discusses the inferred mechanisms in the context of plant growth.


Assuntos
Parede Celular/metabolismo , Glucosiltransferases/metabolismo , Microtúbulos/metabolismo , Membrana Celular/enzimologia , Forma Celular , Celulose/metabolismo , Glucosiltransferases/química , Microtúbulos/ultraestrutura , Modelos Moleculares , Células Vegetais/fisiologia , Desenvolvimento Vegetal , Fenômenos Fisiológicos Vegetais , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/metabolismo , Raízes de Plantas/fisiologia , Plantas/metabolismo
19.
Funct Plant Biol ; 39(11): 948-957, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32480844

RESUMO

In plant breeding, plants have to be characterised precisely, consistently and rapidly by different people at several field sites within defined time spans. For a meaningful data evaluation and statistical analysis, standardised data storage is required. Data access must be provided on a long-term basis and be independent of organisational barriers without endangering data integrity or intellectual property rights. We discuss the associated technical challenges and demonstrate adequate solutions exemplified in a data management pipeline for a project to identify markers for drought tolerance in potato. This project involves 11 groups from academia and breeding companies, 11 sites and four analytical platforms. Our data warehouse concept combines central data storage in databases and a file server and integrates existing and specialised database solutions for particular data types with new, project-specific databases. The strict use of controlled vocabularies and the application of web-access technologies proved vital to the successful data exchange between diverse institutes and data management concepts and infrastructures. By presenting our data management system and making the software available, we aim to support related phenotyping projects.

20.
BMC Bioinformatics ; 12: 429, 2011 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-22051375

RESUMO

BACKGROUND: Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis. RESULTS: We extract four different models of the thermodynamic folding space which underlie the programs RNAFOLD, RNASHAPES, and RNASUBOPT. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences. CONCLUSIONS: We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development.


Assuntos
Algoritmos , Dobramento de RNA , RNA/química , Sequência de Bases , Biologia Computacional , Probabilidade , Análise de Sequência de RNA , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...