RESUMO
Background: Plants differ in their ability to cope with external stresses (e.g., drought tolerance). Genome duplications are an important mechanism to enable plant adaptation. This leads to characteristic footprints in the genome, such as protein family expansion. We explore genetic diversity and uncover evolutionary adaptation to stresses by exploiting genome comparisons between stress tolerant and sensitive species and RNA-Seq data sets from stress experiments. Expanded gene families that are stress-responsive based on differential expression analysis could hint at species or clade-specific adaptation, making these gene families exciting candidates for follow-up tolerance studies and crop improvement. Software: Integration of such cross-species omics data is a challenging task, requiring various steps of transformation and filtering. Ultimately, visualization is crucial for quality control and interpretation. To address this, we developed A2TEA: Automated Assessment of Trait-specific Evolutionary Adaptations, a Snakemake workflow for detecting adaptation footprints in silico. It functions as a one-stop processing pipeline, integrating protein family, phylogeny, expression, and protein function analyses. The pipeline is accompanied by an R Shiny web application that allows exploring, highlighting, and exporting the results interactively. This allows the user to formulate hypotheses regarding the genomic adaptations of one or a subset of the investigated species to a given stress. Conclusions: While our research focus is on crops, the pipeline is entirely independent of the underlying species and can be used with any set of species. We demonstrate pipeline efficiency on real-world datasets and discuss the implementation and limits of our analysis workflow as well as planned extensions to its current state. The A2TEA workflow and web application are publicly available at: https://github.com/tgstoecker/A2TEA.Workflow and https://github.com/tgstoecker/A2TEA.WebApp, respectively.
Assuntos
Evolução Biológica , Produtos Agrícolas , Filogenia , Resistência à Seca , GenômicaRESUMO
BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Assuntos
Anotação de Sequência Molecular/tendências , Animais , Biofilmes , Candida albicans/genética , Drosophila melanogaster/genética , Genoma Bacteriano , Genoma Fúngico , Humanos , Locomoção , Memória de Longo Prazo , Anotação de Sequência Molecular/métodos , Pseudomonas aeruginosa/genéticaRESUMO
There are no satisfying tools in tissue microarray (TMA) data analysis up to now to analyze the cooperative behavior of all measured markers in a multifactorial TMA approach. The developed tool TMAinspiration is not only offering an analysis option to close this gap but also offering an ecosystem consisting of quality control concepts and supporting scripts to make this approach a platform for informed practice and further research. The TMAinspiration method is specifically focusing on the demands of the TMA analysis by controlling errors and noise by a generalized regression scheme while at the same time avoiding to introduce a priori too many constraints into the analysis of the data. So, we are testing partitions of a proximity table to find an optimal support for a ranking scheme of molecular dependencies. The idea of combining several partitions to one ensemble, which is balancing the optimization process, is based on the main assumption that all these perspectives on the cellular network need to be self-consistent. Several application examples in breast cancer and one in squamous cell carcinoma demonstrate that this procedure is nicely confirming a priori knowledge on the expression characteristics of protein markers, while also integrating many new results discovered in the treasury of a bigger TMA experiment. The code and software are now freely available at: http://complex-systems.uni-muenster.de/tma_inspiration.html.