RESUMO
BACKGROUND: During the last decade, a great number of extremely valuable large-scale genomics and proteomics datasets have become available to the research community. In addition, dropping costs for conducting high-throughput sequencing experiments and the option to outsource them considerably contribute to an increasing number of researchers becoming active in this field. Even though various computational approaches have been developed to analyze these data, it is still a laborious task involving prudent integration of many heterogeneous and frequently updated data sources, creating a barrier for interested scientists to accomplish their own analysis. RESULTS: We have implemented Dintor, a data integration framework that provides a set of over 30 tools to assist researchers in the exploration of genomics and proteomics datasets. Each of the tools solves a particular task and several tools can be combined into data processing pipelines. Dintor covers a wide range of frequently required functionalities, from gene identifier conversions and orthology mappings to functional annotation of proteins and genetic variants up to candidate gene prioritization and Gene Ontology-based gene set enrichment analysis. Since the tools operate on constantly changing datasets, we provide a mechanism to unambiguously link tools with different versions of archived datasets, which guarantees reproducible results for future tool invocations. We demonstrate a selection of Dintor's capabilities by analyzing datasets from four representative publications. The open source software can be downloaded and installed on a local Unix machine. For reasons of data privacy it can be configured to retrieve local data only. In addition, the Dintor tools are available on our public Galaxy web service at http://dintor.eurac.edu . CONCLUSIONS: Dintor is a computational annotation framework for the analysis of genomic and proteomic datasets, providing a rich set of tools that cover the most frequently encountered tasks. A major advantage is its capability to consistently handle multiple versions of tool-associated datasets, supporting the researcher in delivering reproducible results.
Assuntos
Curadoria de Dados/métodos , Genômica/métodos , Proteômica/métodos , Bases de Dados Genéticas , SoftwareRESUMO
Several evidences emphasize B-cell pathogenic roles in multiple sclerosis (MS). We performed transcriptome analyses on peripheral B cells from therapy-free patients and age/sex-matched controls. Down-regulation of two transcripts (interferon response factor 1-IRF1, and C-X-C motif chemokine 10-CXCL10), belonging to the same pathway, was validated by RT-PCR in 26 patients and 21 controls. IRF1 and CXCL10 transcripts share potential seeding sequences for hsa-miR-424, that resulted up-regulated in MS patients. We confirmed this interaction and its functional effect by transfection experiments. Consistent findings indicate down-regulation of IRF1/CXCL10 axis, that may plausibly contribute to a pro-survival status of B cells in MS.
Assuntos
Linfócitos B/metabolismo , Perfilação da Expressão Gênica/métodos , Fator Regulador 1 de Interferon/biossíntese , Esclerose Múltipla/metabolismo , Transdução de Sinais/fisiologia , Transcriptoma/fisiologia , Adulto , Linhagem Celular Tumoral , Feminino , Humanos , Fator Regulador 1 de Interferon/genética , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/diagnóstico , Esclerose Múltipla/genéticaRESUMO
Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as "mixing strategy", which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.