RESUMO
The "Protein Abundances Across Organisms" database (PaxDb) is an integrative metaresource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDb focuses on computing best-estimate abundances for proteins in normal/healthy contexts and expresses abundance values for each protein in "parts per million" in relation to all other protein molecules in the cell. The uniform data reprocessing, quality scoring, and integrated orthology relations have made PaxDb one of the preferred tools for comparisons between individual datasets, tissues, or organisms. In describing the latest version 5.0 of PaxDb, we particularly emphasize the data integration from various types of raw data and how we expanded the number of organisms and tissue groups as well as the proteome coverage. The current collection of PaxDb includes 831 original datasets from 170 species, including 22 Archaea, 81 Bacteria, and 67 Eukaryota. Apart from detailing the data update, we also present a comparative analysis of the human proteome subset of PaxDb against the two most widely used human proteome data resources: Human Protein Atlas and Genotype-Tissue Expression. Lastly, through our protein abundance data, we reveal an evolutionary trend in the usage of sulfur-containing amino acids in the proteomes of Fungi.
RESUMO
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.
Assuntos
Genômica/métodos , Mapeamento de Interação de Proteínas/métodos , Software , Animais , Bases de Dados Genéticas , Ontologia Genética , HumanosRESUMO
A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Software , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Relação Estrutura-Atividade , Interface Usuário-Computador , NavegadorRESUMO
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Perfilação da Expressão Gênica , Internet , Proteínas/classificação , Proteínas/genética , Proteínas/metabolismo , SoftwareRESUMO
Protein quantification at proteome-wide scale is an important aim, enabling insights into fundamental cellular biology and serving to constrain experiments and theoretical models. While proteome-wide quantification is not yet fully routine, many datasets approaching proteome-wide coverage are becoming available through biophysical and MS techniques. Data of this type can be accessed via a variety of sources, including publication supplements and online data repositories. However, access to the data is still fragmentary, and comparisons across experiments and organisms are not straightforward. Here, we describe recent updates to our database resource "PaxDb" (Protein Abundances Across Organisms). PaxDb focuses on protein abundance information at proteome-wide scope, irrespective of the underlying measurement technique. Quantification data is reprocessed, unified, and quality-scored, and then integrated to build a meta-resource. PaxDb also allows evolutionary comparisons through precomputed gene orthology relations. Recently, we have expanded the scope of the database to include cell-line samples, and more systematically scan the literature for suitable datasets. We report that a significant fraction of published experiments cannot readily be accessed and/or parsed for quantitative information, requiring additional steps and efforts. The current update brings PaxDb to 414 datasets in 53 organisms, with (semi-) quantitative abundance information covering more than 300,000 proteins.
Assuntos
Bases de Dados de Proteínas , Proteoma/análise , Proteômica/métodos , Software , Animais , HumanosRESUMO
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Algoritmos , Interpretação Estatística de Dados , Mineração de Dados , Internet , Integração de Sistemas , Interface Usuário-ComputadorRESUMO
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas/métodos , Integração de Sistemas , Interface Usuário-ComputadorRESUMO
Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein-protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein-protein interactions currently available. STRING can be reached at http://string-db.org/.