RESUMEN
The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.
Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Publicaciones Periódicas como Asunto , Unión Proteica , Proteínas/química , Control de CalidadRESUMEN
IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275,000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.
Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Gráficos por Computador , Genes , Internet , Anotación de Secuencia Molecular , Análisis de Secuencia de Proteína , Programas InformáticosRESUMEN
Thalamoreticular circuitry plays a key role in arousal, attention, cognition, and sleep spindles, and is linked to several brain disorders. A detailed computational model of mouse somatosensory thalamus and thalamic reticular nucleus has been developed to capture the properties of over 14,000 neurons connected by 6 million synapses. The model recreates the biological connectivity of these neurons, and simulations of the model reproduce multiple experimental findings in different brain states. The model shows that inhibitory rebound produces frequency-selective enhancement of thalamic responses during wakefulness. We find that thalamic interactions are responsible for the characteristic waxing and waning of spindle oscillations. In addition, we find that changes in thalamic excitability control spindle frequency and their incidence. The model is made openly available to provide a new tool for studying the function and dysfunction of the thalamoreticular circuitry in various brain states.
Asunto(s)
Tálamo , Vigilia , Ratones , Animales , Tálamo/fisiología , Sueño/fisiología , Núcleos Talámicos/fisiología , Percepción , Corteza Cerebral/fisiologíaRESUMEN
The use of computational modeling to describe and analyze biological systems is at the heart of systems biology. Model structures, simulation descriptions and numerical results can be encoded in structured formats, but there is an increasing need to provide an additional semantic layer. Semantic information adds meaning to components of structured descriptions to help identify and interpret them unambiguously. Ontologies are one of the tools frequently used for this purpose. We describe here three ontologies created specifically to address the needs of the systems biology community. The Systems Biology Ontology (SBO) provides semantic information about the model components. The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. The Terminology for the Description of Dynamics (TEDDY) categorizes dynamical features of the simulation results and general systems behavior. The provision of semantic information extends a model's longevity and facilitates its reuse. It provides useful insight into the biology of modeled processes, and may be used to make informed decisions on subsequent simulation experiments.
Asunto(s)
Biología Computacional , Semántica , Biología de Sistemas , Vocabulario Controlado , Algoritmos , Simulación por Computador , Almacenamiento y Recuperación de la Información , Modelos BiológicosRESUMEN
Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site.
Asunto(s)
Bases de Datos de Proteínas/normas , Proteoma/análisis , Sistemas de Administración de Bases de Datos/normas , Humanos , Cooperación Internacional , Proteómica/métodos , Terminología como AsuntoRESUMEN
SARS-CoV-2 started spreading toward the end of 2019 causing COVID-19, a disease that reached pandemic proportions among the human population within months. The reasons for the spectrum of differences in the severity of the disease across the population, and in particular why the disease affects more severely the aging population and those with specific preconditions are unclear. We developed machine learning models to mine 240,000 scientific articles openly accessible in the CORD-19 database, and constructed knowledge graphs to synthesize the extracted information and navigate the collective knowledge in an attempt to search for a potential common underlying reason for disease severity. The machine-driven framework we developed repeatedly pointed to elevated blood glucose as a key facilitator in the progression of COVID-19. Indeed, when we systematically retraced the steps of the SARS-CoV-2 infection, we found evidence linking elevated glucose to each major step of the life-cycle of the virus, progression of the disease, and presentation of symptoms. Specifically, elevations of glucose provide ideal conditions for the virus to evade and weaken the first level of the immune defense system in the lungs, gain access to deep alveolar cells, bind to the ACE2 receptor and enter the pulmonary cells, accelerate replication of the virus within cells increasing cell death and inducing an pulmonary inflammatory response, which overwhelms an already weakened innate immune system to trigger an avalanche of systemic infections, inflammation and cell damage, a cytokine storm and thrombotic events. We tested the feasibility of the hypothesis by manually reviewing the literature referenced by the machine-generated synthesis, reconstructing atomistically the virus at the surface of the pulmonary airways, and performing quantitative computational modeling of the effects of glucose levels on the infection process. We conclude that elevation in glucose levels can facilitate the progression of the disease through multiple mechanisms and can explain much of the differences in disease severity seen across the population. The study provides diagnostic considerations, new areas of research and potential treatments, and cautions on treatment strategies and critical care conditions that induce elevations in blood glucose levels.
Asunto(s)
COVID-19 , Anciano , Glucemia , Síndrome de Liberación de Citoquinas , Humanos , Inflamación , SARS-CoV-2RESUMEN
A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.
Asunto(s)
Bases de Datos de Proteínas/normas , Guías como Asunto , Almacenamiento y Recuperación de la Información/normas , Mapeo de Interacción de Proteínas/normas , Proteómica/normas , Investigación/normas , Humanos , InternacionalidadRESUMEN
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio-ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator.
Asunto(s)
Biología Computacional/métodos , Proteómica/normas , Humanos , Reproducibilidad de los ResultadosRESUMEN
MOTIVATION: The IntAct repository is one of the largest and most widely used databases for the curation and storage of molecular interaction data. These datasets need to be analyzed by computational methods. Software packages in the statistical environment R provide powerful tools for conducting such analyses. RESULTS: We introduce Rintact, a Bioconductor package that allows users to transform PSI-MI XML2.5 interaction data files from IntAct into R graph objects. On these, they can use methods from R and Bioconductor for a variety of tasks: determining cohesive subgraphs, computing summary statistics, fitting mathematical models to the data or rendering graphical layouts. Rintact provides a programmatic interface to the IntAct repository and allows the use of the analytic methods provided by R and Bioconductor. AVAILABILITY: Rintact is freely available at http://bioconductor.org
Asunto(s)
Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Lenguajes de Programación , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Interfaz Usuario-Computador , Gráficos por Computador , Sistemas de Administración de Bases de DatosRESUMEN
MOTIVATION: Protein-protein interaction networks provide insights into the relationships between the proteins of an organism thereby contributing to a better understanding of cellular processes. Nevertheless, large-scale interaction networks are available for only a few model organisms. Thus, interologs are useful for a systematic transfer of protein interaction networks between organisms. However, no standard tool is available so far for that purpose. RESULTS: In this study, we present an automated prediction tool developed for all sequenced genomes available in Integr8. We also have developed a second method to predict protein-protein interactions in the widely used cyanobacterium Synechocystis. Using these methods, we have constructed a new network of 8783 inferred interactions for Synechocystis. AVAILABILITY: InteroPORC is open-source, downloadable and usable through a web interface at http://biodev.extra.cea.fr/interoporc/.
Asunto(s)
Biología Computacional/métodos , Cianobacterias/genética , Mapeo de Interacción de Proteínas , Proteínas/química , Synechocystis/metabolismo , Algoritmos , Automatización , Simulación por Computador , Cianobacterias/metabolismo , Genoma Bacteriano , Internet , Modelos Biológicos , Estructura Terciaria de Proteína , Procesamiento de Señales Asistido por Computador , Programas Informáticos , Synechocystis/genética , Biología de SistemasRESUMEN
BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
Asunto(s)
Bases de Datos de Proteínas/normas , Procesamiento de Lenguaje Natural , Mapeo de Interacción de Proteínas/métodos , Proteómica/métodos , Biología Computacional , Gráficos por Computador , Sistemas de Administración de Bases de Datos , Proteómica/normas , Interfaz Usuario-ComputadorRESUMEN
The ever-increasing generation of, and corresponding interest in, molecular interaction data has lead to the establishment of a number of high-quality molecular interaction databases which manually curate interaction data extracted from the literature. In order to effectively share the curation load, and ensure that data is stored in and accessible from multiple sources, these databases have united to form the IMEx consortium. All of the IMEx databases also accept direct deposition of interaction data from authors prior to publication, thus both assisting the scientist in preparing the dataset for publication and ensuring that its subsequent representation in the public domain databases is fully accurate. This article walks the potential submitter through the various routes by which data may be deposited with the databases and describes the tools which have been developed to assist in this process.
Asunto(s)
Bases de Datos Genéticas/normas , Mapeo de Interacción de Proteínas/métodos , Proteómica/normas , Interfaz Usuario-Computador , Acceso a la Información , Internet , Edición/normas , Programas Informáticos , Vocabulario ControladoRESUMEN
BACKGROUND: Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. RESULTS: We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. CONCLUSION: We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.
Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Internet , Proteínas/química , Proteínas/clasificación , Análisis de Secuencia de Proteína/métodos , Interfaz Usuario-Computador , Algoritmos , Secuencia de Aminoácidos , Almacenamiento y Recuperación de la Información/métodos , Datos de Secuencia Molecular , Alineación de Secuencia/métodosRESUMEN
IntAct provides an open source database and toolkit for the storage, presentation and analysis of protein interactions. The web interface provides both textual and graphical representations of protein interactions, and allows exploring interaction networks in the context of the GO annotations of the interacting proteins. A web service allows direct computational access to retrieve interaction networks in XML format. IntAct currently contains approximately 2200 binary and complex interactions imported from the literature and curated in collaboration with the Swiss-Prot team, making intensive use of controlled vocabularies to ensure data consistency. All IntAct software, data and controlled vocabularies are available at http://www.ebi.ac.uk/intact.
Asunto(s)
Bases de Datos de Proteínas , Unión Proteica , Proteínas/metabolismo , Animales , Biología Computacional , Humanos , Almacenamiento y Recuperación de la Información , Internet , Programas Informáticos , Interfaz Usuario-Computador , Vocabulario ControladoRESUMEN
The complex biological processes that control cellular function are mediated by intricate networks of molecular interactions. Accumulating evidence indicates that these interactions are often interdependent, thus acting cooperatively. Cooperative interactions are prevalent in and indispensible for reliable and robust control of cell regulation, as they underlie the conditional decision-making capability of large regulatory complexes. Despite an increased focus on experimental elucidation of the molecular details of cooperative binding events, as evidenced by their growing occurrence in literature, they are currently lacking from the main bioinformatics resources. One of the contributing factors to this deficiency is the lack of a computer-readable standard representation and exchange format for cooperative interaction data. To tackle this shortcoming, we added functionality to the widely used PSI-MI interchange format for molecular interaction data by defining new controlled vocabulary terms that allow annotation of different aspects of cooperativity without making structural changes to the underlying XML schema. As a result, we are able to capture cooperative interaction data in a structured format that is backward compatible with PSI-MI-based data and applications. This will facilitate the storage, exchange and analysis of cooperative interaction data, which in turn will advance experimental research on this fundamental principle in biology.
Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Proteómica , Regulación Alostérica , Proteínas de Ciclo Celular/química , Ciclina A/química , Quinasa 2 Dependiente de la Ciclina/química , Humanos , Modelos Moleculares , Anotación de Secuencia Molecular , Fosforilación , Unión ProteicaRESUMEN
Molecular interactions are crucial components of the cellular process. In order to understand this complex machinery, one needs to gather published data from various sources. Many projects have initiated the collection of interaction data for this purpose since 2002. However, the lack of standardisation previously made the task of aggregating datasets difficult. This issue has been resolved by the creation of Molecular Interaction standard in 2004 by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO). Furthermore, major database providers have come together with the goal to exchange data in order to optimise laborious curation tasks. Finally, tools and frameworks have been created based on PSI-MI standards to facilitate the visualisation and analysis of molecular interaction data.
Asunto(s)
Bases de Datos de Proteínas/normas , Mapeo de Interacción de Proteínas/normas , Proteoma/normas , Proteómica/normas , Sistemas de Administración de Bases de Datos , Humanos , Mapeo de Interacción de Proteínas/métodos , Proteómica/métodosRESUMEN
BACKGROUND: In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions. RESULTS: To aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract. CONCLUSION: The development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.
Asunto(s)
Biología Computacional/métodos , Biología Computacional/normas , Bases de Datos Bibliográficas , Sociedades Científicas , Biología Computacional/instrumentación , Mapeo de Interacción de Proteínas , Proteómica/normas , Vocabulario ControladoRESUMEN
Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.