RESUMO
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.
Assuntos
Curadoria de Dados/métodos , Bases de Dados de Proteínas , Complexos Multiproteicos/química , Coronavirus/química , Visualização de Dados , Bases de Dados de Compostos Químicos , Enzimas/química , Enzimas/metabolismo , Escherichia coli/química , Humanos , Cooperação Internacional , Anotação de Sequência Molecular , Complexos Multiproteicos/metabolismo , Interface Usuário-ComputadorRESUMO
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
Assuntos
Bases de Dados de Proteínas , Mapas de Interação de Proteínas/genética , Software , Humanos , Mapeamento de Interação de Proteínas/métodosRESUMO
The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members.
Assuntos
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Conjuntos de Dados como Assunto , Ontologia Genética , Bases de Conhecimento , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMO
SUMMARY: IntAct App is a Cytoscape 3 application that grants in-depth access to IntAct's molecular interaction data. It build networks where nodes are interacting molecules (mainly proteins, but also genes, RNA, chemicals ) and edges represent evidence of interaction. Users can query a network by providing its molecules, identified by different fields and optionally include all their interacting partners in the resulting network. The app offers three visualizations: one only displaying interactions, another representing every evidence and the last one emphasizing evidence where mutated versions of proteins were used. Users can also filter networks and click on nodes and edges to access all their related details. Finally, the application supports automation of its main features via Cytoscape commands. AVAILABILITY AND IMPLEMENTATION: Implementation available at https://apps.cytoscape.org/apps/intactapp, while the source code is available at https://github.com/EBI-IntAct/IntactApp.
RESUMO
MOTIVATION: A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called 'causal interaction' takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. RESULTS: Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. AVAILABILITY AND IMPLEMENTATION: The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Software , Causalidade , HumanosRESUMO
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database that collates and summarizes information on stable, macromolecular complexes of known function. It captures complex composition, topology and function and links out to a large range of domain-specific resources that hold more detailed data, such as PDB or Reactome. We have made several significant improvements since our last update, including improving compliance to the FAIR data principles by providing complex-specific, stable identifiers that include versioning. Protein complexes are now available from 20 species for download in standards-compliant formats such as PSI-XML, MI-JSON and ComplexTAB or can be accessed via an improved REST API. A component-based JS front-end framework has been implemented to drive a new website and this has allowed the use of APIs from linked services to import and visualize information such as the 3D structure of protein complexes, its role in reactions and pathways and the co-expression of complex components in the tissues of multi-cellular organisms. A first draft of the complete complexome of Saccharomyces cerevisiae is now available to browse and download.
Assuntos
Bases de Dados de Proteínas , Complexos Multiproteicos/química , Animais , Gráficos por Computador , Humanos , Substâncias Macromoleculares/química , Camundongos , Complexos Multiproteicos/metabolismo , Ácidos Nucleicos/química , Conformação ProteicaRESUMO
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
RESUMO
The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE.The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX "complete" submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteoma/metabolismo , Proteômica/métodos , Software , Internet , Reprodutibilidade dos Testes , Espectrometria de Massas em TandemRESUMO
The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the most-widely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the ProteomeXchange submission tool. Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive.
Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica , Peptídeos/química , Proteínas/química , Proteínas/metabolismo , Software , Interface Usuário-ComputadorRESUMO
The PRIDE (PRoteomics IDEntifications) database is one of the world-leading public repositories of mass spectrometry (MS)-based proteomics data and it is a founding member of the ProteomeXchange Consortium of proteomics resources. In the original PRIDE database system, users could access data programmatically by accessing the web services provided by the PRIDE BioMart interface. New REST (REpresentational State Transfer) web services have been developed to serve the most popular functionality provided by BioMart (now discontinued due to data scalability issues) and address the data access requirements of the newly developed PRIDE Archive. Using the API (Application Programming Interface) it is now possible to programmatically query for and retrieve peptide and protein identifications, project and assay metadata and the originally submitted files. Searching and filtering is also possible by metadata information, such as sample details (e.g. species and tissues), instrumentation (mass spectrometer), keywords and other provided annotations. The PRIDE Archive web services were first made available in April 2014. The API has already been adopted by a few applications and standalone tools such as PeptideShaker, PRIDE Inspector, the Unipept web application and the Python-based BioServices package. This application is free and open to all users with no login requirement and can be accessed at http://www.ebi.ac.uk/pride/ws/archive/.
Assuntos
Bases de Dados de Proteínas , Proteômica , Internet , Espectrometria de Massas , Proteínas/químicaRESUMO
The current catalogue of the human proteome is not yet complete, as experimental proteomics evidence is still elusive for a group of proteins known as the missing proteins. The Human Proteome Project (HPP) has been successfully using technology and bioinformatic resources to improve the characterization of such challenging proteins. In this manuscript, we propose a pipeline starting with the mining of the PRIDE database to select a group of data sets potentially enriched in missing proteins that are subsequently analyzed for protein identification with a method based on the statistical analysis of proteotypic peptides. Spermatozoa and the HEK293 cell line were found to be a promising source of missing proteins and clearly merit further attention in future studies. After the analysis of the selected samples, we found 342 PSMs, suggesting the presence of 97 missing proteins in human spermatozoa or the HEK293 cell line, while only 36 missing proteins were potentially detected in the retina, frontal cortex, aorta thoracica, or placenta. The functional analysis of the missing proteins detected confirmed their tissue specificity, and the validation of a selected set of peptides using targeted proteomics (SRM/MRM assays) further supports the utility of the proposed pipeline. As illustrative examples, DNAH3 and TEPP in spermatozoa, and UNCX and ATAD3C in HEK293 cells were some of the more robust and remarkable identifications in this study. We provide evidence indicating the relevance to carefully analyze the ever-increasing MS/MS data available from PRIDE and other repositories as sources for missing proteins detection in specific biological matrices as revealed for HEK293 cells.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteoma/análise , Aorta/química , Feminino , Lobo Frontal/química , Células HEK293 , Humanos , Masculino , Placenta/química , Gravidez , Proteômica/métodos , Retina/química , Espermatozoides/química , Espectrometria de Massas em TandemRESUMO
UNLABELLED: The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library. AVAILABILITY AND IMPLEMENTATION: The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online CONTACT: juan@ebi.ac.uk.
Assuntos
Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteínas/análise , Proteômica/métodos , Software , Humanos , Fragmentos de Peptídeos/análise , Fluxo de TrabalhoRESUMO
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R. We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.
Assuntos
Bases de Dados de Proteínas , Software , Acesso à Informação , Espectrometria de Massas , Metabolômica , Proteômica , Interface Usuário-ComputadorRESUMO
IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org).
Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Internet , SoftwareRESUMO
The Proteomics Standard Initiative Common QUery InterfaCe (PSICQUIC) specification was created by the Human Proteome Organization Proteomics Standards Initiative (HUPO-PSI) to enable computational access to molecular-interaction data resources by means of a standard Web Service and query language. Currently providing >150 million binary interaction evidences from 28 servers globally, the PSICQUIC interface allows the concurrent search of multiple molecular-interaction information resources using a single query. Here, we present an extension of the PSICQUIC specification (version 1.3), which has been released to be compliant with the enhanced standards in molecular interactions. The new release also includes a new reference implementation of the PSICQUIC server available to the data providers. It offers augmented web service capabilities and improves the user experience. PSICQUIC has been running for almost 5 years, with a user base growing from only 4 data providers to 28 (April 2013) allowing access to 151 310 109 binary interactions. The power of this web service is shown in PSICQUIC View web application, an example of how to simultaneously query, browse and download results from the different PSICQUIC servers. This application is free and open to all users with no login requirement (http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml).
Assuntos
Proteômica/normas , Software , InternetRESUMO
SUMMARY: BioJS is an open-source project whose main objective is the visualization of biological data in JavaScript. BioJS provides an easy-to-use consistent framework for bioinformatics application programmers. It follows a community-driven standard specification that includes a collection of components purposely designed to require a very simple configuration and installation. In addition to the programming framework, BioJS provides a centralized repository of components available for reutilization by the bioinformatics community. AVAILABILITY AND IMPLEMENTATION: http://code.google.com/p/biojs/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Gráficos por Computador , Software , Linguagens de ProgramaçãoRESUMO
The International Molecular Exchange Consortium (IMEx) has evolved into a vital partnership of open resources dedicated to curating molecular interaction data from the scientific literature. This consortium, which includes IntAct, MINT, MatrixDB, and DIP, is a collaborative effort with a central mission of aggregating detailed molecular interaction experimental evidence in a machine-readable format, supported by controlled vocabularies and standard ontologies. The IntAct molecular interaction database (www.ebi.ac.uk/intact), as an IMEx partner, serves as a valuable portal for accessing IMEx data through user-friendly search options and an array of interactive filters. The resource currently hosts an extensive repository of 1,293,508 binary interactions meticulously captured from 75,098 experiments documented in 23,366 publications (as of the February 2024 release), with this corpora being added to by regular data releases. IMEx curation policy has consistently prioritized a fine-grained data and curation model, with a focus on capturing the relevant experimental details essential for interpreting molecular interaction data effectively. Our curation process is designed to support the generation of interactomes tailored to contexts such as disease-specific or tissue-/cell-type-specific interactomes. These interactions are ranked according to a scoring system based on the Proteomics Standard Initiative Molecular Interaction (PSI MI) standards. This scoring system allows users to assess the degree of confidence in binary interactions, enhancing the value of the data. The resource provides insights into the nature of relationships among interacting partners as defined by the experimental setup and the associated biological context. Interactive filters enable users to navigate these rich, multilayered data, promoting a deeper understanding of biological complexity. Additionally, the IntAct website fosters the creation of networks for collaborative analyses by the scientific community. The recent transformation of the IntAct website, supported by a graph-type database, empowers users to execute custom queries tailored to their specific research interests. This article illustrates the diverse levels of annotations available for interactions and the multiple search options at users' disposal to access data of interest. © 2024 European Molecular Biology Laboratory, European Bioinformatics Institute. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Using Quick Search, network visualization, and filters Support Protocol: Accessing fine annotations from intact: Unlocking the molecular details Alternate Protocol: Using batch search: Querying multiple interactors Basic Protocol 2: Using advanced search: Precision and customization.
Assuntos
Metadados , Humanos , Bases de Dados de Proteínas , Bases de Dados Factuais , Biologia Computacional/métodos , Interface Usuário-Computador , Mapeamento de Interação de Proteínas/métodosRESUMO
Interacting proteins tend to have similar functions, influencing the same organismal traits. Interaction networks can be used to expand the list of candidate trait-associated genes from genome-wide association studies. Here, we performed network-based expansion of trait-associated genes for 1,002 human traits showing that this recovers known disease genes or drug targets. The similarity of network expansion scores identifies groups of traits likely to share an underlying genetic and biological process. We identified 73 pleiotropic gene modules linked to multiple traits, enriched in genes involved in processes such as protein ubiquitination and RNA processing. In contrast to gene deletion studies, pleiotropy as defined here captures specifically multicellular-related processes. We show examples of modules linked to human diseases enriched in genes with known pathogenic variants that can be used to map targets of approved drugs for repurposing. Finally, we illustrate the use of network expansion scores to study genes at inflammatory bowel disease genome-wide association study loci, and implicate inflammatory bowel disease-relevant genes with strong functional and genetic support.
Assuntos
Biologia Celular , Células , Doença , Estudos de Associação Genética , Pleiotropia Genética , Estudos de Associação Genética/métodos , Humanos , Ubiquitinação/genética , Processamento Pós-Transcricional do RNA/genética , Células/metabolismo , Células/patologia , Reposicionamento de Medicamentos/métodos , Reposicionamento de Medicamentos/tendências , Doença/genética , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/patologia , Estudo de Associação Genômica Ampla , Fenótipo , Doenças Autoimunes/genética , Doenças Autoimunes/patologiaRESUMO
The International Molecular Exchange (IMEx) Consortium provides scientists with a single body of experimentally verified protein interactions curated in rich contextual detail to an internationally agreed standard. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Additionally, we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.