RESUMO
The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.
Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Transdução de Sinais , Humanos , Redes e Vias Metabólicas/genética , Proteoma/genéticaRESUMO
MOTIVATION: Chemical reaction networks (CRNs) play a pivotal role in diverse fields such as systems biology, biochemistry, chemical engineering, and epidemiology. High-level definitions of CRNs enables to use various simulation approaches, including deterministic and stochastic methods, from the same model. However, existing Python tools for simulation of CRN typically wrap external C/C++ libraries for model definition, translation into equations and/or numerically solving them, limiting their extensibility and integration with the broader Python ecosystem. RESULTS: In response, we developed Poincaré and SimBio, two novel Python packages for simulation of dynamical systems and CRNs. Poincaré serves as a foundation for dynamical systems modeling, while SimBio extends this functionality to CRNs, including support for the Systems Biology Markup Language (SBML). Poincaré and SimBio are developed as pure Python packages enabling users to easily extend their simulation capabilities by writing new or leveraging other Python packages. Moreover, this does not compromise the performance, as code can be just-in-time compiled with Numba. Our benchmark tests using curated models from the BioModels repository demonstrate that these tools may provide a potentially superior performance advantage compared to other existing tools. In addition, to ensure a user-friendly experience, our packages use standard typed modern Python syntax that provides a seamless integration with integrated development environments. Our Python-centric approach significantly enhances code analysis, error detection, and refactoring capabilities, positioning Poincaré and SimBio as valuable tools for the modeling community. AVAILABILITY AND IMPLEMENTATION: Poincaré and SimBio are released under the MIT license. Their source code is available on GitHub (https://github.com/maurosilber/poincare and https://github.com/hgrecco/simbio) and can be installed from PyPI or conda-forge.
Assuntos
Linguagens de Programação , Software , Biologia de Sistemas , Biologia de Sistemas/métodos , Simulação por Computador , Modelos BiológicosRESUMO
MOTIVATION: ReactomeGSA is part of the Reactome knowledgebase and one of the leading multi-omics pathway analysis platforms. ReactomeGSA provides access to quantitative pathway analysis methods supporting different 'omics data types. Additionally, ReactomeGSA can process different datasets simultaneously, leading to a comparative pathway analysis that can also be performed across different species. RESULTS: We present a major update to the ReactomeGSA analysis platforms that greatly simplifies the reuse and direct integration of public data. In order to increase the number of available datasets, we developed the new grein_loader Python application that can directly fetch experiments from the GREIN resource. This enabled us to support both EMBL-EBI's Expression Atlas and GEO RNA-seq Experiments Interactive Navigator within ReactomeGSA. To further increase the visibility and simplify the reuse of public datasets, we integrated a novel search function into ReactomeGSA that enables users to search for public datasets across both supported resources. Finally, we completely re-developed ReactomeGSA's web-frontend and R/Bioconductor package to support the new search and loading features, and greatly simplify the use of ReactomeGSA. AVAILABILITY AND IMPLEMENTATION: The new ReactomeGSA web frontend is available at https://www.reactome.org/gsa with an built-in, interactive tutorial. The ReactomeGSA R package (https://bioconductor.org/packages/release/bioc/html/ReactomeGSA.html) is available through Bioconductor and shipped with detailed documentation and vignettes. The grein_loader Python application is available through the Python Package Index (pypi). The complete source code for all applications is available on GitHub at https://github.com/grisslab/grein_loader and https://github.com/reactome.
Assuntos
Software , Humanos , Biologia Computacional/métodos , Bases de ConhecimentoRESUMO
Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.
Assuntos
Estudo de Associação Genômica Ampla , Trombose , Humanos , Bancos de Espécimes Biológicos , Hemostasia , Hemorragia/genética , Doenças RarasRESUMO
Computational models are often employed in systems biology to study the dynamic behaviours of complex systems. With the rise in the number of computational models, finding ways to improve the reusability of these models and their ability to reproduce virtual experiments becomes critical. Correct and effective model annotation in community-supported and standardised formats is necessary for this improvement. Here, we present recent efforts toward a common framework for annotated, accessible, reproducible and interoperable computational models in biology, and discuss key challenges of the field.
Assuntos
Biologia Computacional , Biologia de Sistemas , Simulação por Computador , Reprodutibilidade dos TestesRESUMO
The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.
Assuntos
Bases de Dados de Proteínas , Proteoma/genética , Proteômica , Software , Big Data , Biologia Computacional/normas , Disseminação de InformaçãoRESUMO
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.
Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais , Academias e Institutos , Inteligência Artificial , COVID-19 , Bases de Dados Factuais/economia , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , RNA não Traduzido/genética , SARS-CoV-2/genéticaRESUMO
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.
Assuntos
Curadoria de Dados/métodos , Bases de Dados de Proteínas , Complexos Multiproteicos/química , Coronavirus/química , Visualização de Dados , Bases de Dados de Compostos Químicos , Enzimas/química , Enzimas/metabolismo , Escherichia coli/química , Humanos , Cooperação Internacional , Anotação de Sequência Molecular , Complexos Multiproteicos/metabolismo , Interface Usuário-ComputadorRESUMO
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
Assuntos
Bases de Dados de Proteínas , Mapas de Interação de Proteínas/genética , Software , Humanos , Mapeamento de Interação de Proteínas/métodosRESUMO
The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.
Assuntos
Antivirais/farmacologia , Bases de Conhecimento , Proteínas/metabolismo , COVID-19/metabolismo , Curadoria de Dados , Genoma Humano , Interações Hospedeiro-Patógeno , Humanos , Proteínas/genética , Transdução de Sinais , SoftwareRESUMO
Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.
Assuntos
Simulação por Computador , Software , Humanos , Bioengenharia , Modelos Biológicos , Sistema de Registros , PesquisadoresRESUMO
Understanding autoimmunity to endogenous proteins is crucial in diagnosing and treating autoimmune diseases. In this work, we developed a user-friendly AAgAtlas portal (http://biokb.ncpsb.org.cn/aagatlas_portal/index.php#), which can be used to search for 8045 non-redundant autoantigens (AAgs) and 47 post-translationally modified AAgs against 1073 human diseases that are prioritized by a credential score developed by multisource evidence. Using AAgAtlas, the immunogenic properties of human AAgs was systematically elucidated according to their genetic, biophysical, cytological, expression profile, and evolutionary characteristics. The results indicated that human AAgs are evolutionally conserved in protein sequence and enriched in three hydrophilic and polar amino acid residues (K, D, and E) that are located at the protein surface. AAgs are enriched in proteins that are involved in nucleic acid binding, transferase, and the cytoskeleton. Genome, transcriptome, and proteome analyses further indicated that AAb production is associated with gene variance and abnormal protein expression related to the pathological activities of different tumors. Collectively, our data outlines the hallmarks of human AAgs that facilitate the understanding of humoral autoimmunity and the identification of biomarkers of human diseases.
Assuntos
Autoantígenos , Doenças Autoimunes , Humanos , Autoantígenos/genética , Autoanticorpos , Doenças Autoimunes/genética , Autoimunidade , Sequência de AminoácidosRESUMO
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Assuntos
Proteoma , Proteômica , Humanos , Padrões de Referência , Vocabulário Controlado , Espectrometria de Massas , Bases de Dados de ProteínasRESUMO
The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members.
Assuntos
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Conjuntos de Dados como Assunto , Ontologia Genética , Bases de Conhecimento , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMO
MOTIVATION: Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be surveyed. However, curation interfaces are often complex and challenging to be further developed. Therefore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources or a reluctance to change sensitive production systems. RESULTS: We propose a decoupled, modular and scriptable architecture to build new curation tools on top of existing platforms. Our architecture treats the existing platform as a black box. It, therefore, only relies on its public application programming interfaces and web application instead of requiring any changes to the existing infrastructure. As a case study, we have implemented this architecture in cmd-iaso, a curation tool for the identifiers.org registry. With cmd-iaso, we also show that the proposed design's flexibility can be utilized to streamline and enhance the curator's workflow with the platform's existing web interface. AVAILABILITYAND IMPLEMENTATION: The cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/cmd-iaso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
SUMMARY: IntAct App is a Cytoscape 3 application that grants in-depth access to IntAct's molecular interaction data. It build networks where nodes are interacting molecules (mainly proteins, but also genes, RNA, chemicals ) and edges represent evidence of interaction. Users can query a network by providing its molecules, identified by different fields and optionally include all their interacting partners in the resulting network. The app offers three visualizations: one only displaying interactions, another representing every evidence and the last one emphasizing evidence where mutated versions of proteins were used. Users can also filter networks and click on nodes and edges to access all their related details. Finally, the application supports automation of its main features via Cytoscape commands. AVAILABILITY AND IMPLEMENTATION: Implementation available at https://apps.cytoscape.org/apps/intactapp, while the source code is available at https://github.com/EBI-IntAct/IntactApp.
RESUMO
MOTIVATION: Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud-based e-infrastructures, by providing high availability and low-latency cloud-based services, backed by a high-quality, manually curated resource. RESULTS: We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third-party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. AVAILABILITYAND IMPLEMENTATION: https://identifiers.org.
Assuntos
Disciplinas das Ciências Biológicas , Computação em Nuvem , HumanosRESUMO
MOTIVATION: A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called 'causal interaction' takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. RESULTS: Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. AVAILABILITY AND IMPLEMENTATION: The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Software , Causalidade , HumanosRESUMO
Reproducibility of scientific results is a key element of science and credibility. The lack of reproducibility across many scientific fields has emerged as an important concern. In this piece, we assess mathematical model reproducibility and propose a scorecard for improving reproducibility in this field.
Assuntos
Biologia de Sistemas/métodos , Curadoria de Dados , Humanos , Modelos Teóricos , Reprodutibilidade dos TestesRESUMO
Pathway analyses are key methods to analyze 'omics experiments. Nevertheless, integrating data from different 'omics technologies and different species still requires considerable bioinformatics knowledge.Here we present the novel ReactomeGSA resource for comparative pathway analyses of multi-omics datasets. ReactomeGSA can be used through Reactome's existing web interface and the novel ReactomeGSA R Bioconductor package with explicit support for scRNA-seq data. Data from different species is automatically mapped to a common pathway space. Public data from ExpressionAtlas and Single Cell ExpressionAtlas can be directly integrated in the analysis. ReactomeGSA greatly reduces the technical barrier for multi-omics, cross-species, comparative pathway analyses.We used ReactomeGSA to characterize the role of B cells in anti-tumor immunity. We compared B cell rich and poor human cancer samples from five of the Cancer Genome Atlas (TCGA) transcriptomics and two of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteomics studies. B cell-rich lung adenocarcinoma samples lacked the otherwise present activation through NFkappaB. This may be linked to the presence of a specific subset of tumor associated IgG+ plasma cells that lack NFkappaB activation in scRNA-seq data from human melanoma. This showcases how ReactomeGSA can derive novel biomedical insights by integrating large multi-omics datasets.