RESUMEN
WikiPathways (wikipathways.org) is an open-source biological pathway database. Collaboration and open science are pivotal to the success of WikiPathways. Here we highlight the continuing efforts supporting WikiPathways, content growth and collaboration among pathway researchers. As an evolving database, there is a growing need for WikiPathways to address and overcome technical challenges. In this direction, WikiPathways has undergone major restructuring, enabling a renewed approach for sharing and curating pathway knowledge, thus providing stability for the future of community pathway curation. The website has been redesigned to improve and enhance user experience. This next generation of WikiPathways continues to support existing features while improving maintainability of the database and facilitating community input by providing new functionality and leveraging automation.
Asunto(s)
Bases de Datos FactualesRESUMEN
To grasp the complexity of biological processes, the biological knowledge is often translated into schematic diagrams of, for example, signalling and metabolic pathways. These pathway diagrams describe relevant connections between biological entities and incorporate domain knowledge in a visual format making it easier for humans to interpret. Still, these diagrams can be represented in machine readable formats, as done in the KEGG, Reactome, and WikiPathways databases. However, while humans are good at interpreting the message of the creators of diagrams, algorithms struggle when the diversity in drawing approaches increases. WikiPathways supports multiple drawing styles which need harmonizing to offer semantically enriched access. Particularly challenging, here, are the interactions between the biological entities that underlie the biological causality. These interactions provide information about the biological process (metabolic conversion, inhibition, etc.), the direction, and the participating entities. Availability of the interactions in a semantic and harmonized format is essential for searching the full network of biological interactions. We here study how the graphically-modelled biological knowledge in diagrams can be semantified and harmonized, and exemplify how the resulting data is used to programmatically answer biological questions. We find that we can translate graphically modelled knowledge to a sufficient degree into a semantic model and discuss some of the current limitations. We then use this to show that reproducible notebooks can be used to explore up- and downstream targets of MECP2 and to analyse the sphingolipid metabolism. Our results demonstrate that most of the graphical biological knowledge from WikiPathways is modelled into the semantic layer with the semantic information intact and connectivity information preserved. Being able to evaluate how biological elements affect each other is useful and allows, for example, the identification of up or downstream targets that will have a similar effect when modified.
Asunto(s)
Fenómenos Biológicos , Transducción de Señal , Algoritmos , Bases de Datos Factuales , Humanos , Redes y Vías Metabólicas , Transducción de Señal/fisiologíaRESUMEN
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.
Asunto(s)
Curaduría de Datos/métodos , Bases de Datos de Proteínas , Complejos Multiproteicos/química , Coronavirus/química , Visualización de Datos , Bases de Datos de Compuestos Químicos , Enzimas/química , Enzimas/metabolismo , Escherichia coli/química , Humanos , Cooperación Internacional , Anotación de Secuencia Molecular , Complejos Multiproteicos/metabolismo , Interfaz Usuario-ComputadorRESUMEN
BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
Asunto(s)
COVID-19/patología , Genómica/métodos , Bases del Conocimiento , Proteómica/métodos , SARS-CoV-2/fisiología , COVID-19/metabolismo , COVID-19/virología , Coronavirus/genética , Coronavirus/fisiología , Infecciones por Coronavirus/metabolismo , Infecciones por Coronavirus/patología , Infecciones por Coronavirus/virología , Genoma Viral , Humanos , Internet , Pandemias , SARS-CoV-2/genética , Proteínas Virales/genética , Proteínas Virales/metabolismo , Flujo de TrabajoRESUMEN
WikiPathways (https://www.wikipathways.org) is a biological pathway database known for its collaborative nature and open science approaches. With the core idea of the scientific community developing and curating biological knowledge in pathway models, WikiPathways lowers all barriers for accessing and using its content. Increasingly more content creators, initiatives, projects and tools have started using WikiPathways. Central in this growth and increased use of WikiPathways are the various communities that focus on particular subsets of molecular pathways such as for rare diseases and lipid metabolism. Knowledge from published pathway figures helps prioritize pathway development, using optical character and named entity recognition. We show the growth of WikiPathways over the last three years, highlight the new communities and collaborations of pathway authors and curators, and describe various technologies to connect to external resources and initiatives. The road toward a sustainable, community-driven pathway database goes through integration with other resources such as Wikidata and allowing more use, curation and redistribution of WikiPathways content.
Asunto(s)
Bases de Datos Factuales , COVID-19/patología , Curaduría de Datos , Humanos , Publicaciones , Interfaz Usuario-ComputadorRESUMEN
Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Biología Computacional , Bases de Datos Factuales , Genómica , Proteómica , Humanos , Reconocimiento de Normas Patrones AutomatizadasRESUMEN
Open PHACTS is a pre-competitive project to answer scientific questions developed recently by the pharmaceutical industry. Having high quality biological interaction information in the Open PHACTS Discovery Platform is needed to answer multiple pathway related questions. To address this, updated WikiPathways data has been added to the platform. This data includes information about biological interactions, such as stimulation and inhibition. The platform's Application Programming Interface (API) was extended with appropriate calls to reference these interactions. These new methods of the Open PHACTS API are available now.
Asunto(s)
Antineoplásicos/farmacología , Investigación Biomédica , Biología Computacional/métodos , Descubrimiento de Drogas , Almacenamiento y Recuperación de la Información/métodos , Transducción de Señal , Programas Informáticos , Industria Farmacéutica , Humanos , Hipertrofia/tratamiento farmacológico , Hipertrofia/patología , Miocitos Cardíacos/citología , Miocitos Cardíacos/efectos de los fármacos , Neoplasias/tratamiento farmacológico , Neoplasias/patologíaRESUMEN
WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities.
Asunto(s)
Bases de Datos de Compuestos Químicos , Metabolómica , Animales , Curaduría de Datos , Minería de Datos , Bases de Datos de Compuestos Químicos/normas , Bases de Datos Genéticas , Humanos , Redes y Vías Metabólicas , Control de Calidad , Motor de Búsqueda , Programas InformáticosRESUMEN
With the advancement of genome-sequencing technologies, new genomes are being sequenced daily. Although these sequences are deposited in publicly available data warehouses, their functional and genomic annotations (beyond genes which are predicted automatically) mostly reside in the text of primary publications. Professional curators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don't exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner. Here, we describe WikiGenomes (wikigenomes.org), a web application that facilitates the consumption and curation of genomic data by the entire scientific community. WikiGenomes is based on Wikidata, an openly editable knowledge graph with the goal of aggregating published knowledge into a free and open database. WikiGenomes empowers the individual genomic researcher to contribute their expertise to the curation effort and integrates the knowledge into Wikidata, enabling it to be accessed by anyone without restriction. Database URL: www.wikigenomes.org.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma , Internet , Anotación de Secuencia Molecular/métodos , Anotación de Secuencia Molecular/normasRESUMEN
The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.
Asunto(s)
Ontologías Biológicas , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Internet , Semántica , Investigación Biomédica , HumanosRESUMEN
Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Semántica , Animales , Humanos , Ratones , Modelos Teóricos , Motor de BúsquedaRESUMEN
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Asunto(s)
Recolección de Datos , Curaduría de Datos , Proyectos de Investigación , Sistemas de Administración de Bases de Datos , Guías como Asunto , Reproducibilidad de los ResultadosRESUMEN
The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata's semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteriaChlamydia trachomatis.Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling â¼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert.
Asunto(s)
Curaduría de Datos , Genoma Microbiano , Modelos Teóricos , Femenino , Ontología de Genes , Genes Bacterianos , Humanos , Anotación de Secuencia Molecular , Operón/genética , Motor de BúsquedaRESUMEN
WikiPathways (http://www.wikipathways.org) is an open, collaborative platform for capturing and disseminating models of biological pathways for data visualization and analysis. Since our last NAR update, 4 years ago, WikiPathways has experienced massive growth in content, which continues to be contributed by hundreds of individuals each year. New aspects of the diversity and depth of the collected pathways are described from the perspective of researchers interested in using pathway information in their studies. We provide updates on extensions and services to support pathway analysis and visualization via popular standalone tools, i.e. PathVisio and Cytoscape, web applications and common programming environments. We introduce the Quick Edit feature for pathway authors and curators, in addition to new means of publishing pathways and maintaining custom pathway collections to serve specific research topics and communities. In addition to the latest milestones in our pathway collection and curation effort, we also highlight the latest means to access the content as publishable figures, as standard data files, and as linked data, including bulk and programmatic access.
Asunto(s)
Bases de Datos de Compuestos Químicos , Modelos Biológicos , Perfilación de la Expresión Génica , Genes , Humanos , MetabolómicaRESUMEN
Integration of open access, curated, high-quality information from multiple disciplines in the Life and Biomedical Sciences provides a holistic understanding of the domain. Additionally, the effective linking of diverse data sources can unearth hidden relationships and guide potential research strategies. However, given the lack of consistency between descriptors and identifiers used in different resources and the absence of a simple mechanism to link them, gathering and combining relevant, comprehensive information from diverse databases remains a challenge. The Open Pharmacological Concepts Triple Store (Open PHACTS) is an Innovative Medicines Initiative project that uses semantic web technology approaches to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems. The project draws together sources of publicly-available pharmacological, physicochemical and biomolecular data, represents it in a stable infrastructure and provides well-defined information exploration and retrieval methods. Here, we highlight the utility of this platform in conjunction with workflow tools to solve pharmacological research questions that require interoperability between target, compound, and pathway data. Use cases presented herein cover 1) the comprehensive identification of chemical matter for a dopamine receptor drug discovery program 2) the identification of compounds active against all targets in the Epidermal growth factor receptor (ErbB) signaling pathway that have a relevance to disease and 3) the evaluation of established targets in the Vitamin D metabolism pathway to aid novel Vitamin D analogue design. The example workflows presented illustrate how the Open PHACTS Discovery Platform can be used to exploit existing knowledge and generate new hypotheses in the process of drug discovery.