Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Biomed Semantics ; 14(1): 20, 2023 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-38066573

RESUMO

BACKGROUND: Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. OBJECTIVE: We aim to understand how to incorporate multimodal data into biomedical KG embeddings, and analyze the resulting performance in comparison with traditional methods. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. RESULTS: In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. Further analyses show that incorporating attribute data does outperform baselines over entities below a certain node degree, comprising approximately 75% of the diseases in the graph. We also observe that optimizing attribute encoders is a challenging task that increases optimization costs. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. CONCLUSION: BioBLP allows to investigate different ways of incorporating multimodal biomedical data for learning representations in KGs. With a particular implementation, we find that incorporating attribute data does not consistently outperform baselines, but improvements are obtained on a comparatively large subset of entities below a specific node-degree. Our results indicate a potential for improved performance in scientific discovery tasks where understudied areas of the KG would benefit from link prediction methods.


Assuntos
Reconhecimento Automatizado de Padrão
2.
PeerJ Comput Sci ; 8: e1073, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36426239

RESUMO

In this article, we describe a reproduction of the Relational Graph Convolutional Network (RGCN). Using our reproduction, we explain the intuition behind the model. Our reproduction results empirically validate the correctness of our implementations using benchmark Knowledge Graph datasets on node classification and link prediction tasks. Our explanation provides a friendly understanding of the different components of the RGCN for both users and researchers extending the RGCN approach. Furthermore, we introduce two new configurations of the RGCN that are more parameter efficient. The code and datasets are available at https://github.com/thiviyanT/torch-rgcn.

3.
J Biomed Semantics ; 13(1): 4, 2022 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-35101121

RESUMO

BACKGROUND: Electronic Laboratory Notebooks (ELNs) are used to document experiments and investigations in the wet-lab. Protocols in ELNs contain a detailed description of the conducted steps including the necessary information to understand the procedure and the raised research data as well as to reproduce the research investigation. The purpose of this study is to investigate whether such ELN protocols can be used to create semantic documentation of the provenance of research data by the use of ontologies and linked data methodologies. METHODS: Based on an ELN protocol of a biomedical wet-lab experiment, a retrospective provenance model of the raised research data describing the details of the experiment in a machine-interpretable way is manually engineered. Furthermore, an automated approach for knowledge acquisition from ELN protocols is derived from these results. This structure-based approach exploits the structure in the experiment's description such as headings, tables, and links, to translate the ELN protocol into a semantic knowledge representation. To satisfy the Findable, Accessible, Interoperable, and Reuseable (FAIR) guiding principles, a ready-to-publish bundle is created that contains the research data together with their semantic documentation. RESULTS: While the manual modelling efforts serve as proof of concept by employing one protocol, the automated structure-based approach demonstrates the potential generalisation with seven ELN protocols. For each of those protocols, a ready-to-publish bundle is created and, by employing the SPARQL query language, it is illustrated that questions about the processes and the obtained research data can be answered. CONCLUSIONS: The semantic documentation of research data obtained from the ELN protocols allows for the representation of the retrospective provenance of research data in a machine-interpretable way. Research Object Crate (RO-Crate) bundles including these models enable researchers to easily share the research data including the corresponding documentation, but also to search and relate the experiment to each other.


Assuntos
Documentação , Bases de Conhecimento , Documentação/métodos , Eletrônica , Estudos Retrospectivos , Web Semântica
4.
Patterns (N Y) ; 2(12): 100397, 2021 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-34950910

RESUMO

Many computational models rely on real-world data, and the steps required in moving from data collection, to data preparation, to model calibration, and input are becoming increasingly complex. Errors in data can lead to errors in model output that might invalidate conclusions in extreme cases. While the challenge of errors in data collection have been analyzed in the literature, here we highlight the importance of data handling in the modeling and simulation process, and how particular data handling errors can lead to errors in model output. We develop a framework for assessing the impact of potential data errors for models of spreading processes on networks, a broad class of models that capture many important real-world phenomena (e.g., epidemics, rumor spread, etc.). We focus on the susceptible-infected-removed (SIR) and Threshold models and examine how systematic errors in data handling impact the predicted spread of a virus (or information). Our results demonstrate that data handling errors can have significant impact on model conclusions especially in critical regions of a system.

5.
F1000Res ; 10: 897, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34804501

RESUMO

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional , Benchmarking , Software , Fluxo de Trabalho
6.
Patterns (N Y) ; 1(8): 100136, 2020 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-33294873

RESUMO

The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper, we explore potential reuse features through a literature review and present a case study on datasets on GitHub, a popular open platform for sharing code and data. We describe a corpus of more than 1.4 million data files, from over 65,000 repositories. Using GitHub's engagement metrics as proxies for dataset reuse, we relate them to reuse features from the literature and devise an initial model, using deep neural networks, to predict a dataset's reusability. This demonstrates the practical gap between principles and actionable insights that allow data publishers and tools designers to implement functionalities that provably facilitate reuse.

7.
J Assoc Inf Sci Technol ; 70(5): 419-432, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-31763358

RESUMO

A cross-disciplinary examination of the user behaviors involved in seeking and evaluating data is surprisingly absent from the research data discussion. This review explores the data retrieval literature to identify commonalities in how users search for and evaluate observational research data in selected disciplines. Two analytical frameworks, rooted in information retrieval and science and technology studies, are used to identify key similarities in practices as a first step toward developing a model describing data retrieval.

9.
PeerJ ; 5: e3997, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29134146

RESUMO

Robotic labs, in which experiments are carried out entirely by robots, have the potential to provide a reproducible and transparent foundation for performing basic biomedical laboratory experiments. In this article, we investigate whether these labs could be applicable in current experimental practice. We do this by text mining 1,628 papers for occurrences of methods that are supported by commercial robotic labs. Using two different concept recognition tools, we find that 86%-89% of the papers have at least one of these methods. This and our other results provide indications that robotic labs can serve as the foundation for performing many lab-based experiments.

10.
PeerJ ; 4: e2331, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27602295

RESUMO

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

11.
Sci Data ; 3: 160018, 2016 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-26978244

RESUMO

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.


Assuntos
Coleta de Dados , Curadoria de Dados , Projetos de Pesquisa , Sistemas de Gerenciamento de Base de Dados , Guias como Assunto , Reprodutibilidade dos Testes
12.
Drug Discov Today ; 20(4): 399-405, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25463038

RESUMO

Modern data-driven drug discovery requires integrated resources to support decision-making and enable new discoveries. The Open PHACTS Discovery Platform (http://dev.openphacts.org) was built to address this requirement by focusing on drug discovery questions that are of high priority to the pharmaceutical industry. Although complex, most of these frequently asked questions (FAQs) revolve around the combination of data concerning compounds, targets, pathways and diseases. Computational drug discovery using workflow tools and the integrated resources of Open PHACTS can deliver answers to most of these questions. Here, we report on a selection of workflows used for solving these use cases and discuss some of the research challenges. The workflows are accessible online from myExperiment (http://www.myexperiment.org) and are available for reuse by the scientific community.


Assuntos
Biologia Computacional , Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Técnicas de Apoio para a Decisão , Descoberta de Drogas/métodos , Preparações Farmacêuticas/química , Fluxo de Trabalho , Acesso à Informação , Mineração de Dados , Humanos , Estrutura Molecular , Transdução de Sinais/efeitos dos fármacos , Relação Estrutura-Atividade , Integração de Sistemas
15.
Drug Discov Today ; 17(21-22): 1188-98, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22683805

RESUMO

Open PHACTS is a public-private partnership between academia, publishers, small and medium sized enterprises and pharmaceutical companies. The goal of the project is to deliver and sustain an 'open pharmacological space' using and enhancing state-of-the-art semantic web standards and technologies. It is focused on practical and robust applications to solve specific questions in drug discovery research. OPS is intended to facilitate improvements in drug discovery in academia and industry and to support open innovation and in-house non-public drug discovery research. This paper lays out the challenges and how the Open PHACTS project is hoping to address these challenges technically and socially.


Assuntos
Descoberta de Drogas/organização & administração , Indústria Farmacêutica/organização & administração , Parcerias Público-Privadas/organização & administração , Desenho de Fármacos , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Inovação Organizacional , Pesquisa/organização & administração , Semântica
16.
Nat Genet ; 43(4): 281-3, 2011 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-21445068

RESUMO

Data citation and the derivation of semantic constructs directly from datasets have now both found their place in scientific communication. The social challenge facing us is to maintain the value of traditional narrative publications and their relationship to the datasets they report upon while at the same time developing appropriate metrics for citation of data and data constructs.


Assuntos
Bases de Dados Genéticas , Comunicação , Variação Genética , Humanos , Bases de Conhecimento , Editoração
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...