Pesquisa | BVS Integralidade em Saúde

1.

BioSimulators: a central registry of simulation engines and services for recommending specific tools.

Shaikh, Bilal; Smith, Lucian P; Vasilescu, Dan; Marupilla, Gnaneswara; Wilson, Michael; Agmon, Eran; Agnew, Henry; Andrews, Steven S; Anwar, Azraf; Beber, Moritz E; Bergmann, Frank T; Brooks, David; Brusch, Lutz; Calzone, Laurence; Choi, Kiri; Cooper, Joshua; Detloff, John; Drawert, Brian; Dumontier, Michel; Ermentrout, G Bard; Faeder, James R; Freiburger, Andrew P; Fröhlich, Fabian; Funahashi, Akira; Garny, Alan; Gennari, John H; Gleeson, Padraig; Goelzer, Anne; Haiman, Zachary; Hasenauer, Jan; Hellerstein, Joseph L; Hermjakob, Henning; Hoops, Stefan; Ison, Jon C; Jahn, Diego; Jakubowski, Henry V; Jordan, Ryann; Kalas, Matús; König, Matthias; Liebermeister, Wolfram; Sheriff, Rahuman S Malik; Mandal, Synchon; McDougal, Robert; Medley, J Kyle; Mendes, Pedro; Müller, Robert; Myers, Chris J; Naldi, Aurelien; Nguyen, Tung V N; Nickerson, David P.

Nucleic Acids Res ; 50(W1): W108-W114, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35524558

RESUMO

Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.

Assuntos

Simulação por Computador , Software , Humanos , Bioengenharia , Modelos Biológicos , Sistema de Registros , Pesquisadores

2.

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy.

Sun, Chang; van Soest, Johan; Dumontier, Michel.

J Biomed Inform ; 143: 104404, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37268168

RESUMO

A large amount of personal health data that is highly valuable to the scientific community is still not accessible or requires a lengthy request process due to privacy concerns and legal restrictions. As a solution, synthetic data has been studied and proposed to be a promising alternative to this issue. However, generating realistic and privacy-preserving synthetic personal health data retains challenges such as simulating the characteristics of the patients' data that are in the minority classes, capturing the relations among variables in imbalanced data and transferring them to the synthetic data, and preserving individual patients' privacy. In this paper, we propose a differentially private conditional Generative Adversarial Network model (DP-CGANS) consisting of data transformation, sampling, conditioning, and network training to generate realistic and privacy-preserving personal data. Our model distinguishes categorical and continuous variables and transforms them into latent space separately for better training performance. We tackle the unique challenges of generating synthetic patient data due to the special data characteristics of personal health data. For example, patients with a certain disease are typically the minority in the dataset and the relations among variables are crucial to be observed. Our model is structured with a conditional vector as an additional input to present the minority class in the imbalanced data and maximally capture the dependency between variables. Moreover, we inject statistical noise into the gradients in the networking training process of DP-CGANS to provide a differential privacy guarantee. We extensively evaluate our model with state-of-the-art generative models on personal socio-economic datasets and real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement. We demonstrate that our model outperforms other comparable models, especially in capturing the dependence between variables. Finally, we present the balance between data utility and privacy in synthetic data generation considering the different data structures and characteristics of real-world personal health data such as imbalanced classes, abnormal distributions, and data sparsity.

Assuntos

Aprendizado de Máquina , Privacidade , Humanos , Grupos Minoritários

3.

Studying the association of diabetes and healthcare cost on distributed data from the Maastricht Study and Statistics Netherlands using a privacy-preserving federated learning infrastructure.

Sun, Chang; van Soest, Johan; Koster, Annemarie; Eussen, Simone J P M; Schram, Miranda T; Stehouwer, Coen D A; Dagnelie, Pieter C; Dumontier, Michel.

J Biomed Inform ; 134: 104194, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36064113

RESUMO

The mining of personal data collected by multiple organizations remains challenging in the presence of technical barriers, privacy concerns, and legal and/or organizational restrictions. While a number of privacy-preserving and data mining frameworks have recently emerged, much remains to show their practical utility. In this study, we implement and utilize a secure infrastructure using data from Statistics Netherlands and the Maastricht Study to learn the association between Type 2 Diabetes Mellitus (T2DM) and healthcare expenses considering the impact of lifestyle, physical activities, and complications of T2DM. Through experiments using real-world distributed personal data, we present the feasibility and effectiveness of the secure infrastructure for practical use cases of linking and analyzing vertically partitioned data across multiple organizations. We discovered that individuals diagnosed with T2DM had significantly higher expenses than those with prediabetes, while participants with prediabetes spent more than those without T2DM in all the included healthcare categories to different degrees. We further discuss a joint effort from technical, ethical-legal, and domain-specific experts that is highly valued for applying such a secure infrastructure to real-life use cases to protect data privacy.

Assuntos

Diabetes Mellitus Tipo 2 , Estado Pré-Diabético , Diabetes Mellitus Tipo 2/terapia , Custos de Cuidados de Saúde , Humanos , Países Baixos , Privacidade

4.

Ten simple rules for making training materials FAIR.

Garcia, Leyla; Batut, Bérénice; Burke, Melissa L; Kuzak, Mateusz; Psomopoulos, Fotis; Arcila, Ricardo; Attwood, Teresa K; Beard, Niall; Carvalho-Silva, Denise; Dimopoulos, Alexandros C; Del Angel, Victoria Dominguez; Dumontier, Michel; Gurwitz, Kim T; Krause, Roland; McQuilton, Peter; Le Pera, Loredana; Morgan, Sarah L; Rauste, Päivi; Via, Allegra; Kahlem, Pascal; Rustici, Gabriella; van Gelder, Celia W G; Palagi, Patricia M.

PLoS Comput Biol ; 16(5): e1007854, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32437350

RESUMO

Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.

Assuntos

Instrução por Computador/normas , Guias como Assunto , Biologia/educação , Biologia Computacional , Humanos , Armazenamento e Recuperação da Informação

5.

Relation extraction from DailyMed structured product labels by optimally combining crowd, experts and machines.

Shingjergji, Krist; Celebi, Remzi; Scholtes, Jan; Dumontier, Michel.

J Biomed Inform ; 122: 103902, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34481057

RESUMO

The effectiveness of machine learning models to provide accurate and consistent results in drug discovery and clinical decision support is strongly dependent on the quality of the data used. However, substantive amounts of open data that drive drug discovery suffer from a number of issues including inconsistent representation, inaccurate reporting, and incomplete context. For example, databases of FDA-approved drug indications used in computational drug repositioning studies do not distinguish between treatments that simply offer symptomatic relief from those that target the underlying pathology. Moreover, drug indication sources often lack proper provenance and have little overlap. Consequently, new predictions can be of poor quality as they offer little in the way of new insights. Hence, work remains to be done to establish higher quality databases of drug indications that are suitable for use in drug discovery and repositioning studies. Here, we report on the combination of weak supervision (i.e., programmatic labeling and crowdsourcing) and deep learning methods for relation extraction from DailyMed text to create a higher quality drug-disease relation dataset. The generated drug-disease relation data shows a high overlap with DrugCentral, a manually curated dataset. Using this dataset, we constructed a machine learning model to classify relations between drugs and diseases from text into four categories; treatment, symptomatic relief, contradiction, and effect, exhibiting an improvement of 15.5% with Bi-LSTM (F1 score of 71.8%) over the best performing discrete method. Access to high quality data is crucial to building accurate and reliable drug repurposing prediction models. Our work suggests how the combination of crowds, experts, and machine learning methods can go hand-in-hand to improve datasets and predictive models.

Assuntos

Crowdsourcing , Aprendizado de Máquina , Reposicionamento de Medicamentos

6.

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.

McMurry, Julie A; Juty, Nick; Blomberg, Niklas; Burdett, Tony; Conlin, Tom; Conte, Nathalie; Courtot, Mélanie; Deck, John; Dumontier, Michel; Fellows, Donal K; Gonzalez-Beltran, Alejandra; Gormanns, Philipp; Grethe, Jeffrey; Hastings, Janna; Hériché, Jean-Karim; Hermjakob, Henning; Ison, Jon C; Jimenez, Rafael C; Jupp, Simon; Kunze, John; Laibe, Camille; Le Novère, Nicolas; Malone, James; Martin, Maria Jesus; McEntyre, Johanna R; Morris, Chris; Muilu, Juha; Müller, Wolfgang; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Sariyar, Murat; Snoep, Jacky L; Soiland-Reyes, Stian; Stanford, Natalie J; Swainston, Neil; Washington, Nicole; Williams, Alan R; Wimalaratne, Sarala M; Winfree, Lilly M; Wolstencroft, Katherine; Goble, Carole; Mungall, Christopher J; Haendel, Melissa A; Parkinson, Helen.

PLoS Biol ; 15(6): e2001414, 2017 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-28662064

RESUMO

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

Assuntos

Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Mineração de Dados/métodos , Design de Software , Software , Disciplinas das Ciências Biológicas/estatística & dados numéricos , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Mineração de Dados/estatística & dados numéricos , Mineração de Dados/tendências , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados Factuais/tendências , Previsões , Humanos , Internet

7.

Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings.

Celebi, Remzi; Uyar, Huseyin; Yasar, Erkan; Gumus, Ozgur; Dikenelli, Oguz; Dumontier, Michel.

BMC Bioinformatics ; 20(1): 726, 2019 Dec 18.

Artigo em Inglês | MEDLINE | ID: mdl-31852427

RESUMO

BACKGROUND: Current approaches to identifying drug-drug interactions (DDIs), include safety studies during drug development and post-marketing surveillance after approval, offer important opportunities to identify potential safety issues, but are unable to provide complete set of all possible DDIs. Thus, the drug discovery researchers and healthcare professionals might not be fully aware of potentially dangerous DDIs. Predicting potential drug-drug interaction helps reduce unanticipated drug interactions and drug development costs and optimizes the drug design process. Methods for prediction of DDIs have the tendency to report high accuracy but still have little impact on translational research due to systematic biases induced by networked/paired data. In this work, we aimed to present realistic evaluation settings to predict DDIs using knowledge graph embeddings. We propose a simple disjoint cross-validation scheme to evaluate drug-drug interaction predictions for the scenarios where the drugs have no known DDIs. RESULTS: We designed different evaluation settings to accurately assess the performance for predicting DDIs. The settings for disjoint cross-validation produced lower performance scores, as expected, but still were good at predicting the drug interactions. We have applied Logistic Regression, Naive Bayes and Random Forest on DrugBank knowledge graph with the 10-fold traditional cross validation using RDF2Vec, TransE and TransD. RDF2Vec with Skip-Gram generally surpasses other embedding methods. We also tested RDF2Vec on various drug knowledge graphs such as DrugBank, PharmGKB and KEGG to predict unknown drug-drug interactions. The performance was not enhanced significantly when an integrated knowledge graph including these three datasets was used. CONCLUSION: We showed that the knowledge embeddings are powerful predictors and comparable to current state-of-the-art methods for inferring new DDIs. We addressed the evaluation biases by introducing drug-wise and pairwise disjoint test classes. Although the performance scores for drug-wise and pairwise disjoint seem to be low, the results can be considered to be realistic in predicting the interactions for drugs with limited interaction information.

Assuntos

Interações Medicamentosas , Teorema de Bayes , Conhecimento , Modelos Logísticos , Reconhecimento Automatizado de Padrão

8.

Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths.

Zhang, Yijia; Zheng, Wei; Lin, Hongfei; Wang, Jian; Yang, Zhihao; Dumontier, Michel.

Bioinformatics ; 34(5): 828-835, 2018 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-29077847

RESUMO

Motivation: Adverse events resulting from drug-drug interactions (DDI) pose a serious health issue. The ability to automatically extract DDIs described in the biomedical literature could further efforts for ongoing pharmacovigilance. Most of neural networks-based methods typically focus on sentence sequence to identify these DDIs, however the shortest dependency path (SDP) between the two entities contains valuable syntactic and semantic information. Effectively exploiting such information may improve DDI extraction. Results: In this article, we present a hierarchical recurrent neural networks (RNNs)-based method to integrate the SDP and sentence sequence for DDI extraction task. Firstly, the sentence sequence is divided into three subsequences. Then, the bottom RNNs model is employed to learn the feature representation of the subsequences and SDP, and the top RNNs model is employed to learn the feature representation of both sentence sequence and SDP. Furthermore, we introduce the embedding attention mechanism to identify and enhance keywords for the DDI extraction task. We evaluate our approach using the DDI extraction 2013 corpus. Our method is competitive or superior in performance as compared with other state-of-the-art methods. Experimental results show that the sentence sequence and SDP are complementary to each other. Integrating the sentence sequence with SDP can effectively improve the DDI extraction performance. Availability and implementation: The experimental data is available at https://github.com/zhangyijia1979/hierarchical-RNNs-model-for-DDI-extraction. Contact: zhyj@dlut.edu.cn or michel.dumontier@maastrichtuniversity.nl. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Mineração de Dados/métodos , Interações Medicamentosas , Redes Neurais de Computação , Farmacovigilância , Humanos , Publicações

9.

The digital revolution in phenotyping.

Oellrich, Anika; Collier, Nigel; Groza, Tudor; Rebholz-Schuhmann, Dietrich; Shah, Nigam; Bodenreider, Olivier; Boland, Mary Regina; Georgiev, Ivo; Liu, Hongfang; Livingston, Kevin; Luna, Augustin; Mallon, Ann-Marie; Manda, Prashanti; Robinson, Peter N; Rustici, Gabriella; Simon, Michelle; Wang, Liqin; Winnenburg, Rainer; Dumontier, Michel.

Brief Bioinform ; 17(5): 819-30, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-26420780

RESUMO

Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.

Assuntos

Fenótipo , Humanos , Armazenamento e Recuperação da Informação , Projetos de Pesquisa , Pesquisa Translacional Biomédica

10.

Finding our way through phenotypes.

Deans, Andrew R; Lewis, Suzanna E; Huala, Eva; Anzaldo, Salvatore S; Ashburner, Michael; Balhoff, James P; Blackburn, David C; Blake, Judith A; Burleigh, J Gordon; Chanet, Bruno; Cooper, Laurel D; Courtot, Mélanie; Csösz, Sándor; Cui, Hong; Dahdul, Wasila; Das, Sandip; Dececchi, T Alexander; Dettai, Agnes; Diogo, Rui; Druzinsky, Robert E; Dumontier, Michel; Franz, Nico M; Friedrich, Frank; Gkoutos, George V; Haendel, Melissa; Harmon, Luke J; Hayamizu, Terry F; He, Yongqun; Hines, Heather M; Ibrahim, Nizar; Jackson, Laura M; Jaiswal, Pankaj; James-Zorn, Christina; Köhler, Sebastian; Lecointre, Guillaume; Lapp, Hilmar; Lawrence, Carolyn J; Le Novère, Nicolas; Lundberg, John G; Macklin, James; Mast, Austin R; Midford, Peter E; Mikó, István; Mungall, Christopher J; Oellrich, Anika; Osumi-Sutherland, David; Parkinson, Helen; Ramírez, Martín J; Richter, Stefan; Robinson, Peter N.

PLoS Biol ; 13(1): e1002033, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25562316

RESUMO

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.

Assuntos

Estudos de Associação Genética , Animais , Biologia Computacional , Curadoria de Dados , Bases de Dados Factuais/normas , Interação Gene-Ambiente , Genômica , Humanos , Fenótipo , Padrões de Referência , Reprodutibilidade dos Testes , Terminologia como Assunto

11.

Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.

Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel.

BMC Bioinformatics ; 18(1): 415, 2017 Sep 18.

Artigo em Inglês | MEDLINE | ID: mdl-28923003

RESUMO

BACKGROUND: The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. METHODS: In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. RESULTS: Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). CONCLUSION: Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.

Assuntos

Algoritmos , Metadados/normas , Análise por Conglomerados , Confiabilidade dos Dados

12.

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier.

J Biomed Inform ; 72: 132-139, 2017 08.

Artigo em Inglês | MEDLINE | ID: mdl-28625880

RESUMO

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.

Assuntos

Algoritmos , Bases de Dados Genéticas , Expressão Gênica , Metadados , Humanos , Estudos Prospectivos , Estudos Retrospectivos

13.

Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop.

Jagodnik, Kathleen M; Koplev, Simon; Jenkins, Sherry L; Ohno-Machado, Lucila; Paten, Benedict; Schurer, Stephan C; Dumontier, Michel; Verborgh, Ruben; Bui, Alex; Ping, Peipei; McKenna, Neil J; Madduri, Ravi; Pillai, Ajay; Ma'ayan, Avi.

J Biomed Inform ; 71: 49-57, 2017 07.

Artigo em Inglês | MEDLINE | ID: mdl-28501646

RESUMO

The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.

Assuntos

Conjuntos de Dados como Assunto , Disseminação de Informação , National Institutes of Health (U.S.) , Pesquisa Biomédica , Humanos , Conhecimento , Pesquisa Translacional Biomédica , Estados Unidos

14.

SPARQL-enabled identifier conversion with Identifiers.org.

Wimalaratne, Sarala M; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille.

Bioinformatics ; 31(11): 1875-7, 2015 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-25638809

RESUMO

MOTIVATION: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. RESULTS: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. AVAILABILITY AND IMPLEMENTATION: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql.

Assuntos

Bases de Dados Factuais , Disciplinas das Ciências Biológicas , Internet , Semântica , Integração de Sistemas

15.

Finding the Evidence Base Using Citation Networks: Do 300 to 400 US Physicians Die by Suicide Annually?

Leung, Tiffany I; Pendharkar, Sima; Chen, Chwen-Yuen Angie; Dumontier, Michel.

J Gen Intern Med ; 36(4): 1129-1131, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-32462565

Assuntos

Médicos , Prevenção do Suicídio , Humanos

16.

Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A.

J Biomed Inform ; 60: 199-209, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26873781

RESUMO

Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

Assuntos

Crowdsourcing/métodos , Ontologia Genética , Systematized Nomenclature of Medicine , Algoritmos , Análise de Variância , Área Sob a Curva , Biologia Computacional/métodos , Humanos , Internet , Ferramenta de Busca , Software , Análise e Desempenho de Tarefas

17.

An evidence-based approach to identify aging-related genes in Caenorhabditis elegans.

Callahan, Alison; Cifuentes, Juan José; Dumontier, Michel.

BMC Bioinformatics ; 16: 40, 2015 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-25888240

RESUMO

BACKGROUND: Extensive studies have been carried out on Caenorhabditis elegans as a model organism to elucidate mechanisms of aging and the effects of perturbing known aging-related genes on lifespan and behavior. This research has generated large amounts of experimental data that is increasingly difficult to integrate and analyze with existing databases and domain knowledge. To address this challenge, we demonstrate a scalable and effective approach for automatic evidence gathering and evaluation that leverages existing experimental data and literature-curated facts to identify genes involved in aging and lifespan regulation in C. elegans. RESULTS: We developed a semantic knowledge base for aging by integrating data about C. elegans genes from WormBase with data about 2005 human and model organism genes from GenAge and 149 genes from GenDR, and with the Bio2RDF network of linked data for the life sciences. Using HyQue (a Semantic Web tool for hypothesis-based querying and evaluation) to interrogate this knowledge base, we examined 48,231 C. elegans genes for their role in modulating lifespan and aging. HyQue identified 24 novel but well-supported candidate aging-related genes for further experimental validation. CONCLUSIONS: We use semantic technologies to discover candidate aging genes whose effects on lifespan are not yet well understood. Our customized HyQue system, the aging research knowledge base it operates over, and HyQue evaluations of all C. elegans genes are freely available at http://hyque.semanticscience.org .

Assuntos

Envelhecimento/genética , Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Biologia Computacional/métodos , Bases de Dados Factuais , Software , Animais , Perfilação da Expressão Gênica , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Armazenamento e Recuperação da Informação

18.

Analysis of In Vitro Aptamer Selection Parameters.

McKeague, Maureen; McConnell, Erin M; Cruz-Toledo, Jose; Bernard, Elyse D; Pach, Amanda; Mastronardi, Emily; Zhang, Xueru; Beking, Michael; Francis, Tariq; Giamberardino, Amanda; Cabecinha, Ashley; Ruscito, Annamaria; Aranda-Rodriguez, Rocio; Dumontier, Michel; DeRosa, Maria C.

J Mol Evol ; 81(5-6): 150-61, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26530075

RESUMO

Nucleic acid aptamers are novel molecular recognition tools that offer many advantages compared to their antibody and peptide-based counterparts. However, challenges associated with in vitro selection, characterization, and validation have limited their wide-spread use in the fields of diagnostics and therapeutics. Here, we extracted detailed information about aptamer selection experiments housed in the Aptamer Base, spanning over two decades, to perform the first parameter analysis of conditions used to identify and isolate aptamers de novo. We used information from 492 published SELEX experiments and studied the relationships between the nucleic acid library, target choice, selection methods, experimental conditions, and the affinity of the resulting aptamer candidates. Our findings highlight that the choice of target and selection template made the largest and most significant impact on the success of a de novo aptamer selection. Our results further emphasize the need for improved documentation and more thorough experimentation of SELEX criteria to determine their correlation with SELEX success.

Assuntos

Aptâmeros de Nucleotídeos , Técnica de Seleção de Aptâmeros/métodos

19.

Evaluation of research in biomedical ontologies.

Hoehndorf, Robert; Dumontier, Michel; Gkoutos, Georgios V.

Brief Bioinform ; 14(6): 696-712, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-22962340

RESUMO

Ontologies are now pervasive in biomedicine, where they serve as a means to standardize terminology, to enable access to domain knowledge, to verify data consistency and to facilitate integrative analyses over heterogeneous biomedical data. For this purpose, research on biomedical ontologies applies theories and methods from diverse disciplines such as information management, knowledge representation, cognitive science, linguistics and philosophy. Depending on the desired applications in which ontologies are being applied, the evaluation of research in biomedical ontologies must follow different strategies. Here, we provide a classification of research problems in which ontologies are being applied, focusing on the use of ontologies in basic and translational research, and we demonstrate how research results in biomedical ontologies can be evaluated. The evaluation strategies depend on the desired application and measure the success of using an ontology for a particular biomedical problem. For many applications, the success can be quantified, thereby facilitating the objective evaluation and comparison of research in biomedical ontology. The objective, quantifiable comparison of research results based on scientific applications opens up the possibility for systematically improving the utility of ontologies in biomedical research.

Assuntos

Ontologias Biológicas , Pesquisa Biomédica

20.

Mouse model phenotypes provide information about human drug targets.

Hoehndorf, Robert; Hiebert, Tanya; Hardy, Nigel W; Schofield, Paul N; Gkoutos, Georgios V; Dumontier, Michel.

Bioinformatics ; 30(5): 719-25, 2014 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-24158600

RESUMO

MOTIVATION: Methods for computational drug target identification use information from diverse information sources to predict or prioritize drug targets for known drugs. One set of resources that has been relatively neglected for drug repurposing is animal model phenotype. RESULTS: We investigate the use of mouse model phenotypes for drug target identification. To achieve this goal, we first integrate mouse model phenotypes and drug effects, and then systematically compare the phenotypic similarity between mouse models and drug effect profiles. We find a high similarity between phenotypes resulting from loss-of-function mutations and drug effects resulting from the inhibition of a protein through a drug action, and demonstrate how this approach can be used to suggest candidate drug targets. AVAILABILITY AND IMPLEMENTATION: Analysis code and supplementary data files are available on the project Web site at https://drugeffects.googlecode.com.

Assuntos

Reposicionamento de Medicamentos/métodos , Fenótipo , Proteínas/antagonistas & inibidores , Animais , Inibidores de Ciclo-Oxigenase 2/farmacologia , Diclofenaco/farmacologia , Humanos , Camundongos , Camundongos Knockout , Modelos Animais , Proteínas/classificação , Proteínas/efeitos dos fármacos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa