Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 213, 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38872097

RESUMO

BACKGROUND: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .


Assuntos
Benchmarking , Benchmarking/métodos , Algoritmos , Pesquisa Biomédica/métodos , Software , Aprendizado de Máquina , Bases de Dados Factuais , Biologia Computacional/métodos , Semântica
2.
BMC Med Inform Decis Mak ; 24(Suppl 2): 114, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38689287

RESUMO

BACKGROUND: Traditional literature based discovery is based on connecting knowledge pairs extracted from separate publications via a common mid point to derive previously unseen knowledge pairs. To avoid the over generation often associated with this approach, we explore an alternative method based on word evolution. Word evolution examines the changing contexts of a word to identify changes in its meaning or associations. We investigate the possibility of using changing word contexts to detect drugs suitable for repurposing. RESULTS: Word embeddings, which represent a word's context, are constructed from chronologically ordered publications in MEDLINE at bi-monthly intervals, yielding a time series of word embeddings for each word. Focusing on clinical drugs only, any drugs repurposed in the final time segment of the time series are annotated as positive examples. The decision regarding the drug's repurposing is based either on the Unified Medical Language System (UMLS), or semantic triples extracted using SemRep from MEDLINE. CONCLUSIONS: The annotated data allows deep learning classification, with a 5-fold cross validation, to be performed and multiple architectures to be explored. Performance of 65% using UMLS labels, and 81% using SemRep labels is attained, indicating the technique's suitability for the detection of candidate drugs for repurposing. The investigation also shows that different architectures are linked to the quantities of training data available and therefore that different models should be trained for every annotation approach.


Assuntos
Reposicionamento de Medicamentos , Humanos , Unified Medical Language System , MEDLINE , Aprendizado Profundo , Processamento de Linguagem Natural , Semântica
3.
BMC Bioinformatics ; 23(Suppl 9): 570, 2023 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-36918777

RESUMO

BACKGROUND: Automatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of [Formula: see text] and [Formula: see text] relations can be simply connected to deduce [Formula: see text]. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject[Formula: see text](predicate)[Formula: see text]object triples as the [Formula: see text] relations, but too many proposed connections remain for manual verification. RESULTS: Based on the hypothesis that only a small number of subject-predicate-object triples extracted from a publication represent the paper's novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset-making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper-to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards. CONCLUSIONS: The quantity of proposed knowledge pairs is reduced by a factor of [Formula: see text], and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.


Assuntos
Descoberta do Conhecimento , Conhecimento
4.
BMC Bioinformatics ; 24(1): 412, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37915001

RESUMO

BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.


Assuntos
Algoritmos , Neoplasias , Humanos , PubMed , Conhecimento , Descoberta do Conhecimento
5.
J Biomed Inform ; 140: 104341, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36933632

RESUMO

BACKGROUND: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS: We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS: The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION: NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.


Assuntos
Ontologias Biológicas , Produtos Biológicos , Reconhecimento Automatizado de Padrão , Interações Medicamentosas , Semântica , Preparações Farmacêuticas
6.
J Biomed Inform ; 145: 104474, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37572825

RESUMO

Inferring knowledge from known relationships between drugs, proteins, genes, and diseases has great potential for clinical impact, such as predicting which existing drugs could be repurposed to treat rare diseases. Incorporating key biological context such as cell type or tissue of action into representations of extracted biomedical knowledge is essential for principled pharmacological discovery. Existing global, literature-derived knowledge graphs of interactions between drugs, proteins, genes, and diseases lack this essential information. In this study, we frame the task of associating biological context with protein-protein interactions extracted from text as a classification task using syntactic, semantic, and novel meta-discourse features. We introduce the Insider corpora, which are automatically generated PubMed-scale corpora for training classifiers for the context association task. These corpora are created by searching for precise syntactic cues of cell type and tissue relevancy to extracted regulatory relations. We report F1 scores of 0.955 and 0.862 for identifying relevant cell types and tissues, respectively, for our identified relations. By classifying with this framework, we demonstrate that the problem of context association can be addressed using intuitive, interpretable features. We demonstrate the potential of this approach to enrich text-derived knowledge bases with biological detail by incorporating cell type context into a protein-protein network for dengue fever.


Assuntos
Mineração de Dados , Bases de Conhecimento , Humanos , PubMed , Doenças Raras
7.
J Biomed Inform ; 145: 104464, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37541406

RESUMO

OBJECTIVE: We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS: We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS: We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION: Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY: Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.


Assuntos
Doença de Alzheimer , Descoberta do Conhecimento , Humanos , Descoberta do Conhecimento/métodos , Doença de Alzheimer/diagnóstico , Redes Neurais de Computação , Aprendizagem , Fenótipo
8.
J Biomed Inform ; 142: 104383, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37196989

RESUMO

OBJECTIVE: To demonstrate and develop an approach enabling individual researchers or small teams to create their own ad-hoc, lightweight knowledge bases tailored for specialized scientific interests, using text-mining over scientific literature, and demonstrate the effectiveness of these knowledge bases in hypothesis generation and literature-based discovery (LBD). METHODS: We propose a lightweight process using an extractive search framework to create ad-hoc knowledge bases, which require minimal training and no background in bio-curation or computer science. These knowledge bases are particularly effective for LBD and hypothesis generation using Swanson's ABC method. The personalized nature of the knowledge bases allows for a somewhat higher level of noise than "public facing" ones, as researchers are expected to have prior domain experience to separate signal from noise. Fact verification is shifted from exhaustive verification of the knowledge base to post-hoc verification of specific entries of interest, allowing researchers to assess the correctness of relevant knowledge base entries by considering the paragraphs in which the facts were introduced. RESULTS: We demonstrate the methodology by constructing several knowledge bases of different kinds: three knowledge bases that support lab-internal hypothesis generation: Drug Delivery to Ovarian Tumors (DDOT); Tissue Engineering and Regeneration; Challenges in Cancer Research; and an additional comprehensive, accurate knowledge base designated as a public resource for the wider community on the topic of Cell Specific Drug Delivery (CSDD). In each case, we show the design and construction process, along with relevant visualizations for data exploration, and hypothesis generation. For CSDD and DDOT we also show meta-analysis, human evaluation, and in vitro experimental evaluation. CONCLUSION: Our approach enables researchers to create personalized, lightweight knowledge bases for specialized scientific interests, effectively facilitating hypothesis generation and literature-based discovery (LBD). By shifting fact verification efforts to post-hoc verification of specific entries, researchers can focus on exploring and generating hypotheses based on their expertise. The constructed knowledge bases demonstrate the versatility and adaptability of our approach to versatile research interests. The web-based platform, available at https://spike-kbc.apps.allenai.org, provides researchers with a valuable tool for rapid construction of knowledge bases tailored to their needs.


Assuntos
Mineração de Dados , Descoberta do Conhecimento , Humanos , Mineração de Dados/métodos , Descoberta do Conhecimento/métodos , Publicações
9.
J Biomed Inform ; 143: 104362, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37146741

RESUMO

Scientific literature presents a wealth of information yet to be explored. As the number of researchers increase with each passing year and publications are released, this contributes to an era where specialized fields of research are becoming more prevalent. As this trend continues, this further propagates the separation of interdisciplinary publications and makes keeping up to date with literature a laborious task. Literature-based discovery (LBD) aims to mitigate these concerns by promoting information sharing among non-interacting literature while extracting potentially meaningful information. Furthermore, recent advances in neural network architectures and data representation techniques have fueled their respective research communities in achieving state-of-the-art performance in many downstream tasks. However, studies of neural network-based methods for LBD remain to be explored. We introduce and explore a deep learning neural network-based approach for LBD. Additionally, we investigate various approaches to represent terms as concepts and analyze the affect of feature scaling representations into our model. We compare the evaluation performance of our method on five hallmarks of cancer datasets utilized for closed discovery. Our results show the chosen representation as input into our model affects evaluation performance. We found feature scaling our input representations increases evaluation performance and decreases the necessary number of epochs needed to achieve model generalization. We also explore two approaches to represent model output. We found reducing the model's output to capturing a subset of concepts improved evaluation performance at the cost of model generalizability. We also compare the efficacy of our method on the five hallmarks of cancer datasets to a set of randomly chosen relations between concepts. We found these experiments confirm our method's suitability for LBD.


Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Redes Neurais de Computação , Descoberta do Conhecimento/métodos , Publicações
10.
Synthese ; 201(1): 24, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36643731

RESUMO

An important part of research is situating one's work in a body of existing literature, thereby connecting to existing ideas. Despite this, the various kinds of relationships that might exist among academic literature do not appear to have been formally studied. Here I present a graphical representation of academic work in terms of entities and relations, drawing on structure-mapping theory (used in the study of analogies). I then use this representation to present a typology of operations that could relate two pieces of academic work. I illustrate the various types of relationships with examples from medicine, physics, psychology, history and philosophy of science, machine learning, education, and neuroscience. The resulting typology not only gives insights into the relationships that might exist between static publications, but also the rich process whereby an ongoing research project evolves through interactions with the research literature.

11.
J Biomed Inform ; 115: 103696, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33571675

RESUMO

OBJECTIVE: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. METHODS: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from PubMed and other COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative and accurate subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant. We used this subset to construct a knowledge graph, and applied five state-of-the-art, neural knowledge graph completion algorithms (i.e., TransE, RotatE, DistMult, ComplEx, and STELP) to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. RESULTS: Accuracy classifier based on PubMedBERT achieved the best performance (F1 = 0.854) in identifying accurate semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1 = 0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as others that have not yet been studied. Discovery patterns enabled identification of additional candidate drugs and generation of plausible hypotheses regarding the links between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (i.e., paclitaxel, SB 203580, alpha 2-antiplasmin, metoclopramide, and oxymatrine) and the mechanistic explanations for their potential use are further discussed. CONCLUSION: We showed that a LBD approach can be feasible not only for discovering drug candidates for COVID-19, but also for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions. Source code and data are available at https://github.com/kilicogluh/lbd-covid.


Assuntos
Tratamento Farmacológico da COVID-19 , Reposicionamento de Medicamentos , Descoberta do Conhecimento , Algoritmos , Antivirais/uso terapêutico , COVID-19/virologia , Humanos , Redes Neurais de Computação , SARS-CoV-2/isolamento & purificação
12.
BMC Bioinformatics ; 20(1): 425, 2019 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-31416434

RESUMO

BACKGROUND: Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed, making automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures of Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and compare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and direct co-occurrence vector cosine. Our proposed indirect association measures extend traditional association measures to quantify indirect rather than direct associations while preserving valuable statistical properties. RESULTS: We perform a comparison between several different hypothesis ranking methods for LBD, and compare them against our proposed indirect association measures. We intrinsically evaluate each method's performance using its ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each method's ability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another time-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking term pairs and applying a threshold at each rank. CONCLUSIONS: Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of biases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the best suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from.


Assuntos
Descoberta do Conhecimento , Modelos Teóricos , Área Sob a Curva , Bases de Dados como Assunto , Humanos , Curva ROC , Semântica , Estatísticas não Paramétricas
13.
BMC Bioinformatics ; 20(Suppl 10): 251, 2019 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-31138105

RESUMO

BACKGROUND: The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches. RESULTS: Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction. CONCLUSION: We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result.


Assuntos
Conhecimento , Algoritmos , Automação , Humanos , Descoberta do Conhecimento , Publicações , Semântica
14.
J Biomed Inform ; 93: 103141, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30857950

RESUMO

Literature Based Discovery (LBD) refers to the problem of inferring new and interesting knowledge by logically connecting independent fragments of information units through explicit or implicit means. This area of research, which incorporates techniques from Natural Language Processing (NLP), Information Retrieval and Artificial Intelligence, has significant potential to reduce discovery time in biomedical research fields. Formally introduced in 1986, LBD has grown to be a significant and a core task for text mining practitioners in the biomedical domain. Together with its inter-disciplinary nature, this has led researchers across domains to contribute in advancing this field of study. This survey attempts to consolidate and present the evolution of techniques in this area. We cover a variety of techniques and provide a detailed description of the problem setting, the intuition, the advantages and limitations of various influential papers. We also list the current bottlenecks in this field and provide a general direction of research activities for the future. In an effort to be comprehensive and for ease of reference for off-the-shelf users, we also list many publicly available tools for LBD. We hope this survey will act as a guide to both academic and industry (bio)-informaticians, introduce the various methodologies currently employed and also the challenges yet to be tackled.


Assuntos
Descoberta do Conhecimento , Processamento de Linguagem Natural , Mineração de Dados/métodos , Humanos , Inquéritos e Questionários
15.
BMC Med Inform Decis Mak ; 19(Suppl 2): 59, 2019 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-30961599

RESUMO

BACKGROUND: Drug development is an expensive and time-consuming process. Literature-based discovery has played a critical role in drug development and may be a supplementary method to help scientists speed up the discovery of drugs. METHODS: Here, we propose a relation path features embedding based convolutional neural network model with attention mechanism for drug discovery from literature, which we denote as PACNN. First, we use predications from biomedical abstracts to construct a biomedical knowledge graph, and then apply a path ranking algorithm to extract drug-disease relation path features on the biomedical knowledge graph. After that, we use these drug-disease relation features to train a convolutional neural network model which combined with the attention mechanism. Finally, we employ the trained models to mine drugs for treating diseases. RESULTS: The experiment shows that the proposed model achieved promising results, comparing to several random walk algorithms. CONCLUSIONS: In this paper, we propose a relation path features embedding based convolutional neural network with attention mechanism for discovering potential drugs from literature. Our method could be an auxiliary method for drug discovery, which can speed up the discovery of new drugs for the incurable diseases.


Assuntos
Descoberta de Drogas , Bases de Conhecimento , Redes Neurais de Computação , Algoritmos , Humanos , Projetos de Pesquisa
16.
Radiol Med ; 124(6): 495-504, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-30725395

RESUMO

INTRODUCTION: In the last decade, several journal's editors decided to publish alternative bibliometric indices parallel to the impact factor (IF): Scimago Journal Rank (SJR), Source Normalized Impact per Paper (SNIP), Eigenfactor Score (ES) and CiteScore™ (CiteScore); however, there is scarce information about the correlations among them. In this study, we aimed to evaluate the associations between this bibliometrics in the Radiology, Nuclear Medicine & Medical Imaging category of the Web of Knowledge. We hypothesized the IF did not show the best correlation with other metrics. METHODS: Retrospective study. We used bibliometrics recorded from the 2017 publicly available versions of the Journal Citation Reports (JCR), SJR ( www.scimagojr.com ), SNIP ( www.journalindicators.com ), and CiteScore ( www.scopus.com ); we also included the Total Cites. We measured the correlations using the Spearman correlation coefficients (RS) for all combinations of the bivariate pair, performed pairwise comparisons of the RS values, and calculated the coefficients of determination. We also tested the statistical significance of the difference between r coefficients between groups. All analyses were conducted with the JMP Pro software. RESULTS: The stronger bivariate correlations were represented by the ES↔Total Cites RS = 0.968, p < 0.001, R2 = 0.937; and the CiteScore↔SJR RS = 0.911, p < 0.001, R2 = 0.829. From 105 possible combinations of pairwise comparisons, 38 depicted a p value > 0.050 which would suggest interchangeability among bivariate correlations. CONCLUSIONS: Our findings support our hypothesis that the IF does not show the best correlation between other metrics. Radiologists, interventional radiologist, or nuclear medicine doctors should have a clear understanding of the associations among the journal's bibliometrics for their decision-making during the manuscript submission phase.


Assuntos
Bibliometria , Diagnóstico por Imagem , Medicina Nuclear , Radiologia , Humanos , Fator de Impacto de Revistas , Modelos Estatísticos , Estudos Retrospectivos
17.
BMC Bioinformatics ; 19(1): 193, 2018 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-29843590

RESUMO

BACKGROUND: Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. METHODS: Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. RESULTS: The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. CONCLUSIONS: In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.


Assuntos
Mineração de Dados/métodos , Descoberta de Drogas/métodos , Tratamento Farmacológico , Humanos , Bases de Conhecimento , Modelos Logísticos , Publicações
18.
BMC Bioinformatics ; 19(1): 176, 2018 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-29783926

RESUMO

BACKGROUND: Link prediction in biomedical graphs has several important applications including predicting Drug-Target Interactions (DTI), Protein-Protein Interaction (PPI) prediction and Literature-Based Discovery (LBD). It can be done using a classifier to output the probability of link formation between nodes. Recently several works have used neural networks to create node representations which allow rich inputs to neural classifiers. Preliminary works were done on this and report promising results. However they did not use realistic settings like time-slicing, evaluate performances with comprehensive metrics or explain when or why neural network methods outperform. We investigated how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes (∼ 6 million edges) containing information relevant to DTI, PPI and LBD. We compared the performance of the neural link predictor to those of established baselines and report performance across five metrics. RESULTS: In random- and time-sliced experiments when the neural network methods were able to learn good node representations and there was a negligible amount of disconnected nodes, those approaches outperformed the baselines. In the smallest graph (∼ 15,000 edges) and in larger graphs with approximately 14% disconnected nodes, baselines such as Common Neighbours proved a justifiable choice for link prediction. At low recall levels (∼ 0.3) the approaches were mostly equal, but at higher recall levels across all nodes and average performance at individual nodes, neural network approaches were superior. Analysis showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links. Additionally, while neural network methods benefit from large amounts of data, they require considerable amounts of computational resources to utilise them. CONCLUSIONS: Our results indicate that when there is enough data for the neural network methods to use and there are a negligible amount of disconnected nodes, those approaches outperform the baselines. At low recall levels the approaches are mostly equal but at higher recall levels and average performance at individual nodes, neural network approaches are superior. Performance at nodes without common neighbours which indicate more unexpected and perhaps more useful links account for this.


Assuntos
Redes Neurais de Computação , Algoritmos , Descoberta de Drogas , Descoberta do Conhecimento , Mapeamento de Interação de Proteínas
19.
Rev Panam Salud Publica ; 42: e35, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31093064

RESUMO

OBJECTIVE: To determine the level of stability or change in topic areas published by public health journals in Latin America and the Caribbean, using keywords and co-word analysis, in order to support evidence-based research planning. METHODS: Keywords were extracted from papers indexed in Scopus® that were published by the Revista de Salud Pública (RSP; Colombia), the Salud Pública de México (SPM; Mexico), and the Revista Peruana de Medicina Experimental y Salud Pública (RPMESP; Peru) for three periods: 2005 - 2007, 2008 - 2010, and 2011 - 2013. Co-word analysis was used to examine keywords extracted. Textual information was analyzed using centrality measures (inbetweenness and closeness). The hypothesis of stability/change of thematic coverage was tested using the Spearman's rho correlation coefficient. VOSviewer was used to visualize the co-word maps. RESULTS: A moderate level of change in thematic coverage was observed in 2005 - 2010, as evidenced by the correlation coefficients for two of the 3-year periods, 2005 - 2007 and 2008 - 2010: 0.545 for RSP and 0.593 for SPM. However, in 2008 - 2013, more keywords remained constant from one period to the next, given the size of the correlation coefficients for the last 3-year periods: 2008 - 2010 and 2011 - 2013: 0.727 for RSP and 0.605 for SPM. CONCLUSION: The research hypothesis was partially accepted given that just two consecutive 3-year periods showed a statistically-significant degree of stability in thematic coverage in public health studies. In that sense, this study provides compelling evidence of the effectiveness of using a combined approach for examining the dynamics of thematic coverage: centrality measures for identifying the main keywords and visual inspection for detecting the structure of textual information.

20.
BMC Bioinformatics ; 18(Suppl 7): 249, 2017 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-28617217

RESUMO

BACKGROUND: Literature based discovery (LBD) automatically infers missed connections between concepts in literature. It is often assumed that LBD generates more information than can be reasonably examined. METHODS: We present a detailed analysis of the quantity of hidden knowledge produced by an LBD system and the effect of various filtering approaches upon this. The investigation of filtering combined with single or multi-step linking term chains is carried out on all articles in PubMed. RESULTS: The evaluation is carried out using both replication of existing discoveries, which provides justification for multi-step linking chain knowledge in specific cases, and using timeslicing, which gives a large scale measure of performance. CONCLUSIONS: While the quantity of hidden knowledge generated by LBD can be vast, we demonstrate that (a) intelligent filtering can greatly reduce the number of hidden knowledge pairs generated, (b) for a specific term, the number of single step connections can be manageable, and


Assuntos
Mineração de Dados , Algoritmos , Humanos , Conhecimento , Descoberta do Conhecimento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA