Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 77
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Brief Bioinform ; 24(5)2023 09 20.
Article in English | MEDLINE | ID: mdl-37594310

ABSTRACT

Omics data from clinical samples are the predominant source of target discovery and drug development. Typically, hundreds or thousands of differentially expressed genes or proteins can be identified from omics data. This scale of possibilities is overwhelming for target discovery and validation using biochemical or cellular experiments. Most of these proteins and genes have no corresponding drugs or even active compounds. Moreover, a proportion of them may have been previously reported as being relevant to the disease of interest. To facilitate translational drug discovery from omics data, we have developed a new classification tool named Omics and Text driven Translational Medicine (OTTM). This tool can markedly narrow the range of proteins or genes that merit further validation via drug availability assessment and literature mining. For the 4489 candidate proteins identified in our previous proteomics study, OTTM recommended 40 FDA-approved or clinical trial drugs. Of these, 15 are available commercially and were tested on hepatocellular carcinoma Hep-G2 cells. Two drugs-tafenoquine succinate (an FDA-approved antimalarial drug targeting CYC1) and branaplam (a Phase 3 clinical drug targeting SMN1 for the treatment of spinal muscular atrophy)-showed potent inhibitory activity against Hep-G2 cell viability, suggesting that CYC1 and SMN1 may be potential therapeutic target proteins for hepatocellular carcinoma. In summary, OTTM is an efficient classification tool that can accelerate the discovery of effective drugs and targets using thousands of candidate proteins identified from omics data. The online and local versions of OTTM are available at http://otter-simm.com/ottm.html.


Subject(s)
Carcinoma, Hepatocellular , Liver Neoplasms , Humans , Translational Science, Biomedical , Proteomics , Drug Discovery
2.
Genet Med ; 26(4): 101083, 2024 04.
Article in English | MEDLINE | ID: mdl-38281099

ABSTRACT

PURPOSE: The American College of Medical Genetics and Genomics and the Association for Molecular Pathology have outlined a schema that allows for systematic classification of variant pathogenicity. Although gnomAD is generally accepted as a reliable source of population frequency data and ClinGen has provided guidance on the utility of specific bioinformatic predictors, there is no consensus source for identifying publications relevant to a variant. Multiple tools are available to aid in the identification of relevant variant literature, including manually curated databases and literature search engines. We set out to determine the utility of 4 literature mining tools used for ascertainment to inform the discussion of the use of these tools. METHODS: Four literature mining tools including the Human Gene Mutation Database, Mastermind, ClinVar, and LitVar 2.0 were used to identify relevant variant literature for 50 RYR1 variants. Sensitivity and precision were determined for each tool. RESULTS: Sensitivity among the 4 tools ranged from 0.332 to 0.687. Precision ranged from 0.389 to 0.906. No single tool retrieved all relevant publications. CONCLUSION: At the current time, the use of multiple tools is necessary to completely identify the literature relevant to curate a variant.


Subject(s)
Data Mining , Genetic Variation , Ryanodine Receptor Calcium Release Channel , Humans , Gene Frequency , Genetic Testing , Genetic Variation/genetics , Mutation , Ryanodine Receptor Calcium Release Channel/genetics
3.
Int Ophthalmol ; 44(1): 244, 2024 Jun 21.
Article in English | MEDLINE | ID: mdl-38904678

ABSTRACT

OBJECTIVE: Keratoconus (KC) is a condition characterized by progressive corneal steepening and thinning. However, its pathophysiological mechanism remains vague. We mainly performed literature mining to extract bioinformatic and related data on KC at the RNA level. The objective of this study was to explore the potential pathological mechanisms of KC by identifying hub genes and key molecular pathways at the RNA level. METHODS: We performed an exhaustive search of the PubMed database and identified studies that pertained to gene transcripts derived from diverse corneal layers in patients with KC. The identified differentially expressed genes were intersected, and overlapping genes were extracted for further analyses. Significantly enriched genes were screened using "Gene Ontology" (GO) and "Kyoto Encyclopedia of Genes and Genomes" (KEGG) analysis with the "Database for Annotation, Visualization, and Integrated Discovery" (DAVID) database. A protein-protein interaction (PPI) network was constructed for the significantly enriched genes using the STRING database. The PPI network was visualized using the Cytoscape software, and hub genes were screened via betweenness centrality values. Pathways that play a critical role in the pathophysiology of KC were discovered using the GO and KEGG analyses of the hub genes. RESULTS: 68 overlapping genes were obtained. Fifty genes were significantly enriched in 67 biological processes, and 16 genes were identified in 7 KEGG pathways. Moreover, 14 nodes and 32 edges were identified via the PPI network constructed using the STRING database. Multiple analyses identified 4 hub genes, 12 enriched biological processes, and 6 KEGG pathways. GO enrichment analysis showed that the hub genes are mainly involved in the positive regulation of apoptotic process, and KEGG analysis showed that the hub genes are primarily associated with the interleukin-17 (IL-17) and tumor necrosis factor (TNF) pathways. Overall, the matrix metalloproteinase 9, IL-6, estrogen receptor 1, and prostaglandin-endoperoxide synthase 2 were the potential important genes associated with KC. CONCLUSION: Four genes, matrix metalloproteinase 9, IL-6, estrogen receptor 1, and prostaglandin endoperoxide synthase 2, as well as IL-17 and TNF pathways, are critical in the development of KC. Inflammation and apoptosis may contribute to the pathogenesis of KC.


Subject(s)
Computational Biology , Data Mining , Gene Regulatory Networks , Keratoconus , Keratoconus/genetics , Keratoconus/metabolism , Keratoconus/diagnosis , Humans , Computational Biology/methods , Data Mining/methods , Protein Interaction Maps/genetics , Gene Expression Profiling/methods , RNA/genetics , Gene Expression Regulation , Gene Ontology , Databases, Genetic
4.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32422651

ABSTRACT

The recent years have witnessed a rapid increase in the number of scientific articles in biomedical domain. These literature are mostly available and readily accessible in electronic format. The domain knowledge hidden in them is critical for biomedical research and applications, which makes biomedical literature mining (BLM) techniques highly demanding. Numerous efforts have been made on this topic from both biomedical informatics (BMI) and computer science (CS) communities. The BMI community focuses more on the concrete application problems and thus prefer more interpretable and descriptive methods, while the CS community chases more on superior performance and generalization ability, thus more sophisticated and universal models are developed. The goal of this paper is to provide a review of the recent advances in BLM from both communities and inspire new research directions.


Subject(s)
Biomedical Research , Data Mining/methods , Publishing , Algorithms , Medical Informatics
5.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32770181

ABSTRACT

MOTIVATION: To obtain key information for personalized medicine and cancer research, clinicians and researchers in the biomedical field are in great need of searching genomic variant information from the biomedical literature now than ever before. Due to the various written forms of genomic variants, however, it is difficult to locate the right information from the literature when using a general literature search system. To address the difficulty of locating genomic variant information from the literature, researchers have suggested various solutions based on automated literature-mining techniques. There is, however, no study for summarizing and comparing existing tools for genomic variant literature mining in terms of how to search easily for information in the literature on genomic variants. RESULTS: In this article, we systematically compared currently available genomic variant recognition and normalization tools as well as the literature search engines that adopted these literature-mining techniques. First, we explain the problems that are caused by the use of non-standard formats of genomic variants in the PubMed literature by considering examples from the literature and show the prevalence of the problem. Second, we review literature-mining tools that address the problem by recognizing and normalizing the various forms of genomic variants in the literature and systematically compare them. Third, we present and compare existing literature search engines that are designed for a genomic variant search by using the literature-mining techniques. We expect this work to be helpful for researchers who seek information about genomic variants from the literature, developers who integrate genomic variant information from the literature and beyond.


Subject(s)
Data Mining , Genetic Variation , Precision Medicine , Search Engine , PubMed , Publications
6.
Mycopathologia ; 188(3): 183-202, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36976442

ABSTRACT

Dermatophytosis is one of the most common superficial infections of the skin affecting nearly one-fifth of the world population at any given time. With nearly 30% of worldwide terbinafine-resistance cases in Trichophyton mentagrophytes/Trichophyton interdigitale and Trichophyton rubrum reported from India in recent years, there is a significant burden of the emerging drug resistance epidemic on India. Here, we carry out a comprehensive retrospective analysis of dermatophytosis in India using 1038 research articles pertaining to 161,245 cases reported from 1939 to 2021. We find that dermatophytosis is prevalent in all parts of the country despite variable climatic conditions in different regions. Our results show T. rubrum as the most prevalent until 2015, with a sudden change in dermatophyte spectrum towards T. mentagrophytes/T. interdigitale complex since then. We also carried out an 18S rRNA-based phylogenetics and an average nucleotide identity-and single nucleotide polymorphism-based analysis of available whole genomes and find very high relatedness among the prevalent dermatophytes, suggesting geographic specificity. The comprehensive epidemiological and phylogenomics analysis of dermatophytosis in India over the last 80 years, presented here, would help in region-specific prevention, control and treatment of dermatophyte infections, especially considering the large number of emerging resistance cases.


Subject(s)
Arthrodermataceae , Tinea , Humans , Arthrodermataceae/genetics , Tinea/epidemiology , Tinea/drug therapy , Trichophyton , Retrospective Studies , India/epidemiology
7.
Zhongguo Zhong Yao Za Zhi ; 48(18): 5091-5101, 2023 Sep.
Article in Zh | MEDLINE | ID: mdl-37802851

ABSTRACT

This study explored the prescription and medication rules of traditional Chinese medicine(TCM) in the prevention and treatment of diabetic microangiopathy based on literature mining. Relevant literature on TCM against diabetic microangiopathy was searched and prescriptions were collected. Microsoft Excel 2021 software was used to establish a prescription database, and an analysis was conducted on the frequency, properties, flavors, meridian tropism, and efficacy classifications of drugs. Association rule analysis, cluster analysis, and factor analysis were performed using SPSS Modeler 18.0 and SPSS Statistics 26.0 software. The characteristic active components and mechanisms of action of medium-high frequency drugs in the analysis of medication rules were explored through li-terature mining. A total of 1 327 prescriptions were included in this study, involving 411 drugs, with a total frequency reaching 19 154 times. The top five high-frequency drugs were Astragali Radix, Angelicae Sinensis Radix, Poria, Salviae Miltiorrhizae Radix et Rhizoma, and Rehmanniae Radix. The cold and warm drugs were used in combination. Drugs were mainly sweet, followed by bitter and pungent, and acted on the liver meridian. The majority of drugs were effective in tonifying deficiency, clearing heat, activating blood, and resolving stasis. Association rule analysis identified the highly supported drug pair of Astragali Radix-Angelicae Sinensis Radix and the highly confident drug combination of Poria-Alismatis Rhizoma-Corni Fructus. The strongest correlation was found among Astragali Radix, Angelicae Sinensis Radix, Poria, and Salviae Miltiorrhizae Radix et Rhizoma through the complex network analysis. Cluster analysis identified nine categories of drug combinations, while factor analysis identified 16 common factors. The analysis of active components in high-frequency drugs for the treatment of diabetic microangiopathy revealed that these effective components mainly exerted their effects by inhibiting oxidative stress and suppressing inflammatory reactions. The study found that the pathogenesis of diabetic microangiopathy was primarily characterized by deficiency in origin, with a combination of deficiency and excess. Deficiency was manifested as Qi deficiency and blood deficiency, while excess as phlegm-heat and blood stasis. The key organ involved in the pathological changes was the liver. The treatment mainly focused on supplementing Qi and nourishing blood, supplemented by clearing heat, coo-ling blood, activating blood, and dredging collaterals. Commonly used formulas included Danggui Buxue Decoction, Liuwei Dihuang Pills, Erzhi Pills, and Buyang Huanwu Decoction. The mechanisms of action of high-frequency drugs in the treatment of diabetic microangiopathy were often related to the inhibition of oxidative stress and suppression of inflammatory reactions. These findings can provide references for the clinical treatment of diabetic microangiopathy and the development of targeted drugs.


Subject(s)
Diabetes Mellitus , Diabetic Angiopathies , Drugs, Chinese Herbal , Humans , Medicine, Chinese Traditional , Drugs, Chinese Herbal/therapeutic use , Prescriptions , Drug Combinations , Diabetic Angiopathies/drug therapy , Data Mining , Diabetes Mellitus/drug therapy
8.
BMC Bioinformatics ; 23(Suppl 6): 407, 2022 Sep 30.
Article in English | MEDLINE | ID: mdl-36180861

ABSTRACT

BACKGROUND: To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study the relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. RESULTS: Among three knowledge graph completion models, TransE outperformed the other two (MR = 10.53, Hits@1 = 0.28). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. CONCLUSION: This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.


Subject(s)
Alzheimer Disease , Semantics , Alzheimer Disease/drug therapy , Drug Repositioning , Humans , Knowledge , Pattern Recognition, Automated
9.
Molecules ; 27(15)2022 Jul 23.
Article in English | MEDLINE | ID: mdl-35897884

ABSTRACT

Noncoding RNAs (ncRNA) are transcripts without protein-coding potential that play fundamental regulatory roles in diverse cellular processes and diseases. The application of deep sequencing experiments in ncRNA research have generated massive omics datasets, which require rapid examination, interpretation and validation based on exiting knowledge resources. Thus, text-mining methods have been increasingly adapted for automatic extraction of relations between an ncRNA and its target or a disease condition from biomedical literature. These bioinformatics tools can also assist in more complex research, such as database curation of candidate ncRNAs and hypothesis generation with respect to pathophysiological mechanisms. In this concise review, we first introduced basic concepts and workflow of literature mining systems. Then, we compared available bioinformatics tools tailored for ncRNA studies, including the tasks, applicability, and limitations. Their powerful utilities and flexibility are demonstrated by examples in a variety of diseases, such as Alzheimer's disease, atherosclerosis and cancers. Finally, we outlined several challenges from the viewpoints of both system developers and end users. We concluded that the application of text-mining techniques will booster disease-associated ncRNA discoveries in the biomedical literature and enable integrative biology in the current omics era.


Subject(s)
Data Mining , RNA, Untranslated , Computational Biology/methods , Data Mining/methods , Publications , RNA, Untranslated/genetics
10.
J Transl Med ; 19(1): 274, 2021 06 26.
Article in English | MEDLINE | ID: mdl-34174885

ABSTRACT

BACKGROUND: There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually. METHODS: In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data. RESULTS: We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap . CONCLUSIONS: Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.


Subject(s)
Antineoplastic Agents , Neoplasms , Data Mining , Humans , Knowledge Bases , Neoplasms/drug therapy , Publications
11.
FEMS Yeast Res ; 21(7)2021 12 02.
Article in English | MEDLINE | ID: mdl-34864983

ABSTRACT

Functional genomic screening of genetic mutant libraries enables the characterization of gene function in diverse organisms. For the fungal pathogen Candida albicans, several genetic mutant libraries have been generated and screened for diverse phenotypes, including tolerance to environmental stressors and antifungal drugs, and pathogenic traits such as cellular morphogenesis, biofilm formation and host-pathogen interactions. Here, we compile and organize C. albicans functional genomic screening data from ∼400 screens, to generate a data library of genetic mutant strains analyzed under diverse conditions. For quantitative screening data, we normalized these results to enable quantitative and comparative analysis of different genes across different phenotypes. Together, this provides a unique C. albicans genetic database, summarizing abundant phenotypic data from functional genomic screens in this critical fungal pathogen.


Subject(s)
Antifungal Agents , Candida albicans , Candida albicans/genetics , Fungal Proteins/genetics , Gene Library , Genomics , Phenotype
12.
BMC Bioinformatics ; 21(1): 432, 2020 Oct 02.
Article in English | MEDLINE | ID: mdl-33008309

ABSTRACT

BACKGROUND: In systems biology, it is of great interest to identify previously unreported associations between genes. Recently, biomedical literature has been considered as a valuable resource for this purpose. While classical clustering algorithms have popularly been used to investigate associations among genes, they are not tuned for the literature mining data and are also based on strong assumptions, which are often violated in this type of data. For example, these approaches often assume homogeneity and independence among observations. However, these assumptions are often violated due to both redundancies in functional descriptions and biological functions shared among genes. Latent block models can be alternatives in this case but they also often show suboptimal performances, especially when signals are weak. In addition, they do not allow to utilize valuable prior biological knowledge, such as those available in existing databases. RESULTS: In order to address these limitations, here we propose PALMER, a constrained latent block model that allows to identify indirect relationships among genes based on the biomedical literature mining data. By automatically associating relevant Gene Ontology terms, PALMER facilitates biological interpretation of novel findings without laborious downstream analyses. PALMER also allows researchers to utilize prior biological knowledge about known gene-pathway relationships to guide identification of gene-gene associations. We evaluated PALMER with simulation studies and applications to studies of pathway-modulating genes relevant to cancer signaling pathways, while utilizing biological pathway annotations available in the KEGG database as prior knowledge. CONCLUSIONS: We showed that PALMER outperforms traditional latent block models and it provides reliable identification of novel gene-gene associations by utilizing prior biological knowledge, especially when signals are weak in the biomedical literature mining dataset. We believe that PALMER and its relevant user-friendly software will be powerful tools that can be used to improve existing pathway annotations and identify novel pathway-modulating genes.


Subject(s)
Algorithms , Data Mining , Models, Theoretical , Molecular Sequence Annotation , Publications , Computer Simulation , Gene Ontology , Gene Regulatory Networks , Humans , Multigene Family , Systems Biology
13.
BMC Bioinformatics ; 21(Suppl 5): 250, 2020 Oct 26.
Article in English | MEDLINE | ID: mdl-33106154

ABSTRACT

Biological contextual information helps understand various phenomena occurring in the biological systems consisting of complex molecular relations. The construction of context-specific relational resources vastly relies on laborious manual extraction from unstructured literature. In this paper, we propose COMMODAR, a machine learning-based literature mining framework for context-specific molecular relations using multimodal representations. The main idea of COMMODAR is the feature augmentation by the cooperation of multimodal representations for relation extraction. We leveraged biomedical domain knowledge as well as canonical linguistic information for more comprehensive representations of textual sources. The models based on multiple modalities outperformed those solely based on the linguistic modality. We applied COMMODAR to the 14 million PubMed abstracts and extracted 9214 context-specific molecular relations. All corpora, extracted data, evaluation results, and the implementation code are downloadable at https://github.com/jae-hyun-lee/commodar . CCS CONCEPTS: • Computing methodologies~Information extraction • Computing methodologies~Neural networks • Applied computing~Biological networks.


Subject(s)
Data Mining/methods , Machine Learning , PubMed , Publications
14.
Cancer Immunol Immunother ; 69(12): 2425-2439, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32556496

ABSTRACT

Cancer immunotherapy is a rapidly growing field that is completely transforming oncology care. Mining this knowledge base for biomedically important information is becoming increasingly challenging, due to the expanding number of scientific publications, and the dynamic evolution of this subject with time. In this study, we have employed a literature-mining approach that was used to analyze the cancer immunotherapy-related publications listed in PubMed and quantify emerging trends. A total of 93,033 publications published in 5055 journals have been retrieved, and 141 meaningful topics have been identified, which were further classified into eight distinct categories. Statistical analysis indicates a mean annual increase in the number of published papers of approximately 8% in the last 20 years. The research topics that exhibited the highest trends included "immune checkpoint inhibitors," "tumor microenvironment," "HPV vaccination," "CAR T-cells," and "gene mutations/tumor profiling." The top identified cancer types included "lung," "colorectal," and "breast cancer," and a shift in popularity from hematological to solid tumors was observed. As regards clinical research, a transition from early phase clinical trials to randomized control trials was recorded, indicating that the field is entering a more advanced phase of development. Overall, this mining approach provided an unbiased analysis of the cancer immunotherapy literature in a time-conserving and scale-efficient manner.


Subject(s)
Bibliometrics , Immunotherapy/trends , Neoplasms/therapy , Antineoplastic Agents, Immunological/therapeutic use , Cancer Vaccines/therapeutic use , Data Mining , Humans , Immunotherapy/methods , Mutation , Neoplasms/genetics , Neoplasms/immunology , Papillomavirus Vaccines/therapeutic use , PubMed/statistics & numerical data , Randomized Controlled Trials as Topic
15.
Drug Dev Res ; 81(8): 1004-1018, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32657473

ABSTRACT

Faced with the current large-scale public health emergency, collecting, sorting, and analyzing biomedical information related to the "SARS-CoV-2" should be done as quickly as possible to gain a global perspective, which is a basic requirement for strengthening epidemic control capacity. However, for human researchers studying viruses and hosts, the vast amount of information available cannot be processed effectively and in a timely manner, particularly if our scientific understanding is also limited, which further lowers the information processing efficiency. We present TWIRLS (Topic-wise inference engine of massive biomedical literatures), a method that can deal with various scientific problems, such as liver cancer, acute myeloid leukemia, and so forth, which can automatically acquire, organize, and classify information. Additionally, this information can be combined with independent functional data sources to build an inference system via a machine-based approach, which can provide relevant knowledge to help human researchers quickly establish subject cognition and to make more effective decisions. Using TWIRLS, we automatically analyzed more than three million words in more than 14,000 literature articles in only 4 hr. We found that an important regulatory factor angiotensin-converting enzyme 2 (ACE2) may be involved in host pathological changes on binding to the coronavirus after infection. On triggering functional changes in ACE2/AT2R, the cytokine homeostasis regulation axis becomes imbalanced via the Renin-Angiotensin System and IP-10, leading to a cytokine storm. Through a preliminary analysis of blood indices of COVID-19 patients with a history of hypertension, we found that non-ARB (Angiotensin II receptor blockers) users had more symptoms of severe illness than ARB users. This suggests ARBs could potentially be used to treat acute lung injury caused by coronavirus infection.

16.
BMC Bioinformatics ; 19(1): 193, 2018 05 30.
Article in English | MEDLINE | ID: mdl-29843590

ABSTRACT

BACKGROUND: Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. METHODS: Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. RESULTS: The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. CONCLUSIONS: In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.


Subject(s)
Data Mining/methods , Drug Discovery/methods , Drug Therapy , Humans , Knowledge Bases , Logistic Models , Publications
17.
BMC Bioinformatics ; 19(Suppl 17): 495, 2018 Dec 28.
Article in English | MEDLINE | ID: mdl-30591010

ABSTRACT

BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. RESULTS: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as "gold standard". For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. CONCLUSIONS: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org .


Subject(s)
Information Storage and Retrieval , PubMed , Publications , Search Engine , Disease/genetics , Genetic Association Studies , Humans , Internet , Nitric Oxide Synthase Type III/metabolism , Tumor Necrosis Factor-alpha/metabolism
18.
J Proteome Res ; 17(4): 1383-1396, 2018 04 06.
Article in English | MEDLINE | ID: mdl-29505266

ABSTRACT

There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.


Subject(s)
Computational Biology/methods , Proteomics/methods , Animals , Human Genome Project , Humans , Precision Medicine/methods , Research , Search Engine
19.
J Proteome Res ; 17(12): 4345-4357, 2018 12 07.
Article in English | MEDLINE | ID: mdl-30094994

ABSTRACT

Targeted metabolomics and biochemical studies complement the ongoing investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven Human Proteome Project (B/D-HPP). However, it is challenging to identify and prioritize metabolite and chemical targets. Literature-mining-based approaches have been proposed for target proteomics studies, but text mining methods for metabolite and chemical prioritization are hindered by a large number of synonyms and nonstandardized names of each entity. In this study, we developed a cloud-based literature mining and summarization platform that maps metabolites and chemicals in the literature to unique identifiers and summarizes the copublication trends of metabolites/chemicals and B/D-HPP topics using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores. We successfully prioritized metabolites and chemicals associated with the B/D-HPP targeted fields and validated the results by checking against expert-curated associations and enrichment analyses. Compared with existing algorithms, our system achieved better precision and recall in retrieving chemicals related to B/D-HPP focused areas. Our cloud-based platform enables queries on all biological terms in multiple species, which will contribute to B/D-HPP and targeted metabolomics/chemical studies.


Subject(s)
Cloud Computing , Metabolomics , Proteome , Algorithms , Data Mining/methods , Humans , Search Engine
20.
Epilepsia ; 59(2): 492-501, 2018 02.
Article in English | MEDLINE | ID: mdl-29341109

ABSTRACT

OBJECTIVE: Current antiepileptic drugs (AEDs) have several shortcomings. For example, they fail to control seizures in 30% of patients. Hence, there is a need to identify new AEDs. Drug repurposing is the discovery of new indications for approved drugs. This drug "recycling" offers the potential of significant savings in the time and cost of drug development. Many drugs licensed for other indications exhibit antiepileptic efficacy in animal models. Our aim was to create a database of "prescribable" drugs, approved for other conditions, with published evidence of efficacy in animal models of epilepsy, and to collate data that would assist in choosing the most promising candidates for drug repurposing. METHODS: The database was created by the following: (1) computational literature-mining using novel software that identifies Medline abstracts containing the name of a prescribable drug, a rodent model of epilepsy, and a phrase indicating seizure reduction; then (2) crowdsourced manual curation of the identified abstracts. RESULTS: The final database includes 173 drugs and 500 abstracts. It is made freely available at www.liverpool.ac.uk/D3RE/PDE3. The database is reliable: 94% of the included drugs have corroborative evidence of efficacy in animal models (for example, evidence from multiple independent studies). The database includes many drugs that are appealing candidates for repurposing, as they are widely accepted by prescribers and patients-the database includes half of the 20 most commonly prescribed drugs in England-and they target many proteins involved in epilepsy but not targeted by current AEDs. It is important to note that the drugs are of potential relevance to human epilepsy-the database is highly enriched with drugs that target proteins of known causal human epilepsy genes (Fisher's exact test P-value < 3 × 10-5 ). We present data to help prioritize the most promising candidates for repurposing from the database. SIGNIFICANCE: The PDE3 database is an important new resource for drug repurposing research in epilepsy.


Subject(s)
Anticonvulsants/therapeutic use , Databases, Pharmaceutical , Drug Repositioning , Epilepsy/drug therapy , Animals , Biomedical Research , Data Mining , Disease Models, Animal , England , Humans , Software
SELECTION OF CITATIONS
SEARCH DETAIL