Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 25(1): 213, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38872097

ABSTRACT

BACKGROUND: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .


Subject(s)
Benchmarking , Benchmarking/methods , Algorithms , Biomedical Research/methods , Software , Machine Learning , Databases, Factual , Computational Biology/methods , Semantics
2.
bioRxiv ; 2024 Jun 09.
Article in English | MEDLINE | ID: mdl-38895485

ABSTRACT

Neurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing. Author Summary: This manuscript outlines our project involving the application of AGATHA, an AI-based literature mining tool, to discover drugs with the potential for repurposing in the context of neurocognitive disorders. The primary objective is to identify connections between approved medications and specific health conditions through advanced statistical analysis, including techniques like Partial Least Squares Discriminant Analysis (PLSDA) and unsupervised clustering. The methodology involves grouping scientific terms related to different health conditions and genes, followed by building discrimination models to extract lists of disease-specific genes. These genes are then analyzed through pathway analysis to select candidates for drug repurposing.

3.
medRxiv ; 2024 Jan 17.
Article in English | MEDLINE | ID: mdl-38293017

ABSTRACT

More than one million people in the United States and over 38 million people worldwide are living with human immunodeficiency virus (HIV) infection. Antiretroviral therapy (ART) greatly improves the health of people living with HIV (PLWH); however, the increased life longevity of PLWH has revealed consequences of HIV-associated comorbidities. HIV can enter the brain and cause inflammation even in individuals with well-controlled HIV infection. The quality of life for PLWH can be compromised by cognitive deficits and memory loss, termed HIV-associated neurological disorders (HAND). HIV-associated dementia is a related but distinct diagnosis. Common causes of dementia in PLWH are similar to the general population and can affect cognition. There is an urgent need to identify treatments for the aging PWLH population. We previously developed AI-based biomedical literature mining systems to uncover a potential novel connection between HAND the renin-angiotensin system (RAAS), which is a pharmacological target for hypertension. RAAS-targeting anti-hypertensives are gaining attention for their protective benefits in several neurocognitive disorders. To our knowledge, the effect of RAAS-targeting drugs on the cognition of PLWH development of dementia has not previously been analyzed. We hypothesized that exposure to angiotensin-converting enzyme inhibitors (ACEi) that cross the blood brain barrier (BBB) reduces the risk/occurrence of dementia in PLWH. We report a retrospective cohort study of electronic health records (EHRs) to examine the proposed hypothesis using data from the United States Department of Veterans Affairs, in which a primary outcome of dementia was measured in controlled cohorts of patients exposed to BBB-penetrant ACEi versus those unexposed to BBB-penetrant ACEi. The results reveal a statistically significant reduction in dementia diagnosis for PLWH exposed to BBB-penetrant ACEi. These results suggest there is a potential protective effect of BBB ACE inhibitor exposure against dementia in PLWH that warrants further investigation.

4.
PLoS One ; 16(7): e0253905, 2021.
Article in English | MEDLINE | ID: mdl-34228754

ABSTRACT

Biomedical research papers often combine disjoint concepts in novel ways, such as when describing a newly discovered relationship between an understudied gene with an important disease. These concepts are often explicitly encoded as metadata keywords, such as the author-provided terms included with many documents in the MEDLINE database. While substantial recent work has addressed the problem of text generation in a more general context, applications, such as scientific writing assistants, or hypothesis generation systems, could benefit from the capacity to select the specific set of concepts that underpin a generated biomedical text. We propose a conditional language model following the transformer architecture. This model uses the "encoder stack" to encode concepts that a user wishes to discuss in the generated text. The "decoder stack" then follows the masked self-attention pattern to perform text generation, using both prior tokens as well as the encoded condition. We demonstrate that this approach provides significant control, while still producing reasonable biomedical text.


Subject(s)
Biomedical Research , Data Mining/methods , Medical Writing , Humans , MEDLINE , Publications
5.
J Neuroimmune Pharmacol ; 15(2): 209-223, 2020 06.
Article in English | MEDLINE | ID: mdl-31802418

ABSTRACT

HIV-1 Associated Neurocognitive Disorder (HAND) is a common and clinically detrimental complication of HIV infection. Viral proteins, including Tat, released from infected cells, cause neuronal toxicity. Substance abuse in HIV-infected patients greatly influences the severity of neuronal damage. To repurpose small molecule inhibitors for anti-HAND therapy, we employed MOLIERE, an AI-based literature mining system that we developed. All human genes were analyzed and prioritized by MOLIERE to find previously unknown targets connected to HAND. From the identified high priority genes, we narrowed the list to those with known small molecule ligands developed for other applications and lacking systemic toxicity in animal models. To validate the AI-based process, the selective small molecule inhibitor of DDX3 helicase activity, RK-33, was chosen and tested for neuroprotective activity. The compound, previously developed for cancer treatment, was tested for the prevention of combined neurotoxicity of HIV Tat and cocaine. Rodent cortical cultures were treated with 6 or 60 ng/ml of HIV Tat and 10 or 25 µM of cocaine, which caused substantial toxicity. RK-33 at doses as low as 1 µM greatly reduced the neurotoxicity of Tat and cocaine. Transcriptome analysis showed that most Tat-activated transcripts are microglia-specific genes and that RK-33 blocks their activation. Treatment with RK-33 inhibits the Tat and cocaine-dependent increase in the number and size of microglia and the proinflammatory cytokines IL-6, TNF-α, MCP-1/CCL2, MIP-2, IL-1α and IL-1ß. These findings reveal that inhibition of DDX3 may have the potential to treat not only HAND but other neurodegenerative diseases. Graphical Abstract RK-33, selective inhibitor of Dead Box RNA helicase 3 (DDX3) protects neurons from combined Tat and cocaine neurotoxicity by inhibition of microglia activation and production of proinflammatory cytokines.


Subject(s)
Azepines/pharmacology , Cocaine/toxicity , DEAD-box RNA Helicases/antagonists & inhibitors , Imidazoles/pharmacology , Microglia/drug effects , tat Gene Products, Human Immunodeficiency Virus/toxicity , AIDS Dementia Complex/drug therapy , AIDS Dementia Complex/enzymology , Animals , Azepines/therapeutic use , Cells, Cultured , DEAD-box RNA Helicases/metabolism , Dopamine Uptake Inhibitors/toxicity , Dose-Response Relationship, Drug , Female , Imidazoles/therapeutic use , Male , Microglia/enzymology , Rats , Rats, Sprague-Dawley
6.
Proc IEEE Int Conf Big Data ; 2018: 1494-1503, 2018 Dec.
Article in English | MEDLINE | ID: mdl-35789222

ABSTRACT

The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within MOLIERE, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3). Reproducibility: code, validation data, and results can be found at sybrandt.com/2018/validation.

7.
Appl Netw Sci ; 2(1): 36, 2017.
Article in English | MEDLINE | ID: mdl-30533515

ABSTRACT

Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks including verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size.

8.
KDD ; 2017: 1633-1642, 2017 Aug.
Article in English | MEDLINE | ID: mdl-29430330

ABSTRACT

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within this network, and demonstrate the effectiveness of MOLIERE by performing hypothesis generation on historical data. Our network, implementation, and resulting data are all publicly available for the broad scientific community.

9.
PLoS One ; 11(5): e0155119, 2016.
Article in English | MEDLINE | ID: mdl-27195952

ABSTRACT

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.


Subject(s)
Data Mining/methods , Medical Informatics/instrumentation , Medical Informatics/methods , Support Vector Machine , Algorithms , Benchmarking , Databases, Factual , Electronic Health Records , False Positive Reactions , Health Services Research , Humans , Models, Theoretical , Regression Analysis , Software
10.
FEBS Lett ; 580(6): 1672-6, 2006 Mar 06.
Article in English | MEDLINE | ID: mdl-16497302

ABSTRACT

Partitioning of aminoacyl-tRNA synthetases and their associated amino acids into two classes allows us to distinguish between thermophilic and mesophilic species based only on amino acids composition. The CLASSDB program has been developed for amino acid content analysis in organisms treated individually or pooled together to form a pattern of characteristic properties. A strong correlation has been observed between optimal growth temperature (OGT) of organisms and class II amino acids content. Amino acid composition in organisms closely related phylogenetically but dissimilar in their OGT testifies that thermo-adaptation happens rather rapidly on the time scale of evolution.


Subject(s)
Amino Acids/analysis , Archaea/chemistry , Archaea/growth & development , Bacteria/chemistry , Bacteria/growth & development , Adaptation, Biological , Evolution, Molecular , Software , Temperature
SELECTION OF CITATIONS
SEARCH DETAIL
...