Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Information (Basel) ; 15(1)2024 Jan.
Article in English | MEDLINE | ID: mdl-38665395

ABSTRACT

The ability to translate Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) into different modalities and data types is essential to improve Deep Learning (DL) for predictive medicine. This work presents DACMVA, a novel framework to conduct data augmentation in a cross-modal dataset by translating between modalities and oversampling imputations of missing data. DACMVA was inspired by previous work on the alignment of latent spaces in Autoencoders. DACMVA is a DL data augmentation pipeline that improves the performance in a downstream prediction task. The unique DACMVA framework leverages a cross-modal loss to improve the imputation quality and employs training strategies to enable regularized latent spaces. Oversampling of augmented data is integrated into the prediction training. It is empirically demonstrated that the new DACMVA framework is effective in the often-neglected scenario of DL training on tabular data with continuous labels. Specifically, DACMVA is applied towards cancer survival prediction on tabular gene expression data where there is a portion of missing data in a given modality. DACMVA significantly (p << 0.001, one-sided Wilcoxon signed-rank test) outperformed the non-augmented baseline and competing augmentation methods with varying percentages of missing data (4%, 90%, 95% missing). As such, DACMVA provides significant performance improvements, even in very-low-data regimes, over existing state-of-the-art methods, including TDImpute and oversampling alone.

2.
Int J Mol Sci ; 25(8)2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38674089

ABSTRACT

Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. This study's goal was to identify the signaling drivers and pathways that modulate glomerular endothelial dysfunction in DKD via artificial intelligence-enabled literature-based discovery. Cross-domain text mining of 33+ million PubMed articles was performed with SemNet 2.0 to identify and rank multi-scalar and multi-factorial pathophysiological concepts related to DKD. A set of identified relevant genes and proteins that regulate different pathological events associated with DKD were analyzed and ranked using normalized mean HeteSim scores. High-ranking genes and proteins intersected three domains-DKD, the immune response, and glomerular endothelial cells. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic processes, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, the nitric oxide response, oxidative stress, the cytokine response, macrophage signaling, NFκB factor activity, the TLR pathway, glucose metabolism, the inflammatory response, the ERK/MAPK signaling response, the JAK/STAT pathway, the T-cell-mediated response, the WNT/ß-catenin pathway, the renin-angiotensin system, and NADPH oxidase activity. High-ranking genes and proteins were used to generate a protein-protein interaction network. The study results prioritized interactions or molecules involved in dysregulated signaling in DKD, which can be further assessed through biochemical network models or experiments.


Subject(s)
Data Mining , Diabetic Nephropathies , Diabetic Nephropathies/metabolism , Diabetic Nephropathies/genetics , Diabetic Nephropathies/pathology , Humans , Signal Transduction , Protein Interaction Maps
3.
J Alzheimers Dis Rep ; 8(1): 371-385, 2024.
Article in English | MEDLINE | ID: mdl-38549638

ABSTRACT

Background: Amyloid-ß plaques (Aß) are associated with Alzheimer's disease (AD). Pooled assessment of amyloid reduction in transgenic AD mice is critical for expediting anti-amyloid AD therapeutic research. Objective: The mean threshold of Aß reduction necessary to achieve cognitive improvement was measured via pooled assessment (n = 594 mice) of Morris water maze (MWM) escape latency of transgenic AD mice treated with substances intended to reduce Aß via reduction of beta-secretase cleaving enzyme (BACE). Methods: Machine learning and statistical methods identified necessary amyloid reduction levels using mouse data (e.g., APP/PS1, LPS, Tg2576, 3xTg-AD, control, wild type, treated, untreated) curated from 22 published studies. Results: K-means clustering identified 4 clusters that primarily corresponded with level of Aß: untreated transgenic AD control mice, wild type mice, and two clusters of transgenic AD mice treated with BACE inhibitors that had either an average 25% "medium reduction" of Aß or 50% "high reduction" of Aß compared to untreated control. A 25% Aß reduction achieved a 28% cognitive improvement, and a 50% Aß reduction resulted in a significant 32% improvement compared to untreated transgenic mice (p < 0.05). Comparatively, wild type mice had a mean 41% MWM latency improvement over untreated transgenic mice (p < 0.05). BACE reduction had a lesser impact on the ratio of Aß42 to Aß40. Supervised learning with an 80% -20% train-test split confirmed Aß reduction was a key feature for predicting MWM escape latency (R2 = 0.8 to 0.95). Conclusions: Results suggest a 25% reduction in Aß as a meaningful treatment threshold for improving transgenic AD mouse cognition.

4.
J Clin Med ; 13(6)2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38542012

ABSTRACT

Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either "high risk" or "low risk" in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.

5.
Cancers (Basel) ; 15(17)2023 Aug 31.
Article in English | MEDLINE | ID: mdl-37686630

ABSTRACT

Chronic myeloid leukemia (CML) is treated with tyrosine kinase inhibitors (TKI) that target the pathological BCR-ABL1 fusion oncogene. The objective of this statistical meta-analysis was to assess the prevalence of other hematological adverse events (AEs) that occur during or after predominantly first-line treatment with TKIs. Data from seventy peer-reviewed, published studies were included in the analysis. Hematological AEs were assessed as a function of TKI drug type (dasatinib, imatinib, bosutinib, nilotinib) and CML phase (chronic, accelerated, blast). AE prevalence aggregated across all severities and phases was significantly different between each TKI (p < 0.05) for anemia-dasatinib (54.5%), bosutinib (44.0%), imatinib (32.8%), nilotinib (11.2%); neutropenia-dasatinib (51.2%), imatinib (29.8%), bosutinib (14.1%), nilotinib (14.1%); thrombocytopenia-dasatinib (62.2%), imatinib (30.4%), bosutinib (35.3%), nilotinib (22.3%). AE prevalence aggregated across all severities and TKIs was significantly (p < 0.05) different between CML phases for anemia-chronic (28.4%), accelerated (66.9%), blast (55.8%); neutropenia-chronic (26.7%), accelerated (63.8%), blast (36.4%); thrombocytopenia-chronic (33.3%), accelerated (65.6%), blast (37.9%). An odds ratio (OR) with 95% confidence interval was used to compare hematological AE prevalence of each TKI compared to the most common first-line TKI therapy, imatinib. For anemia, dasatinib OR = 1.65, [1.51, 1.83]; bosutinib OR = 1.34, [1.16, 1.54]; nilotinib OR = 0.34, [0.30, 0.39]. For neutropenia, dasatinib OR = 1.72, [1.53, 1.92]; bosutinib OR = 0.47, [0.38, 0.58]; nilotinib OR = 0.47, [0.42, 0.54]. For thrombocytopenia, dasatinib OR = 2.04, [1.82, 2.30]; bosutinib OR = 1.16, [0.97, 1.39]; nilotinib OR = 0.73, [0.65, 0.82]. Nilotinib had the greatest fraction of severe (grade 3/4) hematological AEs (30%). In conclusion, the overall prevalence of hematological AEs by TKI type was: dasatinib > bosutinib > imatinib > nilotinib. Study limitations include inability to normalize for dosage and treatment duration.

6.
Biology (Basel) ; 12(9)2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37759668

ABSTRACT

Multiple studies have reported new or exacerbated persistent or resistant hypertension in patients previously infected with COVID-19. We used literature-based discovery to identify and prioritize multi-scalar explanatory biology that relates resistant hypertension to COVID-19. Cross-domain text mining of 33+ million PubMed articles within a comprehensive knowledge graph was performed using SemNet 2.0. Unsupervised rank aggregation determined which concepts were most relevant utilizing the normalized HeteSim score. A series of simulations identified concepts directly related to COVID-19 and resistant hypertension or connected via one of three renin-angiotensin-aldosterone system hub nodes (mineralocorticoid receptor, epithelial sodium channel, angiotensin I receptor). The top-ranking concepts relating COVID-19 to resistant hypertension included: cGMP-dependent protein kinase II, MAP3K1, haspin, ral guanine nucleotide exchange factor, N-(3-Oxododecanoyl)-L-homoserine lactone, aspartic endopeptidases, metabotropic glutamate receptors, choline-phosphate cytidylyltransferase, protein tyrosine phosphatase, tat genes, MAP3K10, uridine kinase, dicer enzyme, CMD1B, USP17L2, FLNA, exportin 5, somatotropin releasing hormone, beta-melanocyte stimulating hormone, pegylated leptin, beta-lipoprotein, corticotropin, growth hormone-releasing peptide 2, pro-opiomelanocortin, alpha-melanocyte stimulating hormone, prolactin, thyroid hormone, poly-beta-hydroxybutyrate depolymerase, CR 1392, BCR-ABL fusion gene, high density lipoprotein sphingomyelin, pregnancy-associated murine protein 1, recQ4 helicase, immunoglobulin heavy chain variable domain, aglycotransferrin, host cell factor C1, ATP6V0D1, imipramine demethylase, TRIM40, H3C2 gene, COL1A1+COL1A2 gene, QARS gene, VPS54, TPM2, MPST, EXOSC2, ribosomal protein S10, TAP-144, gonadotropins, human gonadotropin releasing hormone 1, beta-lipotropin, octreotide, salmon calcitonin, des-n-octanoyl ghrelin, liraglutide, gastrins. Concepts were mapped to six physiological themes: altered endocrine function, 23.1%; inflammation or cytokine storm, 21.3%; lipid metabolism and atherosclerosis, 17.6%; sympathetic input to blood pressure regulation, 16.7%; altered entry of COVID-19 virus, 14.8%; and unknown, 6.5%.

7.
Int J Mol Sci ; 24(15)2023 Aug 02.
Article in English | MEDLINE | ID: mdl-37569714

ABSTRACT

Parkinson's disease (PD) is a movement disorder caused by a dopamine deficit in the brain. Current therapies primarily focus on dopamine modulators or replacements, such as levodopa. Although dopamine replacement can help alleviate PD symptoms, therapies targeting the underlying neurodegenerative process are limited. The study objective was to use artificial intelligence to rank the most promising repurposed drug candidates for PD. Natural language processing (NLP) techniques were used to extract text relationships from 33+ million biomedical journal articles from PubMed and map relationships between genes, proteins, drugs, diseases, etc., into a knowledge graph. Cross-domain text mining, hub network analysis, and unsupervised learning rank aggregation were performed in SemNet 2.0 to predict the most relevant drug candidates to levodopa and PD using relevance-based HeteSim scores. The top predicted adjuvant PD therapies included ebastine, an antihistamine for perennial allergic rhinitis; levocetirizine, another antihistamine; vancomycin, a powerful antibiotic; captopril, an angiotensin-converting enzyme (ACE) inhibitor; and neramexane, an N-methyl-D-aspartate (NMDA) receptor agonist. Cross-domain text mining predicted that antihistamines exhibit the capacity to synergistically alleviate Parkinsonian symptoms when used with dopamine modulators like levodopa or levodopa-carbidopa. The relationship patterns among the identified adjuvant candidates suggest that the likely therapeutic mechanism(s) of action of antihistamines for combatting the multi-factorial PD pathology include counteracting oxidative stress, amending the balance of neurotransmitters, and decreasing the proliferation of inflammatory mediators. Finally, cross-domain text mining interestingly predicted a strong relationship between PD and liver disease.


Subject(s)
Parkinson Disease , Humans , Parkinson Disease/drug therapy , Levodopa/therapeutic use , Antiparkinson Agents/pharmacology , Dopamine/therapeutic use , Artificial Intelligence , Angiotensin-Converting Enzyme Inhibitors/therapeutic use , Histamine Antagonists/therapeutic use
8.
Bioengineering (Basel) ; 10(8)2023 Aug 02.
Article in English | MEDLINE | ID: mdl-37627803

ABSTRACT

This work presents SeizFt-a novel seizure detection framework that utilizes machine learning to automatically detect seizures using wearable SensorDot EEG data. Inspired by interpretable sleep staging, our novel approach employs a unique combination of data augmentation, meaningful feature extraction, and an ensemble of decision trees to improve resilience to variations in EEG and to increase the capacity to generalize to unseen data. Fourier Transform (FT) Surrogates were utilized to increase sample size and improve the class balance between labeled non-seizure and seizure epochs. To enhance model stability and accuracy, SeizFt utilizes an ensemble of decision trees through the CatBoost classifier to classify each second of EEG recording as seizure or non-seizure. The SeizIt1 dataset was used for training, and the SeizIt2 dataset for validation and testing. Model performance for seizure detection was evaluated using two primary metrics: sensitivity using the any-overlap method (OVLP) and False Alarm (FA) rate using epoch-based scoring (EPOCH). Notably, SeizFt placed first among an array of state-of-the-art seizure detection algorithms as part of the Seizure Detection Grand Challenge at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP). SeizFt outperformed state-of-the-art black-box models in accurate seizure detection and minimized false alarms, obtaining a total score of 40.15, combining OVLP and EPOCH across two tasks and representing an improvement of ~30% from the next best approach. The interpretability of SeizFt is a key advantage, as it fosters trust and accountability among healthcare professionals. The most predictive seizure detection features extracted from SeizFt were: delta wave, interquartile range, standard deviation, total absolute power, theta wave, the ratio of delta to theta, binned entropy, Hjorth complexity, delta + theta, and Higuchi fractal dimension. In conclusion, the successful application of SeizFt to wearable SensorDot data suggests its potential for real-time, continuous monitoring to improve personalized medicine for epilepsy.

9.
Res Sq ; 2023 Feb 28.
Article in English | MEDLINE | ID: mdl-36909654

ABSTRACT

Alzheimer's disease (AD) progresses through a lengthy asymptomatic period during which pathological changes accumulate prior to development of clinical symptoms. As disease-modifying treatments are developed, tools to stratify risk of clinical disease will be required to guide their use. In this study, we examine the relationship of AD biomarkers in healthy middle-aged individuals to health history, family history, and neuropsychological measures and identify cerebrospinal fluid (CSF) biomarkers to stratify risk of progression from asymptomatic to symptomatic AD. CSF from cognitively normal (CN) individuals (N=1149) in the Emory Healthy Brain Study were assayed for Aß42, total Tau (tTau), and phospho181-Tau (pTau), and a subset of 134 cognitively normal, but biomarker-positive, individuals were identified with asymptomatic AD (AsymAD) based on a locally-determined cutoff value for ratio of tTau to Aß42. These AsymAD cases were matched for demographic features with 134 biomarker-negative controls (CN/BM-) and compared for differences in medical comorbidities and family history. Dyslipidemia emerged as a distinguishing feature between AsymAD and CN/BM-groups with significant association with personal and family history of dyslipidemia. A weaker relationship was seen with diabetes, but there was no association with hypertension. Examination of the full cohort by median regression revealed a significant relationship of CSF Aß42 (but not tTau or pTau) with dyslipidemia and diabetes. On neuropsychological tests, CSF Aß42 was not correlated with performance on any measures, but tTau and pTau were strongly correlated with visuospatial perception and visual episodic memory. In addition to traditional CSF AD biomarkers, a panel of AD biomarker peptides derived from integrating brain and CSF proteomes were evaluated using machine learning strategies to identify a set of 8 peptides that accurately classified CN/BM- and symptomatic AD CSF samples with AUC of 0.982. Using these 8 peptides in a low dimensional t-distributed Stochastic Neighbor Embedding analysis and k-Nearest Neighbor (k=5) algorithm, AsymAD cases were stratified into "Control-like" and "AD-like" subgroups based on their proximity to CN/BM- or AD CSF profiles. Independent analysis of these cases using a Joint Mutual Information algorithm selected a set of 5 peptides with 81% accuracy in stratifying cases into AD-like and Control-like subgroups. Performance of both sets of peptides was evaluated and validated in an independent data set from the Alzheimer's Disease Neuroimaging Initiative. Based on our findings, we conclude that there is an important role of lipid metabolism in asymptomatic stages of AD. Visuospatial perception and visual episodic memory may be more sensitive than language-based abilities to earliest stages of cognitive decline in AD. Finally, candidate CSF peptides show promise as next generation biomarkers for predicting progression from asymptomatic to symptomatic stages of AD.

10.
J Alzheimers Dis ; 92(2): 411-424, 2023.
Article in English | MEDLINE | ID: mdl-36776048

ABSTRACT

BACKGROUND: The complex and not yet fully understood etiology of Alzheimer's disease (AD) shows important proteopathic signs which are unlikely to be linked to a single protein. However, protein subsets from deep proteomic datasets can be useful in stratifying patient risk, identifying stage dependent disease markers, and suggesting possible disease mechanisms. OBJECTIVE: The objective was to identify protein subsets that best classify subjects into control, asymptomatic Alzheimer's disease (AsymAD), and AD. METHODS: Data comprised 6 cohorts; 620 subjects; 3,334 proteins. Brain tissue-derived predictive protein subsets for classifying AD, AsymAD, or control were identified and validated with label-free quantification and machine learning. RESULTS: A 29-protein subset accurately classified AD (AUC = 0.94). However, an 88-protein subset best predicted AsymAD (AUC = 0.92) or Control (AUC = 0.92) from AD (AUC = 0.98). AD versus Control: APP, DHX15, NRXN1, PBXIP1, RABEP1, STOM, and VGF. AD versus AsymAD: ALDH1A1, BDH2, C4A, FABP7, GABBR2, GNAI3, PBXIP1, and PRKAR1B. AsymAD versus Control: APP, C4A, DMXL1, EXOC2, PITPNB, RABEP1, and VGF. Additional predictors: DNAJA3, PTBP2, SLC30A9, VAT1L, CROCC, PNP, SNCB, ENPP6, HAPLN2, PSMD4, and CMAS. CONCLUSION: Biomarkers were dynamically separable across disease stages. Predictive proteins were significantly enriched to sugar metabolism.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/metabolism , Proteomics , Brain/metabolism , Machine Learning , Sugars/metabolism , HSP40 Heat-Shock Proteins/metabolism , Hydroxybutyrate Dehydrogenase/metabolism , Proteins/metabolism
11.
Proc Conf Empir Methods Nat Lang Process ; 2023: 14462-14478, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38756862

ABSTRACT

Biomedical entity linking (BioEL) is the process of connecting entities referenced in documents to entries in biomedical databases such as the Unified Medical Language System (UMLS) or Medical Subject Headings (MeSH). The study objective was to comprehensively evaluate nine recent state-of-the-art biomedical entity linking models under a unified framework. We compare these models along axes of (1) accuracy, (2) speed, (3) ease of use, (4) generalization, and (5) adaptability to new ontologies and datasets. We additionally quantify the impact of various preprocessing choices such as abbreviation detection. Systematic evaluation reveals several notable gaps in current methods. In particular, current methods struggle to correctly link genes and proteins and often have difficulty effectively incorporating context into linking decisions. To expedite future development and baseline testing, we release our unified evaluation framework and all included models on GitHub at https://github.com/davidkartchner/biomedical-entity-linking.

12.
Article in English | MEDLINE | ID: mdl-38682049

ABSTRACT

Seizure detection using machine learning is a critical problem for the timely intervention and management of epilepsy. We propose SeizFt, a robust seizure detection framework using EEG from a wearable device. It uses features paired with an ensemble of trees, thus enabling further interpretation of the model's results. The efficacy of the underlying augmentation and class-balancing strategy is also demonstrated. This study was performed for the Seizure Detection Challenge 2023, an ICASSP Grand Challenge.

13.
Inf Process Med Imaging ; 13939: 208-221, 2023 Jun.
Article in English | MEDLINE | ID: mdl-38680427

ABSTRACT

The Event Based Model (EBM) is a probabilistic generative model to explore biomarker changes occurring as a disease progresses. Disease progression is hypothesized to occur through a sequence of biomarker dysregulation "events". The EBM estimates the biomarker dysregulation event sequence. It computes the data likelihood for a given dysregulation sequence, and subsequently evaluates the posterior distribution on the dysregulation sequence. Since the posterior distribution is intractable, Markov Chain Monte-Carlo is employed to generate samples under the posterior distribution. However, the set of possible sequences increases as N! where N is the number of biomarkers (data dimension) and quickly becomes prohibitively large for effective sampling via MCMC. This work proposes the "scaled EBM" (sEBM) to enable event based modeling on large biomarker sets (e.g. high-dimensional data). First, sEBM implicitly selects a subset of biomarkers useful for modeling disease progression and infers the event sequence only for that subset. Second, sEBM clusters biomarkers with similar positions in the event sequence and only orders the "clusters", with each successive cluster corresponding to the next stage in disease progression. These two modifications used to construct the sEBM method provably reduces the possible space of event sequences by multiple orders of magnitude. The novel modifications are supported by theory and experiments on synthetic and real clinical data provides validation for sEBM to work in higher dimensional settings. Results on synthetic data with known ground truth shows that sEBM outperforms previous EBM variants as data dimensions increase. sEBM was successfully implemented with up to 300 biomarkers, which is a 6-fold increase over previous EBM applications. A real-world clinical application of sEBM is performed using 119 neuroimaging markers from publicly available Alzheimer's Disease Neuroimaging Initiative (ADNI) data to stratify subjects into 6 stages of disease progression. Subjects included cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer's Disease (AD). sEBM stage is differentiated for the 3 groups (χ2p-value<4.6e-32). Increased sEBM stage is a strong predictor of conversion risk to AD (p-value<2.3e-14) for MCI subjects, as verified with a Cox proportional-hazards model adjusted for age, sex, education and APOE4 status. Like EBM, sEBM does not rely on apriori defined diagnostic labels and only uses cross-sectional data.

14.
Cancers (Basel) ; 14(19)2022 Sep 26.
Article in English | MEDLINE | ID: mdl-36230609

ABSTRACT

Tyrosine kinase inhibitors (TKIs) are prescribed for chronic myeloid leukemia (CML) and some other cancers. The objective was to predict and rank TKI-related adverse events (AEs), including under-reported or preclinical AEs, using novel text mining. First, k-means clustering of 2575 clinical CML TKI abstracts separated TKIs by significant (p < 0.05) AE type: gastrointestinal (bosutinib); edema (imatinib); pulmonary (dasatinib); diabetes (nilotinib); cardiovascular (ponatinib). Next, we propose a novel cross-domain text mining method utilizing a knowledge graph, link prediction, and hub node network analysis to predict new relationships. Cross-domain text mining of 30+ million articles via SemNet predicted and ranked known and novel TKI AEs. Three physiology-based tiers were formed using unsupervised rank aggregation feature importance. Tier 1 ranked in the top 1%: hematology (anemia, neutropenia, thrombocytopenia, hypocellular marrow); glucose (diabetes, insulin resistance, metabolic syndrome); iron (deficiency, overload, metabolism), cardiovascular (hypertension, heart failure, vascular dilation); thyroid (hypothyroidism, hyperthyroidism, parathyroid). Tier 2 ranked in the top 5%: inflammation (chronic inflammatory disorder, autoimmune, periodontitis); kidney (glomerulonephritis, glomerulopathy, toxic nephropathy). Tier 3 ranked in the top 10%: gastrointestinal (bowel regulation, hepatitis, pancreatitis); neuromuscular (autonomia, neuropathy, muscle pain); others (secondary cancers, vitamin deficiency, edema). Results suggest proactive TKI patient AE surveillance levels: regular surveillance for tier 1, infrequent surveillance for tier 2, and symptom-based surveillance for tier 3.

15.
Big Data Cogn Comput ; 6(1)2022 Mar.
Article in English | MEDLINE | ID: mdl-35936510

ABSTRACT

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or "knowledge graph" of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer's disease and metabolic co-morbidities.

16.
Big Data Cogn Comput ; 6(2)2022 Jun.
Article in English | MEDLINE | ID: mdl-35847767

ABSTRACT

Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi.

17.
Artif Intell ; 3(1): 211-228, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35845102

ABSTRACT

A major bottleneck preventing the extension of deep learning systems to new domains is the prohibitive cost of acquiring sufficient training labels. Alternatives such as weak supervision, active learning, and fine-tuning of pretrained models reduce this burden but require substantial human input to select a highly informative subset of instances or to curate labeling functions. REGAL (Rule-Enhanced Generative Active Learning) is an improved framework for weakly supervised text classification that performs active learning over labeling functions rather than individual instances. REGAL interactively creates high-quality labeling patterns from raw text, enabling a single annotator to accurately label an entire dataset after initialization with three keywords for each class. Experiments demonstrate that REGAL extracts up to 3 times as many high-accuracy labeling functions from text as current state-of-the-art methods for interactive weak supervision, enabling REGAL to dramatically reduce the annotation burden of writing labeling functions for weak supervision. Statistical analysis reveals REGAL performs equal or significantly better than interactive weak supervision for five of six commonly used natural language processing (NLP) baseline datasets.

18.
Front Neurosci ; 16: 1111763, 2022.
Article in English | MEDLINE | ID: mdl-36741054

ABSTRACT

Introduction: Amyotrophic Lateral Sclerosis (ALS) is a paralyzing, multifactorial neurodegenerative disease with limited therapeutics and no known cure. The study goal was to determine which pathophysiological treatment targets appear most beneficial. Methods: A big data approach was used to analyze high copy SOD1 G93A experimental data. The secondary data set comprised 227 published studies and 4,296 data points. Treatments were classified by pathophysiological target: apoptosis, axonal transport, cellular chemistry, energetics, neuron excitability, inflammation, oxidative stress, proteomics, or systemic function. Outcome assessment modalities included onset delay, health status (rotarod performance, body weight, grip strength), and survival duration. Pairwise statistical analysis (two-tailed t-test with Bonferroni correction) of normalized fold change (treatment/control) assessed significant differences in treatment efficacy. Cohen's d quantified pathophysiological treatment category effect size compared to "all" (e.g., all pathophysiological treatment categories combined). Results: Inflammation treatments were best at delaying onset (d = 0.42, p > 0.05). Oxidative stress treatments were significantly better for prolonging survival duration (d = 0.18, p < 0.05). Excitability treatments were significantly better for prolonging overall health status (d = 0.22, p < 0.05). However, the absolute best pathophysiological treatment category for prolonging health status varied with disease progression: oxidative stress was best for pre-onset health (d = 0.18, p > 0.05); excitability was best for prolonging function near onset (d = 0.34, p < 0.05); inflammation was best for prolonging post-onset function (d = 0.24, p > 0.05); and apoptosis was best for prolonging end-stage function (d = 0.49, p > 0.05). Finally, combination treatments simultaneously targeting multiple pathophysiological categories (e.g., polytherapy) performed significantly (p < 0.05) better than monotherapies at end-stage. Discussion: In summary, the most effective pathophysiological treatments change as function of assessment modality and disease progression. Shifting pathophysiological treatment category efficacy with disease progression supports the homeostatic instability theory of ALS disease progression.

19.
Brain Sci ; 11(8)2021 Jul 23.
Article in English | MEDLINE | ID: mdl-34439596

ABSTRACT

Heterogeneity among Alzheimer's disease (AD) patients confounds clinical trial patient selection and therapeutic efficacy evaluation. This work defines separable AD clinical sub-populations using unsupervised machine learning. Clustering (t-SNE followed by k-means) of patient features and association rule mining (ARM) was performed on the ADNIMERGE dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Patient sociodemographics, brain imaging, biomarkers, cognitive tests, and medication usage were included for analysis. Four AD clinical sub-populations were identified using between-cluster mean fold changes [cognitive performance, brain volume]: cluster-1 represented least severe disease [+17.3, +13.3]; cluster-0 [-4.6, +3.8] and cluster-3 [+10.8, -4.9] represented mid-severity sub-populations; cluster-2 represented most severe disease [-18.4, -8.4]. ARM assessed frequently occurring pharmacologic substances within the 4 sub-populations. No drug class was associated with the least severe AD (cluster-1), likely due to lesser antecedent disease. Anti-hyperlipidemia drugs associated with cluster-0 (mid-severity, higher volume). Interestingly, antioxidants vitamin C and E associated with cluster-3 (mid-severity, higher cognition). Anti-depressants like Zoloft associated with most severe disease (cluster-2). Vitamin D is protective for AD, but ARM identified significant underutilization across all AD sub-populations. Identification and feature characterization of four distinct AD sub-population "clusters" using standard clinical features enhances future clinical trial selection criteria and cross-study comparative analysis.

20.
J Alzheimers Dis ; 83(1): 435-450, 2021.
Article in English | MEDLINE | ID: mdl-34334405

ABSTRACT

BACKGROUND: Apolipoprotein E (APOE) genotypes typically increase risk of amyloid-ß deposition and onset of clinical Alzheimer's disease (AD). However, cognitive assessments in APOE transgenic AD mice have resulted in discord. OBJECTIVE: Analysis of 31 peer-reviewed AD APOE mouse publications (n = 3,045 mice) uncovered aggregate trends between age, APOE genotype, gender, modulatory treatments, and cognition. METHODS: T-tests with Bonferroni correction (significance = p < 0.002) compared age-normalized Morris water maze (MWM) escape latencies in wild type (WT), APOE2 knock-in (KI2), APOE3 knock-in (KI3), APOE4 knock-in (KI4), and APOE knock-out (KO) mice. Positive treatments (t+) to favorably modulate APOE to improve cognition, negative treatments (t-) to perturb etiology and diminish cognition, and untreated (t0) mice were compared. Machine learning with random forest modeling predicted MWM escape latency performance based on 12 features: mouse genotype (WT, KI2, KI3, KI4, KO), modulatory treatment (t+, t-, t0), mouse age, and mouse gender (male = g_m; female = g_f, mixed gender = g_mi). RESULTS: KI3 mice performed significantly better in MWM, but KI4 and KO performed significantly worse than WT. KI2 performed similarly to WT. KI4 performed significantly worse compared to every other genotype. Positive treatments significantly improved cognition in WT, KI4, and KO compared to untreated. Interestingly, negative treatments in KI4 also significantly improved mean MWM escape latency. Random forest modeling resulted in the following feature importance for predicting superior MWM performance: [KI3, age, g_m, KI4, t0, t+, KO, WT, g_mi, t-, g_f, KI2] = [0.270, 0.094, 0.092, 0.088, 0.077, 0.074, 0.069, 0.061, 0.058, 0.054, 0.038, 0.023]. CONCLUSION: APOE3, age, and male gender was most important for predicting superior mouse cognitive performance.


Subject(s)
Alzheimer Disease/genetics , Apolipoproteins E/genetics , Cognition , Mice, Knockout, ApoE , Animals , Apolipoprotein E2/genetics , Apolipoprotein E3/genetics , Apolipoprotein E4/genetics , Female , Humans , Male , Mice , Morris Water Maze Test , Sex Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...