RESUMO
Modeling biological mechanisms is a key for disease understanding and drug-target identification. However, formulating quantitative models in the field of Alzheimer's Disease is challenged by a lack of detailed knowledge of relevant biochemical processes. Additionally, fitting differential equation systems usually requires time resolved data and the possibility to perform intervention experiments, which is difficult in neurological disorders. This work addresses these challenges by employing the recently published Variational Autoencoder Modular Bayesian Networks (VAMBN) method, which we here trained on combined clinical and patient level gene expression data while incorporating a disease focused knowledge graph. Our approach, called iVAMBN, resulted in a quantitative model that allowed us to simulate a down-expression of the putative drug target CD33, including potential impact on cognitive impairment and brain pathophysiology. Experimental validation demonstrated a high overlap of molecular mechanism predicted to be altered by CD33 perturbation with cell line data. Altogether, our modeling approach may help to select promising drug targets.
Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Teorema de Bayes , Inteligência Artificial , Lectina 3 Semelhante a Ig de Ligação ao Ácido Siálico/química , Lectina 3 Semelhante a Ig de Ligação ao Ácido Siálico/genética , Lectina 3 Semelhante a Ig de Ligação ao Ácido Siálico/metabolismoRESUMO
Parkinson's disease (PD) is characterized by a long prodromal phase with a multitude of markers indicating an increased PD risk prior to clinical diagnosis based on motor symptoms. Current PD prediction models do not consider interdependencies of single predictors, lack differentiation by subtypes of prodromal PD, and may be limited and potentially biased by confounding factors, unspecific assessment methods and restricted access to comprehensive marker data of prospective cohorts. We used prospective data of 18 established risk and prodromal markers of PD in 1178 healthy, PD-free individuals and 24 incident PD cases collected longitudinally in the Tübingen evaluation of Risk factors for Early detection of NeuroDegeneration (TREND) study at 4 visits over up to 10 years. We employed artificial intelligence (AI) to learn and quantify PD marker interdependencies via a Bayesian network (BN) with probabilistic confidence estimation using bootstrapping. The BN was employed to generate a synthetic cohort and individual marker profiles. Robust interdependencies were observed for BN edges from age to subthreshold parkinsonism and urinary dysfunction, sex to substantia nigra hyperechogenicity, depression, non-smoking and to constipation; depression to symptomatic hypotension and excessive daytime somnolence; solvent exposure to cognitive deficits and to physical inactivity; and non-smoking to physical inactivity. Conversion to PD was interdependent with prior subthreshold parkinsonism, sex and substantia nigra hyperechogenicity. Several additional interdependencies with lower probabilistic confidence were identified. Synthetic subjects generated via the BN based representation of the TREND study were realistic as assessed through multiple comparison approaches of real and synthetic data. Altogether our work demonstrates the potential of modern AI approaches (specifically BNs) both for modelling and understanding interdependencies between PD risk and prodromal markers, which are so far not accounted for in PD prediction models, as well as for generating realistic synthetic data.
Assuntos
Doença de Parkinson , Transtornos Parkinsonianos , Humanos , Estudos Prospectivos , Inteligência Artificial , Teorema de Bayes , Sintomas ProdrômicosRESUMO
Individual organizations, such as hospitals, pharmaceutical companies, and health insurance providers, are currently limited in their ability to collect data that are fully representative of a disease population. This can, in turn, negatively impact the generalization ability of statistical models and scientific insights. However, sharing data across different organizations is highly restricted by legal regulations. While federated data access concepts exist, they are technically and organizationally difficult to realize. An alternative approach would be to exchange synthetic patient data instead. In this work, we introduce the Multimodal Neural Ordinary Differential Equations (MultiNODEs), a hybrid, multimodal AI approach, which allows for generating highly realistic synthetic patient trajectories on a continuous time scale, hence enabling smooth interpolation and extrapolation of clinical studies. Our proposed method can integrate both static and longitudinal data, and implicitly handles missing values. We demonstrate the capabilities of MultiNODEs by applying them to real patient-level data from two independent clinical studies and simulated epidemiological data of an infectious disease.
RESUMO
BACKGROUND: Functional decline in Alzheimer's disease (AD) is typically measured using single-time point subjective rating scales, which rely on direct observation or (caregiver) recall. Remote monitoring technologies (RMTs), such as smartphone applications, wearables, and home-based sensors, can change these periodic subjective assessments to more frequent, or even continuous, objective monitoring. The aim of the RADAR-AD study is to assess the accuracy and validity of RMTs in measuring functional decline in a real-world environment across preclinical-to-moderate stages of AD compared to standard clinical rating scales. METHODS: This study includes three tiers. For the main study, we will include participants (n = 220) with preclinical AD, prodromal AD, mild-to-moderate AD, and healthy controls, classified by MMSE and CDR score, from clinical sites equally distributed over 13 European countries. Participants will undergo extensive neuropsychological testing and physical examination. The RMT assessments, performed over an 8-week period, include walk tests, financial management tasks, an augmented reality game, two activity trackers, and two smartphone applications installed on the participants' phone. In the first sub-study, fixed sensors will be installed in the homes of a representative sub-sample of 40 participants. In the second sub-study, 10 participants will stay in a smart home for 1 week. The primary outcome of this study is the difference in functional domain profiles assessed using RMTs between the four study groups. The four participant groups will be compared for each RMT outcome measure separately. Each RMT outcome will be compared to a standard clinical test which measures the same functional or cognitive domain. Finally, multivariate prediction models will be developed. Data collection and privacy are important aspects of the project, which will be managed using the RADAR-base data platform running on specifically designed biomedical research computing infrastructure. RESULTS: First results are expected to be disseminated in 2022. CONCLUSION: Our study is well placed to evaluate the clinical utility of RMT assessments. Leveraging modern-day technology may deliver new and improved methods for accurately monitoring functional decline in all stages of AD. It is greatly anticipated that these methods could lead to objective and real-life functional endpoints with increased sensitivity to pharmacological agent signal detection.
Assuntos
Doença de Alzheimer , Doença de Alzheimer/diagnóstico , Cuidadores , Europa (Continente) , Humanos , Testes Neuropsicológicos , TecnologiaRESUMO
In the area of Big Data, one of the major obstacles for the progress of biomedical research is the existence of data "silos" because legal and ethical constraints often do not allow for sharing sensitive patient data from clinical studies across institutions. While federated machine learning now allows for building models from scattered data of the same format, there is still the need to investigate, mine, and understand data of separate and very differently designed clinical studies that can only be accessed within each of the data-hosting organizations. Simulation of sufficiently realistic virtual patients based on the data within each individual organization could be a way to fill this gap. In this work, we propose a new machine learning approach [Variational Autoencoder Modular Bayesian Network (VAMBN)] to learn a generative model of longitudinal clinical study data. VAMBN considers typical key aspects of such data, namely limited sample size coupled with comparable many variables of different numerical scales and statistical properties, and many missing values. We show that with VAMBN, we can simulate virtual patients in a sufficiently realistic manner while making theoretical guarantees on data privacy. In addition, VAMBN allows for simulating counterfactual scenarios. Hence, VAMBN could facilitate data sharing as well as design of clinical trials.
RESUMO
Translational research of many disease areas requires a longitudinal understanding of disease development and progression across all biologically relevant scales. Several corresponding studies are now available. However, to compile a comprehensive picture of a specific disease, multiple studies need to be analyzed and compared. A large number of clinical studies is nowadays conducted in the context of drug development in pharmaceutical research. However, legal and ethical constraints typically do not allow for sharing sensitive patient data. In consequence there exist data "silos", which slow down the overall scientific progress in translational research. In this paper, we suggest the idea of a virtual cohort (VC) to address this limitation. Our key idea is to describe a longitudinal patient cohort with the help of a generative statistical model, namely a modular Bayesian Network, in which individual modules are represented as sparse autoencoder networks. We show that with the help of such a model we can simulate subjects that are highly similar to real ones. Our approach allows for incorporating arbitrary multi-scale, multi-modal data without making specific distribution assumptions. Moreover, we demonstrate the possibility to simulate interventions (e.g. via a treatment) in the VC. Overall, our proposed approach opens the possibility to build sufficiently realistic VCs for multiple disease areas in the future.
Assuntos
Teorema de Bayes , Aprendizado Profundo , Pesquisa Translacional Biomédica/métodos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Encéfalo/diagnóstico por imagem , Estudos de Coortes , Simulação por Computador , Bases de Dados Factuais/estatística & dados numéricos , Progressão da Doença , Humanos , Estudos Longitudinais , Modelos Estatísticos , Doença de Parkinson/diagnóstico , Polimorfismo de Nucleotídeo Único , Pesquisa Translacional Biomédica/estatística & dados numéricos , Interface Usuário-ComputadorRESUMO
One of the visions of precision medicine has been to re-define disease taxonomies based on molecular characteristics rather than on phenotypic evidence. However, achieving this goal is highly challenging, specifically in neurology. Our contribution is a machine-learning based joint molecular subtyping of Alzheimer's (AD) and Parkinson's Disease (PD), based on the genetic burden of 15 molecular mechanisms comprising 27 proteins (e.g. APOE) that have been described in both diseases. We demonstrate that our joint AD/PD clustering using a combination of sparse autoencoders and sparse non-negative matrix factorization is reproducible and can be associated with significant differences of AD and PD patient subgroups on a clinical, pathophysiological and molecular level. Hence, clusters are disease-associated. To our knowledge this work is the first demonstration of a mechanism based stratification in the field of neurodegenerative diseases. Overall, we thus see this work as an important step towards a molecular mechanism-based taxonomy of neurological disorders, which could help in developing better targeted therapies in the future by going beyond classical phenotype based disease definitions.
Assuntos
Doença de Alzheimer/classificação , Doença de Alzheimer/genética , Doença de Parkinson/classificação , Doença de Parkinson/genética , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/metabolismo , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Encéfalo/diagnóstico por imagem , Análise por Conglomerados , Estudos de Coortes , Desenvolvimento de Medicamentos , Epigenoma , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Neuroimagem , Avaliação de Resultados em Cuidados de Saúde , Doença de Parkinson/metabolismo , Polimorfismo de Nucleotídeo Único , Medicina de Precisão , Transcriptoma , Aprendizado de Máquina não SupervisionadoRESUMO
BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. FINDINGS: The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning-based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. CONCLUSIONS: We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general.
Assuntos
Doença de Alzheimer/fisiopatologia , Bases de Dados Factuais , Aprendizado Profundo , Progressão da Doença , Modelos Neurológicos , Doença de Parkinson/fisiopatologia , Medicina de Precisão , Feminino , Humanos , MasculinoRESUMO
Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis & treatment of disease from the clinical dataset is therefore increasingly becoming necessary. Aim of this study was to assess the applicability of knowledge discovery in brain tumor data warehouse, applying data mining techniques for investigation of clinical parameters that can be associated with occurrence of brain tumor. In this study, a brain tumor warehouse was developed comprising of clinical data for 550 patients. Apriori association rule algorithm was applied to discover associative rules among the clinical parameters. The rules discovered in the study suggests - high values of Creatinine, Blood Urea Nitrogen (BUN), SGOT & SGPT to be directly associated with tumor occurrence for patients in the primary stage with atleast 85% confidence and more than 50% support. A normalized regression model is proposed based on these parameters along with Haemoglobin content, Alkaline Phosphatase and Serum Bilirubin for prediction of occurrence of STATE (brain tumor) as 0 (absent) or 1 (present). The results indicate that the methodology followed will be of good value for the diagnostic procedure of brain tumor, especially when large data volumes are involved and screening based on discovered parameters would allow clinicians to detect tumors at an early stage of development.