Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
1.
Stud Health Technol Inform ; 317: 21-29, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39234703

RESUMEN

Individual health data is crucial for scientific advancements, particularly in developing Artificial Intelligence (AI); however, sharing real patient information is often restricted due to privacy concerns. A promising solution to this challenge is synthetic data generation. This technique creates entirely new datasets that mimic the statistical properties of real data, while preserving confidential patient information. In this paper, we present the workflow and different services developed in the context of Germany's National Data Infrastructure project NFDI4Health. First, two state-of-the-art AI tools (namely, VAMBN and MultiNODEs) for generating synthetic health data are outlined. Further, we introduce SYNDAT (a public web-based tool) which allows users to visualize and assess the quality and risk of synthetic data provided by desired generative models. Additionally, the utility of the proposed methods and the web-based tool is showcased using data from Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Center for Cancer Registry Data of the Robert Koch Institute (RKI).


Asunto(s)
Flujo de Trabajo , Humanos , Alemania , Gestión de Riesgos , Inteligencia Artificial , Enfermedad de Alzheimer
2.
Stud Health Technol Inform ; 317: 270-279, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39234731

RESUMEN

INTRODUCTION: A modern approach to ensuring privacy when sharing datasets is the use of synthetic data generation methods, which often claim to outperform classic anonymization techniques in the trade-off between data utility and privacy. Recently, it was demonstrated that various deep learning-based approaches are able to generate useful synthesized datasets, often based on domain-specific analyses. However, evaluating the privacy implications of releasing synthetic data remains a challenging problem, especially when the goal is to conform with data protection guidelines. METHODS: Therefore, the recent privacy risk quantification framework Anonymeter has been built for evaluating multiple possible vulnerabilities, which are specifically based on privacy risks that are considered by the European Data Protection Board, i.e. singling out, linkability, and attribute inference. This framework was applied to a synthetic data generation study from the epidemiological domain, where the synthesization replicates time and age trends previously found in data collected during the DONALD cohort study (1312 participants, 16 time points). The conducted privacy analyses are presented, which place a focus on the vulnerability of outliers. RESULTS: The resulting privacy scores are discussed, which vary greatly between the different types of attacks. CONCLUSION: Challenges encountered during their implementation and during the interpretation of their results are highlighted, and it is concluded that privacy risk assessment for synthetic data remains an open problem.


Asunto(s)
Seguridad Computacional , Medición de Riesgo , Humanos , Estudios Longitudinales , Confidencialidad , Privacidad
3.
NPJ Digit Med ; 7(1): 235, 2024 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-39242660

RESUMEN

Parkinson's disease (PD) presents diverse symptoms and comorbidities, complicating its diagnosis and management. The primary objective of this cross-sectional, monocentric study was to assess digital gait sensor data's utility for monitoring and diagnosis of motor and gait impairment in PD. As a secondary objective, for the more challenging tasks of detecting comorbidities, non-motor outcomes, and disease progression subgroups, we evaluated for the first time the integration of digital markers with metabolomics and clinical data. Using shoe-attached digital sensors, we collected gait measurements from 162 patients and 129 controls in a single visit. Machine learning models showed significant diagnostic power, with AUC scores of 83-92% for PD vs. control and up to 75% for motor severity classification. Integrating gait data with metabolomics and clinical data improved predictions for challenging-to-detect comorbidities such as hallucinations. Overall, this approach using digital biomarkers and multimodal data integration can assist in objective disease monitoring, diagnosis, and comorbidity detection.

4.
Clin Pharmacokinet ; 63(9): 1221-1237, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39153056

RESUMEN

INTRODUCTION: In the last decade, various Machine Learning techniques have been proposed aiming to individualise the dose of anticancer drugs mostly based on a presumed drug effect or measured effect biomarkers. The aim of this scoping review was to comprehensively summarise the research status on the use of Machine Learning for precision dosing in anticancer drug therapy. METHODS: This scoping review was conducted in accordance with the interim guidance by Cochrane and the Joanna Briggs Institute. We systematically searched the databases Medline (via PubMed), Embase and the Cochrane Library for research articles and reviews including results published after 2016. Results were reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist. RESULTS: A total of 17 relevant studies was identified. In 12 of the included studies, Reinforcement Learning methods were used, including Classical, Deep, Double Deep and Conservative Q-Learning and Fuzzy Reinforcement Learning. Furthermore, classical Machine Learning methods were compared in terms of their performance and an artificial intelligence platform based on parabolic equations was used to guide dosing prospectively and retrospectively, albeit only in a limited number of patients. Due to the significantly different algorithm structures, a meaningful comparison between the various Machine Learning approaches was not possible. CONCLUSION: Overall, this review emphasises the clinical relevance of Machine Learning methods for anticancer drug dose optimisation, as many algorithms have shown promising results enabling model-free predictions with the potential to maximise efficacy and minimise toxicity when compared to standard protocols.


Asunto(s)
Antineoplásicos , Aprendizaje Automático , Neoplasias , Medicina de Precisión , Humanos , Antineoplásicos/administración & dosificación , Antineoplásicos/farmacocinética , Medicina de Precisión/métodos , Neoplasias/tratamiento farmacológico , Relación Dosis-Respuesta a Droga
5.
Database (Oxford) ; 20242024 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-39104284

RESUMEN

MicroRNAs (miRNAs) play important roles in post-transcriptional processes and regulate major cellular functions. The abnormal regulation of expression of miRNAs has been linked to numerous human diseases such as respiratory diseases, cancer, and neurodegenerative diseases. Latest miRNA-disease associations are predominantly found in unstructured biomedical literature. Retrieving these associations manually can be cumbersome and time-consuming due to the continuously expanding number of publications. We propose a deep learning-based text mining approach that extracts normalized miRNA-disease associations from biomedical literature. To train the deep learning models, we build a new training corpus that is extended by distant supervision utilizing multiple external databases. A quantitative evaluation shows that the workflow achieves an area under receiver operator characteristic curve of 98% on a holdout test set for the detection of miRNA-disease associations. We demonstrate the applicability of the approach by extracting new miRNA-disease associations from biomedical literature (PubMed and PubMed Central). We have shown through quantitative analysis and evaluation on three different neurodegenerative diseases that our approach can effectively extract miRNA-disease associations not yet available in public databases. Database URL: https://zenodo.org/records/10523046.


Asunto(s)
Minería de Datos , MicroARNs , MicroARNs/genética , Humanos , Minería de Datos/métodos , Redes Neurales de la Computación , Enfermedades Neurodegenerativas/genética , Aprendizaje Profundo , Bases de Datos Genéticas
6.
PLOS Glob Public Health ; 4(8): e0003058, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39172923

RESUMEN

During the COVID-19 pandemic, many hospitals reached their capacity limits and could no longer guarantee treatment of all patients. At the same time, governments endeavored to take sensible measures to stop the spread of the virus while at the same time trying to keep the economy afloat. Many models extrapolating confirmed cases and hospitalization rate over short periods of time have been proposed, including several ones coming from the field of machine learning. However, the highly dynamic nature of the pandemic with rapidly introduced interventions and new circulating variants imposed non-trivial challenges for the generalizability of such models. In the context of this paper, we propose the use of ensemble models, which are allowed to change in their composition or weighting of base models over time and could thus better adapt to highly dynamic pandemic or epidemic situations. In that regard, we also explored the use of secondary metadata-Google searches-to inform the ensemble model. We tested our approach using surveillance data from COVID-19, Influenza, and hospital syndromic surveillance of severe acute respiratory infections (SARI). In general, we found ensembles to be more robust than the individual models. Altogether we see our work as a contribution to enhance the preparedness for future pandemic situations.

7.
BMC Med Inform Decis Mak ; 24(1): 214, 2024 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-39075407

RESUMEN

Deep neural networks (DNN) have fundamentally revolutionized the artificial intelligence (AI) field. The transformer model is a type of DNN that was originally used for the natural language processing tasks and has since gained more and more attention for processing various kinds of sequential data, including biological sequences and structured electronic health records. Along with this development, transformer-based models such as BioBERT, MedBERT, and MassGenie have been trained and deployed by researchers to answer various scientific questions originating in the biomedical domain. In this paper, we review the development and application of transformer models for analyzing various biomedical-related datasets such as biomedical textual data, protein sequences, medical structured-longitudinal data, and biomedical images as well as graphs. Also, we look at explainable AI strategies that help to comprehend the predictions of transformer-based models. Finally, we discuss the limitations and challenges of current models, and point out emerging novel research directions.


Asunto(s)
Redes Neurales de la Computación , Humanos , Inteligencia Artificial , Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Investigación Biomédica
8.
Sci Rep ; 14(1): 14412, 2024 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-38909025

RESUMEN

Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e., data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case.


Asunto(s)
Análisis de Datos , Humanos , Estudios Longitudinales , Inteligencia Artificial
9.
EPMA J ; 15(2): 275-287, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38841617

RESUMEN

Background: Huntington's disease (HD) is a progressive neurodegenerative disease caused by a CAG trinucleotide expansion in the huntingtin gene. The length of the CAG repeat is inversely correlated with disease onset. HD is characterized by hyperkinetic movement disorder, psychiatric symptoms, and cognitive deficits, which greatly impact patient's quality of life. Despite this clear genetic course, high variability of HD patients' symptoms can be observed. Current clinical diagnosis of HD solely relies on the presence of motor signs, disregarding the other important aspects of the disease. By incorporating a broader approach that encompasses motor as well as non-motor aspects of HD, predictive, preventive, and personalized (3P) medicine can enhance diagnostic accuracy and improve patient care. Methods: Multisymptom disease trajectories of HD patients collected from the Enroll-HD study were first aligned on a common disease timescale to account for heterogeneity in disease symptom onset and diagnosis. Following this, the aligned disease trajectories were clustered using the previously published Variational Deep Embedding with Recurrence (VaDER) algorithm and resulting progression subtypes were clinically characterized. Lastly, an AI/ML model was learned to predict the progression subtype from only first visit data or with data from additional follow-up visits. Results: Results demonstrate two distinct subtypes, one large cluster (n = 7122) showing a relative stable disease progression and a second, smaller cluster (n = 411) showing a dramatically more progressive disease trajectory. Clinical characterization of the two subtypes correlates with CAG repeat length, as well as several neurobehavioral, psychiatric, and cognitive scores. In fact, cognitive impairment was found to be the major difference between the two subtypes. Additionally, a prognostic model shows the ability to predict HD subtypes from patients' first visit only. Conclusion: In summary, this study aims towards the paradigm shift from reactive to preventive and personalized medicine by showing that non-motor symptoms are of vital importance for predicting and categorizing each patients' disease progression pattern, as cognitive decline is oftentimes more reflective of HD progression than its motor aspects. Considering these aspects while counseling and therapy definition will personalize each individuals' treatment. The ability to provide patients with an objective assessment of their disease progression and thus a perspective for their life with HD is the key to improving their quality of life. By conducting additional analysis on biological data from both subtypes, it is possible to gain a deeper understanding of these subtypes and uncover the underlying biological factors of the disease. This greatly aligns with the goal of shifting towards 3P medicine. Supplementary Information: The online version contains supplementary material available at 10.1007/s13167-024-00368-2.

10.
Front Immunol ; 15: 1343900, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38720902

RESUMEN

Alzheimer's disease has an increasing prevalence in the population world-wide, yet current diagnostic methods based on recommended biomarkers are only available in specialized clinics. Due to these circumstances, Alzheimer's disease is usually diagnosed late, which contrasts with the currently available treatment options that are only effective for patients at an early stage. Blood-based biomarkers could fill in the gap of easily accessible and low-cost methods for early diagnosis of the disease. In particular, immune-based blood-biomarkers might be a promising option, given the recently discovered cross-talk of immune cells of the central nervous system with those in the peripheral immune system. Here, we give a background on recent advances in research on brain-immune system cross-talk in Alzheimer's disease and review machine learning approaches, which can combine multiple biomarkers with further information (e.g. age, sex, APOE genotype) into predictive models supporting an earlier diagnosis. In addition, mechanistic modeling approaches, such as agent-based modeling open the possibility to model and analyze cell dynamics over time. This review aims to provide an overview of the current state of immune-system related blood-based biomarkers and their potential for the early diagnosis of Alzheimer's disease.


Asunto(s)
Enfermedad de Alzheimer , Biomarcadores , Diagnóstico Precoz , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/inmunología , Enfermedad de Alzheimer/sangre , Humanos , Biomarcadores/sangre , Aprendizaje Automático , Animales
11.
NPJ Parkinsons Dis ; 10(1): 95, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38698004

RESUMEN

The progression of Parkinson's disease (PD) is heterogeneous across patients, affecting counseling and inflating the number of patients needed to test potential neuroprotective treatments. Moreover, disease subtypes might require different therapies. This work uses a data-driven approach to investigate how observed heterogeneity in PD can be explained by the existence of distinct PD progression subtypes. To derive stable PD progression subtypes in an unbiased manner, we analyzed multimodal longitudinal data from three large PD cohorts and performed extensive cross-cohort validation. A latent time joint mixed-effects model (LTJMM) was used to align patients on a common disease timescale. Progression subtypes were identified by variational deep embedding with recurrence (VaDER). In each cohort, we identified a fast-progressing and a slow-progressing subtype, reflected by different patterns of motor and non-motor symptoms progression, survival rates, treatment response, features extracted from DaTSCAN imaging and digital gait assessments, education, and Alzheimer's disease pathology. Progression subtypes could be predicted with ROC-AUC up to 0.79 for individual patients when a one-year observation period was used for model training. Simulations demonstrated that enriching clinical trials with fast-progressing patients based on these predictions can reduce the required cohort size by 43%. Our results show that heterogeneity in PD can be explained by two distinct subtypes of PD progression that are stable across cohorts. These subtypes align with the brain-first vs. body-first concept, which potentially provides a biological explanation for subtype differences. Our predictive models will enable clinical trials with significantly lower sample sizes by enriching fast-progressing patients.

12.
EPMA J ; 15(1): 1-23, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38463624

RESUMEN

Worldwide stroke is the second leading cause of death and the third leading cause of death and disability combined. The estimated global economic burden by stroke is over US$891 billion per year. Within three decades (1990-2019), the incidence increased by 70%, deaths by 43%, prevalence by 102%, and DALYs by 143%. Of over 100 million people affected by stroke, about 76% are ischemic stroke (IS) patients recorded worldwide. Contextually, ischemic stroke moves into particular focus of multi-professional groups including researchers, healthcare industry, economists, and policy-makers. Risk factors of ischemic stroke demonstrate sufficient space for cost-effective prevention interventions in primary (suboptimal health) and secondary (clinically manifested collateral disorders contributing to stroke risks) care. These risks are interrelated. For example, sedentary lifestyle and toxic environment both cause mitochondrial stress, systemic low-grade inflammation and accelerated ageing; inflammageing is a low-grade inflammation associated with accelerated ageing and poor stroke outcomes. Stress overload, decreased mitochondrial bioenergetics and hypomagnesaemia are associated with systemic vasospasm and ischemic lesions in heart and brain of all age groups including teenagers. Imbalanced dietary patterns poor in folate but rich in red and processed meat, refined grains, and sugary beverages are associated with hyperhomocysteinaemia, systemic inflammation, small vessel disease, and increased IS risks. Ongoing 3PM research towards vulnerable groups in the population promoted by the European Association for Predictive, Preventive and Personalised Medicine (EPMA) demonstrates promising results for the holistic patient-friendly non-invasive approach utilising tear fluid-based health risk assessment, mitochondria as a vital biosensor and AI-based multi-professional data interpretation as reported here by the EPMA expert group. Collected data demonstrate that IS-relevant risks and corresponding molecular pathways are interrelated. For examples, there is an evident overlap between molecular patterns involved in IS and diabetic retinopathy as an early indicator of IS risk in diabetic patients. Just to exemplify some of them such as the 5-aminolevulinic acid/pathway, which are also characteristic for an altered mitophagy patterns, insomnia, stress regulation and modulation of microbiota-gut-brain crosstalk. Further, ceramides are considered mediators of oxidative stress and inflammation in cardiometabolic disease, negatively affecting mitochondrial respiratory chain function and fission/fusion activity, altered sleep-wake behaviour, vascular stiffness and remodelling. Xanthine/pathway regulation is involved in mitochondrial homeostasis and stress-driven anxiety-like behaviour as well as molecular mechanisms of arterial stiffness. In order to assess individual health risks, an application of machine learning (AI tool) is essential for an accurate data interpretation performed by the multiparametric analysis. Aspects presented in the paper include the needs of young populations and elderly, personalised risk assessment in primary and secondary care, cost-efficacy, application of innovative technologies and screening programmes, advanced education measures for professionals and general population-all are essential pillars for the paradigm change from reactive medical services to 3PM in the overall IS management promoted by the EPMA.

14.
Infect Dis Model ; 9(2): 501-518, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38445252

RESUMEN

In July 2023, the Center of Excellence in Respiratory Pathogens organized a two-day workshop on infectious diseases modelling and the lessons learnt from the Covid-19 pandemic. This report summarizes the rich discussions that occurred during the workshop. The workshop participants discussed multisource data integration and highlighted the benefits of combining traditional surveillance with more novel data sources like mobility data, social media, and wastewater monitoring. Significant advancements were noted in the development of predictive models, with examples from various countries showcasing the use of machine learning and artificial intelligence in detecting and monitoring disease trends. The role of open collaboration between various stakeholders in modelling was stressed, advocating for the continuation of such partnerships beyond the pandemic. A major gap identified was the absence of a common international framework for data sharing, which is crucial for global pandemic preparedness. Overall, the workshop underscored the need for robust, adaptable modelling frameworks and the integration of different data sources and collaboration across sectors, as key elements in enhancing future pandemic response and preparedness.

15.
CPT Pharmacometrics Syst Pharmacol ; 13(1): 41-53, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37843389

RESUMEN

Recently, the use of machine-learning (ML) models for pharmacokinetic (PK) modeling has grown significantly. Although most of the current approaches use ML techniques as black boxes, there are only a few that have proposed interpretable architectures which integrate mechanistic knowledge. In this work, we use as the test case a one-compartment PK model using a scientific machine learning (SciML) framework and consider learning an unknown absorption using neural networks, while simultaneously estimating other parameters of drug distribution and elimination. We generate simulated data with different sampling strategies to show that our model can accurately predict concentrations in extrapolation tasks, including new dosing regimens with different sparsity levels, and produce reliable forecasts even for new patients. By using a scenario of fitting PK data with complex absorption, we demonstrate that including known physiological structure into an SciML model allows us to obtain highly accurate predictions while preserving the interpretability of classical compartmental models.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Humanos
16.
Med Klin Intensivmed Notfmed ; 119(2): 123-128, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37380812

RESUMEN

BACKGROUND: There is an ongoing debate as to whether death with sepsis is primarily caused by sepsis or, more often, by the underlying disease. There are no data on the influence of a researcher's background on such an assessment. Therefore, the aim of this analysis was to assess the cause of death in sepsis and the influence of an investigator's professional background on such an assessment. MATERIALS AND METHODS: We performed a retrospective observational cohort study of sepsis patients treated in the medical intensive care unit (ICU) of a tertiary care center. For deceased patients, comorbidities and severity of illness were documented. The cause of death (sepsis or comorbidities or both combined) was independently assessed by four assessors with different professional backgrounds (medical student, senior physician in the medical ICU, anesthesiological intensivist, and senior physician specialized in the predominant comorbidity). RESULTS: In all, 78 of 235 patients died in hospital. Agreement between assessors about cause of death was low (κ 0.37, 95% confidence interval 0.29-0.44). Depending on the assessor, sepsis was the sole cause of death in 6-12% of cases, sepsis and comorbidities in 54-76%, and comorbidities alone in 18-40%. CONCLUSIONS: In a relevant proportion of patients with sepsis treated in the medical ICU, comorbidities contribute significantly to mortality, and death from sepsis without relevant comorbidities is a rare event. Designation of the cause of death in sepsis patients is highly subjective and may be influenced by the professional background of the assessor.


Asunto(s)
Sepsis , Choque Séptico , Humanos , Proyectos Piloto , Estudios Retrospectivos , Causas de Muerte , Sepsis/terapia , Unidades de Cuidados Intensivos , Comorbilidad , Mortalidad Hospitalaria , Choque Séptico/terapia
17.
Sci Rep ; 13(1): 20780, 2023 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-38012282

RESUMEN

The COVID-19 pandemic has pointed out the need for new technical approaches to increase the preparedness of healthcare systems. One important measure is to develop innovative early warning systems. Along those lines, we first compiled a corpus of relevant COVID-19 related symptoms with the help of a disease ontology, text mining and statistical analysis. Subsequently, we applied statistical and machine learning (ML) techniques to time series data of symptom related Google searches and tweets spanning the time period from March 2020 to June 2022. In conclusion, we found that a long-short-term memory (LSTM) jointly trained on COVID-19 symptoms related Google Trends and Twitter data was able to accurately forecast up-trends in classical surveillance data (confirmed cases and hospitalization rates) 14 days ahead. In both cases, F1 scores were above 98% and 97%, respectively, hence demonstrating the potential of using digital traces for building an early alert system for pandemics in Germany.


Asunto(s)
COVID-19 , Medios de Comunicación Sociales , Humanos , Pandemias , COVID-19/epidemiología , Aprendizaje Automático , Minería de Datos/métodos , Registros
18.
Heliyon ; 9(9): e19441, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37681175

RESUMEN

Adverse drug events constitute a major challenge for the success of clinical trials. Several computational strategies have been suggested to estimate the risk of adverse drug events in preclinical drug development. While these approaches have demonstrated high utility in practice, they are at the same time limited to specific information sources. Thus, many current computational approaches neglect a wealth of information which results from the integration of different data sources, such as biological protein function, gene expression, chemical compound structure, cell-based imaging and others. In this work we propose an integrative and explainable multi-modal Graph Machine Learning approach (MultiGML), which fuses knowledge graphs with multiple further data modalities to predict drug related adverse events and general drug target-phenotype associations. MultiGML demonstrates excellent prediction performance compared to alternative algorithms, including various traditional knowledge graph embedding techniques. MultiGML distinguishes itself from alternative techniques by providing in-depth explanations of model predictions, which point towards biological mechanisms associated with predictions of an adverse drug event. Hence, MultiGML could be a versatile tool to support decision making in preclinical drug development.

19.
BMC Genom Data ; 24(1): 50, 2023 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-37667186

RESUMEN

BACKGROUND: A relevant part of the genetic architecture of complex traits is still unknown; despite the discovery of many disease-associated common variants. Polygenic risk score (PRS) models are based on the evaluation of the additive effects attributable to common variants and have been successfully implemented to assess the genetic susceptibility for many phenotypes. In contrast, burden tests are often used to identify an enrichment of rare deleterious variants in specific genes. Both kinds of genetic contributions are typically analyzed independently. Many studies suggest that complex phenotypes are influenced by both low effect common variants and high effect rare deleterious variants. The aim of this paper is to integrate the effect of both common and rare functional variants for a more comprehensive genetic risk modeling. METHODS: We developed a framework combining gene-based scores based on the enrichment of rare functionally relevant variants with genome-wide PRS based on common variants for association analysis and prediction models. We applied our framework on UK Biobank dataset with genotyping and exome data and considered 28 blood biomarkers levels as target phenotypes. For each biomarker, an association analysis was performed on full cohort using gene-based scores (GBS). The cohort was then split into 3 subsets for PRS construction and feature selection, predictive model training, and independent evaluation, respectively. Prediction models were generated including either PRS, GBS or both (combined). RESULTS: Association analyses of the cohort were able to detect significant genes that were previously known to be associated with different biomarkers. Interestingly, the analyses also revealed heterogeneous effect sizes and directionality highlighting the complexity of the blood biomarkers regulation. However, the combined models for many biomarkers show little or no improvement in prediction accuracy compared to the PRS models. CONCLUSION: This study shows that rare variants play an important role in the genetic architecture of complex multifactorial traits such as blood biomarkers. However, while rare deleterious variants play a strong role at an individual level, our results indicate that classical common variant based PRS might be more informative to predict the genetic susceptibility at the population level.


Asunto(s)
Exoma , Predisposición Genética a la Enfermedad , Humanos , Predisposición Genética a la Enfermedad/genética , Biomarcadores , Fenotipo , Herencia Multifactorial/genética
20.
IEEE J Biomed Health Inform ; 27(9): 4548-4558, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37347632

RESUMEN

In situations like the COVID-19 pandemic, healthcare systems are under enormous pressure as they can rapidly collapse under the burden of the crisis. Machine learning (ML) based risk models could lift the burden by identifying patients with a high risk of severe disease progression. Electronic Health Records (EHRs) provide crucial sources of information to develop these models because they rely on routinely collected healthcare data. However, EHR data is challenging for training ML models because it contains irregularly timestamped diagnosis, prescription, and procedure codes. For such data, transformer-based models are promising. We extended the previously published Med-BERT model by including age, sex, medications, quantitative clinical measures, and state information. After pre-training on approximately 988 million EHRs from 3.5 million patients, we developed models to predict Acute Respiratory Manifestations (ARM) risk using the medical history of 80,211 COVID-19 patients. Compared to Random Forests, XGBoost, and RETAIN, our transformer-based models more accurately forecast the risk of developing ARM after COVID-19 infection. We used Integrated Gradients and Bayesian networks to understand the link between the essential features of our model. Finally, we evaluated adapting our model to Austrian in-patient data. Our study highlights the promise of predictive transformer-based models for precision medicine.


Asunto(s)
COVID-19 , Humanos , Pandemias , Teorema de Bayes , Aprendizaje Automático , Progresión de la Enfermedad , Registros Electrónicos de Salud
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...