RESUMEN
Genome technologies have defined a complex genetic architecture in major infectious, inflammatory, and autoimmune disorders. High density marker arrays and Immunochips have powered genome-wide association studies (GWAS) that have mapped nearly 450 genetic risk loci in 22 major inflammatory diseases, including a core of common genes that play a central role in pathological inflammation. Whole-exome and whole-genome sequencing have identified more than 265 genes in which mutations cause primary immunodeficiencies and rare forms of severe inflammatory bowel disease. Combined analysis of inflammatory disease GWAS and primary immunodeficiencies point to shared proteins and pathways that are required for immune cell development and protection against infections and are also associated with pathological inflammation. Finally, sequencing of chromatin immunoprecipitates containing specific transcription factors, with parallel RNA sequencing, has charted epigenetic regulation of gene expression by proinflammatory transcription factors in immune cells, providing complementary information to characterize morbid genes at infectious and inflammatory disease loci.
Asunto(s)
Enfermedades Autoinmunes/genética , Síndromes de Inmunodeficiencia/genética , Infecciones/genética , Inflamación/genética , Vacunas/inmunología , Animales , Epigénesis Genética , Exoma/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunidad/genética , Infecciones/inmunología , RiesgoRESUMEN
Genome-wide association studies (GWAS) have revealed risk alleles for ulcerative colitis (UC). To understand their cell type specificities and pathways of action, we generate an atlas of 366,650 cells from the colon mucosa of 18 UC patients and 12 healthy individuals, revealing 51 epithelial, stromal, and immune cell subsets, including BEST4+ enterocytes, microfold-like cells, and IL13RA2+IL11+ inflammatory fibroblasts, which we associate with resistance to anti-TNF treatment. Inflammatory fibroblasts, inflammatory monocytes, microfold-like cells, and T cells that co-express CD8 and IL-17 expand with disease, forming intercellular interaction hubs. Many UC risk genes are cell type specific and co-regulated within relatively few gene modules, suggesting convergence onto limited sets of cell types and pathways. Using this observation, we nominate and infer functions for specific risk genes across GWAS loci. Our work provides a framework for interrogating complex human diseases and mapping risk variants to cell types and pathways.
Asunto(s)
Colitis Ulcerosa/patología , Colon/metabolismo , Adulto , Anciano , Anticuerpos Monoclonales/uso terapéutico , Bestrofinas/metabolismo , Antígenos CD8/metabolismo , Estudios de Casos y Controles , Colitis Ulcerosa/tratamiento farmacológico , Colitis Ulcerosa/metabolismo , Colon/patología , Enterocitos/citología , Enterocitos/metabolismo , Femenino , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Humanos , Interleucina-17/metabolismo , Masculino , Persona de Mediana Edad , Factores de Riesgo , Linfocitos T/citología , Linfocitos T/metabolismo , Trombospondinas/metabolismo , Factor de Necrosis Tumoral alfa/inmunología , Factor de Necrosis Tumoral alfa/metabolismo , Adulto JovenRESUMEN
Age-related hearing loss (ARHL) is a prevalent concern in the elderly population. Recent genome-wide and phenome-wide association studies (GWASs and PheWASs) have delved into the identification of causative variants and the understanding of pleiotropy, highlighting the polygenic intricacies of this complex condition. While recent large-scale GWASs have pinpointed significant SNPs and risk variants associated with ARHL, the detailed mechanisms, encompassing both genetic and epigenetic modifications, remain to be fully elucidated. This review presents the latest advances in association studies, integrating findings from both human studies and model organisms. By juxtaposing historical perspectives with contemporary genomics, we aim to catalyze innovative research and foster the development of novel therapeutic strategies for ARHL.
Asunto(s)
Presbiacusia , Humanos , Anciano , Presbiacusia/genética , Presbiacusia/epidemiología , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Mass coral bleaching is one of the clearest threats of climate change to the persistence of marine biodiversity. Despite the negative impacts of bleaching on coral health and survival, some corals may be able to rapidly adapt to warming ocean temperatures. Thus, a significant focus in coral research is identifying the genes and pathways underlying coral heat adaptation. Here, we review state-of-the-art methods that may enable the discovery of heat-adaptive loci in corals and identify four main knowledge gaps. To fill these gaps, we describe an experimental approach combining seascape genomics with CRISPR/Cas9 gene editing to discover and validate heat-adaptive loci. Finally, we discuss how information on adaptive genotypes could be used in coral reef conservation and management strategies.
Asunto(s)
Antozoos , Animales , Antozoos/genética , Arrecifes de Coral , Temperatura , Genotipo , Cambio ClimáticoRESUMEN
Genome-wide association studies (GWASs) have identified numerous genetic loci associated with human traits and diseases. However, pinpointing the causal genes remains a challenge, which impedes the translation of GWAS findings into biological insights and medical applications. In this review, we provide an in-depth overview of the methods and technologies used for prioritizing genes from GWAS loci, including gene-based association tests, integrative analysis of GWAS and molecular quantitative trait loci (xQTL) data, linking GWAS variants to target genes through enhancer-gene connection maps, and network-based prioritization. We also outline strategies for generating context-dependent xQTL data and their applications in gene prioritization. We further highlight the potential of gene prioritization in drug repurposing. Lastly, we discuss future challenges and opportunities in this field.
Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Sitios de Carácter Cuantitativo/genética , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple/genética , Redes Reguladoras de Genes/genéticaRESUMEN
Healthy sleep is vital for humans to achieve optimal health and longevity. Poor sleep and sleep disorders are strongly associated with increased morbidity and mortality. However, the importance of good sleep continues to be underrecognized. Mechanisms regulating sleep and its functions in humans remain mostly unclear even after decades of dedicated research. Advancements in gene sequencing techniques and computational methodologies have paved the way for various genetic analysis approaches, which have provided some insights into human sleep genetics. This review summarizes our current knowledge of the genetic basis underlying human sleep traits and sleep disorders. We also highlight the use of animal models to validate genetic findings from human sleep studies and discuss potential molecular mechanisms and signaling pathways involved in the regulation of human sleep.
Asunto(s)
Trastornos del Sueño-Vigilia , Sueño , Humanos , Trastornos del Sueño-Vigilia/genética , Sueño/genética , Animales , Transducción de Señal/genéticaRESUMEN
The function of some genetic variants associated with brain-relevant traits has been explained through colocalization with expression quantitative trait loci (eQTL) conducted in bulk postmortem adult brain tissue. However, many brain-trait associated loci have unknown cellular or molecular function. These genetic variants may exert context-specific function on different molecular phenotypes including post-transcriptional changes. Here, we identified genetic regulation of RNA editing and alternative polyadenylation (APA) within a cell-type-specific population of human neural progenitors and neurons. More RNA editing and isoforms utilizing longer polyadenylation sequences were observed in neurons, likely due to higher expression of genes encoding the proteins mediating these post-transcriptional events. We also detected hundreds of cell-type-specific editing quantitative trait loci (edQTLs) and alternative polyadenylation QTLs (apaQTLs). We found colocalizations of a neuron edQTL in CCDC88A with educational attainment and a progenitor apaQTL in EP300 with schizophrenia, suggesting that genetically mediated post-transcriptional regulation during brain development leads to differences in brain function.
Asunto(s)
Neurogénesis , Neuronas , Sitios de Carácter Cuantitativo , Humanos , Neurogénesis/genética , Neuronas/metabolismo , Edición de ARN/genética , Poliadenilación/genética , Esquizofrenia/genética , Regulación de la Expresión Génica , Células-Madre Neurales/metabolismo , Células-Madre Neurales/citología , Encéfalo/metabolismo , Procesamiento Postranscripcional del ARN/genéticaRESUMEN
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Fenotipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleótido Simple , Aprendizaje AutomáticoRESUMEN
Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.
Asunto(s)
Frecuencia de los Genes , Genotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos , Humanos , Estudios de Cohortes , Desequilibrio de Ligamiento , Estudio de Asociación del Genoma Completo/métodos , Genoma Humano , Control de Calidad , Aprendizaje Automático , Secuenciación Completa del Genoma/normas , Secuenciación Completa del Genoma/métodosRESUMEN
Single-nucleotide polymorphisms in ETS1 are associated with systemic lupus erythematosus (SLE). Ets1-/- mice develop SLE-like symptoms, suggesting that dysregulation of this transcription factor is important to the onset or progression of SLE. We used conditional deletion approaches to examine the impact of Ets1 expression in different immune cell types. Ets1 deletion on CD4+ T cells, but not B cells or dendritic cells, resulted in the SLE autoimmunity, and this was associated with the spontaneous expansion of T follicular helper type 2 (Tfh2) cells. Ets1-/- Tfh2 cells exhibited increased expression of GATA-3 and interleukin-4 (IL-4), which induced IgE isotype switching in B cells. Neutralization of IL-4 reduced Tfh2 cell frequencies and ameliorated disease parameters. Mechanistically, Ets1 suppressed signature Tfh and Th2 cell genes, including Cxcr5, Bcl6, and Il4ra, thus curbing the terminal Tfh2 cell differentiation process. Tfh2 cell frequencies in SLE patients correlated with disease parameters, providing evidence for the relevance of these findings to human disease.
Asunto(s)
Diferenciación Celular/inmunología , Lupus Eritematoso Sistémico/inmunología , Proteína Proto-Oncogénica c-ets-1/inmunología , Células Th2/inmunología , Animales , Autoinmunidad/genética , Autoinmunidad/inmunología , Linfocitos B/inmunología , Linfocitos B/metabolismo , Linfocitos T CD4-Positivos/inmunología , Linfocitos T CD4-Positivos/metabolismo , Diferenciación Celular/genética , Proliferación Celular/efectos de los fármacos , Proliferación Celular/genética , Expresión Génica/inmunología , Perfilación de la Expresión Génica , Humanos , Lupus Eritematoso Sistémico/genética , Lupus Eritematoso Sistémico/metabolismo , Ratones Endogámicos C57BL , Ratones Noqueados , Ratones Transgénicos , Proteína Proto-Oncogénica c-ets-1/genética , Proteína Proto-Oncogénica c-ets-1/metabolismo , Células Th2/metabolismoRESUMEN
As their statistical power grows, genome-wide association studies (GWAS) have identified an increasing number of loci underlying quantitative traits of interest. These loci are scattered throughout the genome and are individually responsible only for small fractions of the total heritable trait variance. The recently proposed omnigenic model provides a conceptual framework to explain these observations by postulating that numerous distant loci contribute to each complex trait via effect propagation through intracellular regulatory networks. We formalize this conceptual framework by proposing the "quantitative omnigenic model" (QOM), a statistical model that combines prior knowledge of the regulatory network topology with genomic data. By applying our model to gene expression traits in yeast, we demonstrate that QOM achieves similar gene expression prediction performance to traditional GWAS with hundreds of times less parameters, while simultaneously extracting candidate causal and quantitative chains of effect propagation through the regulatory network for every individual gene. We estimate the fraction of heritable trait variance in cis- and in trans-, break the latter down by effect propagation order, assess the trans- variance not attributable to transcriptional regulation, and show that QOM correctly accounts for the low-dimensional structure of gene expression covariance. We furthermore demonstrate the relevance of QOM for systems biology, by employing it as a statistical test for the quality of regulatory network reconstructions, and linking it to the propagation of nontranscriptional (including environmental) effects.
Asunto(s)
Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Saccharomyces cerevisiae/genética , Polimorfismo de Nucleótido SimpleRESUMEN
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Asunto(s)
Biomarcadores , Estudio de Asociación del Genoma Completo , Inflamación , Medicina de Precisión , Secuenciación Completa del Genoma , Humanos , Medicina de Precisión/métodos , Inflamación/genética , Estudio de Asociación del Genoma Completo/métodos , Secuenciación Completa del Genoma/métodos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Femenino , Interleucina-6/genéticaRESUMEN
Distilling insomnia genome-wide association study (GWAS) variants, Palermo and colleagues identified several genes that participate in sleep regulation in two different model organisms. This workflow sets off an innovative strategy to extract biological relevance from large human genomic databases.
Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Fenotipo , Sueño/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Polygenic risk score (PRS) has demonstrated its great utility in biomedical research through identifying high-risk individuals for different diseases from their genotypes. However, the broader application of PRS to the general population is hindered by the limited transferability of PRS developed in Europeans to non-European populations. To improve PRS prediction accuracy in non-European populations, we develop a statistical method called SDPRX that can effectively integrate genome wide association study summary statistics from different populations. SDPRX automatically adjusts for linkage disequilibrium differences between populations and characterizes the joint distribution of the effect sizes of a variant in two populations to be both null, population specific, or shared with correlation. Through simulations and applications to real traits, we show that SDPRX improves the prediction performance over existing methods in non-European populations.
Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Factores de Riesgo , GenotipoRESUMEN
Response to the anti-IL17 monoclonal antibody secukinumab is heterogeneous, and not all participants respond to treatment. Understanding whether this heterogeneity is driven by genetic variation is a key aim of pharmacogenetics and could influence precision medicine approaches in inflammatory diseases. Using changes in disease activity scores across 5,218 genotyped individuals from 19 clinical trials across four indications (psoriatic arthritis, psoriasis, ankylosing spondylitis, and rheumatoid arthritis), we tested whether genetics predicted response to secukinumab. We did not find any evidence of association between treatment response and common variants, imputed HLA alleles, polygenic risk scores of disease susceptibility, or cross-disease components of shared genetic risk. This suggests that anti-IL17 therapy is equally effective regardless of an individual's genetic background, a finding that has important implications for future genetic studies of biological therapy response in inflammatory diseases.
Asunto(s)
Artritis Psoriásica , Artritis Reumatoide , Psoriasis , Humanos , Artritis Psoriásica/tratamiento farmacológico , Artritis Psoriásica/genética , Psoriasis/tratamiento farmacológico , Psoriasis/genética , Artritis Reumatoide/tratamiento farmacológico , Artritis Reumatoide/genética , GenotipoRESUMEN
Despite extensive research on global heritability estimation for complex traits, few methods accurately dissect local heritability. A precise local heritability estimate is crucial for high-resolution mapping in genetics. Here, we report the effective heritability estimator (EHE) that can use p values from genome-wide association studies (GWASs) for local heritability estimation by directly converting marginal heritability estimates of SNPs to a non-redundant heritability estimate of a gene or a small genomic region. EHE provides higher accuracy and precision for local heritability estimation among seven compared methods. Importantly, EHE can be applied to estimate the conditional heritability of nearby genes, where redundant heritability among the genes can also be removed further. The conditional estimation can be guided by tissue-specific expression profiles (or other functional scores) to prioritize and quantify more functionally important genes of complex phenotypes. Applying EHE to 42 complex phenotypes from the UK Biobank, we revealed the existence of two types of distinct genetic architectures for various complex phenotypes and found that highly pleiotropic genes are not enriched for more heritability compared to other candidate susceptibility genes. EHE provides an accurate and robust way to dissect the genetic architecture of complex phenotypes.
Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Algoritmos , Proyectos de InvestigaciónRESUMEN
The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.
Asunto(s)
Exoma , Estudio de Asociación del Genoma Completo , Estudio de Asociación del Genoma Completo/métodos , Exoma/genética , Bancos de Muestras Biológicas , Fenotipo , Análisis de Datos , Reino UnidoRESUMEN
Common genetic variants and susceptibility loci associated with Alzheimer's disease (AD) have been discovered through large-scale genome-wide association studies (GWAS), GWAS by proxy (GWAX) and meta-analysis of GWAS and GWAX (GWAS+GWAX). However, due to the very low repeatability of AD susceptibility loci and the low heritability of AD, these AD genetic findings have been questioned. We summarize AD genetic findings from the past 10 years and provide a new interpretation of these findings in the context of statistical heterogeneity. We discovered that only 17% of AD risk loci demonstrated reproducibility with a genome-wide significance of P < 5.00E-08 across all AD GWAS and GWAS+GWAX datasets. We highlighted that the AD GWAS+GWAX with the largest sample size failed to identify the most significant signals, the maximum number of genome-wide significant genetic variants or maximum heritability. Additionally, we identified widespread statistical heterogeneity in AD GWAS+GWAX datasets, but not in AD GWAS datasets. We consider that statistical heterogeneity may have attenuated the statistical power in AD GWAS+GWAX and may contribute to explaining the low repeatability (17%) of genome-wide significant AD susceptibility loci and the decreased AD heritability (40-2%) as the sample size increased. Importantly, evidence supports the idea that a decrease in statistical heterogeneity facilitates the identification of genome-wide significant genetic loci and contributes to an increase in AD heritability. Collectively, current AD GWAX and GWAS+GWAX findings should be meticulously assessed and warrant additional investigation, and AD GWAS+GWAX should employ multiple meta-analysis methods, such as random-effects inverse variance-weighted meta-analysis, which is designed specifically for statistical heterogeneity.
Asunto(s)
Enfermedad de Alzheimer , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Enfermedad de Alzheimer/genética , Humanos , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Heterogeneidad GenéticaRESUMEN
Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.