RESUMEN
Coronary artery disease (CAD) is the leading cause of death among adults worldwide. Accurate risk stratification can support optimal lifetime prevention. Current methods lack the ability to incorporate new information throughout the life course or to combine innate genetic risk factors with acquired lifetime risk. We designed a general multistate model (MSGene) to estimate age-specific transitions across 10 cardiometabolic states, dependent on clinical covariates and a CAD polygenic risk score. This model is designed to handle longitudinal data over the lifetime to address this unmet need and support clinical decision-making. We analyze longitudinal data from 480,638 UK Biobank participants and compared predicted lifetime risk with the 30-year Framingham risk score. MSGene improves discrimination (C-index 0.71 vs 0.66), age of high-risk detection (C-index 0.73 vs 0.52), and overall prediction (RMSE 1.1% vs 10.9%), in held-out data. We also use MSGene to refine estimates of lifetime absolute risk reduction from statin initiation. Our findings underscore our multistate model's potential public health value for accurate lifetime CAD risk estimation using clinical factors and increasingly available genetics toward earlier more effective prevention.
Asunto(s)
Enfermedad de la Arteria Coronaria , Registros Electrónicos de Salud , Humanos , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/epidemiología , Masculino , Femenino , Persona de Mediana Edad , Registros Electrónicos de Salud/estadística & datos numéricos , Anciano , Medición de Riesgo/métodos , Factores de Riesgo , Adulto , Predisposición Genética a la Enfermedad , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Reino Unido/epidemiología , Estudios Longitudinales , Herencia Multifactorial/genéticaRESUMEN
Tumor mutational burden (TMB), the total number of somatic mutations in the tumor, and copy number burden (CNB), the corresponding measure of aneuploidy, are established fundamental somatic features and emerging biomarkers for immunotherapy. However, the genetic and non-genetic influences on TMB/CNB and, critically, the manner by which they influence patient outcomes remain poorly understood. Here, we present a large germline-somatic study of TMB/CNB with >23,000 individuals across 17 cancer types, of which 12,000 also have extensive clinical, treatment, and overall survival (OS) measurements available. We report dozens of clinical associations with TMB/CNB, observing older age and male sex to have a strong effect on TMB and weaker impact on CNB. We additionally identified significant germline influences on TMB/CNB, including fine-scale European ancestry and germline polygenic risk scores (PRSs) for smoking, tanning, white blood cell counts, and educational attainment. We quantify the causal effect of exposures on somatic mutational processes using Mendelian randomization. Many of the identified features associated with TMB/CNB were additionally associated with OS for individuals treated at a single tertiary cancer center. For individuals receiving immunotherapy, we observed a complex relationship between PRSs for educational attainment, self-reported college attainment, TMB, and survival, suggesting that the influence of this biomarker may be substantially modified by socioeconomic status. While the accumulation of somatic alterations is a stochastic process, our work demonstrates that it can be shaped by host characteristics including germline genetics.
Asunto(s)
Neoplasias , Humanos , Masculino , Mutación/genética , Neoplasias/genética , Neoplasias/patología , Inmunoterapia , Biomarcadores de Tumor/genética , Células Germinativas/patologíaRESUMEN
Currently, coronary artery disease (CAD) is the leading cause of death among adults worldwide. Accurate risk stratification can support optimal lifetime prevention. We designed a novel and general multistate model (MSGene) to estimate age-specific transitions across 10 cardiometabolic states, dependent on clinical covariates and a CAD polygenic risk score. MSGene supports decision making about CAD prevention related to any of these states. We analyzed longitudinal data from 480,638 UK Biobank participants and compared predicted lifetime risk with the 30-year Framingham risk score. MSGene improved discrimination (C-index 0.71 vs 0.66), age of high-risk detection (C-index 0.73 vs 0.52), and overall prediction (RMSE 1.1% vs 10.9%), with external validation. We also used MSGene to refine estimates of lifetime absolute risk reduction from statin initiation. Our findings underscore the potential public health value of our novel multistate model for accurate lifetime CAD risk estimation using clinical factors and increasingly available genetics.
RESUMEN
PURPOSE: To describe an artificial intelligence platform that detects thyroid eye disease (TED). DESIGN: Development of a deep learning model. METHODS: 1944 photographs from a clinical database were used to train a deep learning model. 344 additional images ('test set') were used to calculate performance metrics. Receiver operating characteristic, precision-recall curves and heatmaps were generated. From the test set, 50 images were randomly selected ('survey set') and used to compare model performance with ophthalmologist performance. 222 images obtained from a separate clinical database were used to assess model recall and to quantitate model performance with respect to disease stage and grade. RESULTS: The model achieved test set accuracy of 89.2%, specificity 86.9%, recall 93.4%, precision 79.7% and an F1 score of 86.0%. Heatmaps demonstrated that the model identified pixels corresponding to clinical features of TED. On the survey set, the ensemble model achieved accuracy, specificity, recall, precision and F1 score of 86%, 84%, 89%, 77% and 82%, respectively. 27 ophthalmologists achieved mean performance of 75%, 82%, 63%, 72% and 66%, respectively. On the second test set, the model achieved recall of 91.9%, with higher recall for moderate to severe (98.2%, n=55) and active disease (98.3%, n=60), as compared with mild (86.8%, n=68) or stable disease (85.7%, n=63). CONCLUSIONS: The deep learning classifier is a novel approach to identify TED and is a first step in the development of tools to improve diagnostic accuracy and lower barriers to specialist evaluation.
RESUMEN
We introduce pleiotropic association test (PAT) for joint analysis of multiple traits using genome-wide association study (GWAS) summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect. Additionally, simulations comparing PAT to three multi-trait methods, HIPO, MTAG, and ASSET, show PAT identified 15.3% more omnibus associations over the next best method. When these associations were interpreted on a per trait level using m-values, PAT had 37.5% more true per trait interpretations with a 0.92% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT discovered 22,095 novel variants. Through the m-values interpretation framework, the number of per trait associations for two traits were almost tripled and were nearly doubled for another trait relative to the original single trait GWAS.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Pleiotropía Genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Metaanálisis como AsuntoRESUMEN
The immune checkpoint inhibitor (ICI) pembrolizumab is US FDA approved for treatment of solid tumors with high tumor mutational burden (TMB-high; ≥10 variants/Mb). However, the extent to which TMB-high generalizes as an accurate biomarker in diverse patient populations is largely unknown. Using two clinical cohorts, we investigated the interplay between genetic ancestry, TMB, and tumor-only versus tumor-normal paired sequencing in solid tumors. TMB estimates from tumor-only panels substantially overclassified individuals into the clinically important TMB-high group due to germline contamination, and this bias was particularly pronounced in patients with Asian/African ancestry. Among patients with non-small cell lung cancer treated with ICIs, those misclassified as TMB-high from tumor-only panels did not associate with improved outcomes. TMB-high was significantly associated with improved outcomes only in European ancestries and merits validation in non-European ancestry populations. Ancestry-aware tumor-only TMB calibration and ancestry-diverse biomarker studies are critical to ensure that existing disparities are not exacerbated in precision medicine.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Biomarcadores de Tumor/genética , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/patología , Humanos , Inhibidores de Puntos de Control Inmunológico/farmacología , Inhibidores de Puntos de Control Inmunológico/uso terapéutico , Neoplasias Pulmonares/genética , Mutación , Carga TumoralRESUMEN
BACKGROUND: Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for the recovery of genome-wide germline genotypes through imputation. METHODS: We developed a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmarked our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then applied our approach to a prospectively collected panel sequencing cohort of 25,889 tumors. RESULTS: We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86, genetic ancestry was inferred with a correlation of > 0.98, polygenic risk scores were inferred with a correlation of > 0.90, and individual HLA alleles were inferred with a correlation of > 0.80. We demonstrate a minimal influence on the accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying the relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. CONCLUSIONS: We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data and make our analysis pipeline publicly available to facilitate this effort.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Células Germinativas , Neoplasias/genética , Análisis de Secuencia de ADN , Alelos , Biología Computacional , Variaciones en el Número de Copia de ADN , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Genotipo , Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación , Polimorfismo de Nucleótido SimpleRESUMEN
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).
Asunto(s)
Estudio de Asociación del Genoma Completo , Causalidad , Mapeo Cromosómico/métodos , Humanos , Desequilibrio de Ligamiento , Lipoproteínas HDL/genética , Polimorfismo de Nucleótido SimpleRESUMEN
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.