RESUMEN
BACKGROUND AND AIMS: Despite the substantial impact of environmental factors, individuals with a family history of liver cancer have an increased risk for HCC. However, genetic factors have not been studied systematically by genome-wide approaches in large numbers of individuals from European descent populations (EDP). APPROACH AND RESULTS: We conducted a 2-stage genome-wide association study (GWAS) on HCC not affected by HBV infections. A total of 1872 HCC cases and 2907 controls were included in the discovery stage, and 1200 HCC cases and 1832 controls in the validation. We analyzed the discovery and validation samples separately and then conducted a meta-analysis. All analyses were conducted in the presence and absence of HCV. The liability-scale heritability was 24.4% for overall HCC. Five regions with significant ORs (95% CI) were identified for nonviral HCC: 3p22.1, MOBP , rs9842969, (0.51, [0.40-0.65]); 5p15.33, TERT , rs2242652, (0.70, (0.62-0.79]); 19q13.11, TM6SF2 , rs58542926, (1.49, [1.29-1.72]); 19p13.11 MAU2 , rs58489806, (1.53, (1.33-1.75]); and 22q13.31, PNPLA3 , rs738409, (1.66, [1.51-1.83]). One region was identified for HCV-induced HCC: 6p21.31, human leukocyte antigen DQ beta 1, rs9275224, (0.79, [0.74-0.84]). A combination of homozygous variants of PNPLA3 and TERT showing a 6.5-fold higher risk for nonviral-related HCC compared to individuals lacking these genotypes. This observation suggests that gene-gene interactions may identify individuals at elevated risk for developing HCC. CONCLUSIONS: Our GWAS highlights novel genetic susceptibility of nonviral HCC among European descent populations from North America with substantial heritability. Selected genetic influences were observed for HCV-positive HCC. Our findings indicate the importance of genetic susceptibility to HCC development.
Asunto(s)
Carcinoma Hepatocelular , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neoplasias Hepáticas , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Carcinoma Hepatocelular/genética , Estudios de Casos y Controles , Sitios Genéticos , Neoplasias Hepáticas/genética , América del Norte/epidemiología , Polimorfismo de Nucleótido Simple , Población Blanca/genética , Pueblos de América del NorteRESUMEN
Fourteen years after the first genome-wide association study (GWAS) of lung cancer was published, approximately 45 genomic loci have now been significantly associated with lung cancer risk. While functional characterization was performed for several of these loci, a comprehensive summary of the current molecular understanding of lung cancer risk has been lacking. Further, many novel computational and experimental tools now became available to accelerate the functional assessment of disease-associated variants, moving beyond locus-by-locus approaches. In this review, we first highlight the heterogeneity of lung cancer GWAS findings across histological subtypes, ancestries and smoking status, which poses unique challenges to follow-up studies. We then summarize the published lung cancer post-GWAS studies for each risk-associated locus to assess the current understanding of biological mechanisms beyond the initial statistical association. We further summarize strategies for GWAS functional follow-up studies considering cutting-edge functional genomics tools and providing a catalog of available resources relevant to lung cancer. Overall, we aim to highlight the importance of integrating computational and experimental approaches to draw biological insights from the lung cancer GWAS results beyond association.
Asunto(s)
Estudio de Asociación del Genoma Completo , Neoplasias Pulmonares , Humanos , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Genómica/métodos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Pulmón/patología , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Differences by sex in lung cancer incidence and mortality have been reported which cannot be fully explained by sex differences in smoking behavior, implying existence of genetic and molecular basis for sex disparity in lung cancer development. However, the information about sex dimorphism in lung cancer risk is quite limited despite the great success in lung cancer association studies. By adopting a stringent two-stage analysis strategy, we performed a genome-wide gene-sex interaction analysis using genotypes from a lung cancer cohort including ~ 47 000 individuals with European ancestry. Three low-frequency variants (minor allele frequency < 0.05), rs17662871 [odds ratio (OR) = 0.71, P = 4.29×10-8); rs79942605 (OR = 2.17, P = 2.81×10-8) and rs208908 (OR = 0.70, P = 4.54×10-8) were identified with different risk effect of lung cancer between men and women. Further expression quantitative trait loci and functional annotation analysis suggested rs208908 affects lung cancer risk through differential regulation of Coxsackie virus and adenovirus receptor gene expression in lung tissues between men and women. Our study is one of the first studies to provide novel insights about the genetic and molecular basis for sex disparity in lung cancer development.
Asunto(s)
Estudio de Asociación del Genoma Completo , Neoplasias Pulmonares , Estudios de Casos y Controles , Femenino , Predisposición Genética a la Enfermedad , Humanos , Pulmón , Neoplasias Pulmonares/epidemiología , Neoplasias Pulmonares/genética , Masculino , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
A major difference between amyloid precursor protein (APP) isoforms (APP695 and APP751) is the existence of a Kunitz type protease inhibitor (KPI) domain which has a significant impact on the homo- and hetero-dimerization of APP isoforms. However, the exact molecular mechanisms of dimer formation remain elusive. To characterize the role of the KPI domain in APP dimerization, we performed a single molecule pull down (SiMPull) assay where homo-dimerization between tethered APP molecules and soluble APP molecules was highly preferred regardless of the type of APP isoforms, while hetero-dimerization between tethered APP751 molecules and soluble APP695 molecules was limited. We further investigated the domain level APP-APP interactions using coarse-grained models with the Martini force field. Though the model initial ternary complexes (KPI-E1, KPI-KPI, KPI-E2, E1-E1, E2-E2, and E1-E2) generated using HADDOCK (HD) and AlphaFold2 (AF2), the binding free energy profiles and the binding affinities of the domain combinations were investigated via the umbrella sampling with Martini force field. Additionally, membrane-bound microenvironments at the domain level were modeled. As a result, it was revealed that the KPI domain has a stronger attractive interaction with itself than the E1 and E2 domains, as reported elsewhere. Thus, the KPI domain of APP751 may form additional attractive interactions with E1, E2 and the KPI domain itself, whereas it is absent in APP695. In conclusion, we found that the APP751 homo-dimer formation is predominant than the homodimerization in APP695, which is facilitated by the presence of the KPI domain.
Asunto(s)
Precursor de Proteína beta-Amiloide , Inhibidores de Proteasas , Precursor de Proteína beta-Amiloide/metabolismo , Dimerización , Isoformas de Proteínas/metabolismo , Dominios ProteicosRESUMEN
KEY MESSAGE: The novel gene CaAN3 encodes an R2R3 MYB transcription factor that regulates fruit-specific anthocyanin accumulation. The key regulatory gene CaAN2 encodes an R2R3 MYB transcription factor that regulates anthocyanin biosynthesis in various tissues in pepper (Capsicum annuum). However, CaAN2 is not expressed in certain pepper accessions showing fruit-specific anthocyanin accumulation. In this study, we identified the novel locus CaAN3 as a regulator of fruit-specific anthocyanin biosynthesis, using an F2 population derived from a hybrid cultivar with purple immature fruits and segregating for CaAN3. We extracted total RNA, assembled two RNA pools according to fruit color, and carried out bulked segregant RNA sequencing. We aligned the raw reads to the pepper reference genome Dempsey and identified 6,672 significant single nucleotide polymorphisms (SNPs) by calculating the Δ(SNP-index) between the two pools. We then conducted molecular mapping to delimit the target region of CaAN3 to the interval 184.6-186.4 Mbp on chromosome 10. We focused on Dem.v1.00043895, encoding an R2R3 MYB transcription factor, as the strongest candidate gene. Sequence analysis revealed four insertion/deletion polymorphisms in the promoter region of the green CaAN3 allele. We employed virus-induced gene silencing and transient overexpression assays to characterize the function of the candidate gene. When Dem.v1.00043895 was silenced in pepper, anthocyanin accumulation decreased in the pericarp, while the transient overexpression of Dem.v1.00043895 in Nicotiana benthamiana leaves resulted in the accumulation of anthocyanins around the infiltration sites. These results showed that Dem.v1.00043895 is CaAN3, an activator of anthocyanin biosynthesis in pepper fruits.
Asunto(s)
Capsicum , Antocianinas , Capsicum/genética , Capsicum/metabolismo , Frutas/genética , Frutas/metabolismo , Regulación de la Expresión Génica de las Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , ARN , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
Hierarchical modeling is the preferred approach of modeling neighborhood effects. When both residential and workplace neighborhoods are known, a bivariate (residential-workplace) neighborhood random effect that quantifies the extent that a neighborhood's residential and workplace effects are correlated may be modeled. However, standard statistical software for hierarchical models does not easily allow correlations between the random effects of distinct clustering variables to be incorporated. To overcome this challenge, we develop a Bayesian model and an accompanying estimation procedure that allows for correlated bivariate neighborhood effects and allows individuals to reside or work in multiple neighborhoods, cross-sectional and longitudinal heterogeneity between individuals, and serial correlation between repeated observations over time. Simulation studies that vary key model parameters evaluate how well each aspect of the model is identified by the data. We apply the model to the motivating Framingham Heart Study linked food establishment data to examine whether (i) proximity to fast-food establishments is associated with body mass index, (ii) workplace neighborhood exposure associations are larger than those for residential neighborhood exposure, and (iii) residential neighborhood exposure associations correlate with workplace neighborhood exposure. Comparisons of the full model to models with restricted versions of the covariance structure illustrate the impact of including each feature of the covariance structure.
Asunto(s)
Índice de Masa Corporal , Comida Rápida/estadística & datos numéricos , Modelos Estadísticos , Características de la Residencia/estadística & datos numéricos , Lugar de Trabajo/estadística & datos numéricos , Teorema de Bayes , Estudios Transversales , Comida Rápida/efectos adversos , Humanos , Estudios Longitudinales , Programas InformáticosRESUMEN
To identify genetic variation associated with lung cancer risk, we performed a genome-wide association analysis of 685 lung cancer cases that had a family history of two or more first or second degree relatives compared with 744 controls without lung cancer that were genotyped on an Illumina Human OmniExpressExome-8v1 array. To ensure robust results, we further evaluated these findings using data from six additional studies that were assembled through the Transdisciplinary Research on Cancer of the Lung Consortium comprising 1993 familial cases and 33 690 controls. We performed a meta-analysis after imputation of all variants using the 1000 Genomes Project Phase 1 (version 3 release date September 2013). Analyses were conducted for 9 327 222 SNPs integrating data from the two sources. A novel variant on chromosome 4p15.31 near the LCORL gene and an imputed rare variant intergenic between CDKN2A and IFNA8 on chromosome 9p21.3 were identified at a genome-wide level of significance for squamous cell carcinomas. Additionally, associations of CHRNA3 and CHRNA5 on chromosome 15q25.1 in sporadic lung cancer were confirmed at a genome-wide level of significance in familial lung cancer. Previously identified variants in or near CHRNA2, BRCA2, CYP2A6 for overall lung cancer, TERT, SECISPB2L and RTEL1 for adenocarcinoma and RAD52 and MHC for squamous carcinoma were significantly associated with lung cancer.
Asunto(s)
Adenocarcinoma/genética , Carcinoma de Células Escamosas/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neoplasias Pulmonares/epidemiología , Neoplasias Pulmonares/genética , Estudios de Casos y Controles , Cromosomas Humanos Par 15/genética , Cromosomas Humanos Par 4 , Cromosomas Humanos Par 9/genética , Humanos , Pulmón/patología , Anamnesis , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
PARK2, a gene associated with Parkinson disease, is a tumor suppressor in human malignancies. Here, we show that c.823C>T (p.Arg275Trp), a germline mutation in PARK2, is present in a family with eight cases of lung cancer. The resulting amino acid change, p.Arg275Trp, is located in the highly conserved RING finger 1 domain of PARK2, which encodes an E3 ubiquitin ligase. Upon further analysis, the c.823C>T mutation was detected in three additional families affected by lung cancer. The effect size for PARK2 c.823C>T (odds ratio = 5.24) in white individuals was larger than those reported for variants from lung cancer genome-wide association studies. These data implicate this PARK2 germline mutation as a genetic susceptibility factor for lung cancer. Our results provide a rationale for further investigations of this specific mutation and gene for evaluation of the possibility of developing targeted therapies against lung cancer in individuals with PARK2 variants by compensating for the loss-of-function effect caused by the associated variation.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Neoplasias Pulmonares/genética , Ubiquitina-Proteína Ligasas/genética , Secuencia de Bases , Cartilla de ADN/genética , Exoma/genética , Femenino , Mutación de Línea Germinal/genética , Humanos , Masculino , Datos de Secuencia Molecular , Mutación Missense/genética , Oportunidad Relativa , Linaje , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies. METHODS: In this study we developed a novel distance-based approach, Ancestry Inference using Principal component analysis and Spatial analysis (AIPS) that incorporates an Inverse Distance Weighted (IDW) interpolation method from spatial analysis to assign individuals to population memberships. RESULTS: We demonstrate the benefits of AIPS in analyzing population substructure, specifically related to the four most commonly used tools EIGENSTRAT, STRUCTURE, fastSTRUCTURE, and ADMIXTURE using genotype data from various intra-European panels and European-Americans. While the aforementioned commonly used tools performed poorly in inferring ancestry from a large number of subpopulations, AIPS accurately distinguished variations between and within subpopulations. CONCLUSIONS: Our results show that AIPS can be applied to large-scale data sets to discriminate the modest variability among intra-continental populations as well as for characterizing inter-continental variation. The method we developed will protect against spurious associations when mapping the genetic basis of a disease. Our approach is more accurate and computationally efficient method for inferring genetic ancestry in the large-scale genetic studies.
Asunto(s)
Genética de Población/métodos , Europa (Continente) , Genoma Humano/genética , Humanos , Filogenia , Análisis de Componente PrincipalRESUMEN
Results from genome-wide association studies (GWAS) have indicated that strong single-gene effects are the exception, not the rule, for most diseases. We assessed the joint effects of germline genetic variations through a pathway-based approach that considers the tissue-specific contexts of GWAS findings. From GWAS meta-analyses of lung cancer (12 160 cases/16 838 controls), breast cancer (15 748 cases/18 084 controls) and prostate cancer (14 160 cases/12 724 controls) in individuals of European ancestry, we determined the tissue-specific interaction networks of proteins expressed from genes that are likely to be affected by disease-associated variants. Reactome pathways exhibiting enrichment of proteins from each network were compared across the cancers. Our results show that pathways associated with all three cancers tend to be broad cellular processes required for growth and survival. Significant examples include the nerve growth factor (P = 7.86 × 10(-33)), epidermal growth factor (P = 1.18 × 10(-31)) and fibroblast growth factor (P = 2.47 × 10(-31)) signaling pathways. However, within these shared pathways, the genes that influence risk largely differ by cancer. Pathways found to be unique for a single cancer focus on more specific cellular functions, such as interleukin signaling in lung cancer (P = 1.69 × 10(-15)), apoptosis initiation by Bad in breast cancer (P = 3.14 × 10(-9)) and cellular responses to hypoxia in prostate cancer (P = 2.14 × 10(-9)). We present the largest comparative cross-cancer pathway analysis of GWAS to date. Our approach can also be applied to the study of inherited mechanisms underlying risk across multiple diseases in general.
Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Neoplasias de la Mama/genética , Femenino , Predisposición Genética a la Enfermedad , Variación Genética/genética , Humanos , Neoplasias Pulmonares/genética , Masculino , Polimorfismo de Nucleótido Simple/genética , Neoplasias de la Próstata/genéticaRESUMEN
BACKGROUND: Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the eigenvectors generated by Eigenstrat are sample specific and thus cannot be generalized to other individuals. RESULTS: We developed FastPop, an efficient R package that fills the gap between Structure and Eigenstrat. It can: 1, generate PCA scores that identify ancestral origins and can be used for multiple studies; 2, infer ancestry information for data arising from two or more intercontinental origins. We demonstrate the use of FastPop using 2318 SNP markers selected from the genome based on high variability among European, Asian and West African (African) populations. We conducted an analysis of 505 Hapmap samples with European, African or Asian ancestry along with 19661 additional samples of unknown ancestry. The results from FastPop are highly consistent with those obtained by Structure across the 19661 samples we studied. The correlations of the results between FastPop and Structure are 0.99, 0.97 and 0.99 for European, African and Asian ancestry scores, respectively. Compared with Structure, FastPop is more efficient as it finished ancestry inference for 19661 samples in 16 min compared with 21-24 h required by Structure. FastPop also provided scores based on SNP weights so the scores of reference population can be applied to other studies provided the same set of markers are used. We also present application of the method for studying four continental populations (European, Asian, African, and Native American). CONCLUSIONS: We developed an algorithm that can infer ancestries on data involving two or more intercontinental origins. It is efficient for analyzing large datasets. Additionally the PCA derived scores can be applied to multiple data sets to ensure the same ancestry analysis is applied to all studies.
Asunto(s)
Algoritmos , Etnicidad/genética , Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , Grupos Raciales/genética , Programas Informáticos , Teorema de Bayes , Genotipo , Proyecto Mapa de Haplotipos , HumanosRESUMEN
BACKGROUND: Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data-derived predictor of known cancer associated genes. RESULTS: We found that the traditional approach of identifying cancer genes--identifying differentially expressed genes--is not very efficient. The analysis of interindividual variation of gene expression in tumor samples identifies cancer-associated genes more effectively. The results were consistent across 4 major types of cancer: breast, colorectal, lung, and prostate. We used recently reported cancer-associated genes (2011-2012) for validation and found that novel cancer-associated genes can be best identified by elevated variance of the gene expression in tumor samples. CONCLUSIONS: The observation that the high interindividual variation of gene expression in tumor tissues is the best predictor of cancer-associated genes is likely a result of tumor heterogeneity on gene level. Computer simulation demonstrates that in the case of heterogeneity, an assessment of variance in tumors provides a better identification of cancer genes than does the comparison of the expression in normal and tumor tissues. Our results thus challenge the current paradigm that comparing the mean expression between normal and tumorous tissues is the best approach to identifying cancer-associated genes; we found that the high interindividual variation in expression is a better approach, and that using variation would improve our chances of identifying cancer-associated genes.
Asunto(s)
Genómica , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Simulación por Computador , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Humanos , Modelos Logísticos , Neoplasias/patologíaRESUMEN
Esophageal adenocarcinoma is the most common histological subtype of esophageal cancer in Western countries and shows poor prognosis with rapid growth. EAC is characterized by a strong male predominance and racial disparity. EAC is up to fivefold more common among Whites than Blacks, yet Black patients with EAC have poorer survival rates. The racial disparity remains largely unknown, and there is limited knowledge of mutations in EAC regarding racial disparities. We used whole-exome sequencing to show somatic mutation profiles derived from tumor samples from 18 EAC male patients. We identified three molecular subgroups based on the pre-defined esophageal cancer-specific mutational signatures. Group 1 is associated with age and NTHL1 deficiency-related signatures. Group 2 occurs primarily in Black patients and is associated with signatures related to DNA damage from oxidative stress and NTHL1 deficiency-related signatures. Group 3 is associated with defective homologous recombination-based DNA often caused by BRCA mutation in White patients. We observed significantly mutated race related genes (LCE2B in Black, SDR39U1 in White) were (q-value < 0.1). Our findings underscore the possibility of distinct molecular mutation patterns in EAC among different races. Further studies are needed to validate our findings, which could contribute to precision medicine in EAC.
Asunto(s)
Adenocarcinoma , Neoplasias Esofágicas , Femenino , Humanos , Masculino , Adenocarcinoma/genética , Adenocarcinoma/patología , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/patología , Mutación , Negro o Afroamericano , Blanco , Secuenciación del ExomaRESUMEN
OBJECTIVE: Educational offerings to fill the bioinformatics knowledge gap are a key component to enhancing access and use of health data from the All of Us Research Program. We developed a Train the Trainer-based, innovative training series including project-based learning, modular on-demand demonstrations, and unstructured tutorial time as a model for educational engagement in the All of Us community. MATERIALS AND METHODS: We highlight our training modules and content, with training survey data informing cycles of development in the creation of a 6-module training series with modular demonstrations. RESULTS: We have conducted 2 public iterations of the Train the Trainer (Tx3) Series based on survey feedback while training over 300 registered researchers to access and analyze data on the All of Us Researcher Workbench. DISCUSSION AND CONCLUSION: Future directions of the Tx3 Series include enhanced focus on project-based learning and learner requests for modularity and asynchronous materials access.
RESUMEN
While respiratory diseases such as COPD and asthma share many risk factors, most studies investigate them in insolation and in predominantly European ancestry populations. Here, we conducted the most powerful multi-trait and -ancestry genetic analysis of respiratory diseases and auxiliary traits to date. Our approach improves the power of genetic discovery across traits and ancestries, identifying 44 novel loci associated with lung function in individuals of East Asian ancestry. Using these results, we developed PRSxtra (cross TRait and Ancestry), a multi-trait and -ancestry polygenic risk score approach that leverages shared components of heritable risk via pleiotropic effects. PRSxtra significantly improved the prediction of asthma, COPD, and lung cancer compared to trait- and ancestry-matched PRS in a multi-ancestry cohort from the All of Us Research Program, especially in diverse populations. PRSxtra identified individuals in the top decile with over four-fold odds of asthma and COPD compared to the first decile. Our results present a new framework for multi-trait and -ancestry studies of respiratory diseases to improve genetic discovery and polygenic prediction.
RESUMEN
Background: Lung cancer and tobacco use pose significant global health challenges, necessitating a comprehensive translational roadmap for improved prevention strategies. Polygenic risk scores (PRSs) are powerful tools for patient risk stratification but have not yet been widely used in primary care for lung cancer, particularly in diverse patient populations. Methods: We propose the GREAT care paradigm, which employs PRSs to stratify disease risk and personalize interventions. We developed PRSs using large-scale multi-ancestry genome-wide association studies and standardized PRS distributions across all ancestries. We applied our PRSs to 796 individuals from the GISC Trial, 350,154 from UK Biobank (UKBB), and 210,826 from All of Us Research Program (AoU), totaling 561,776 individuals of diverse ancestry. Results: Significant odds ratios (ORs) for lung cancer and difficulty quitting smoking were observed in both UKBB and AoU. For lung cancer, the ORs for individuals in the highest risk group (top 20% versus bottom 20%) were 1.85 (95% CI: 1.58 - 2.18) in UKBB and 2.39 (95% CI: 1.93 - 2.97) in AoU. For difficulty quitting smoking, the ORs (top 33% versus bottom 33%) were 1.36 (95% CI: 1.32 - 1.41) in UKBB and 1.32 (95% CI: 1.28 - 1.36) in AoU. Conclusion: Our PRS-based intervention model leverages large-scale genetic data for robust risk assessment across populations. This model will be evaluated in two cluster-randomized clinical trials aimed at motivating health behavior changes in high-risk patients of diverse ancestry. This pioneering approach integrates genomic insights into primary care, promising improved outcomes in cancer prevention and tobacco treatment.
RESUMEN
Genome-wide association studies (GWAS) identified over fifty loci associated with lung cancer risk. However, underlying mechanisms and target genes are largely unknown, as most risk-associated variants might regulate gene expression in a context-specific manner. Here, we generate a barcode-shared transcriptome and chromatin accessibility map of 117,911 human lung cells from age/sex-matched ever- and never-smokers to profile context-specific gene regulation. Identified candidate cis-regulatory elements (cCREs) are largely cell type-specific, with 37% detected in one cell type. Colocalization of lung cancer candidate causal variants (CCVs) with these cCREs combined with transcription factor footprinting prioritize the variants for 68% of the GWAS loci. CCV-colocalization and trait relevance score indicate that epithelial and immune cell categories, including rare cell types, contribute to lung cancer susceptibility the most. A multi-level cCRE-gene linking system identifies candidate susceptibility genes from 57% of the loci, where most loci display cell-category-specific target genes, suggesting context-specific susceptibility gene function.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neoplasias Pulmonares , Análisis de la Célula Individual , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Análisis de la Célula Individual/métodos , Transcriptoma , Regulación Neoplásica de la Expresión Génica , Polimorfismo de Nucleótido Simple , Cromatina/genética , Cromatina/metabolismo , Masculino , Femenino , Sitios de Carácter Cuantitativo , Secuencias Reguladoras de Ácidos Nucleicos/genética , MultiómicaRESUMEN
Lung cancer remains the leading cause of cancer mortality, despite declining smoking rates. Previous lung cancer GWAS have identified numerous loci, but separating the genetic risks of lung cancer and smoking behavioral susceptibility remains challenging. Here, we perform multi-ancestry GWAS meta-analyses of lung cancer using the Million Veteran Program cohort (approximately 95% male cases) and a previous study of European-ancestry individuals, jointly comprising 42,102 cases and 181,270 controls, followed by replication in an independent cohort of 19,404 cases and 17,378 controls. We then carry out conditional meta-analyses on cigarettes per day and identify two novel, replicated loci, including the 19p13.11 pleiotropic cancer locus in squamous cell lung carcinoma. Overall, we report twelve novel risk loci for overall lung cancer, lung adenocarcinoma, and squamous cell lung carcinoma, nine of which are externally replicated. Finally, we perform PheWAS on polygenic risk scores for lung cancer, with and without conditioning on smoking. The unconditioned lung cancer polygenic risk score is associated with smoking status in controls, illustrating a reduced predictive utility in non-smokers. Additionally, our polygenic risk score demonstrates smoking-independent pleiotropy of lung cancer risk across neoplasms and metabolic traits.
Asunto(s)
Predisposición Genética a la Enfermedad , Neoplasias Pulmonares , Fumar , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Adenocarcinoma del Pulmón/genética , Carcinoma de Células Escamosas/genética , Estudios de Casos y Controles , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidad , Polimorfismo de Nucleótido Simple , Factores de Riesgo , Fumar/genética , Blanco , Etnicidad/genética , Puntuación de Riesgo GenéticoRESUMEN
BACKGROUND: Clinical, molecular, and genetic epidemiology studies displayed remarkable differences between ever- and never-smoking lung cancer. METHODS: We conducted a stratified multi-population (European, East Asian, and African descent) association study on 44,823 ever-smokers and 20,074 never-smokers to identify novel variants that were missed in the non-stratified analysis. Functional analysis including expression quantitative trait loci (eQTL) colocalization and DNA damage assays, and annotation studies were conducted to evaluate the functional roles of the variants. We further evaluated the impact of smoking quantity on lung cancer risk for the variants associated with ever-smoking lung cancer. RESULTS: Five novel independent loci, GABRA4, intergenic region 12q24.33, LRRC4C, LINC01088, and LCNL1 were identified with the association at two or three populations (P < 5 × 10-8). Further functional analysis provided multiple lines of evidence suggesting the variants affect lung cancer risk through excessive DNA damage (GABRA4) or cis-regulation of gene expression (LCNL1). The risk of variants from 12 independent regions, including the well-known CHRNA5, associated with ever-smoking lung cancer was evaluated for never-smokers, light-smokers (packyear ≤ 20), and moderate-to-heavy-smokers (packyear > 20). Different risk patterns were observed for the variants among the different groups by smoking behavior. CONCLUSIONS: We identified novel variants associated with lung cancer in only ever- or never-smoking groups that were missed by prior main-effect association studies. IMPACT: Our study highlights the genetic heterogeneity between ever- and never-smoking lung cancer and provides etiologic insights into the complicated genetic architecture of this deadly cancer.