RESUMEN
The prostate cancer (PCa) risk-associated SNP rs11672691 is positively associated with aggressive disease at diagnosis. We showed that rs11672691 maps to the promoter of a short isoform of long noncoding RNA PCAT19 (PCAT19-short), which is in the third intron of the long isoform (PCAT19-long). The risk variant is associated with decreased and increased levels of PCAT19-short and PCAT19-long, respectively. Mechanistically, the risk SNP region is bifunctional with both promoter and enhancer activity. The risk variants of rs11672691 and its LD SNP rs887391 decrease binding of transcription factors NKX3.1 and YY1 to the promoter of PCAT19-short, resulting in weaker promoter but stronger enhancer activity that subsequently activates PCAT19-long. PCAT19-long interacts with HNRNPAB to activate a subset of cell-cycle genes associated with PCa progression, thereby promoting PCa tumor growth and metastasis. Taken together, these findings reveal a risk SNP-mediated promoter-enhancer switching mechanism underlying both initiation and progression of aggressive PCa.
Asunto(s)
Neoplasias de la Próstata/genética , ARN Largo no Codificante/genética , Alelos , Línea Celular Tumoral , Elementos de Facilitación Genéticos/genética , Regulación Neoplásica de la Expresión Génica/genética , Frecuencia de los Genes/genética , Predisposición Genética a la Enfermedad/genética , Proteínas de Homeodominio/metabolismo , Humanos , Masculino , Polimorfismo de Nucleótido Simple/genética , Regiones Promotoras Genéticas/genética , Unión Proteica , Isoformas de ARN/genética , Factores de Riesgo , Factores de Transcripción/metabolismo , Factor de Transcripción YY1/metabolismoRESUMEN
Polygenic risk scores (PRSs) for a variety of diseases have recently been shown to have relative risks that depend on age, and genetic relative risks decrease with increasing age. A refined understanding of the age dependency of PRSs for a disease is important for personalized risk predictions and risk stratification. To further evaluate how the PRS relative risk for prostate cancer depends on age, we refined analyses for a validated PRS for prostate cancer by using 64,274 prostate cancer cases and 46,432 controls of diverse ancestry (82.8% European, 9.8% African American, 3.8% Latino, 2.8% Asian, and 0.8% Ghanaian). Our strategy applied a novel weighted proportional hazards model to case-control data to fully utilize age to refine how the relative risk decreased with age. We found significantly greater relative risks for younger men (age 30-55 years) compared with older men (70-88 years) for both relative risk per standard deviation of the PRS and dichotomized according to the upper 90th percentile of the PRS distribution. For the largest European ancestral group that could provide reliable resolution, the log-relative risk decreased approximately linearly from age 50 to age 75. Despite strong evidence of age-dependent genetic relative risk, our results suggest that absolute risk predictions differed little from predictions that assumed a constant relative risk over ages, from short-term to long-term predictions, simplifying implementation of risk discussions into clinical practice.
Asunto(s)
Predisposición Genética a la Enfermedad , Neoplasias de la Próstata , Adulto , Anciano , Estudio de Asociación del Genoma Completo , Ghana , Humanos , Masculino , Persona de Mediana Edad , Herencia Multifactorial/genética , Neoplasias de la Próstata/genética , Factores de RiesgoRESUMEN
The familial recurrence risk is the probability a person will have disease, given a reported family history. When family histories are obtained as simple counts of disease among family members, as often obtained in cancer registries or surveys, we propose methods to estimate recurrence risks based on truncated binomial distributions. By this approach, we are able to obtain unbiased estimates of risk for a person with at least k-affected relatives, where k can be specified to determine how risk varies with k. We also derive robust variances of the recurrence risk estimate, to account for correlations within families, such as those induced by shared genes or shared environment, without explicitly modeling the factors that cause familial correlations. Furthermore, we illustrate how mixture models can be used to account for a sample composed of low- and high-risk families. Using simulations, we illustrate the properties of the proposed methods. Application of our methods to a family history survey of prostate cancer shows that the recurrence risk for prostate cancer increased from 16%, when there was at least one affected relative, to 52%, when there was at least five affected relatives.
Asunto(s)
Familia , Anamnesis , Modelos Genéticos , Neoplasias de la Próstata/epidemiología , Neoplasias de la Próstata/genética , Distribución Binomial , Predisposición Genética a la Enfermedad , Humanos , Incidencia , Masculino , Anamnesis/estadística & datos numéricos , Sistema de Registros , Riesgo , Factores de Riesgo , Encuestas y CuestionariosRESUMEN
The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.
Asunto(s)
Enfermedad/genética , Mutación Missense/genética , Programas Informáticos , Área Bajo la Curva , Análisis Mutacional de ADN , Exoma/genética , Frecuencia de los Genes , Humanos , Curva ROCRESUMEN
Frontotemporal lobar degeneration with neuronal inclusions of the TAR DNA-binding protein 43 (FTLD-TDP) represents the most common pathological subtype of FTLD. We established the international FTLD-TDP whole-genome sequencing consortium to thoroughly characterize the known genetic causes of FTLD-TDP and identify novel genetic risk factors. Through the study of 1131 unrelated Caucasian patients, we estimated that C9orf72 repeat expansions and GRN loss-of-function mutations account for 25.5% and 13.9% of FTLD-TDP patients, respectively. Mutations in TBK1 (1.5%) and other known FTLD genes (1.4%) were rare, and the disease in 57.7% of FTLD-TDP patients was unexplained by the known FTLD genes. To unravel the contribution of common genetic factors to the FTLD-TDP etiology in these patients, we conducted a two-stage association study comprising the analysis of whole-genome sequencing data from 517 FTLD-TDP patients and 838 controls, followed by targeted genotyping of the most associated genomic loci in 119 additional FTLD-TDP patients and 1653 controls. We identified three genome-wide significant FTLD-TDP risk loci: one new locus at chromosome 7q36 within the DPP6 gene led by rs118113626 (p value = 4.82e - 08, OR = 2.12), and two known loci: UNC13A, led by rs1297319 (p value = 1.27e - 08, OR = 1.50) and HLA-DQA2 led by rs17219281 (p value = 3.22e - 08, OR = 1.98). While HLA represents a locus previously implicated in clinical FTLD and related neurodegenerative disorders, the association signal in our study is independent from previously reported associations. Through inspection of our whole-genome sequence data for genes with an excess of rare loss-of-function variants in FTLD-TDP patients (n ≥ 3) as compared to controls (n = 0), we further discovered a possible role for genes functioning within the TBK1-related immune pathway (e.g., DHX58, TRIM21, IRF7) in the genetic etiology of FTLD-TDP. Together, our study based on the largest cohort of unrelated FTLD-TDP patients assembled to date provides a comprehensive view of the genetic landscape of FTLD-TDP, nominates novel FTLD-TDP risk loci, and strongly implicates the immune pathway in FTLD-TDP pathogenesis.
Asunto(s)
Proteínas del Tejido Nervioso/genética , Proteinopatías TDP-43/genética , Anciano , Expansión de las Repeticiones de ADN , Dipeptidil-Peptidasas y Tripeptidil-Peptidasas/genética , Femenino , Lóbulo Frontal/metabolismo , Degeneración Lobar Frontotemporal/genética , Degeneración Lobar Frontotemporal/inmunología , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Antígenos HLA-DQ/genética , Humanos , Péptidos y Proteínas de Señalización Intracelular , Mutación con Pérdida de Función , Masculino , Persona de Mediana Edad , Proteínas del Tejido Nervioso/fisiología , Canales de Potasio/genética , Progranulinas/genética , Progranulinas/fisiología , Proteínas Serina-Treonina Quinasas/genética , Proteínas Serina-Treonina Quinasas/fisiología , Proteínas/genética , Proteínas/fisiología , ARN Mensajero/biosíntesis , Factores de Riesgo , Análisis de Secuencia de ARN , Sociedades Científicas , Proteinopatías TDP-43/inmunología , Población Blanca/genéticaRESUMEN
BACKGROUND: After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. RESULTS: We analyzed 10,000 exomes from the Alzheimer's Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1-5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer's disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. CONCLUSIONS: Identification of the complete variant set from sequencing data is the prerequisite of genetic association analyses. The current analytic practice of calling genetic variants from sequencing data using a single bioinformatics pipeline is no longer adequate with the increasingly large projects. The number and percentage of quality variants that passed quality filters but are missed by the one-pipeline approach rapidly increased with sample size.
Asunto(s)
Biología Computacional/métodos , Variación Genética , Enfermedad de Alzheimer/genética , Composición de Base/genética , Descubrimiento de Drogas , Genoma , Genotipo , Técnicas de Genotipaje , Humanos , Tamaño de la Muestra , Alineación de SecuenciaRESUMEN
MOTIVATION: Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies. RESULTS: We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types. AVAILABILITY AND IMPLEMENTATION: FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/. CONTACT: nilah@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Regulación de la Expresión Génica , Variación Genética , Programas Informáticos , Genómica , Humanos , Sitios de Carácter CuantitativoRESUMEN
MicroRNAs (miRNAs) regulate up to one-third of all protein-coding genes including genes relevant to cancer. Variants within miRNAs have been reported to be associated with prognosis, survival, response to chemotherapy across cancer types, in vitro parameters of cell growth, and altered risks for development of cancer. Five miRNA variants have been reported to be associated with risk for development of colorectal cancer (CRC). In this study, we evaluated germline genetic variation in 1,123 miRNAs in 899 individuals with CRCs categorized by clinical subtypes and in 204 controls. The role of common miRNA variation in CRC was investigated using single variant and miRNA-level association tests. Twenty-nine miRNAs and 30 variants exhibited some marginal association with CRC in at least one subtype of CRC. Previously reported associations were not confirmed (n = 4) or could not be evaluated (n = 1). The variants noted for the CRCs with deficient mismatch repair showed little overlap with the variants noted for CRCs with proficient mismatch repair, consistent with our evolving understanding of the distinct biology underlying these two groups. © 2016 The Authors Genes, Chromosomes & Cancer Published by Wiley Periodicals, Inc.
Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Colorrectales/genética , Variación Genética/genética , Mutación de Línea Germinal/genética , MicroARNs/genética , Estudios de Casos y Controles , Estudios de Seguimiento , Humanos , Estadificación de Neoplasias , Pronóstico , Factores de RiesgoRESUMEN
Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain â¼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same region.
Asunto(s)
Mapeo Cromosómico/métodos , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/genética , Población Blanca/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Desequilibrio de Ligamiento , MasculinoRESUMEN
Previous genome-wide association studies (GWAS) of prostate cancer risk focused on cases unselected for family history and have reported over 100 significant associations. The International Consortium for Prostate Cancer Genetics (ICPCG) has now performed a GWAS of 2511 (unrelated) familial prostate cancer cases and 1382 unaffected controls from 12 member sites. All samples were genotyped on the Illumina 5M+exome single nucleotide polymorphism (SNP) platform. The GWAS identified a significant evidence for association for SNPs in six regions previously associated with prostate cancer in population-based cohorts, including 3q26.2, 6q25.3, 8q24.21, 10q11.23, 11q13.3, and 17q12. Of note, SNP rs138042437 (p = 1.7e(-8)) at 8q24.21 achieved a large estimated effect size in this cohort (odds ratio = 13.3). 116 previously sampled affected relatives of 62 risk-allele carriers from the GWAS cohort were genotyped for this SNP, identifying 78 additional affected carriers in 62 pedigrees. A test for an excess number of affected carriers among relatives exhibited strong evidence for co-segregation of the variant with disease (p = 8.5e(-11)). The majority (92 %) of risk-allele carriers at rs138042437 had a consistent estimated haplotype spanning approximately 100 kb of 8q24.21 that contained the minor alleles of three rare SNPs (dosage minor allele frequencies <1.7 %), rs183373024 (PRNCR1), previously associated SNP rs188140481, and rs138042437 (CASC19). Strong evidence for co-segregation of a SNP on the haplotype further characterizes the haplotype as a prostate cancer predisposition locus.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neoplasias de la Próstata/genética , ARN Largo no Codificante/genética , Proteínas Supresoras de Tumor/genética , Anciano , Frecuencia de los Genes , Genotipo , Haplotipos/genética , Heterocigoto , Humanos , Masculino , Persona de Mediana Edad , Linaje , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/patología , Factores de RiesgoRESUMEN
BACKGROUND: Polygenic risk scores comprising established susceptibility variants have shown to be informative classifiers for several complex diseases including prostate cancer. For prostate cancer it is unknown if inclusion of genetic markers that have so far not been associated with prostate cancer risk at a genome-wide significant level will improve disease prediction. METHODS: We built polygenic risk scores in a large training set comprising over 25,000 individuals. Initially 65 established prostate cancer susceptibility variants were selected. After LD pruning additional variants were prioritized based on their association with prostate cancer. Six-fold cross validation was performed to assess genetic risk scores and optimize the number of additional variants to be included. The final model was evaluated in an independent study population including 1,370 cases and 1,239 controls. RESULTS: The polygenic risk score with 65 established susceptibility variants provided an area under the curve (AUC) of 0.67. Adding an additional 68 novel variants significantly increased the AUC to 0.68 (P = 0.0012) and the net reclassification index with 0.21 (P = 8.5E-08). All novel variants were located in genomic regions established as associated with prostate cancer risk. CONCLUSIONS: Inclusion of additional genetic variants from established prostate cancer susceptibility regions improves disease prediction.
Asunto(s)
Marcadores Genéticos , Predisposición Genética a la Enfermedad , Neoplasias de la Próstata/genética , Variación Genética , Humanos , Desequilibrio de Ligamiento , Masculino , Factores de RiesgoRESUMEN
Genetic studies have identified single nucleotide polymorphisms (SNPs) associated with the risk of prostate cancer (PC). It remains unclear whether such genetic variants are associated with disease aggressiveness. The NCI-SPORE Genetics Working Group retrospectively collected clinicopathologic information and genotype data for 36 SNPs which at the time had been validated to be associated with PC risk from 25,674 cases with PC. Cases were grouped according to race, Gleason score (Gleason ≤ 6, 7, ≥ 8) and aggressiveness (non-aggressive, intermediate, and aggressive disease). Statistical analyses were used to compare the frequency of the SNPs between different disease cohorts. After adjusting for multiple testing, only PC-risk SNP rs2735839 (G) was significantly and inversely associated with aggressive (OR = 0.77; 95 % CI 0.69-0.87) and high-grade disease (OR = 0.77; 95 % CI 0.68-0.86) in European men. Similar associations with aggressive (OR = 0.72; 95 % CI 0.58-0.89) and high-grade disease (OR = 0.69; 95 % CI 0.54-0.87) were documented in African-American subjects. The G allele of rs2735839 was associated with disease aggressiveness even at low PSA levels (<4.0 ng/mL) in both European and African-American men. Our results provide further support that a PC-risk SNP rs2735839 near the KLK3 gene on chromosome 19q13 may be associated with aggressive and high-grade PC. Future prospectively designed, case-case GWAS are needed to identify additional SNPs associated with PC aggressiveness.
Asunto(s)
Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Cohortes , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana Edad , National Cancer Institute (U.S.) , Invasividad Neoplásica , Factores de Riesgo , Estados UnidosRESUMEN
Searching for rare genetic variants associated with complex diseases can be facilitated by enriching for diseased carriers of rare variants by sampling cases from pedigrees enriched for disease, possibly with related or unrelated controls. This strategy, however, complicates analyses because of shared genetic ancestry, as well as linkage disequilibrium among genetic markers. To overcome these problems, we developed broad classes of "burden" statistics and kernel statistics, extending commonly used methods for unrelated case-control data to allow for known pedigree relationships, for autosomes and the X chromosome. Furthermore, by replacing pedigree-based genetic correlation matrices with estimates of genetic relationships based on large-scale genomic data, our methods can be used to account for population-structured data. By simulations, we show that the type I error rates of our developed methods are near the asymptotic nominal levels, allowing rapid computation of P-values. Our simulations also show that a linear weighted kernel statistic is generally more powerful than a weighted "burden" statistic. Because the proposed statistics are rapid to compute, they can be readily used for large-scale screening of the association of genomic sequence data with disease status.
Asunto(s)
Interpretación Estadística de Datos , Estudios de Asociación Genética/estadística & datos numéricos , Variación Genética , Linaje , Simulación por Computador , Estudio de Asociación del Genoma Completo , HumanosRESUMEN
BACKGROUND: Family history is a major risk factor for prostate cancer (PCa), suggesting a genetic component to the disease. However, traditional linkage and association studies have failed to fully elucidate the underlying genetic basis of familial PCa. METHODS: Here, we use a candidate gene approach to identify potential PCa susceptibility variants in whole exome sequencing data from familial PCa cases. Six hundred ninety-seven candidate genes were identified based on function, location near a known chromosome 17 linkage signal, and/or previous association with prostate or other cancers. Single nucleotide variants (SNVs) in these candidate genes were identified in whole exome sequence data from 33 PCa cases from 11 multiplex PCa families (3 cases/family). RESULTS: Overall, 4,856 candidate gene SNVs were identified, including 1,052 missense and 10 nonsense variants. Twenty missense variants were shared by all three family members in each family in which they were observed. Additionally, 15 missense variants were shared by two of three family members and predicted to be deleterious by five different algorithms. Four missense variants, BLM Gln123Arg, PARP2 Arg283Gln, LRCC46 Ala295Thr and KIF2B Pro91Leu, and one nonsense variant, CYP3A43 Arg441Ter, showed complete co-segregation with PCa status. Twelve additional variants displayed partial co-segregation with PCa. CONCLUSIONS: Forty-three nonsense and shared, missense variants were identified in our candidate genes. Further research is needed to determine the contribution of these variants to PCa susceptibility.
Asunto(s)
Codón sin Sentido , Mutación Missense , Neoplasias de la Próstata/genética , Anciano , Exoma , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana EdadRESUMEN
Previous GWAS studies have reported significant associations between various common SNPs and prostate cancer risk using cases unselected for family history. How these variants influence risk in familial prostate cancer is not well studied. Here, we analyzed 25 previously reported SNPs across 14 loci from prior prostate cancer GWAS. The International Consortium for Prostate Cancer Genetics (ICPCG) previously validated some of these using a family-based association method (FBAT). However, this approach suffered reduced power due to the conditional statistics implemented in FBAT. Here, we use a case-control design with an empirical analysis strategy to analyze the ICPCG resource for association between these 25 SNPs and familial prostate cancer risk. Fourteen sites contributed 12,506 samples (9,560 prostate cancer cases, 3,368 with aggressive disease, and 2,946 controls from 2,283 pedigrees). We performed association analysis with Genie software which accounts for relationships. We analyzed all familial prostate cancer cases and the subset of aggressive cases. For the familial prostate cancer phenotype, 20 of the 25 SNPs were at least nominally associated with prostate cancer and 16 remained significant after multiple testing correction (p ≤ 1E (-3)) occurring on chromosomal bands 6q25, 7p15, 8q24, 10q11, 11q13, 17q12, 17q24, and Xp11. For aggressive disease, 16 of the SNPs had at least nominal evidence and 8 were statistically significant including 2p15. The results indicate that the majority of common, low-risk alleles identified in GWAS studies for all prostate cancer also contribute risk for familial prostate cancer, and that some may contribute risk to aggressive disease.
Asunto(s)
Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/genética , Alelos , Estudios de Casos y Controles , Estudios de Seguimiento , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Metaanálisis como Asunto , Linaje , Fenotipo , Factores de RiesgoRESUMEN
We performed a meta-analysis of 3 genome-wide association studies to identify additional common variants influencing chronic lymphocytic leukemia (CLL) risk. The discovery phase was composed of genome-wide association study data from 1121 cases and 3745 controls. Replication analysis was performed in 861 cases and 2033 controls. We identified a novel CLL risk locus at 6p21.33 (rs210142; intronic to the BAK1 gene, BCL2 antagonist killer 1; P = 9.47 × 10(-16)). A strong relationship between risk genotype and reduced BAK1 expression was shown in lymphoblastoid cell lines. This finding provides additional support for polygenic inheritance to CLL and provides further insight into the biologic basis of disease development.
Asunto(s)
Antineoplásicos/efectos adversos , Aberraciones Cromosómicas/inducido químicamente , Leucemia Mieloide de Fase Crónica/tratamiento farmacológico , Leucemia Mieloide de Fase Crónica/genética , Cromosoma Filadelfia , Piperazinas/efectos adversos , Pirimidinas/efectos adversos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Benzamidas , Análisis Citogenético , Femenino , Humanos , Mesilato de Imatinib , Hibridación Fluorescente in Situ , Leucemia Mieloide de Fase Crónica/mortalidad , Masculino , Persona de Mediana Edad , Pronóstico , Estudios Prospectivos , ARN Mensajero/genética , Reacción en Cadena en Tiempo Real de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Tasa de Supervivencia , Adulto JovenRESUMEN
5-methylcytosine (5mC) is the most common chemical modification occurring on the CpG sites across the human genome. Bisulfite conversion combined with short-read whole genome sequencing can capture and quantify the modification at single nucleotide resolution. However, the PCR amplification process could lead to duplicative methylation patterns and introduce 5mC detection bias. Additionally, the limited read length also restricts co-methylation analysis between distant CpG sites. The bisulfite conversion process presents a significant challenge for detecting variant-specific methylation due to the destruction of allele information in the sequencing reads. To address these issues, we sought to characterize the human methylation profiling with the nanopore long-read sequencing, aiming to demonstrate its potential for long-range co-methylation analysis with native modification call and intact allele information retained. In this regard, we first analyzed the nanopore demo data in the adaptive sampling sequencing run targeting all human CpG islands. We applied the linkage disequilibrium (LD) R2 to calculate the co-methylation in nanopore data, and further identified 27,875, 50,481, 26,542 and 51,189 methylation haplotype blocks (MHB) in COLO829, COLO829BL, HCC1395 and HCC1395BL cell lines, respectively. Interestingly, while we found that majority of the co-methylation were in a short range (≤200bp), a small portion (1~3%) showed long distance (≥1,000bp), suggesting potential remote regulatory mechanisms across the genome. To further characterize the epigenetic changes related to transcription factor binding, we profiled the 5mC percentage changes surrounding various motif sites in JASPAR collection and found that CTCF and KLF5 binding sites showed reduced methylation, while FOXE1 and ZNF354A sites showed increased methylation. To further investigate the allele-specific 5mCG in the prostate genome, we designed a target region covering methylation quantitative trait loci (mQTL) and genome-wide association study (GWAS) risk germline variants and generated long reads with adaptive sampling run in the 22Rv1 cell line. To identify the allele-specific methylation in the 22Rv1 cell line, we performed long-read based phasing and compared the 5mCG signals between the two haplotypes. As a result, we identified 6,390 haplotype-specific methylated regions in the 22Rv1 cell line (p-MWU ≤ 1e-5 and delta ≥ 50%). By examining haplotype-specific methylated regions near the phasing variants, we identified examples of allele-specific methylated regions that showed allelespecific accessibility in the ATAC-seq data. By further integrating the ATAC-seq data of 22Rv1, we found that methylation levels were negatively correlated with chromatin accessibility at the genome-wide scale. Our study has revealed native methylome profiling while preserving haplotype information, offering a novel approach to uncovering the regulatory mechanisms of the human prostate genome.
RESUMEN
Frontotemporal lobar degeneration with neuronal inclusions of the TAR DNA-binding protein 43 (FTLD-TDP) is a fatal neurodegenerative disorder with only a limited number of risk loci identified. We report our comprehensive genome-wide association study as part of the International FTLD-TDP Whole-Genome Sequencing Consortium, including 985 cases and 3,153 controls, and meta-analysis with the Dementia-seq cohort, compiled from 26 institutions/brain banks in the United States, Europe and Australia. We confirm UNC13A as the strongest overall FTLD-TDP risk factor and identify TNIP1 as a novel FTLD-TDP risk factor. In subgroup analyses, we further identify for the first time genome-wide significant loci specific to each of the three main FTLD-TDP pathological subtypes (A, B and C), as well as enrichment of risk loci in distinct tissues, brain regions, and neuronal subtypes, suggesting distinct disease aetiologies in each of the subtypes. Rare variant analysis confirmed TBK1 and identified VIPR1 , RBPJL , and L3MBTL1 as novel subtype specific FTLD-TDP risk genes, further highlighting the role of innate and adaptive immunity and notch signalling pathway in FTLD-TDP, with potential diagnostic and novel therapeutic implications.
RESUMEN
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses.