RESUMEN
The HostSeq initiative recruited 10,059 Canadians infected with SARS-CoV-2 between March 2020 and March 2023, obtained clinical information on their disease experience and whole genome sequenced (WGS) their DNA. We analyzed the WGS data for genetic contributors to severe COVID-19 (considering 3,499 hospitalized cases and 4,975 non-hospitalized after quality control). We investigated the evidence for replication of loci reported by the International Host Genetics Initiative (HGI); analyzed the X chromosome; conducted rare variant gene-based analysis and polygenic risk score testing. Population stratification was adjusted for using meta-analysis across ancestry groups. We replicated two loci identified by the HGI for COVID-19 severity: the LZTFL1/SLC6A20 locus on chromosome 3 and the FOXP4 locus on chromosome 6 (the latter with a variant significant at P < 5E-8). We found novel significant associations with MRAS and WDR89 in gene-based analyses, and constructed a polygenic risk score that explained 1.01% of the variance in severe COVID-19. This study provides independent evidence confirming the robustness of previously identified COVID-19 severity loci by the HGI and identifies novel genes for further investigation.
Asunto(s)
COVID-19 , Pueblos de América del Norte , Humanos , COVID-19/genética , SARS-CoV-2/genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Canadá/epidemiología , Estudio de Asociación del Genoma Completo , Proteínas de Transporte de Membrana , Factores de Transcripción ForkheadRESUMEN
INTRODUCTION: When a study sample includes a large proportion of long-term survivors, mixture cure (MC) models that separately assess biomarker associations with long-term recurrence-free survival and time to disease recurrence are preferred to proportional-hazards models. However, in samples with few recurrences, standard maximum likelihood can be biased. OBJECTIVE AND METHODS: We extend Firth-type penalized likelihood (FT-PL) developed for bias reduction in the exponential family to the Weibull-logistic MC, using the Jeffreys invariant prior. Via simulation studies based on a motivating cohort study, we compare parameter estimates of the FT-PL method to those by ML, as well as type 1 error (T1E) and power obtained using likelihood ratio statistics. RESULTS: In samples with relatively few events, the Firth-type penalized likelihood estimates (FT-PLEs) have mean bias closer to zero and smaller mean squared error than maximum likelihood estimates (MLEs), and can be obtained in samples where the MLEs are infinite. Under similar T1E rates, FT-PL consistently exhibits higher statistical power than ML in samples with few events. In addition, we compare FT-PL estimation with two other penalization methods (a log-F prior method and a modified Firth-type method) based on the same simulations. DISCUSSION: Consistent with findings for logistic and Cox regressions, FT-PL under MC regression yields finite estimates under stringent conditions, and better bias-and-variance balance than the other two penalizations. The practicality and strength of FT-PL for MC analysis is illustrated in a cohort study of breast cancer prognosis with long-term follow-up for recurrence-free survival.
Asunto(s)
Recurrencia Local de Neoplasia , Humanos , Estudios de Cohortes , Funciones de Verosimilitud , Simulación por Computador , Modelos de Riesgos ProporcionalesRESUMEN
Next generation sequencing technologies have made it possible to investigate the role of rare variants (RVs) in disease etiology. Because RVs associated with disease susceptibility tend to be enriched in families with affected individuals, study designs based on affected sib pairs (ASP) can be more powerful than case-control studies. We construct tests of RV-set association in ASPs for single genomic regions as well as for multiple regions. Single-region tests can efficiently detect a gene region harboring susceptibility variants, while multiple-region extensions are meant to capture signals dispersed across a biological pathway, potentially as a result of locus heterogeneity. Within ascertained ASPs, the test statistics contrast the frequencies of duplicate rare alleles (usually appearing on a shared haplotype) against frequencies of a single rare allele copy (appearing on a nonshared haplotype); we call these allelic parity tests. Incorporation of minor allele frequency estimates from reference populations can markedly improve test efficiency. Under various genetic penetrance models, application of the tests in simulated ASP data sets demonstrates good type I error properties as well as power gains over approaches that regress ASP rare allele counts on sharing state, especially in small samples. We discuss robustness of the allelic parity methods to the presence of genetic linkage, misspecification of reference population allele frequencies, sequencing error and de novo mutations, and population stratification. As proof of principle, we apply single- and multiple-region tests in a motivating study data set consisting of whole exome sequencing of sisters ascertained with early onset breast cancer.
Asunto(s)
Variación Genética , Modelos Genéticos , Alelos , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Cromosomas Humanos Par 1 , Femenino , Frecuencia de los Genes , Heterogeneidad Genética , Ligamiento Genético , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Modelos de Riesgos ProporcionalesRESUMEN
In this work, we propose a single nucleotide polymorphism set association test for survival phenotypes in the presence of a non-susceptible fraction. We consider a mixture model with a logistic regression for the susceptibility indicator and a proportional hazards regression to model survival in the susceptible group. We propose a joint test to assess the significance of the genetic variant in both logistic and survival regressions simultaneously. We adopt the spirit of SKAT and conduct a variance-component test treating the genetic effects of multiple variants as random. We derive score-type test statistics, and we investigate several approaches to compute their $p$-values. The finite-sample properties of the proposed tests are assessed and compared to existing approaches by simulations and their use is illustrated through an application to ovarian cancer data from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2.
Asunto(s)
Susceptibilidad a Enfermedades , Modelos Genéticos , Modelos Estadísticos , Análisis de Supervivencia , Proteína BRCA2/genética , Femenino , Humanos , Neoplasias Ováricas/genética , Neoplasias Ováricas/mortalidad , Polimorfismo de Nucleótido Simple , Ubiquitina-Proteína Ligasas/genéticaRESUMEN
Post-GWAS analysis, in many cases, focuses on fine-mapping targeted genetic regions discovered at GWAS-stage; that is, the aim is to pinpoint potential causal variants and susceptibility genes for complex traits and disease outcomes using next-generation sequencing (NGS) technologies. Large-scale GWAS cohorts are necessary to identify target regions given the typically modest genetic effect sizes. In this context, two-phase sampling design and analysis is a cost-reduction technique that utilizes data collected during phase 1 GWAS to select an informative subsample for phase 2 sequencing. The main goal is to make inference for genetic variants measured via NGS by efficiently combining data from phases 1 and 2. We propose two approaches for selecting a phase 2 design under a budget constraint. The first method identifies sampling fractions that select a phase 2 design yielding an asymptotic variance covariance matrix with certain optimal characteristics, for example, smallest trace, via Lagrange multipliers (LM). The second relies on a genetic algorithm (GA) with a defined fitness function to identify exactly a phase 2 subsample. We perform comprehensive simulation studies to evaluate the empirical properties of the proposed designs for a genetic association study of a quantitative trait. We compare our methods against two ranked designs: residual-dependent sampling and a recently identified optimal design. Our findings demonstrate that the proposed designs, GA in particular, can render competitive power in combined phase 1 and 2 analysis compared with alternative designs while preserving type 1 error control. These results are especially evident under the more practical scenario where design values need to be defined a priori and are subject to misspecification. We illustrate the proposed methods in a study of triglyceride levels in the North Finland Birth Cohort of 1966. R code to reproduce our results is available at github.com/egosv/TwoPhase_postGWAS.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudios de Asociación Genética , Genotipo , Humanos , FenotipoRESUMEN
SUMMARY: For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image. The gpart functions facilitate construction of LD blocks and SNP partitions for vast amounts of genome sequencing data within reasonable time and memory limits in personal computing environments. AVAILABILITY AND IMPLEMENTATION: The R package is available at https://bioconductor.org/packages/gpart. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Haplotipos , Humanos , Desequilibrio de Ligamiento , Programas InformáticosRESUMEN
BACKGROUND: Although diabetic kidney disease demonstrates both familial clustering and single nucleotide polymorphism heritability, the specific genetic factors influencing risk remain largely unknown. METHODS: To identify genetic variants predisposing to diabetic kidney disease, we performed genome-wide association study (GWAS) analyses. Through collaboration with the Diabetes Nephropathy Collaborative Research Initiative, we assembled a large collection of type 1 diabetes cohorts with harmonized diabetic kidney disease phenotypes. We used a spectrum of ten diabetic kidney disease definitions based on albuminuria and renal function. RESULTS: Our GWAS meta-analysis included association results for up to 19,406 individuals of European descent with type 1 diabetes. We identified 16 genome-wide significant risk loci. The variant with the strongest association (rs55703767) is a common missense mutation in the collagen type IV alpha 3 chain (COL4A3) gene, which encodes a major structural component of the glomerular basement membrane (GBM). Mutations in COL4A3 are implicated in heritable nephropathies, including the progressive inherited nephropathy Alport syndrome. The rs55703767 minor allele (Asp326Tyr) is protective against several definitions of diabetic kidney disease, including albuminuria and ESKD, and demonstrated a significant association with GBM width; protective allele carriers had thinner GBM before any signs of kidney disease, and its effect was dependent on glycemia. Three other loci are in or near genes with known or suggestive involvement in this condition (BMP7) or renal biology (COLEC11 and DDR1). CONCLUSIONS: The 16 diabetic kidney disease-associated loci may provide novel insights into the pathogenesis of this condition and help identify potential biologic targets for prevention and treatment.
Asunto(s)
Autoantígenos/genética , Colágeno Tipo IV/genética , Diabetes Mellitus Tipo 1/genética , Nefropatías Diabéticas/genética , Estudio de Asociación del Genoma Completo , Membrana Basal Glomerular , Mutación , Estudios de Cohortes , Femenino , Humanos , MasculinoRESUMEN
We evaluate two-phase designs to follow-up findings from genome-wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation-maximization-based inference under a semiparametric maximum likelihood formulation tailored for post-GWAS inference. A GWAS-SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT-SNP-dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme-QT strata yields significant power improvements compared to marginal QT- or SNP-based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure.
Asunto(s)
Estudio de Asociación del Genoma Completo , Genotipo , Funciones de Verosimilitud , Carácter Cuantitativo Heredable , Análisis de Secuencia de ADN , Algoritmos , Presión Sanguínea/genética , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Motivation: Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation. Results: We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive. Big-LD uses an agglomerative approach that starts by identifying small communities of SNPs, i.e. the SNPs in each LD bin region, and proceeds by merging these communities. We determine the number of blocks using a method to find maximum-weight independent set. Big-LD produces larger LD blocks compared to existing methods such as MATILDE, Haploview, MIG ++, or S-MIG ++ and the LD blocks better agree with recombination hotspot locations determined by sperm-typing experiments. The observed average runtime of Big-LD for 13 288 240 non-monomorphic SNPs from 1000 Genomes Project autosome data (286 East Asians) is about 5.83 h, which is a significant improvement over the existing methods. Availability and implementation: Source code and documentation are available for download at http://github.com/sunnyeesl/BigLD. Contact: yyoo@snu.ac.kr. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genética de Población/métodos , Genoma Humano , Haplotipos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Algoritmos , Pueblo Asiatico/genética , Humanos , Desequilibrio de Ligamiento , Modelos GenéticosRESUMEN
AIMS/HYPOTHESIS: The aim of this study was to identify genetic variants associated with beta cell function in type 1 diabetes, as measured by serum C-peptide levels, through meta-genome-wide association studies (meta-GWAS). METHODS: We performed a meta-GWAS to combine the results from five studies in type 1 diabetes with cross-sectionally measured stimulated, fasting or random C-peptide levels, including 3479 European participants. The p values across studies were combined, taking into account sample size and direction of effect. We also performed separate meta-GWAS for stimulated (n = 1303), fasting (n = 2019) and random (n = 1497) C-peptide levels. RESULTS: In the meta-GWAS for stimulated/fasting/random C-peptide levels, a SNP on chromosome 1, rs559047 (Chr1:238753916, T>A, minor allele frequency [MAF] 0.24-0.26), was associated with C-peptide (p = 4.13 × 10-8), meeting the genome-wide significance threshold (p < 5 × 10-8). In the same meta-GWAS, a locus in the MHC region (rs9260151) was close to the genome-wide significance threshold (Chr6:29911030, C>T, MAF 0.07-0.10, p = 8.43 × 10-8). In the stimulated C-peptide meta-GWAS, rs61211515 (Chr6:30100975, T/-, MAF 0.17-0.19) in the MHC region was associated with stimulated C-peptide (ß [SE] = - 0.39 [0.07], p = 9.72 × 10-8). rs61211515 was also associated with the rate of stimulated C-peptide decline over time in a subset of individuals (n = 258) with annual repeated measures for up to 6 years (p = 0.02). In the meta-GWAS of random C-peptide, another MHC region, SNP rs3135002 (Chr6:32668439, C>A, MAF 0.02-0.06), was associated with C-peptide (p = 3.49 × 10-8). Conditional analyses suggested that the three identified variants in the MHC region were independent of each other. rs9260151 and rs3135002 have been associated with type 1 diabetes, whereas rs559047 and rs61211515 have not been associated with a risk of developing type 1 diabetes. CONCLUSIONS/INTERPRETATION: We identified a locus on chromosome 1 and multiple variants in the MHC region, at least some of which were distinct from type 1 diabetes risk loci, that were associated with C-peptide, suggesting partly non-overlapping mechanisms for the development and progression of type 1 diabetes. These associations need to be validated in independent populations. Further investigations could provide insights into mechanisms of beta cell loss and opportunities to preserve beta cell function.
Asunto(s)
Péptido C/sangre , Cromosomas Humanos Par 1/genética , Diabetes Mellitus Tipo 1/genética , Estudio de Asociación del Genoma Completo , Antígenos de Histocompatibilidad Clase I/genética , Adolescente , Adulto , Alelos , Estudios Transversales , Diabetes Mellitus Tipo 1/sangre , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Células Secretoras de Insulina/metabolismo , Masculino , Polimorfismo de Nucleótido Simple , Adulto JovenRESUMEN
By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster-specific effects in a quadratic sum of squares and cross-products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well-powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P-value, variance-component, and principal-component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene-specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome-wide analysis. The cluster construction of the MLC test statistics helps reveal within-gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations.
Asunto(s)
Marcadores Genéticos/genética , Haplotipos/genética , Modelos Lineales , Desequilibrio de Ligamiento , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Humanos , Fenotipo , Sitios de Carácter CuantitativoRESUMEN
BACKGROUND: We previously observed that T-bet+ tumor-infiltrating T lymphocytes (T-bet+ TILs) in primary breast tumors were associated with adverse clinicopathological features, yet favorable clinical outcome. We identified BRD4 (Bromodomain-Containing Protein 4), a member of the Bromodomain and Extra Terminal domain (BET) family, as a gene that distinguished T-bet+/high and T-bet-/low tumors. In clinical studies, BET inhibitors have been shown to suppress inflammation in various cancers, suggesting a potential link between BRD4 and immune infiltration in cancer. Hence, we examined the BRD4 expression and clinicopathological features of breast cancer. METHODS: The cohort consisted of a prospectively ascertained consecutive series of women with axillary node-negative breast cancer with long follow-up. Gene expression microarray data were used to detect mRNAs differentially expressed between T-bet+/high (n = 6) and T-bet-/low (n = 41) tumors. Tissue microarrays (TMAs) constructed from tumors of 612 women were used to quantify expression of BRD4 by immunohistochemistry, which was analyzed for its association with T-bet+ TILs, Jagged1, clinicopathological features, and disease-free survival. RESULTS: Microarray analysis indicated that BRD4 mRNA expression was up to 44-fold higher in T-bet+/high tumors compared to T-bet-/low tumors (p = 5.38E-05). Immunohistochemical expression of BRD4 in cancer cells was also shown to be associated with T-bet+ TILs (p = 0.0415) as well as with Jagged1 mRNA and protein expression (p = 0.0171, 0.0010 respectively). BRD4 expression correlated with larger tumor size (p = 0.0049), pre-menopausal status (p = 0.0018), and high Ki-67 proliferative index (p = 0.0009). Women with high tumoral BRD4 expression in the absence of T-bet+ TILs exhibited a significantly poorer outcome (log rank test p = 0.0165) relative to other subgroups. CONCLUSIONS: The association of BRD4 expression with T-bet+ TILs, and T-bet+ TIL-dependent disease-free survival suggests a potential link between BRD4-mediated tumor development and tumor immune surveillance, possibly through BRD4's regulation of Jagged1 signaling pathways. Further understanding BRD4's role in different immune contexts may help to identify an appropriate subset of breast cancer patients who may benefit from BET inhibitors without the risk of diminishing the anti-tumoral immune activity.
Asunto(s)
Neoplasias de la Mama/mortalidad , Linfocitos Infiltrantes de Tumor/inmunología , Proteínas Nucleares/fisiología , Proteínas de Dominio T Box/análisis , Factores de Transcripción/fisiología , Neoplasias de la Mama/inmunología , Neoplasias de la Mama/patología , Proteínas de Ciclo Celular , Supervivencia sin Enfermedad , Femenino , Humanos , Inmunohistoquímica , Proteína Jagged-1/fisiología , Ganglios Linfáticos/patología , Proteínas Nucleares/análisis , Proteínas Nucleares/genética , Estudios Prospectivos , Factores de Transcripción/análisis , Factores de Transcripción/genéticaRESUMEN
The "winner's curse" is a subtle and difficult problem in interpretation of genetic association, in which association estimates from large-scale gene detection studies are larger in magnitude than those from subsequent replication studies. This is practically important because use of a biased estimate from the original study will yield an underestimate of sample size requirements for replication, leaving the investigators with an underpowered study. Motivated by investigation of the genetics of type 1 diabetes complications in a longitudinal cohort of participants in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Genetics Study, we apply a bootstrap resampling method in analysis of time to nephropathy under a Cox proportional hazards model, examining 1,213 single-nucleotide polymorphisms (SNPs) in 201 candidate genes custom genotyped in 1,361 white probands. Among 15 top-ranked SNPs, bias reduction in log hazard ratio estimates ranges from 43.1% to 80.5%. In simulation studies based on the observed DCCT/EDIC genotype data, genome-wide bootstrap estimates for false-positive SNPs and for true-positive SNPs with low-to-moderate power are closer to the true values than uncorrected naïve estimates, but tend to overcorrect SNPs with high power. This bias-reduction technique is generally applicable for complex trait studies including quantitative, binary, and time-to-event traits.
Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Sesgo , Diabetes Mellitus Tipo 1/complicaciones , Diabetes Mellitus Tipo 1/epidemiología , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 1/terapia , Reacciones Falso Positivas , Femenino , Genotipo , Humanos , Enfermedades Renales/complicaciones , Enfermedades Renales/genética , Enfermedades Renales/patología , Masculino , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Modelos de Riesgos Proporcionales , Riesgo , Tamaño de la Muestra , Factores de TiempoRESUMEN
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Teóricos , Polimorfismo de Nucleótido Simple/genética , Neoplasias de la Mama/genética , Femenino , Genotipo , Humanos , Masculino , Neoplasias de la Próstata/genética , Tamaño de la MuestraRESUMEN
In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies.
Asunto(s)
Estudio de Asociación del Genoma Completo , Teorema de Bayes , Mapeo Cromosómico , Simulación por Computador , Genoma Humano , Genotipo , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , ProbabilidadRESUMEN
We investigated the association of signals from previous GWAS and candidate gene meta-analyses for diabetic retinopathy (DR) or nephropathy (DN), as well as an EPO variant in meta-analyses of severe (SDR) and mild diabetic retinopathy (MDR). Meta-analyses of SDR (≥severe non-proliferative diabetic retinopathy (NPDR) or history of panretinal photocoagulation) and MDR (≥mild NPDR), defined based on seven-field stereoscopic fundus photographs, were performed in two well-characterized type 1 diabetes (T1D) cohorts: the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC, n = 1,304) and Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR, n = 603). Among 34 previous signals for DR, after controlling for multiple testing, no association was replicated in our meta-analyses. rs1571942 and rs12219125 at PLXDC2 locus showed nominally significant (<0.05) association with SDR in the same direction as previous report, as did rs1801282 in PPARG gene with MDR. Among 55 loci previously associated with DN, three showed suggestive associations with SDR in our study without maintaining significance after correction for multiple testing. Of particular interest, rs1617640 (EPO) was not significantly associated with DR status, combined SDR-DN phenotype, time to SDR or time to DN (all P > 0.05). Lack of replication of previous DR hits and EPO despite reasonable statistical power implies that many of these may be false positives. Consistent with pleiotropy, we provide suggestive collective evidence for association between DR and variants previously associated with DN without reaching statistical significance at any single locus.
Asunto(s)
Diabetes Mellitus Tipo 1/genética , Retinopatía Diabética/genética , Eritropoyetina/genética , Sitios Genéticos , Polimorfismo Genético , Receptores de Superficie Celular/genética , Ensayos Clínicos como Asunto , Femenino , Humanos , MasculinoRESUMEN
BACKGROUND: Menacalc is an immunofluorescence-based, quantitative method in which expression of the non-invasive Mena protein isoform (Mena11a) is subtracted from total Mena protein expression. Previous work has found a significant positive association between Menacalc and risk of death from breast cancer. Our goal was to determine if Menacalc could be used as an independent prognostic marker for axillary node-negative (ANN) breast cancer. METHODS: Analysis of the association of Menacalc with overall survival (death from any cause) was performed for 403 ANN tumors using Kaplan Meier survival curves and the univariate Cox proportional hazards (PH) model with the log-rank or the likelihood ratio test. Cox PH models were used to estimate hazard ratios (HRs) for the association of Menacalc with risk of death after adjustment for HER2 status and clinicopathological tumor features. RESULTS: High Menacalc was associated with increased risk of death from any cause (P=0.0199, HR (CI)=2.18 (1.19, 4.00)). A similarly elevated risk of death was found in the subset of the Menacalc cohort which did not receive hormone or chemotherapy (n=142) (P=0.0052, HR (CI)=3.80 (1.58, 9.97)). There was a trend toward increased risk of death with relatively high Menacalc in the HER2, basal and luminal molecular subtypes. CONCLUSIONS: Menacalc may serve as an independent prognostic biomarker for the ANN breast cancer patient population.
Asunto(s)
Biomarcadores de Tumor/biosíntesis , Neoplasias de la Mama/genética , Proteínas de Microfilamentos/biosíntesis , Anciano , Biomarcadores de Tumor/genética , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Estimación de Kaplan-Meier , Proteínas de Microfilamentos/genética , Persona de Mediana Edad , Metástasis de la Neoplasia , Pronóstico , Isoformas de Proteínas/biosíntesis , Isoformas de Proteínas/genética , Receptor ErbB-2/genéticaRESUMEN
The objectives of this study were to determine the prognostic significance of subgrouping estrogen receptor (ER)-positive breast tumors into low- and high-risk luminal categories using Ki67 index, TP53, or progesterone receptor (PR) status. The study group comprised 540 patients with lymph node negative, invasive breast carcinoma. Luminal A subtype was defined as being ER positive, HER2 negative, and Ki67 low (<14% cells positive) and luminal B subtype as being ER positive, HER2 negative, and Ki67 high (≥ 14% cells positive). Luminal tumors were also subgrouped into risk categories based on the PR and TP53 status. Survival analysis was performed. Patients with luminal B tumors (n=173) had significantly worse disease-free survival compared to those with luminal A tumors (n=186) (log rank P-value=0.0164; univariate Cox regression relative risk 2.00; 95% CI, 1.12-3.58; P=0.0187). Luminal subtype remained an independent prognostic indicator on multivariate analysis including traditional prognostic factors (relative risk 2.12; 95% CI, 1.16-3.88; P=0.0151). Using TP53 status or PR negativity rather than Ki67 to classify ER-positive luminal tumors gave similar outcome results to those obtained using the proliferation index. However, it was a combination of the three markers, which proved the most powerful prognostically. Ki67 index, TP53 status, or PR negativity can be used to segregate ER-positive, HER2-negative tumors into prognostically meaningful subgroups with significantly different clinical outcomes. These biomarkers particularly in combination may potentially be used clinically to guide patient management.
Asunto(s)
Neoplasias de la Mama/química , Carcinoma/química , Antígeno Ki-67/análisis , Receptores de Progesterona/análisis , Proteína p53 Supresora de Tumor/análisis , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Neoplasias de la Mama/terapia , Carcinoma/clasificación , Carcinoma/mortalidad , Carcinoma/patología , Carcinoma/terapia , Distribución de Chi-Cuadrado , Diagnóstico Diferencial , Supervivencia sin Enfermedad , Femenino , Humanos , Inmunohistoquímica , Estimación de Kaplan-Meier , Análisis Multivariante , Invasividad Neoplásica , Ontario , Valor Predictivo de las Pruebas , Modelos de Riesgos Proporcionales , Estudios Prospectivos , Receptor ErbB-2/análisis , Receptores de Estrógenos/análisis , Factores de Riesgo , Factores de TiempoRESUMEN
OBJECTIVES: We conducted a case-control study to determine the association between KIR2D and KIR3D gene polymorphisms and their interaction with HLA alleles in PsA. METHODS: A total of 678 subjects with PsA and 688 healthy controls were studied. Differences between cases and controls in the frequency of individual KIR polymorphisms were tested for significance by an asymptotic χ(2) test and Fisher's exact test. Trends for increasing susceptibility to PsA from combined genotypes (HLA-KIR and HLA) were evaluated by the Cochran-Armitage trend test. Multigene logistic regression analysis was conducted to identify independent associations and interactions. RESULTS: In univariate analyses, KIR2DL2 and KIR2DS2 polymorphisms were significantly associated with PsA. Only KIR2DS2 was associated with PsA compared with healthy controls in multivariate analysis [odds ratio (OR) 1.25, 95% CI 1.01, 1.54, P = 0.044]. The presence of HLA-C group 2 alleles was associated with a higher risk of PsA (trend test P = 0.006). The risk of PsA is higher when KIR2DS2 is present with the HLA-C ligands (C group 1) for the corresponding inhibitory KIRs, and is highest when KIR2DS2 is present in the absence of HLA-C ligands for homologous inhibitor KIRs, compared with the state when KIR2DS2 is absent (trend test P = 0.027). The presence of HLA-C alleles that have high cell surface expression was also associated with a higher risk of PsA (trend test P < 0.001). HLA-B Bw4 and HLA-B Bw4 80ile allele groups were associated with a higher PsA risk (trend test P < 0.0001 for both analyses). CONCLUSION: This study confirms the association of the KIR2DS gene, especially KIR2DS2, with PsA.
Asunto(s)
Artritis Psoriásica/genética , Receptores KIR2DL2/genética , Receptores KIR/genética , Adulto , Alelos , Estudios de Casos y Controles , Femenino , Predisposición Genética a la Enfermedad/genética , Genotipo , Antígenos HLA-C/genética , Humanos , Masculino , Persona de Mediana Edad , Análisis Multivariante , Polimorfismo Genético , Análisis de RegresiónRESUMEN
By systematic examination of common tag single-nucleotide polymorphisms (SNPs) across the genome, the genome-wide association study (GWAS) has proven to be a successful approach to identify genetic variants that are associated with complex diseases and traits. Although the per base pair cost of sequencing has dropped dramatically with the advent of the next-generation technologies, it may still only be feasible to obtain DNA sequence data for a portion of available study subjects due to financial constraints. Two-phase sampling designs have been used frequently in large-scale surveys and epidemiological studies where certain variables are too costly to be measured on all subjects. We consider two-phase stratified sampling designs for genetic association, in which tag SNPs for candidate genes or regions are genotyped on all subjects in phase 1, and a proportion of subjects are selected into phase 2 based on genotypes at one or more tag SNPs. Deep sequencing in the region is then applied to genotype phase 2 subjects at sequence SNPs. We investigate alternative sampling designs for selection of phase 2 subjects within strata defined by tag SNP genotypes and develop methods of inference for sequence SNP variant associations using data from both phases. In comparison to methods that use data from phase 2 alone, the combined analysis improves efficiency.