RESUMO
Importance: Pathogenic variants (PVs) in ATM, BRCA1, BRCA2, CHEK2 , and PALB2 are associated with increased breast cancer risk. However, it is unknown whether breast cancer risk differs by PV type or location in carriers ascertained from the general population. Objective: To evaluate breast cancer risks associated with PV type and location in ATM, BRCA1, BRCA2, CHEK2 , and PALB2 . Design: Age adjusted case-control association analysis for all participants, subsets of PV carriers, and women with no breast cancer family history in population-based and clinical testing cohorts. Setting: Twelve US population-based studies within the Cancer Risk Estimates Related to Susceptibility (CARRIERS) Consortium, and breast cancer cases from the UK-Biobank and an Ambry Genetics clinical testing cohort. Participants: 32,247 women with and 32,544 age-matched women without a breast cancer diagnosis from CARRIERS; 237 and 1351 women with BRCA2 PVs and breast cancer from the UKBB and Ambry Genetics, respectively. Exposures: PVs in ATM, BRCA1, BRCA2, CHEK2, and PALB2. Main Outcomes and Measures: PVs were grouped by type and location within genes and assessed for risks of breast cancer (odds ratios (OR), 95% confidence intervals (CI), and p-values) using logistic regression. Mean ages at diagnosis were compared using linear regression. Results: Compared to women carrying BRCA2 exon 11 protein truncating variants (PTVs) in the CARRIERS population-based study, women with BRCA2 ex13-27 PTVs (OR=2.7, 95%CI 1.1-7.9) and ex1-10 PTVs (OR=1.6, 95%CI 0.8-3.5) had higher breast cancer risks, lower rates of ER-negative breast cancer (ex13-27 OR=0.5, 95%CI 0.2-0.9; ex1-10 OR=0.5, 95%CI 0.1-1.0), and earlier age of breast cancer diagnosis (ex13-27 5.5 years, p<0.001; ex1-10 2.4 years, p=0.17). These associations with ER-negative breast cancer and age replicated in a high-risk clinical cohort and the population-based UK Biobank cohort. No differences in risk or age at diagnosis by gene region were observed for PTVs in other predisposition genes. Conclusions and Relevance: Population-based and clinical high-risk cohorts establish that PTVs in exon 11 of BRCA2 are associated with reduced risk of breast cancer, later age at diagnosis, and greater risk of ER-negative disease. These differential risks may improve individualized risk prediction and clinical management for women carrying BRCA2 PTVs. Key Points: Question: Does ATM , BRCA1 , BRCA2 , CHEK2 and PALB2 pathogenic variant type and location influence breast cancer risk in population-based studies? Findings: Breast cancer risk and estrogen receptor status differ based on the type and location of pathogenic variants in BRCA2 . Women carrying protein truncating variants in exon 11 have a lower breast cancer risk in the population-based cohorts, older age at diagnosis and higher rates of estrogen receptor negative breast cancer than women with exon 1-10 or exon 13-27 truncation variants in population-based and clinical testing cohorts. Meaning: Incorporating pathogenic variant type and location in cancer risk models may improve individualized risk prediction.
RESUMO
Clinical genetic testing identifies variants causal for hereditary cancer, information that is used for risk assessment and clinical management. Unfortunately, some variants identified are of uncertain clinical significance (VUS), complicating patient management. Case-control data is one evidence type used to classify VUS, and previous findings indicate that case-control likelihood ratios (LRs) outperform odds ratios for variant classification. As an initiative of the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) Analytical Working Group we analyzed germline sequencing data of BRCA1 and BRCA2 from 96,691 female breast cancer cases and 303,925 unaffected controls from three studies: the BRIDGES study of the Breast Cancer Association Consortium, the Cancer Risk Estimates Related to Susceptibility consortium, and the UK Biobank. We observed 11,227 BRCA1 and BRCA2 variants, with 6,921 being coding, covering 23.4% of BRCA1 and BRCA2 VUS in ClinVar and 19.2% of ClinVar curated (likely) benign or pathogenic variants. Case-control LR evidence was highly consistent with ClinVar assertions for (likely) benign or pathogenic variants; exhibiting 99.1% sensitivity and 95.4% specificity for BRCA1 and 92.2% sensitivity and 86.6% specificity for BRCA2. This approach provides case-control evidence for 785 unclassified variants, that can serve as a valuable element for clinical classification.
RESUMO
Genetic studies have identified numerous regions associated with plasma fibrinogen levels in Europeans, yet missing heritability and limited inclusion of non-Europeans necessitates further studies with improved power and sensitivity. Compared with array-based genotyping, whole genome sequencing (WGS) data provides better coverage of the genome and better representation of non-European variants. To better understand the genetic landscape regulating plasma fibrinogen levels, we meta-analyzed WGS data from the NHLBI's Trans-Omics for Precision Medicine (TOPMed) program (n=32,572), with array-based genotype data from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium (n=131,340) imputed to the TOPMed or Haplotype Reference Consortium panel. We identified 18 loci that have not been identified in prior genetic studies of fibrinogen. Of these, four are driven by common variants of small effect with reported MAF at least 10 percentage points higher in African populations. Three signals (SERPINA1, ZFP36L2, and TLR10) contain predicted deleterious missense variants. Two loci, SOCS3 and HPN, each harbor two conditionally distinct, non-coding variants. The gene region encoding the fibrinogen protein chain subunits (FGG;FGB;FGA), contains 7 distinct signals, including one novel signal driven by rs28577061, a variant common in African ancestry populations but extremely rare in Europeans (MAFAFR=0.180; MAFEUR=0.008). Through phenome-wide association studies in the VA Million Veteran Program, we found associations between fibrinogen polygenic risk scores and thrombotic and inflammatory disease phenotypes, including an association with gout. Our findings demonstrate the utility of WGS to augment genetic discovery in diverse populations and offer new insights for putative mechanisms of fibrinogen regulation.
RESUMO
DNA methylation (DNAm)-based deconvolution estimates contain relative data, forming a composition, that standard methods (testing directly on cell proportions) are ill-suited to handle. In this study we examined the performance of an alternative method, analysis of compositions of microbiomes (ANCOM), for the analysis of DNAm-based deconvolution estimates. We performed two different simulation studies comparing ANCOM to a standard approach (two sample t-test performed directly on cell proportions) and analyzed a real-world data from the Women's Health Initiative to evaluate the applicability of ANCOM to DNAm-based deconvolution estimates. Our findings indicate that ANCOM can effectively account for the compositional nature of DNAm-based deconvolution estimates. ANCOM adequately controls the false discovery rate while maintaining statistical power comparable to that of standard methods.
DNA methylation (DNAm)-based deconvolution provides highly accurate estimates of the proportion of each cell type in a mixed-cell type biological sample (e.g., whole-blood). These estimates can be used for examining the association between cell type proportions and biological or clinical end points; for example, comparing the estimated neutrophil proportion in whole blood between smokers and non-smokers. Cell proportion data has unique features which present challenges for traditional and widely used statistical methods. In response to this issue, our work presents two simulation studies and a real-world analysis that benchmark the performance of current standard statistical methods against an alternative method called analysis composition of microbes (ANCOM), which was originally developed for the analysis of microbiome data. In our real-world analysis we used DNAm data collected from Women's Health Initiative Long Life Study I and compared the results of each method against a gold-standard that is typically not available for these analyses. In each of our simulation studies, ANCOM was able to detect true differences in cell proportions between the groups being compared but had a much lower rate of false discovery compared with the standard statistical methods. Our real-world analysis demonstrated similar findings. Overall, our study highlights the potential of ANCOM as a powerful and robust method for analyzing DNAm-derived deconvolution estimates when the interest is comparisons of cell type proportions and biological or clinical end points. ANCOM's ability to minimize false discovery while maintaining robust statistical power positions it as a valuable addition to the epigenomic analysis toolkit.
Assuntos
Metilação de DNA , Humanos , Feminino , Microbiota/genética , Simulação por ComputadorRESUMO
Polygenic risk score (PRS) prediction of complex diseases can be improved by leveraging related phenotypes. This has motivated the development of several multi-trait PRS methods that jointly model information from genetically correlated traits. However, these methods do not account for vertical pleiotropy between traits, in which one trait acts as a mediator for another. Here, we introduce endoPRS, a weighted lasso model that incorporates information from relevant endophenotypes to improve disease risk prediction without making assumptions about the genetic architecture underlying the endophenotype-disease relationship. Through extensive simulation analysis, we demonstrate the robustness of endoPRS in a variety of complex genetic frameworks. We also apply endoPRS to predict the risk of childhood onset asthma in UK Biobank by leveraging a paired GWAS of eosinophil count, a relevant endophenotype. We find that endoPRS significantly improves prediction compared to many existing PRS methods, including multi-trait PRS methods, MTAG and wMT-BLUP, which suggests advantages of endoPRS in real-life clinical settings.
RESUMO
Clonal hematopoiesis of indeterminate potential (CHIP), whereby somatic mutations in hematopoietic stem cells confer a selective advantage and drive clonal expansion, not only correlates with age but also confers increased risk of morbidity and mortality. Here, we leverage genetically predicted traits to identify factors that determine CHIP clonal expansion rate. We used the passenger-approximated clonal expansion rate method to quantify the clonal expansion rate for 4,370 individuals in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) cohort and calculated polygenic risk scores for DNA methylation aging, inflammation-related measures and circulating protein levels. Clonal expansion rate was significantly associated with both genetically predicted and measured epigenetic clocks. No associations were identified with inflammation-related lab values or diseases and CHIP expansion rate overall. A proteome-wide search identified predicted circulating levels of myeloid zinc finger 1 and anti-Müllerian hormone as associated with an increased CHIP clonal expansion rate and tissue inhibitor of metalloproteinase 1 and glycine N-methyltransferase as associated with decreased CHIP clonal expansion rate. Together, our findings identify epigenetic and proteomic patterns associated with the rate of hematopoietic clonal expansion.
Assuntos
Hematopoiese Clonal , Epigênese Genética , Proteômica , Hematopoiese Clonal/genética , Humanos , Metilação de DNA , Feminino , Masculino , Células-Tronco Hematopoéticas/metabolismo , Pessoa de Meia-Idade , Proteoma/metabolismo , Proteoma/genética , Inibidor Tecidual de Metaloproteinase-1/genética , IdosoRESUMO
Clonal hematopoiesis (CH) is characterized by the acquisition of a somatic mutation in a hematopoietic stem cell that results in a clonal expansion. These driver mutations can be single nucleotide variants in cancer driver genes or larger structural rearrangements called mosaic chromosomal alterations (mCAs). The factors that influence the variations in mCA fitness and ultimately result in different clonal expansion rates are not well understood. We used the Passenger-Approximated Clonal Expansion Rate (PACER) method to estimate clonal expansion rate as PACER scores for 6,381 individuals in the NHLBI TOPMed cohort with gain, loss, and copy-neutral loss of heterozygosity mCAs. Our mCA fitness estimates, derived by aggregating per-individual PACER scores, were correlated (R2 = 0.49) with an alternative approach that estimated fitness of mCAs in the UK Biobank using population-level distributions of clonal fraction. Among individuals with JAK2 V617F clonal hematopoiesis of indeterminate potential or mCAs affecting the JAK2 gene on chromosome 9, PACER score was strongly correlated with erythrocyte count. In a cross-sectional analysis, genome-wide association study of estimates of mCA expansion rate identified a TCL1A locus variant associated with mCA clonal expansion rate, with suggestive variants in NRIP1 and TERT.
Assuntos
Aberrações Cromossômicas , Hematopoiese Clonal , Mosaicismo , Humanos , Hematopoiese Clonal/genética , Masculino , Feminino , Estudo de Associação Genômica Ampla , Janus Quinase 2/genética , Telomerase/genética , Telomerase/metabolismo , Perda de Heterozigosidade , Estudos Transversais , Mutação , Pessoa de Meia-Idade , Células-Tronco Hematopoéticas/metabolismo , Polimorfismo de Nucleotídeo Único , IdosoRESUMO
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Assuntos
Biomarcadores , Estudo de Associação Genômica Ampla , Inflamação , Medicina de Precisão , Sequenciamento Completo do Genoma , Humanos , Medicina de Precisão/métodos , Inflamação/genética , Estudo de Associação Genômica Ampla/métodos , Sequenciamento Completo do Genoma/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Predisposição Genética para Doença , Feminino , Interleucina-6/genéticaRESUMO
Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5. Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.
Assuntos
Estudo de Associação Genômica Ampla , Homeostase do Telômero , Telômero , Humanos , Telômero/genética , Telômero/metabolismo , Células K562 , Homeostase do Telômero/genética , Polimorfismo de Nucleotídeo Único , Regulação da Expressão Gênica , Sistemas CRISPR-CasRESUMO
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
Assuntos
Negro ou Afro-Americano , Estratificação de Risco Genético , Software , Humanos , Negro ou Afro-Americano/genética , Simulação por Computador , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Fatores de RiscoRESUMO
Overall survival probability for MDS patients who underwent allo-HCT and were matched to donors that are wild-type (red) and heterozygous (blue) for the rs111224634 SNP.
Assuntos
Doença Enxerto-Hospedeiro , Transplante de Células-Tronco Hematopoéticas , Síndromes Mielodisplásicas , Humanos , Síndromes Mielodisplásicas/genética , Síndromes Mielodisplásicas/terapia , Doadores de Tecidos , Condicionamento Pré-Transplante , Estudos RetrospectivosRESUMO
Clonal hematopoiesis (CH) is characterized by the acquisition of a somatic mutation in a hematopoietic stem cell that results in a clonal expansion. These driver mutations can be single nucleotide variants in cancer driver genes or larger structural rearrangements called mosaic chromosomal alterations (mCAs). The factors that influence the variations in mCA fitness and ultimately result in different clonal expansion rates are not well-understood. We used the Passenger-Approximated Clonal Expansion Rate (PACER) method to estimate clonal expansion rate for 6,381 individuals in the NHLBI TOPMed cohort with gain, loss, and copy-neutral loss of heterozygosity mCAs. Our estimates of mCA fitness were correlated (R 2 = 0.49) with an alternative approach that estimated fitness of mCAs in the UK Biobank using a theoretical probability distribution. Individuals with lymphoid-associated mCAs had a significantly higher white blood cell count and faster clonal expansion rate. In a cross-sectional analysis, genome-wide association study of estimates of mCA expansion rate identified TCL1A , NRIP1 , and TERT locus variants as modulators of mCA clonal expansion rate.
RESUMO
Mitochondria carry their own circular genome and disruption of the mitochondrial genome is associated with various aging-related diseases. Unlike the nuclear genome, mitochondrial DNA (mtDNA) can be present at 1000 s to 10,000 s copies in somatic cells and variants may exist in a state of heteroplasmy, where only a fraction of the DNA molecules harbors a particular variant. We quantify mtDNA heteroplasmy in 194,871 participants in the UK Biobank and find that heteroplasmy is associated with a 1.5-fold increased risk of all-cause mortality. Additionally, we functionally characterize mtDNA single nucleotide variants (SNVs) using a constraint-based score, mitochondrial local constraint score sum (MSS) and find it associated with all-cause mortality, and with the prevalence and incidence of cancer and cancer-related mortality, particularly leukemia. These results indicate that mitochondria may have a functional role in certain cancers, and mitochondrial heteroplasmic SNVs may serve as a prognostic marker for cancer, especially for leukemia.
Assuntos
Leucemia , Mitocôndrias , Humanos , Mitocôndrias/genética , DNA Mitocondrial/genética , Heteroplasmia , Leucemia/genética , MutaçãoRESUMO
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
RESUMO
The effects of assortative mating (AM) on estimates from genetic studies has been receiving increasing attention in recent years. We extend existing AM theory to more general models of sorting and conclude that correct theory-based AM adjustments require knowledge of complicated, unknown historical sorting patterns. We propose a simple, general-purpose approach using polygenic indexes (PGIs). Our approach can estimate the fraction of genetic variance and genetic correlation that is driven by AM. Our approach is less effective when applied to Mendelian randomization (MR) studies for two reasons: AM can induce a form of selection bias in MR studies that remains after our adjustment; and, in the MR context, the adjustment is particularly sensitive to PGI estimation error. Using data from the UK Biobank, we find that AM inflates genetic correlation estimates between health traits and education by 14% on average. Our results suggest caution in interpreting genetic correlations or MR estimates for traits subject to AM.
RESUMO
BACKGROUND: Sickle cell trait affects approximately 8% of Black individuals in the United States, along with many other individuals with ancestry from malaria-endemic regions worldwide. While traditionally considered a benign condition, recent evidence suggests that sickle cell trait is associated with lower eGFR and higher risk of kidney diseases, including kidney failure. The mechanisms underlying these associations remain poorly understood. We used proteomic profiling to gain insight into the pathobiology of sickle cell trait. METHODS: We measured proteomics ( N =1285 proteins assayed by Olink Explore) using baseline plasma samples from 592 Black participants with sickle cell trait and 1:1 age-matched Black participants without sickle cell trait from the prospective Women's Health Initiative cohort. Age-adjusted linear regression was used to assess the association between protein levels and sickle cell trait. RESULTS: In age-adjusted models, 35 proteins were significantly associated with sickle cell trait after correction for multiple testing. Several of the sickle cell trait-protein associations were replicated in Black participants from two independent cohorts (Atherosclerosis Risk in Communities study and Jackson Heart Study) assayed using an orthogonal aptamer-based proteomic platform (SomaScan). Many of the validated sickle cell trait-associated proteins are known biomarkers of kidney function or injury ( e.g. , hepatitis A virus cellular receptor 1 [HAVCR1]/kidney injury molecule-1 [KIM-1], uromodulin [UMOD], ephrins), related to red cell physiology or hemolysis (erythropoietin [EPO], heme oxygenase 1 [HMOX1], and α -hemoglobin stabilizing protein) and/or inflammation (fractalkine, C-C motif chemokine ligand 2/monocyte chemoattractant protein-1 [MCP-1], and urokinase plasminogen activator surface receptor [PLAUR]). A protein risk score constructed from the top sickle cell trait-associated biomarkers was associated with incident kidney failure among those with sickle cell trait during Women's Health Initiative follow-up (odds ratio, 1.32; 95% confidence interval, 1.10 to 1.58). CONCLUSIONS: We identified and replicated the association of sickle cell trait with a number of plasma proteins related to hemolysis, kidney injury, and inflammation.
Assuntos
Insuficiência Renal , Traço Falciforme , Humanos , Feminino , Estados Unidos , Proteoma , Estudos Prospectivos , Hemólise , Proteômica , Biomarcadores , InflamaçãoRESUMO
BACKGROUND: Genome-wide studies of gene-environment interactions (G×E) may identify variants associated with disease risk in conjunction with lifestyle/environmental exposures. We conducted a genome-wide G×E analysis of ~ 7.6 million common variants and seven lifestyle/environmental risk factors for breast cancer risk overall and for estrogen receptor positive (ER +) breast cancer. METHODS: Analyses were conducted using 72,285 breast cancer cases and 80,354 controls of European ancestry from the Breast Cancer Association Consortium. Gene-environment interactions were evaluated using standard unconditional logistic regression models and likelihood ratio tests for breast cancer risk overall and for ER + breast cancer. Bayesian False Discovery Probability was employed to assess the noteworthiness of each SNP-risk factor pairs. RESULTS: Assuming a 1 × 10-5 prior probability of a true association for each SNP-risk factor pairs and a Bayesian False Discovery Probability < 15%, we identified two independent SNP-risk factor pairs: rs80018847(9p13)-LINGO2 and adult height in association with overall breast cancer risk (ORint = 0.94, 95% CI 0.92-0.96), and rs4770552(13q12)-SPATA13 and age at menarche for ER + breast cancer risk (ORint = 0.91, 95% CI 0.88-0.94). CONCLUSIONS: Overall, the contribution of G×E interactions to the heritability of breast cancer is very small. At the population level, multiplicative G×E interactions do not make an important contribution to risk prediction in breast cancer.
Assuntos
Neoplasias da Mama , Interação Gene-Ambiente , Adulto , Feminino , Humanos , Predisposição Genética para Doença , Neoplasias da Mama/etiologia , Neoplasias da Mama/genética , Teorema de Bayes , Estudo de Associação Genômica Ampla , Fatores de Risco , Polimorfismo de Nucleotídeo Único , Estudos de Casos e ControlesRESUMO
In this study, the asymptotic distributions of the likelihood ratio test (LRT), the restricted likelihood ratio test (RLRT), the F and the sequence kernel association test (SKAT) statistics for testing an additive effect of the expected familial relatedness (FR) in a linear mixed model are examined based on an eigenvalue approach. First, the covariance structure for modeling the FR effect in a LMM is presented. Then, the multiplicity of eigenvalues for the log-likelihood and restricted log-likelihood is established under a replicate family setting and extended to a more general replicate family setting (GRFS) as well. After that, the asymptotic null distributions of LRT, RLRT, F and SKAT statistics under GRFS are derived. The asymptotic null distribution of SKAT for testing genetic rare variants is also constructed. In addition, a simple formula for sample size calculation is provided based on the restricted maximum likelihood estimate of the effect size for the expected FR. Finally, a power comparison of these test statistics on hypothesis test of the expected FR effect is made via simulation. The four test statistics are also applied to a data set from the UK Biobank.
Assuntos
Modelos Genéticos , Humanos , Funções Verossimilhança , Simulação por Computador , Modelos LinearesRESUMO
BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Genótipo , Frequência do Gene , Modelos EstatísticosRESUMO
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.