RESUMEN
Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.
Asunto(s)
Población Negra/genética , Predisposición Genética a la Enfermedad , Genoma Humano/genética , Genómica , Femenino , Frecuencia de los Genes/genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Polimorfismo de Nucleótido Simple/genética , Uganda/epidemiología , Secuenciación Completa del GenomaRESUMEN
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Genómica , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisión , Citocromo P-450 CYP2D6/genética , Haplotipos/genética , Heterocigoto , Humanos , Mutación INDEL , Mutación con Pérdida de Función , Mutagénesis , Fenotipo , Polimorfismo de Nucleótido Simple , Densidad de Población , Medicina de Precisión/normas , Control de Calidad , Tamaño de la Muestra , Estados Unidos , Secuenciación Completa del Genoma/normasRESUMEN
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Asunto(s)
Biomarcadores , Estudio de Asociación del Genoma Completo , Inflamación , Medicina de Precisión , Secuenciación Completa del Genoma , Humanos , Medicina de Precisión/métodos , Inflamación/genética , Estudio de Asociación del Genoma Completo/métodos , Secuenciación Completa del Genoma/métodos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Femenino , Interleucina-6/genéticaRESUMEN
Genetic studies have identified numerous regions associated with plasma fibrinogen levels in Europeans, yet missing heritability and limited inclusion of non-Europeans necessitates further studies with improved power and sensitivity. Compared with array-based genotyping, whole genome sequencing (WGS) data provides better coverage of the genome and better representation of non-European variants. To better understand the genetic landscape regulating plasma fibrinogen levels, we meta-analyzed WGS data from the NHLBI's Trans-Omics for Precision Medicine (TOPMed) program (n=32,572), with array-based genotype data from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium (n=131,340) imputed to the TOPMed or Haplotype Reference Consortium panel. We identified 18 loci that have not been identified in prior genetic studies of fibrinogen. Of these, four are driven by common variants of small effect with reported MAF at least 10 percentage points higher in African populations. Three signals (SERPINA1, ZFP36L2, and TLR10) contain predicted deleterious missense variants. Two loci, SOCS3 and HPN, each harbor two conditionally distinct, non-coding variants. The gene region encoding the fibrinogen protein chain subunits (FGG;FGB;FGA), contains 7 distinct signals, including one novel signal driven by rs28577061, a variant common in African ancestry populations but extremely rare in Europeans (MAFAFR=0.180; MAFEUR=0.008). Through phenome-wide association studies in the VA Million Veteran Program, we found associations between fibrinogen polygenic risk scores and thrombotic and inflammatory disease phenotypes, including an association with gout. Our findings demonstrate the utility of WGS to augment genetic discovery in diverse populations and offer new insights for putative mechanisms of fibrinogen regulation.
RESUMEN
Current publicly available tools that allow rapid exploration of linkage disequilibrium (LD) between markers (e.g., HaploReg and LDlink) are based on whole-genome sequence (WGS) data from 2,504 individuals in the 1000 Genomes Project. Here, we present TOP-LD, an online tool to explore LD inferred with high-coverage (â¼30×) WGS data from 15,578 individuals in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. TOP-LD provides a significant upgrade compared to current LD tools, as the TOPMed WGS data provide a more comprehensive representation of genetic variation than the 1000 Genomes data, particularly for rare variants and in the specific populations that we analyzed. For example, TOP-LD encompasses LD information for 150.3, 62.2, and 36.7 million variants for European, African, and East Asian ancestral samples, respectively, offering 2.6- to 9.1-fold increase in variant coverage compared to HaploReg 4.0 or LDlink. In addition, TOP-LD includes tens of thousands of structural variants (SVs). We demonstrate the value of TOP-LD in fine-mapping at the GGT1 locus associated with gamma glutamyltransferase in the African ancestry participants in UK Biobank. Beyond fine-mapping, TOP-LD can facilitate a wide range of applications that are based on summary statistics and estimates of LD. TOP-LD is freely available online.
Asunto(s)
Estudio de Asociación del Genoma Completo , Medicina de Precisión , Pueblo Asiatico , Humanos , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Secuenciación Completa del GenomaRESUMEN
Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.
Asunto(s)
Estudio de Asociación del Genoma Completo , Genoma , Humanos , Estudio de Asociación del Genoma Completo/métodos , Secuenciación Completa del Genoma/métodos , Fenotipo , Variación GenéticaRESUMEN
Plasma levels of fibrinogen, coagulation factors VII and VIII and von Willebrand factor (vWF) are four intermediate phenotypes that are heritable and have been associated with the risk of clinical thrombotic events. To identify rare and low-frequency variants associated with these hemostatic factors, we conducted whole-exome sequencing in 10 860 individuals of European ancestry (EA) and 3529 African Americans (AAs) from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and the National Heart, Lung and Blood Institute's Exome Sequencing Project. Gene-based tests demonstrated significant associations with rare variation (minor allele frequency < 5%) in fibrinogen gamma chain (FGG) (with fibrinogen, P = 9.1 × 10-13), coagulation factor VII (F7) (with factor VII, P = 1.3 × 10-72; seven novel variants) and VWF (with factor VIII and vWF; P = 3.2 × 10-14; one novel variant). These eight novel rare variant associations were independent of the known common variants at these loci and tended to have much larger effect sizes. In addition, one of the rare novel variants in F7 was significantly associated with an increased risk of venous thromboembolism in AAs (Ile200Ser; rs141219108; P = 4.2 × 10-5). After restricting gene-based analyses to only loss-of-function variants, a novel significant association was detected and replicated between factor VIII levels and a stop-gain mutation exclusive to AAs (rs3211938) in CD36 molecule (CD36). This variant has previously been linked to dyslipidemia but not with the levels of a hemostatic factor. These efforts represent the largest integration of whole-exome sequence data from two national projects to identify genetic variation associated with plasma hemostatic factors.
Asunto(s)
Factor VIII , Hemostáticos , Factor VII/genética , Factor VIII/genética , Fibrinógeno/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Secuenciación del Exoma , Factor de von Willebrand/análisis , Factor de von Willebrand/genéticaRESUMEN
BACKGROUND: Understanding the impact of clonal hematopoiesis of indeterminate potential (CHIP) and mosaic chromosomal alterations (mCAs) on solid tumor risk and mortality can shed light on novel cancer pathways. METHODS: The authors analyzed whole genome sequencing data from the Trans-Omics for Precision Medicine Women's Health Initiative study (n = 10,866). They investigated the presence of CHIP and mCA and their association with the development and mortality of breast, lung, and colorectal cancers. RESULTS: CHIP was associated with higher risk of breast (hazard ratio [HR], 1.30; 95% confidence interval [CI], 1.03-1.64; p = .02) but not colorectal (p = .77) or lung cancer (p = .32). CHIP carriers who developed colorectal cancer also had a greater risk for advanced-stage (p = .01), but this was not seen in breast or lung cancer. CHIP was associated with increased colorectal cancer mortality both with (HR, 3.99; 95% CI, 2.41-6.62; p < .001) and without adjustment (HR, 2.50; 95% CI, 1.32-4.72; p = .004) for advanced-stage and a borderline higher breast cancer mortality (HR, 1.53; 95% CI, 0.98-2.41; p = .06). Conversely, mCA (cell fraction [CF] >3%) did not correlate with cancer risk. With higher CFs (mCA >5%), autosomal mCA was associated with increased breast cancer risk (HR, 1.39; 95% CI, 1.06-1.83; p = .01). There was no association of mCA (>3%) with breast, colorectal, or lung mortality except higher colon cancer mortality (HR, 2.19; 95% CI, 1.11-4.3; p = .02) with mCA >5%. CONCLUSIONS: CHIP and mCA (CF >5%) were associated with higher breast cancer risk and colorectal cancer mortality individually. These data could inform on novel pathways that impact cancer risk and lead to better risk stratification.
Asunto(s)
Neoplasias de la Mama , Aberraciones Cromosómicas , Hematopoyesis Clonal , Neoplasias Colorrectales , Mosaicismo , Humanos , Femenino , Hematopoyesis Clonal/genética , Anciano , Neoplasias de la Mama/genética , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Persona de Mediana Edad , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/mortalidad , Neoplasias Colorrectales/patología , Incidencia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidad , Neoplasias Pulmonares/patología , Masculino , Neoplasias/genética , Neoplasias/mortalidad , Neoplasias/patología , Neoplasias/epidemiología , Secuenciación Completa del GenomaRESUMEN
BACKGROUND: Population-based estimates of the risk of breast cancer associated with germline pathogenic variants in cancer-predisposition genes are critically needed for risk assessment and management in women with inherited pathogenic variants. METHODS: In a population-based case-control study, we performed sequencing using a custom multigene amplicon-based panel to identify germline pathogenic variants in 28 cancer-predisposition genes among 32,247 women with breast cancer (case patients) and 32,544 unaffected women (controls) from population-based studies in the Cancer Risk Estimates Related to Susceptibility (CARRIERS) consortium. Associations between pathogenic variants in each gene and the risk of breast cancer were assessed. RESULTS: Pathogenic variants in 12 established breast cancer-predisposition genes were detected in 5.03% of case patients and in 1.63% of controls. Pathogenic variants in BRCA1 and BRCA2 were associated with a high risk of breast cancer, with odds ratios of 7.62 (95% confidence interval [CI], 5.33 to 11.27) and 5.23 (95% CI, 4.09 to 6.77), respectively. Pathogenic variants in PALB2 were associated with a moderate risk (odds ratio, 3.83; 95% CI, 2.68 to 5.63). Pathogenic variants in BARD1, RAD51C, and RAD51D were associated with increased risks of estrogen receptor-negative breast cancer and triple-negative breast cancer, whereas pathogenic variants in ATM, CDH1, and CHEK2 were associated with an increased risk of estrogen receptor-positive breast cancer. Pathogenic variants in 16 candidate breast cancer-predisposition genes, including the c.657_661del5 founder pathogenic variant in NBN, were not associated with an increased risk of breast cancer. CONCLUSIONS: This study provides estimates of the prevalence and risk of breast cancer associated with pathogenic variants in known breast cancer-predisposition genes in the U.S. population. These estimates can inform cancer testing and screening and improve clinical management strategies for women in the general population with inherited pathogenic variants in these genes. (Funded by the National Institutes of Health and the Breast Cancer Research Foundation.).
Asunto(s)
Neoplasias de la Mama/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Femenino , Humanos , Persona de Mediana Edad , Mutación , Oportunidad Relativa , Riesgo , Análisis de Secuencia de ADN , Adulto JovenRESUMEN
Overall survival probability for MDS patients who underwent allo-HCT and were matched to donors that are wild-type (red) and heterozygous (blue) for the rs111224634 SNP.
Asunto(s)
Enfermedad Injerto contra Huésped , Trasplante de Células Madre Hematopoyéticas , Síndromes Mielodisplásicos , Humanos , Síndromes Mielodisplásicos/genética , Síndromes Mielodisplásicos/terapia , Donantes de Tejidos , Acondicionamiento Pretrasplante , Estudios RetrospectivosRESUMEN
BACKGROUND: Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM). RESULTS: Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage. CONCLUSIONS: The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Frecuencia de los Genes , Modelos EstadísticosRESUMEN
BACKGROUND: Genome-wide studies of gene-environment interactions (G×E) may identify variants associated with disease risk in conjunction with lifestyle/environmental exposures. We conducted a genome-wide G×E analysis of ~ 7.6 million common variants and seven lifestyle/environmental risk factors for breast cancer risk overall and for estrogen receptor positive (ER +) breast cancer. METHODS: Analyses were conducted using 72,285 breast cancer cases and 80,354 controls of European ancestry from the Breast Cancer Association Consortium. Gene-environment interactions were evaluated using standard unconditional logistic regression models and likelihood ratio tests for breast cancer risk overall and for ER + breast cancer. Bayesian False Discovery Probability was employed to assess the noteworthiness of each SNP-risk factor pairs. RESULTS: Assuming a 1 × 10-5 prior probability of a true association for each SNP-risk factor pairs and a Bayesian False Discovery Probability < 15%, we identified two independent SNP-risk factor pairs: rs80018847(9p13)-LINGO2 and adult height in association with overall breast cancer risk (ORint = 0.94, 95% CI 0.92-0.96), and rs4770552(13q12)-SPATA13 and age at menarche for ER + breast cancer risk (ORint = 0.91, 95% CI 0.88-0.94). CONCLUSIONS: Overall, the contribution of G×E interactions to the heritability of breast cancer is very small. At the population level, multiplicative G×E interactions do not make an important contribution to risk prediction in breast cancer.
Asunto(s)
Neoplasias de la Mama , Interacción Gen-Ambiente , Adulto , Femenino , Humanos , Predisposición Genética a la Enfermedad , Neoplasias de la Mama/etiología , Neoplasias de la Mama/genética , Teorema de Bayes , Estudio de Asociación del Genoma Completo , Factores de Riesgo , Polimorfismo de Nucleótido Simple , Estudios de Casos y ControlesRESUMEN
Whole-genome sequencing (WGS) can improve assessment of low-frequency and rare variants, particularly in non-European populations that have been underrepresented in existing genomic studies. The genetic determinants of C-reactive protein (CRP), a biomarker of chronic inflammation, have been extensively studied, with existing genome-wide association studies (GWASs) conducted in >200,000 individuals of European ancestry. In order to discover novel loci associated with CRP levels, we examined a multi-ancestry population (n = 23,279) with WGS (â¼38× coverage) from the Trans-Omics for Precision Medicine (TOPMed) program. We found evidence for eight distinct associations at the CRP locus, including two variants that have not been identified previously (rs11265259 and rs181704186), both of which are non-coding and more common in individuals of African ancestry (â¼10% and â¼1% minor allele frequency, respectively, and rare or monomorphic in 1000 Genomes populations of East Asian, South Asian, and European ancestry). We show that the minor (G) allele of rs181704186 is associated with lower CRP levels and decreased transcriptional activity and protein binding in vitro, providing a plausible molecular mechanism for this African ancestry-specific signal. The individuals homozygous for rs181704186-G have a mean CRP level of 0.23 mg/L, in contrast to individuals heterozygous for rs181704186 with mean CRP of 2.97 mg/L and major allele homozygotes with mean CRP of 4.11 mg/L. This study demonstrates the utility of WGS in multi-ethnic populations to drive discovery of complex trait associations of large effect and to identify functional alleles in noncoding regulatory regions.
Asunto(s)
Pueblo Asiatico/genética , Población Negra/genética , Proteína C-Reactiva/genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Población Blanca/genética , Secuenciación Completa del Genoma/métodos , Estudios de Cohortes , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de LigamientoRESUMEN
Malignant progression of normal tissue is typically driven by complex networks of somatic changes, including genetic mutations, copy number aberrations, epigenetic changes, and transcriptional reprogramming. To delineate aberrant multi-omic tumor features that correlate with clinical outcomes, we present a novel pathway-centric tool based on the multiple factor analysis framework called padma. Using a multi-omic consensus representation, padma quantifies and characterizes individualized pathway-specific multi-omic deviations and their underlying drivers, with respect to the sampled population. We demonstrate the utility of padma to correlate patient outcomes with complex genetic, epigenetic, and transcriptomic perturbations in clinically actionable pathways in breast and lung cancer.
Asunto(s)
Neoplasias , Análisis Factorial , Humanos , Neoplasias/genética , TranscriptomaRESUMEN
People hospitalized with COVID-19 often exhibit altered hematological traits associated with disease prognosis (e.g., lower lymphocyte and platelet counts). We investigated whether inter-individual variability in baseline hematological traits influences risk of acute SARS-CoV-2 infection or progression to severe COVID-19. We report inconsistent associations between blood cell traits with incident SARS-CoV-2 infection and severe COVID-19 in UK Biobank and the Vanderbilt University Medical Center Synthetic Derivative (VUMC SD). Since genetically determined blood cell measures better represent cell abundance across the lifecourse, we also assessed the shared genetic architecture of baseline blood cell traits on COVID-19 related outcomes by Mendelian randomization (MR) analyses. We found significant relationships between COVID-19 severity and mean sphered cell volume after adjusting for multiple testing. However, MR results differed significantly across different freezes of COVID-19 summary statistics and genetic correlation between these traits was modest (0.1), decreasing our confidence in these results. We observed overlapping genetic association signals between other hematological and COVID-19 traits at specific loci such as MAPT and TYK2. In conclusion, we did not find convincing evidence of relationships between the genetic architecture of blood cell traits and either SARS-CoV-2 infection or COVID-19 hospitalization, though we do see evidence of shared signals at specific loci.
Asunto(s)
COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Pruebas Genéticas , Fenotipo , Centros Médicos Académicos , Estudio de Asociación del Genoma CompletoRESUMEN
In this study, the asymptotic distributions of the likelihood ratio test (LRT), the restricted likelihood ratio test (RLRT), the F and the sequence kernel association test (SKAT) statistics for testing an additive effect of the expected familial relatedness (FR) in a linear mixed model are examined based on an eigenvalue approach. First, the covariance structure for modeling the FR effect in a LMM is presented. Then, the multiplicity of eigenvalues for the log-likelihood and restricted log-likelihood is established under a replicate family setting and extended to a more general replicate family setting (GRFS) as well. After that, the asymptotic null distributions of LRT, RLRT, F and SKAT statistics under GRFS are derived. The asymptotic null distribution of SKAT for testing genetic rare variants is also constructed. In addition, a simple formula for sample size calculation is provided based on the restricted maximum likelihood estimate of the effect size for the expected FR. Finally, a power comparison of these test statistics on hypothesis test of the expected FR effect is made via simulation. The four test statistics are also applied to a data set from the UK Biobank.
Asunto(s)
Modelos Genéticos , Humanos , Funciones de Verosimilitud , Simulación por Computador , Modelos LinealesRESUMEN
Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1-4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1-2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.
Asunto(s)
Estatura/genética , Frecuencia de los Genes/genética , Variación Genética/genética , Proteínas ADAMTS/genética , Adulto , Alelos , Moléculas de Adhesión Celular/genética , Femenino , Genoma Humano/genética , Glicoproteínas/genética , Glicoproteínas/metabolismo , Glicosaminoglicanos/biosíntesis , Proteínas Hedgehog/genética , Humanos , Péptidos y Proteínas de Señalización Intercelular/genética , Péptidos y Proteínas de Señalización Intercelular/metabolismo , Factores Reguladores del Interferón/genética , Subunidad alfa del Receptor de Interleucina-11/genética , Masculino , Herencia Multifactorial/genética , NADPH Oxidasa 4 , NADPH Oxidasas/genética , Fenotipo , Proteína Plasmática A Asociada al Embarazo/metabolismo , Procolágeno N-Endopeptidasa/genética , Proteoglicanos/biosíntesis , Proteolisis , Receptores Androgénicos/genética , Somatomedinas/metabolismoRESUMEN
Breast cancer risk is influenced by rare coding variants in susceptibility genes, such as BRCA1, and many common, mostly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. Here we report the results of a genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry. We identified 65 new loci that are associated with overall breast cancer risk at P < 5 × 10-8. The majority of credible risk single-nucleotide polymorphisms in these loci fall in distal regulatory elements, and by integrating in silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2-5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the use of genetic risk scores for individualized screening and prevention.
Asunto(s)
Neoplasias de la Mama/genética , Sitios Genéticos , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Asia/etnología , Pueblo Asiatico/genética , Sitios de Unión/genética , Neoplasias de la Mama/diagnóstico , Simulación por Computador , Europa (Continente)/etnología , Femenino , Humanos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Secuencias Reguladoras de Ácidos Nucleicos , Medición de Riesgo , Factores de Transcripción/metabolismo , Población Blanca/genéticaRESUMEN
Mendelian randomization (MR) is an established approach for assessing the causal effects of heritable exposures on outcomes. Outcomes of interest often include binary clinical endpoints, but may also include censored survival times. We explore the implications of both the Cox proportional hazard model and the additive hazard model in the context of MR, with a specific emphasis on two-stage methods. We show that naive application of standard MR approaches to censored survival times may induce significant bias. Through simulations and analysis of data from the Women's Health Initiative, we provide practical advice on modeling survival outcomes in MRs.
Asunto(s)
Análisis de la Aleatorización Mendeliana , Modelos Genéticos , Sesgo , Causalidad , Femenino , Humanos , Modelos de Riesgos ProporcionalesRESUMEN
Familial relatedness (FR) and population structure (PS) are two major sources for genetic correlation. In the human population, both FR and PS can further break down into additive and dominant components to account for potential additive and dominant genetic effects. In this study, besides the classical additive genomic relationship matrix, a dominant genomic relationship matrix is introduced. A link between the additive/dominant genomic relationship matrices and the coancestry (or kinship)/double coancestry coefficients is also established. In addition, a way to separate the FR and PS correlations based on the estimates of coancestry and double coancestry coefficients from the genomic relationship matrices is proposed. A unified linear mixed model is also developed, which can account for both the additive and dominance effects of FR and PS correlations as well as their possible random interactions. Finally, this unified linear mixed model is applied to analyze two study cohorts from UK Biobank.