RESUMO
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Assuntos
Canais de Cálcio , Neoplasias Colorretais , Fator de Iniciação 3 em Eucariotos , Glaucoma , Repetições Minissatélites , Humanos , Canais de Cálcio/genética , Neoplasias Colorretais/genética , Genoma Humano , Glaucoma/genética , Polimorfismo Genético , Fator de Iniciação 3 em Eucariotos/genéticaRESUMO
The human genome contains hundreds of thousands of regions harboring copy-number variants (CNV). However, the phenotypic effects of most such polymorphisms are unknown because only larger CNVs have been ascertainable from SNP-array data generated by large biobanks. We developed a computational approach leveraging haplotype sharing in biobank cohorts to more sensitively detect CNVs. Applied to UK Biobank, this approach accounted for approximately half of all rare gene inactivation events produced by genomic structural variation. This CNV call set enabled a detailed analysis of associations between CNVs and 56 quantitative traits, identifying 269 independent associations (p < 5 × 10-8) likely to be causally driven by CNVs. Putative target genes were identifiable for nearly half of the loci, enabling insights into dosage sensitivity of these genes and uncovering several gene-trait relationships. These results demonstrate the ability of haplotype-informed analysis to provide insights into the genetic basis of human complex traits.
Assuntos
Herança Multifatorial , Locos de Características Quantitativas , Humanos , Variações do Número de Cópias de DNA , Fenótipo , Genoma Humano , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
Assuntos
Estatura , Mapeamento Cromossômico , Polimorfismo de Nucleotídeo Único , Humanos , Estatura/genética , Frequência do Gene/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Haplótipos/genética , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , Europa (Continente)/etnologia , Tamanho da Amostra , FenótipoRESUMO
Recent work has found increasing evidence of mitigated, incompletely penetrant phenotypes in heterozygous carriers of recessive Mendelian disease variants. We leveraged whole-exome imputation within the full UK Biobank cohort (n â¼ 500K) to extend such analyses to 3,475 rare variants curated from ClinVar and OMIM. Testing these variants for association with 58 quantitative traits yielded 102 significant associations involving variants previously implicated in 34 different diseases. Notable examples included a POR missense variant implicated in Antley-Bixler syndrome that associated with a 1.76 (SE 0.27) cm increase in height and an ABCA3 missense variant implicated in interstitial lung disease that associated with reduced FEV1/FVC ratio. Association analyses with 1,134 disease traits yielded five additional variant-disease associations. We also observed contrasting levels of recessiveness between two more-common, classical Mendelian diseases. Carriers of cystic fibrosis variants exhibited increased risk of several mitigated disease phenotypes, whereas carriers of spinal muscular atrophy alleles showed no evidence of altered phenotypes. Incomplete penetrance of cystic fibrosis carrier phenotypes did not appear to be mediated by common allelic variation on the functional haplotype. Our results show that many disease-associated recessive variants can produce mitigated phenotypes in heterozygous carriers and motivate further work exploring penetrance mechanisms.
Assuntos
Fenótipo de Síndrome de Antley-Bixler , Fibrose Cística , Doenças Pulmonares Intersticiais , Alelos , Fenótipo de Síndrome de Antley-Bixler/genética , Fibrose Cística/genética , Bases de Dados Factuais , Predisposição Genética para Doença , Humanos , Doenças Pulmonares Intersticiais/genética , Atrofia Muscular Espinal/genética , Penetrância , Fenótipo , Reino UnidoRESUMO
Retrotransposons comprise about 45% of the human genome1, but their contributions to human trait variation and evolution are only beginning to be explored2,3. Here, we find that a sequence of SVA retrotransposon insertions in an early intron of the ASIP (agouti signaling protein) gene has probably shaped human pigmentation several times. In the UK Biobank (n = 169,641), a recent 3.3-kb SVA insertion polymorphism associated strongly with lighter skin pigmentation (0.22 [0.21-0.23] s.d.; P = 2.8 × 10-351) and increased skin cancer risk (odds ratio = 1.23 [1.18-1.27]; P = 1.3 × 10-28), appearing to underlie one of the strongest common genetic influences on these phenotypes within European populations4-6. ASIP expression in skin displayed the same association pattern, with the SVA insertion allele exhibiting 2.2-fold (1.9-2.6) increased expression. This effect had an unusual apparent mechanism: an earlier, nonpolymorphic, human-specific SVA retrotransposon 3.9 kb upstream appeared to have caused ASIP hypofunction by nonproductive splicing, which the new (polymorphic) SVA insertion largely eliminated. Extended haplotype homozygosity indicated that the insertion allele has risen to allele frequencies up to 11% in European populations over the past several thousand years. These results indicate that a sequence of retrotransposon insertions contributed to a species-wide increase, then a local decrease, of human pigmentation.
Assuntos
Proteína Agouti Sinalizadora , Retroelementos , Pigmentação da Pele , Humanos , Retroelementos/genética , Proteína Agouti Sinalizadora/genética , Pigmentação da Pele/genética , Mutagênese Insercional , Alelos , Neoplasias Cutâneas/genética , Genoma Humano , Íntrons/genéticaRESUMO
Copy number variants (CNVs) are among the largest genetic variants, yet CNVs have not been effectively ascertained in most genetic association studies. Here we ascertained protein-altering CNVs from UK Biobank whole-exome sequencing data (n = 468,570) using haplotype-informed methods capable of detecting subexonic CNVs and variation within segmental duplications. Incorporating CNVs into analyses of rare variants predicted to cause gene loss of function (LOF) identified 100 associations of predicted LOF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 conferred one of the strongest protective effects of gene LOF on hypertension risk (odds ratio = 0.86 (0.82-0.90)). Protein-coding variation in rapidly evolving gene families within segmental duplications-previously invisible to most analysis methods-generated some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
Assuntos
Variações do Número de Cópias de DNA , Diabetes Mellitus Tipo 2 , Humanos , Variações do Número de Cópias de DNA/genética , Diabetes Mellitus Tipo 2/genética , Fenótipo , Estudos de Associação Genética , ÉxonsRESUMO
Structural variants (SVs) comprise the largest genetic variants, altering from 50 base pairs to megabases of DNA. However, SVs have not been effectively ascertained in most genetic association studies, leaving a key gap in our understanding of human complex trait genetics. We ascertained protein-altering SVs from UK Biobank whole-exome sequencing data (n=468,570) using haplotype-informed methods capable of detecting sub-exonic SVs and variation within segmental duplications. Incorporating SVs into analyses of rare variants predicted to cause gene loss-of-function (pLoF) identified 100 associations of pLoF variants with 41 quantitative traits. A low-frequency partial deletion of RGL3 exon 6 appeared to confer one of the strongest protective effects of gene LoF on hypertension risk (OR = 0.86 [0.82-0.90]). Protein-coding variation in rapidly-evolving gene families within segmental duplications-previously invisible to most analysis methods-appeared to generate some of the human genome's largest contributions to variation in type 2 diabetes risk, chronotype, and blood cell traits. These results illustrate the potential for new genetic insights from genomic variation that has escaped large-scale analysis to date.
RESUMO
Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R2 > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10-8) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Assuntos
Bancos de Espécimes Biológicos , Sequenciamento do Exoma/estatística & dados numéricos , Frequência do Gene , Proteínas/genética , Pressão Sanguínea/genética , Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/estatística & dados numéricos , Marcadores Genéticos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Desequilíbrio de Ligação , Proteínas de Membrana/genética , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteínas/metabolismo , Receptores do Fator Natriurético Atrial/genética , Reino Unido , Sequenciamento do Exoma/métodosRESUMO
Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the relationships of VNTRs to most phenotypes are unknown because of difficulties in measuring such repetitive elements. We developed methods to estimate VNTR lengths from whole-exome sequencing data and impute VNTR alleles into single-nucleotide polymorphism haplotypes. Analyzing 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 786 phenotypes identified some of the strongest associations of common variants with human phenotypes, including height, hair morphology, and biomarkers of health. Accounting for large-effect VNTRs further enabled fine-mapping of associations to many more protein-coding mutations in the same genes. These results point to cryptic effects of highly polymorphic common structural variants that have eluded molecular analyses to date.
Assuntos
Genoma Humano , Repetições Minissatélites/genética , Fenótipo , Polimorfismo Genético , Agrecanas/genética , Antígenos/genética , População Negra , Estatura/genética , Estudos de Associação Genética , Cabelo , Haplótipos , Humanos , Proteínas de Filamentos Intermediários/genética , Rim/fisiologia , Lipoproteína(a)/sangue , Lipoproteína(a)/genética , Mucina-1/genética , Polimorfismo de Nucleotídeo Único , Polinucleotídeo Adenililtransferase/genética , População Branca/genética , Sequenciamento do ExomaRESUMO
Hundreds of the proteins encoded in human genomes contain domains that vary in size or copy number due to variable numbers of tandem repeats (VNTRs) in protein-coding exons. VNTRs have eluded analysis by the molecular methods-SNP arrays and high-throughput sequencing-used in large-scale human genetic studies to date; thus, the relationships of VNTRs to most human phenotypes are unknown. We developed ways to estimate VNTR lengths from whole-exome sequencing data, identify the SNP haplotypes on which VNTR alleles reside, and use imputation to project these haplotypes into abundant SNP data. We analyzed 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 791 phenotypes. Analysis revealed some of the strongest associations of common variants with human phenotypes including height, hair morphology, and biomarkers of human health; for example, a VNTR encoding 13-44 copies of a 19-amino-acid repeat in the chondroitin sulfate domain of aggrecan (ACAN) associated with height variation of 3.4 centimeters (s.e. 0.3 cm). Incorporating large-effect VNTRs into analysis also made it possible to map many additional effects at the same loci: for the blood biomarker lipoprotein(a), for example, analysis of the kringle IV-2 VNTR within the LPA gene revealed that 18 coding SNPs and the VNTR in LPA explained 90% of lipoprotein(a) heritability in Europeans, enabling insights about population differences and epidemiological significance of this clinical biomarker. These results point to strong, cryptic effects of highly polymorphic common structural variants that have largely eluded molecular analyses to date.
RESUMO
Although germline de novo copy number variants (CNVs) are known causes of autism spectrum disorder (ASD), the contribution of mosaic (early-developmental) copy number variants (mCNVs) has not been explored. In this study, we assessed the contribution of mCNVs to ASD by ascertaining mCNVs in genotype array intensity data from 12,077 probands with ASD and 5,500 unaffected siblings. We detected 46 mCNVs in probands and 19 mCNVs in siblings, affecting 2.8-73.8% of cells. Probands carried a significant burden of large (>4-Mb) mCNVs, which were detected in 25 probands but only one sibling (odds ratio = 11.4, 95% confidence interval = 1.5-84.2, P = 7.4 × 10-4). Event size positively correlated with severity of ASD symptoms (P = 0.016). Surprisingly, we did not observe mosaic analogues of the short de novo CNVs recurrently observed in ASD (eg, 16p11.2). We further experimentally validated two mCNVs in postmortem brain tissue from 59 additional probands. These results indicate that mCNVs contribute a previously unexplained component of ASD risk.