Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 117
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 111(5): 990-995, 2024 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-38636510

RESUMEN

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.


Asunto(s)
Frecuencia de los Genes , Genotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos , Humanos , Estudios de Cohortes , Desequilibrio de Ligamiento , Estudio de Asociación del Genoma Completo/métodos , Genoma Humano , Control de Calidad , Aprendizaje Automático , Secuenciación Completa del Genoma/normas , Secuenciación Completa del Genoma/métodos
2.
Hum Mol Genet ; 33(14): 1207-1214, 2024 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-38643062

RESUMEN

Genotype imputation is widely used in genome-wide association studies (GWAS). However, both the genotyping chips and imputation reference panels are dependent on next-generation sequencing (NGS). Due to the nature of NGS, some regions of the genome are inaccessible to sequencing. To date, there has been no complete evaluation of these regions and their impact on the identification of associations in GWAS remains unclear. In this study, we systematically assess the extent to which variants in inaccessible regions are underrepresented on genotyping chips and imputation reference panels, in GWAS results and in variant databases. We also determine the proportion of genes located in inaccessible regions and compare the results across variant masks defined by the 1000 Genomes Project and the TOPMed program. Overall, fewer variants were observed in inaccessible regions in all categories analyzed. Depending on the mask used and normalized for region size, only 4%-17% of the genotyped variants are located in inaccessible regions and 52 to 581 genes were almost completely inaccessible. From the Cooperative Health Research in South Tyrol (CHRIS) study, we present a case study of an association located in an inaccessible region that is driven by genotyped variants and cannot be reproduced by imputation in GRCh37. We conclude that genotyping, NGS, genotype imputation and downstream analyses such as GWAS and fine mapping are systematically biased in inaccessible regions, due to missed variants and spurious associations. To help researchers assess gene and variant accessibility, we provide an online application (https://gab.gm.eurac.edu).


Asunto(s)
Genoma Humano , Estudio de Asociación del Genoma Completo , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple/genética
3.
Nucleic Acids Res ; 52(W1): W70-W77, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38709879

RESUMEN

Polygenic scores (PGS) enable the prediction of genetic predisposition for a wide range of traits and diseases by calculating the weighted sum of allele dosages for genetic variants associated with the trait or disease in question. Present approaches for calculating PGS from genotypes are often inefficient and labor-intensive, limiting transferability into clinical applications. Here, we present 'Imputation Server PGS', an extension of the Michigan Imputation Server designed to automate a standardized calculation of polygenic scores based on imputed genotypes. This extends the widely used Michigan Imputation Server with new functionality, bringing the simplicity and efficiency of modern imputation to the PGS field. The service currently supports over 4489 published polygenic scores from publicly available repositories and provides extensive quality control, including ancestry estimation to report population stratification. An interactive report empowers users to screen and compare thousands of scores in a fast and intuitive way. Imputation Server PGS provides a user-friendly web service, facilitating the application of polygenic scores to a wide range of genetic studies and is freely available at https://imputationserver.sph.umich.edu.


Asunto(s)
Predisposición Genética a la Enfermedad , Herencia Multifactorial , Programas Informáticos , Herencia Multifactorial/genética , Humanos , Internet , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Genotipo , Alelos , Puntuación de Riesgo Genético
4.
Am J Hum Genet ; 109(6): 1007-1015, 2022 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-35508176

RESUMEN

Genotype imputation is an integral tool in genome-wide association studies, in which it facilitates meta-analysis, increases power, and enables fine-mapping. With the increasing availability of whole-genome-sequence datasets, investigators have access to a multitude of reference-panel choices for genotype imputation. In principle, combining all sequenced whole genomes into a single large panel would provide the best imputation performance, but this is often cumbersome or impossible due to privacy restrictions. Here, we describe meta-imputation, a method that allows imputation results generated using different reference panels to be combined into a consensus imputed dataset. Our meta-imputation method requires small changes to the output of existing imputation tools to produce necessary inputs, which are then combined using dynamically estimated weights that are tailored to each individual and genome segment. In the scenarios we examined, the method consistently outperforms imputation using a single reference panel and achieves accuracy comparable to imputation using a combined reference panel.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Genoma , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Proyectos de Investigación
5.
Am J Hum Genet ; 109(11): 1986-1997, 2022 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-36198314

RESUMEN

Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple/genética , Calibración , Genotipo , Aprendizaje Automático
6.
Am J Hum Genet ; 109(9): 1653-1666, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-35981533

RESUMEN

Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r2 > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes/genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Secuenciación Completa del Genoma
7.
Am J Hum Genet ; 109(10): 1727-1741, 2022 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-36055244

RESUMEN

Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolípidos , Humanos , Masculino , Metabolómica , Sitios de Carácter Cuantitativo/genética , Miembro 5 de la Familia 22 de Transportadores de Solutos/genética , Transcriptoma/genética
8.
Nature ; 570(7759): 71-76, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31118516

RESUMEN

Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10-3) and candidate genes from knockout mice (P = 5.2 × 10-3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000-185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Secuenciación del Exoma , Exoma/genética , Animales , Estudios de Casos y Controles , Técnicas de Apoyo para la Decisión , Femenino , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Ratones , Ratones Noqueados
9.
Chem Senses ; 492024 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-38452143

RESUMEN

The sense of smell allows for the assessment of the chemical composition of volatiles in our environment. Different factors are associated with reduced olfactory function, including age, sex, as well as health and lifestyle conditions. However, most studies that aimed at identifying the variables that drive olfactory function in the population suffered from methodological weaknesses in study designs and participant selection, such as the inclusion of convenience sample or only of certain age groups, or recruitment biases. We aimed to overcome these issues by investigating the Cooperative Health Research in South Tyrol (CHRIS) cohort, a population-based cohort, by using a validated odor identification test. Specifically, we hypothesized that a series of medical, demographic and lifestyle variables is associated with odor identification abilities. In addition, our goal was to provide clinicians and researchers with normative values for the Sniffin' Sticks identification set, after exclusion of individuals with impaired nasal patency. We included 6,944 participants without acute nasal obstruction and assessed several biological, social, and medical parameters. A basic model determined that age, sex, years of education, and smoking status together explained roughly 13% of the total variance in the data. We further observed that variables related to medical (positive screening for cognitive impairment and for Parkinson's disease, history of skull fracture, stage 2 hypertension) and lifestyle (alcohol abstinence) conditions had a negative effect on odor identification scores. Finally, we provide clinicians with normative values for both versions of the Sniffin' Sticks odor identification test, i.e. with 16 items and with 12 items.


Asunto(s)
Disfunción Cognitiva , Trastornos del Olfato , Enfermedad de Parkinson , Adulto , Humanos , Trastornos del Olfato/diagnóstico , Trastornos del Olfato/epidemiología , Olfato , Odorantes , Umbral Sensorial
10.
Arterioscler Thromb Vasc Biol ; 43(7): e254-e269, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37128921

RESUMEN

BACKGROUND: Antithrombin, PC (protein C), and PS (protein S) are circulating natural anticoagulant proteins that regulate hemostasis and of which partial deficiencies are causes of venous thromboembolism. Previous genetic association studies involving antithrombin, PC, and PS were limited by modest sample sizes or by being restricted to candidate genes. In the setting of the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium, we meta-analyzed across ancestries the results from 10 genome-wide association studies of plasma levels of antithrombin, PC, PS free, and PS total. METHODS: Study participants were of European and African ancestries, and genotype data were imputed to TOPMed, a dense multiancestry reference panel. Each of the 10 studies conducted a genome-wide association studies for each phenotype and summary results were meta-analyzed, stratified by ancestry. Analysis of antithrombin included 25 243 European ancestry and 2688 African ancestry participants, PC analysis included 16 597 European ancestry and 2688 African ancestry participants, PSF and PST analysis included 4113 and 6409 European ancestry participants. We also conducted transcriptome-wide association analyses and multiphenotype analysis to discover additional associations. Novel genome-wide association studies and transcriptome-wide association analyses findings were validated by in vitro functional experiments. Mendelian randomization was performed to assess the causal relationship between these proteins and cardiovascular outcomes. RESULTS: Genome-wide association studies meta-analyses identified 4 newly associated loci: 3 with antithrombin levels (GCKR, BAZ1B, and HP-TXNL4B) and 1 with PS levels (ORM1-ORM2). transcriptome-wide association analyses identified 3 newly associated genes: 1 with antithrombin level (FCGRT), 1 with PC (GOLM2), and 1 with PS (MYL7). In addition, we replicated 7 independent loci reported in previous studies. Functional experiments provided evidence for the involvement of GCKR, SNX17, and HP genes in antithrombin regulation. CONCLUSIONS: The use of larger sample sizes, diverse populations, and a denser imputation reference panel allowed the detection of 7 novel genomic loci associated with plasma antithrombin, PC, and PS levels.


Asunto(s)
Proteína C , Proteína S , Proteína C/genética , Proteína S/genética , Estudio de Asociación del Genoma Completo , Antitrombinas , Transcriptoma , Anticoagulantes , Antitrombina III/genética , Polimorfismo de Nucleótido Simple
11.
PLoS Genet ; 16(9): e1009019, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32915782

RESUMEN

Loci identified in genome-wide association studies (GWAS) can include multiple distinct association signals. We sought to identify the molecular basis of multiple association signals for adiponectin, a hormone involved in glucose regulation secreted almost exclusively from adipose tissue, identified in the Metabolic Syndrome in Men (METSIM) study. With GWAS data for 9,262 men, four loci were significantly associated with adiponectin: ADIPOQ, CDH13, IRS1, and PBRM1. We performed stepwise conditional analyses to identify distinct association signals, a subset of which are also nearly independent (lead variant pairwise r2<0.01). Two loci exhibited allelic heterogeneity, ADIPOQ and CDH13. Of seven association signals at the ADIPOQ locus, two signals colocalized with adipose tissue expression quantitative trait loci (eQTLs) for three transcripts: trait-increasing alleles at one signal were associated with increased ADIPOQ and LINC02043, while trait-increasing alleles at the other signal were associated with decreased ADIPOQ-AS1. In reporter assays, adiponectin-increasing alleles at two signals showed corresponding directions of effect on transcriptional activity. Putative mechanisms for the seven ADIPOQ signals include a missense variant (ADIPOQ G90S), a splice variant, a promoter variant, and four enhancer variants. Of two association signals at the CDH13 locus, the first signal consisted of promoter variants, including the lead adipose tissue eQTL variant for CDH13, while a second signal included a distal intron 1 enhancer variant that showed ~2-fold allelic differences in transcriptional reporter activity. Fine-mapping and experimental validation demonstrated that multiple, distinct association signals at these loci can influence multiple transcripts through multiple molecular mechanisms.


Asunto(s)
Adiponectina/genética , Adiponectina/metabolismo , Tejido Adiposo/metabolismo , Alelos , Cadherinas/genética , Cadherinas/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Frecuencia de los Genes/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Proteínas Sustrato del Receptor de Insulina/genética , Proteínas Sustrato del Receptor de Insulina/metabolismo , Masculino , Síndrome Metabólico/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
12.
Kidney Int ; 102(3): 624-639, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35716955

RESUMEN

Estimated glomerular filtration rate (eGFR) reflects kidney function. Progressive eGFR-decline can lead to kidney failure, necessitating dialysis or transplantation. Hundreds of loci from genome-wide association studies (GWAS) for eGFR help explain population cross section variability. Since the contribution of these or other loci to eGFR-decline remains largely unknown, we derived GWAS for annual eGFR-decline and meta-analyzed 62 longitudinal studies with eGFR assessed twice over time in all 343,339 individuals and in high-risk groups. We also explored different covariate adjustment. Twelve genome-wide significant independent variants for eGFR-decline unadjusted or adjusted for eGFR-baseline (11 novel, one known for this phenotype), including nine variants robustly associated across models were identified. All loci for eGFR-decline were known for cross-sectional eGFR and thus distinguished a subgroup of eGFR loci. Seven of the nine variants showed variant-by-age interaction on eGFR cross section (further about 350,000 individuals), which linked genetic associations for eGFR-decline with age-dependency of genetic cross-section associations. Clinically important were two to four-fold greater genetic effects on eGFR-decline in high-risk subgroups. Five variants associated also with chronic kidney disease progression mapped to genes with functional in-silico evidence (UMOD, SPATA7, GALNTL5, TPPP). An unfavorable versus favorable nine-variant genetic profile showed increased risk odds ratios of 1.35 for kidney failure (95% confidence intervals 1.03-1.77) and 1.27 for acute kidney injury (95% confidence intervals 1.08-1.50) in over 2000 cases each, with matched controls). Thus, we provide a large data resource, genetic loci, and prioritized genes for kidney function decline, which help inform drug development pipelines revealing important insights into the age-dependency of kidney function genetics.


Asunto(s)
N-Acetilgalactosaminiltransferasas , Insuficiencia Renal Crónica , Insuficiencia Renal , Estudios Transversales , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Tasa de Filtración Glomerular/genética , Humanos , Riñón , Estudios Longitudinales , N-Acetilgalactosaminiltransferasas/genética , Insuficiencia Renal/genética
13.
Am J Hum Genet ; 105(4): 773-787, 2019 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-31564431

RESUMEN

Genome-wide association studies (GWASs) have identified thousands of genetic loci associated with cardiometabolic traits including type 2 diabetes (T2D), lipid levels, body fat distribution, and adiposity, although most causal genes remain unknown. We used subcutaneous adipose tissue RNA-seq data from 434 Finnish men from the METSIM study to identify 9,687 primary and 2,785 secondary cis-expression quantitative trait loci (eQTL; <1 Mb from TSS, FDR < 1%). Compared to primary eQTL signals, secondary eQTL signals were located further from transcription start sites, had smaller effect sizes, and were less enriched in adipose tissue regulatory elements compared to primary signals. Among 2,843 cardiometabolic GWAS signals, 262 colocalized by LD and conditional analysis with 318 transcripts as primary and conditionally distinct secondary cis-eQTLs, including some across ancestries. Of cardiometabolic traits examined for adipose tissue eQTL colocalizations, waist-hip ratio (WHR) and circulating lipid traits had the highest percentage of colocalized eQTLs (15% and 14%, respectively). Among alleles associated with increased cardiometabolic GWAS risk, approximately half (53%) were associated with decreased gene expression level. Mediation analyses of colocalized genes and cardiometabolic traits within the 434 individuals provided further evidence that gene expression influences variant-trait associations. These results identify hundreds of candidate genes that may act in adipose tissue to influence cardiometabolic traits.


Asunto(s)
Tejido Adiposo/metabolismo , Diabetes Mellitus Tipo 2/genética , Expresión Génica , Obesidad/genética , Alelos , Índice de Masa Corporal , Finlandia , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Sitios de Carácter Cuantitativo , Relación Cintura-Cadera
14.
Nature ; 536(7614): 41-47, 2016 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-27398621

RESUMEN

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Alelos , Análisis Mutacional de ADN , Europa (Continente)/etnología , Exoma , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Tamaño de la Muestra
15.
Genet Epidemiol ; 44(1): 41-51, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31520493

RESUMEN

Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single-study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep-coverage (~82×) exome and low-coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts. For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta-analysis has similar power to joint analysis in deep-coverage sequence data but can be less powerful in low-coverage sequence data. Given similar data processing and quality control steps, we recommend single-study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep-coverage data.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Frecuencia de los Genes/genética , Polimorfismo de Nucleótido Simple/genética , Exoma/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
16.
Genet Epidemiol ; 44(6): 537-549, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32519380

RESUMEN

A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.


Asunto(s)
Genoma Humano , Estudio de Asociación del Genoma Completo , Secuenciación Completa del Genoma , Análisis Costo-Beneficio , Frecuencia de los Genes/genética , Estudio de Asociación del Genoma Completo/economía , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Secuenciación Completa del Genoma/economía
17.
Hum Mol Genet ; 28(24): 4161-4172, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31691812

RESUMEN

Integration of genome-wide association study (GWAS) signals with expression quantitative trait loci (eQTL) studies enables identification of candidate genes. However, evaluating whether nearby signals may share causal variants, termed colocalization, is affected by the presence of allelic heterogeneity, different variants at the same locus impacting the same phenotype. We previously identified eQTL in subcutaneous adipose tissue from 770 participants in the Metabolic Syndrome in Men (METSIM) study and detected 15 eQTL signals that colocalized with GWAS signals for waist-hip ratio adjusted for body mass index (WHRadjBMI) from the Genetic Investigation of Anthropometric Traits consortium. Here, we reevaluated evidence of colocalization using two approaches, conditional analysis and the Bayesian test COLOC, and show that providing COLOC with approximate conditional summary statistics at multi-signal GWAS loci can reconcile disagreements in colocalization classification between the two tests. Next, we performed conditional analysis on the METSIM subcutaneous adipose tissue data to identify conditionally distinct or secondary eQTL signals. We used the two approaches to test for colocalization with WHRadjBMI GWAS signals and evaluated the differences in colocalization classification between the two tests. Through these analyses, we identified four GWAS signals colocalized with secondary eQTL signals for FAM13A, SSR3, GRB14 and FMO1. Thus, at loci with multiple eQTL and/or GWAS signals, analyzing each signal independently enabled additional candidate genes to be identified.


Asunto(s)
Tejido Adiposo/fisiología , Distribución de la Grasa Corporal , Estudio de Asociación del Genoma Completo/métodos , Síndrome Metabólico/genética , Sitios de Carácter Cuantitativo , Adulto , Teorema de Bayes , Índice de Masa Corporal , Femenino , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Masculino , Fenotipo , Polimorfismo de Nucleótido Simple , Grasa Subcutánea/metabolismo , Relación Cintura-Cadera/métodos
18.
Am J Hum Genet ; 102(4): 620-635, 2018 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-29625024

RESUMEN

Genome-wide association studies (GWASs) and functional genomics approaches implicate enhancer disruption in islet dysfunction and type 2 diabetes (T2D) risk. We applied genetic fine-mapping and functional (epi)genomic approaches to a T2D- and proinsulin-associated 15q22.2 locus to identify a most likely causal variant, determine its direction of effect, and elucidate plausible target genes. Fine-mapping and conditional analyses of proinsulin levels of 8,635 non-diabetic individuals from the METSIM study support a single association signal represented by a cluster of 16 strongly associated (p < 10-17) variants in high linkage disequilibrium (r2 > 0.8) with the GWAS index SNP rs7172432. These variants reside in an evolutionarily and functionally conserved islet and ß cell stretch or super enhancer; the most strongly associated variant (rs7163757, p = 3 × 10-19) overlaps a conserved islet open chromatin site. DNA sequence containing the rs7163757 risk allele displayed 2-fold higher enhancer activity than the non-risk allele in reporter assays (p < 0.01) and was differentially bound by ß cell nuclear extract proteins. Transcription factor NFAT specifically potentiated risk-allele enhancer activity and altered patterns of nuclear protein binding to the risk allele in vitro, suggesting that it could be a factor mediating risk-allele effects. Finally, the rs7163757 proinsulin-raising and T2D risk allele (C) was associated with increased expression of C2CD4B, and possibly C2CD4A, both of which were induced by inflammatory cytokines, in human islets. Together, these data suggest that rs7163757 contributes to genetic risk of islet dysfunction and T2D by increasing NFAT-mediated islet enhancer activity and modulating C2CD4B, and possibly C2CD4A, expression in (patho)physiologic states.


Asunto(s)
Proteínas de Unión al Calcio/genética , Secuencia Conservada , Elementos de Facilitación Genéticos/genética , Evolución Molecular , Islotes Pancreáticos/patología , Mutación/genética , Proteínas Nucleares/genética , Factores de Transcripción/genética , Anciano , Alelos , Animales , Secuencia de Bases , Proteínas de Unión al Calcio/metabolismo , Línea Celular , Cromatina/metabolismo , Cromosomas Humanos Par 15/genética , Citocinas/metabolismo , ADN Intergénico/genética , Humanos , Mediadores de Inflamación/metabolismo , Ratones , Persona de Mediana Edad , Factores de Transcripción NFATC/metabolismo , Mapeo Físico de Cromosoma , Polimorfismo de Nucleótido Simple/genética , Proinsulina/metabolismo , Ratas , Factores de Riesgo
19.
Proc Natl Acad Sci U S A ; 115(2): 379-384, 2018 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-29279374

RESUMEN

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética , Americanos Mexicanos/genética , Diabetes Mellitus Tipo 2/etnología , Diabetes Mellitus Tipo 2/patología , Salud de la Familia , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad/etnología , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Masculino , Linaje , Fenotipo , Sitios de Carácter Cuantitativo/genética , Secuenciación Completa del Genoma/métodos
20.
Hum Mol Genet ; 27(9): 1664-1674, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29481666

RESUMEN

Comprehensive metabolite profiling captures many highly heritable traits, including amino acid levels, which are potentially sensitive biomarkers for disease pathogenesis. To better understand the contribution of genetic variation to amino acid levels, we performed single variant and gene-based tests of association between nine serum amino acids (alanine, glutamine, glycine, histidine, isoleucine, leucine, phenylalanine, tyrosine, and valine) and 16.6 million genotyped and imputed variants in 8545 non-diabetic Finnish men from the METabolic Syndrome In Men (METSIM) study with replication in Northern Finland Birth Cohort (NFBC1966). We identified five novel loci associated with amino acid levels (P = < 5×10-8): LOC157273/PPP1R3B with glycine (rs9987289, P = 2.3×10-26); ZFHX3 (chr16:73326579, minor allele frequency (MAF) = 0.42%, P = 3.6×10-9), LIPC (rs10468017, P = 1.5×10-8), and WWOX (rs9937914, P = 3.8×10-8) with alanine; and TRIB1 with tyrosine (rs28601761, P = 8×10-9). Gene-based tests identified two novel genes harboring missense variants of MAF <1% that show aggregate association with amino acid levels: PYCR1 with glycine (Pgene = 1.5×10-6) and BCAT2 with valine (Pgene = 7.4×10-7); neither gene was implicated by single variant association tests. These findings are among the first applications of gene-based tests to identify new loci for amino acid levels. In addition to the seven novel gene associations, we identified five independent signals at established amino acid loci, including two rare variant signals at GLDC (rs138640017, MAF=0.95%, Pconditional = 5.8×10-40) with glycine levels and HAL (rs141635447, MAF = 0.46%, Pconditional = 9.4×10-11) with histidine levels. Examination of all single variant association results in our data revealed a strong inverse relationship between effect size and MAF (Ptrend<0.001). These novel signals provide further insight into the molecular mechanisms of amino acid metabolism and potentially, their perturbations in disease.


Asunto(s)
Aminoácidos/metabolismo , Estudio de Asociación del Genoma Completo/métodos , Finlandia , Frecuencia de los Genes/genética , Genotipo , Humanos , Masculino , Persona de Mediana Edad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA