Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 10(1): 5121, 2019 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-31719535

RESUMO

Both short and long sleep are associated with an adverse lipid profile, likely through different biological pathways. To elucidate the biology of sleep-associated adverse lipid profile, we conduct multi-ancestry genome-wide sleep-SNP interaction analyses on three lipid traits (HDL-c, LDL-c and triglycerides). In the total study sample (discovery + replication) of 126,926 individuals from 5 different ancestry groups, when considering either long or short total sleep time interactions in joint analyses, we identify 49 previously unreported lipid loci, and 10 additional previously unreported lipid loci in a restricted sample of European-ancestry cohorts. In addition, we identify new gene-sleep interactions for known lipid loci such as LPL and PCSK9. The previously unreported lipid loci have a modest explained variance in lipid levels: most notable, gene-short-sleep interactions explain 4.25% of the variance in triglyceride level. Collectively, these findings contribute to our understanding of the biological mechanisms involved in sleep-associated adverse lipid profiles.

2.
Nat Commun ; 10(1): 4788, 2019 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-31636271

RESUMO

Genetic studies of metabolites have identified thousands of variants, many of which are associated with downstream metabolic and obesogenic disorders. However, these studies have relied on univariate analyses, reducing power and limiting context-specific understanding. Here we aim to provide an integrated perspective of the genetic basis of metabolites by leveraging the Finnish Metabolic Syndrome In Men (METSIM) cohort, a unique genetic resource which contains metabolic measurements, mostly lipids, across distinct time points as well as information on statin usage. We increase effective sample size by an average of two-fold by applying the Covariates for Multi-phenotype Studies (CMS) approach, identifying 588 significant SNP-metabolite associations, including 228 new associations. Our analysis pinpoints a small number of master metabolic regulator genes, balancing the relative proportion of dozens of metabolite levels. We further identify associations to changes in metabolic levels across time as well as genetic interactions with statin at both the master metabolic regulator and genome-wide level.

3.
Bioinformatics ; 35(22): 4837-4839, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-31173064

RESUMO

MOTIVATION: Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. RESULTS: We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. AVAILABILITY AND IMPLEMENTATION: The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Hum Mol Genet ; 2019 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-31127295

RESUMO

Elevated blood pressure (BP), a leading cause of global morbidity and mortality, is influenced by both genetic and lifestyle factors. Cigarette smoking is one such lifestyle factor. Across five ancestries, we performed a genome-wide gene-smoking interaction study of mean arterial pressure (MAP) and pulse pressure (PP) in 129 913 individuals in stage 1 and follow-up analysis in 480 178 additional individuals in stage 2. We report here 136 loci significantly associated with MAP and/or PP. Of these, 61 were previously published through main-effect analysis of BP traits, 37 were recently reported by us for systolic BP and/or diastolic BP through gene-smoking interaction analysis and 38 were newly identified (P < 5 × 10-8, false discovery rate < 0.05). We also identified nine new signals near known loci. Of the 136 loci, 8 showed significant interaction with smoking status. They include CSMD1 previously reported for insulin resistance and BP in the spontaneously hypertensive rats. Many of the 38 new loci show biologic plausibility for a role in BP regulation. SLC26A7 encodes a chloride/bicarbonate exchanger expressed in the renal outer medullary collecting duct. AVPR1A is widely expressed, including in vascular smooth muscle cells, kidney, myocardium and brain. FHAD1 is a long non-coding RNA overexpressed in heart failure. TMEM51 was associated with contractile function in cardiomyocytes. CASP9 plays a central role in cardiomyocyte apoptosis. Identified only in African ancestry were 30 novel loci. Our findings highlight the value of multi-ancestry investigations, particularly in studies of interaction with lifestyle factors, where genomic and lifestyle differences may contribute to novel findings.

5.
Am J Ophthalmol ; 2019 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-31121135

RESUMO

PURPOSE: A genetic correlation is the proportion of phenotypic variance between traits that is shared on a genetic basis. Here we explore genetic correlations between diabetes- and glaucoma-related traits. DESIGN: Cross-sectional study. METHODS: We assembled genome-wide association study summary statistics from European-derived participants regarding diabetes-related traits like fasting blood sugar (FBS) and type 2 diabetes (T2D) and glaucoma-related traits (intraocular pressure (IOP), central corneal thickness (CCT), corneal hysteresis (CH), corneal resistance factor (CRF), cup-disc ratio (CDR), and primary open-angle glaucoma (POAG)). We included data from the National Eye Institute Glaucoma Human Genetics Collaboration Heritable Overall Operational Database, the UK Biobank and the International Glaucoma Genetics Consortium. We calculated genetic correlation (rg) between traits using linkage disequilibrium score regression. We also calculated genetic correlations between IOP, CCT and selected diabetes-related traits based on individual level phenotype data in two Northern European population-based samples using pedigree information and Sequential Oligogenic Linkage Analysis Routines (SOLAR). RESULTS: Overall, there was little rg between diabetes- and glaucoma-related traits. Specifically, we found a non-significant negative correlation between T2D and POAG (rg=-0.14; p=0.16). Using SOLAR, the genetic correlations between measured IOP, CCT, FBS, fasting insulin and hemoglobin A1c, were null. In contrast, genetic correlations between IOP and POAG (rg ≥0.45; p≤3.0E-04) and between CDR and POAG were high (rg =0.57; p=2.8E-10). However, genetic correlations between corneal properties (CCT, CRF and CH) and POAG were low (rg range: -0.18 - 0.11) and non-significant (p≥0.07). CONCLUSION: These analyses suggest there is limited genetic correlation between diabetes- and glaucoma-related traits.

6.
PLoS Genet ; 15(3): e1008018, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30849075

RESUMO

Several bacteria in the gut microbiota have been shown to be associated with inflammatory bowel disease (IBD), and dozens of IBD genetic variants have been identified in genome-wide association studies. However, the role of the microbiota in the etiology of IBD in terms of host genetic susceptibility remains unclear. Here, we studied the association between four major genetic variants associated with an increased risk of IBD and bacterial taxa in up to 633 IBD cases. We performed systematic screening for associations, identifying and replicating associations between NOD2 variants and two taxa: the Roseburia genus and the Faecalibacterium prausnitzii species. By exploring the overall association patterns between genes and bacteria, we found that IBD risk alleles were significantly enriched for associations concordant with bacteria-IBD associations. To understand the significance of this pattern in terms of the study design and known effects from the literature, we used counterfactual principles to assess the fitness of a few parsimonious gene-bacteria-IBD causal models. Our analyses showed evidence that the disease risk of these genetic variants were likely to be partially mediated by the microbiome. We confirmed these results in extensive simulation studies and sensitivity analyses using the association between NOD2 and F. prausnitzii as a case study.


Assuntos
Microbioma Gastrointestinal/genética , Interações entre Hospedeiro e Microrganismos/genética , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/microbiologia , Adulto , Proteínas Adaptadoras de Sinalização CARD/genética , Clostridiales/genética , Clostridiales/isolamento & purificação , Clostridiales/patogenicidade , Faecalibacterium prausnitzii/genética , Faecalibacterium prausnitzii/isolamento & purificação , Faecalibacterium prausnitzii/patogenicidade , Feminino , Estudos de Associação Genética , Predisposição Genética para Doença , Variação Genética , Humanos , Doenças Inflamatórias Intestinais/etiologia , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Proteína Adaptadora de Sinalização NOD2/genética , Polimorfismo de Nucleotídeo Único
7.
Nat Genet ; 51(4): 636-648, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30926973

RESUMO

The concentrations of high- and low-density-lipoprotein cholesterol and triglycerides are influenced by smoking, but it is unknown whether genetic associations with lipids may be modified by smoking. We conducted a multi-ancestry genome-wide gene-smoking interaction study in 133,805 individuals with follow-up in an additional 253,467 individuals. Combined meta-analyses identified 13 new loci associated with lipids, some of which were detected only because association differed by smoking status. Additionally, we demonstrate the importance of including diverse populations, particularly in studies of interactions with lifestyle factors, where genomic and lifestyle differences by ancestry may contribute to novel findings.


Assuntos
Lipídeos/sangue , Lipídeos/genética , Fumar/sangue , Fumar/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Estilo de Vida , Desequilíbrio de Ligação/genética , Masculino , Pessoa de Meia-Idade , Adulto Jovem
8.
Genetics ; 212(1): 65-74, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30808621

RESUMO

Polygenic Risk Scores (PRS) combine genotype information across many single-nucleotide polymorphisms (SNPs) to give a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The "Clumping+Thresholding" (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T. In this paper, we present an efficient method for the joint estimation of SNP effects using individual-level data, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. We also provide an implementation of penalized linear regression for quantitative traits. We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. Overall, we find that PLR achieves equal or higher predictive performance than C+T in most scenarios considered, while being scalable to biobank data. In particular, we find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, in simulations, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC values of 89% and of 82.5%. Applying penalized linear regression to 350,000 individuals of the UK Biobank, we predict height with a larger correlation than with the best prediction of C+T (∼65% instead of ∼55%), further demonstrating its scalability and strong predictive power, even for highly polygenic traits. Moreover, using 150,000 individuals of the UK Biobank, we are able to predict breast cancer better than C+T, fitting PLR in a few minutes only. In conclusion, this paper demonstrates the feasibility and relevance of using penalized regression for PRS computation when large individual-level datasets are available, thanks to the efficient implementation available in our R package bigstatsr.


Assuntos
Algoritmos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Doença Celíaca/genética , Feminino , Humanos , Masculino
9.
Nat Commun ; 10(1): 376, 2019 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-30670697

RESUMO

Many genetic loci affect circulating lipid levels, but it remains unknown whether lifestyle factors, such as physical activity, modify these genetic effects. To identify lipid loci interacting with physical activity, we performed genome-wide analyses of circulating HDL cholesterol, LDL cholesterol, and triglyceride levels in up to 120,979 individuals of European, African, Asian, Hispanic, and Brazilian ancestry, with follow-up of suggestive associations in an additional 131,012 individuals. We find four loci, in/near CLASP1, LHX1, SNTA1, and CNTNAP2, that are associated with circulating lipid levels through interaction with physical activity; higher levels of physical activity enhance the HDL cholesterol-increasing effects of the CLASP1, LHX1, and SNTA1 loci and attenuate the LDL cholesterol-increasing effect of the CNTNAP2 locus. The CLASP1, LHX1, and SNTA1 regions harbor genes linked to muscle function and lipid metabolism. Our results elucidate the role of physical activity interactions in the genetic contribution to blood lipid levels.


Assuntos
Exercício , Loci Gênicos/genética , Lipídeos/sangue , Lipídeos/genética , Adolescente , Adulto , Grupo com Ancestrais do Continente Africano/genética , Idoso , Idoso de 80 Anos ou mais , Grupo com Ancestrais do Continente Asiático/genética , Brasil , Proteínas de Ligação ao Cálcio/genética , Colesterol/sangue , HDL-Colesterol/sangue , HDL-Colesterol/genética , LDL-Colesterol/sangue , LDL-Colesterol/genética , Grupo com Ancestrais do Continente Europeu/genética , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Hispano-Americanos/genética , Humanos , Proteínas com Homeodomínio LIM/genética , Metabolismo dos Lipídeos/genética , Masculino , Proteínas de Membrana/genética , Proteínas Associadas aos Microtúbulos/genética , Pessoa de Meia-Idade , Proteínas Musculares/genética , Proteínas do Tecido Nervoso/genética , Fatores de Transcrição/genética , Triglicerídeos/sangue , Triglicerídeos/genética , Adulto Jovem
10.
Am J Epidemiol ; 188(6): 1033-1054, 2019 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698716

RESUMO

A person's lipid profile is influenced by genetic variants and alcohol consumption, but the contribution of interactions between these exposures has not been studied. We therefore incorporated gene-alcohol interactions into a multiancestry genome-wide association study of levels of high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides. We included 45 studies in stage 1 (genome-wide discovery) and 66 studies in stage 2 (focused follow-up), for a total of 394,584 individuals from 5 ancestry groups. Analyses covered the period July 2014-November 2017. Genetic main effects and interaction effects were jointly assessed by means of a 2-degrees-of-freedom (df) test, and a 1-df test was used to assess the interaction effects alone. Variants at 495 loci were at least suggestively associated (P < 1 × 10-6) with lipid levels in stage 1 and were evaluated in stage 2, followed by combined analyses of stage 1 and stage 2. In the combined analysis of stages 1 and 2, a total of 147 independent loci were associated with lipid levels at P < 5 × 10-8 using 2-df tests, of which 18 were novel. No genome-wide-significant associations were found testing the interaction effect alone. The novel loci included several genes (proprotein convertase subtilisin/kexin type 5 (PCSK5), vascular endothelial growth factor B (VEGFB), and apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1 (APOBEC1) complementation factor (A1CF)) that have a putative role in lipid metabolism on the basis of existing evidence from cellular and experimental models.

11.
Genetics ; 211(4): 1179-1189, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30692194

RESUMO

High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have [Formula: see text], standard two-step methods all have [Formula: see text]-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Análise de Componente Principal/métodos , Animais , Estudo de Associação Genômica Ampla/normas , Humanos , Modelos Genéticos , Análise de Componente Principal/normas , Locos de Características Quantitativas , Reprodutibilidade dos Testes
12.
Genetics ; 211(2): 483-494, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30578273

RESUMO

With growing human genetic and epidemiologic data, there has been increased interest for the study of gene-by-environment (G-E) interaction effects. Still, major questions remain on how to test jointly a large number of interactions between multiple SNPs and multiple exposures. In this study, we first compared the relative performance of four fixed-effect joint analysis approaches using simulated data, considering up to 10 exposures and 300 SNPs: (1) omnibus test, (2) multi-exposure and genetic risk score (GRS) test, (3) multi-SNP and environmental risk score (ERS) test, and (4) GRS-ERS test. Our simulations explored both linear and logistic regression while considering three statistics: the Wald test, the Score test, and the likelihood ratio test (LRT). We further applied the approaches to three large sets of human cohort data (n = 37,664), focusing on type 2 diabetes (T2D), obesity, hypertension, and coronary heart disease with smoking, physical activity, diets, and total energy intake. Overall, GRS-based approaches were the most robust, and had the highest power, especially when the G-E interaction effects were correlated with the marginal genetic and environmental effects. We also observed severe miscalibration of joint statistics in logistic models when the number of events per variable was too low when using either the Wald test or LRT test. Finally, our real data application detected nominally significant interaction effects for three outcomes (T2D, obesity, and hypertension), mainly from the GRS-ERS approach. In conclusion, this study provides guidelines for testing multiple interaction parameters in modern human cohorts including extensive genetic and environmental data.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Algoritmos , Estudo de Associação Genômica Ampla/normas , Humanos , Polimorfismo de Nucleotídeo Único
13.
Genome Biol ; 19(1): 222, 2018 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-30563547

RESUMO

BACKGROUND: DNA methylation is influenced by both environmental and genetic factors and is increasingly thought to affect variation in complex traits and diseases. Yet, the extent of ancestry-related differences in DNA methylation, their genetic determinants, and their respective causal impact on immune gene regulation remain elusive. RESULTS: We report extensive population differences in DNA methylation between 156 individuals of African and European descent, detected in primary monocytes that are used as a model of a major innate immunity cell type. Most of these differences (~ 70%) are driven by DNA sequence variants nearby CpG sites, which account for ~ 60% of the variance in DNA methylation. We also identify several master regulators of DNA methylation variation in trans, including a regulatory hub nearby the transcription factor-encoding CTCF gene, which contributes markedly to ancestry-related differences in DNA methylation. Furthermore, we establish that variation in DNA methylation is associated with varying gene expression levels following mostly, but not exclusively, a canonical model of negative associations, particularly in enhancer regions. Specifically, we find that DNA methylation highly correlates with transcriptional activity of 811 and 230 genes, at the basal state and upon immune stimulation, respectively. Finally, using a Bayesian approach, we estimate causal mediation effects of DNA methylation on gene expression in ~ 20% of the studied cases, indicating that DNA methylation can play an active role in immune gene regulation. CONCLUSION: Using a system-level approach, our study reveals substantial ancestry-related differences in DNA methylation and provides evidence for their causal impact on immune gene regulation.


Assuntos
Grupo com Ancestrais do Continente Africano/genética , Metilação de DNA , Grupo com Ancestrais do Continente Europeu/genética , Regulação da Expressão Gênica , Imunidade Inata , Adulto , Epigênese Genética , Humanos , Masculino , Monócitos , Locos de Características Quantitativas
14.
PLoS One ; 13(6): e0198166, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29912962

RESUMO

Heavy alcohol consumption is an established risk factor for hypertension; the mechanism by which alcohol consumption impact blood pressure (BP) regulation remains unknown. We hypothesized that a genome-wide association study accounting for gene-alcohol consumption interaction for BP might identify additional BP loci and contribute to the understanding of alcohol-related BP regulation. We conducted a large two-stage investigation incorporating joint testing of main genetic effects and single nucleotide variant (SNV)-alcohol consumption interactions. In Stage 1, genome-wide discovery meta-analyses in ≈131K individuals across several ancestry groups yielded 3,514 SNVs (245 loci) with suggestive evidence of association (P < 1.0 x 10-5). In Stage 2, these SNVs were tested for independent external replication in ≈440K individuals across multiple ancestries. We identified and replicated (at Bonferroni correction threshold) five novel BP loci (380 SNVs in 21 genes) and 49 previously reported BP loci (2,159 SNVs in 109 genes) in European ancestry, and in multi-ancestry meta-analyses (P < 5.0 x 10-8). For African ancestry samples, we detected 18 potentially novel BP loci (P < 5.0 x 10-8) in Stage 1 that warrant further replication. Additionally, correlated meta-analysis identified eight novel BP loci (11 genes). Several genes in these loci (e.g., PINX1, GATA4, BLK, FTO and GABBR2) have been previously reported to be associated with alcohol consumption. These findings provide insights into the role of alcohol consumption in the genetic architecture of hypertension.

15.
Bioinformatics ; 34(19): 3412-3414, 2018 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-29726908

RESUMO

Summary: Many genome-wide association studies and genome-wide screening for gene-environment (GxE) interactions have been performed to elucidate the underlying mechanisms of human traits and diseases. When the analyzed outcome is quantitative, the overall contribution of identified genetic variants to the outcome is often expressed as the percentage of phenotypic variance explained. This is commonly done using individual-level genotype data but it is challenging when results are derived through meta-analyses. Here, we present R package, 'VarExp', that allows for the estimation of the percentage of phenotypic variance explained using summary statistics only. It allows for a range of models to be evaluated, including marginal genetic effects, GxE interaction effects and both effects jointly. Its implementation integrates all recent methodological developments and does not need external data to be uploaded by users. Availability and implementation: The R package is available at https://gitlab.pasteur.fr/statistical-genetics/VarExp.git. Supplementary information: Supplementary data are available at Bioinformatics online.

16.
Bioinformatics ; 34(16): 2781-2787, 2018 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-29617937

RESUMO

Motivation: Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses, leading to some software becoming obsolete and researchers having limited access to diverse analysis tools. Results: Here we present two R packages, bigstatsr and bigsnpr, allowing for the analysis of large scale genomic data to be performed within R. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement fast and accurate computations of principal component analysis and association studies, functions to remove single nucleotide polymorphisms in linkage disequilibrium and algorithms to learn polygenic risk scores on millions of single nucleotide polymorphisms. We illustrate applications of the two R packages by analyzing a case-control genomic dataset for celiac disease, performing an association study and computing polygenic risk scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500 000 individuals and 1 million markers on a single desktop computer. Availability and implementation: https://privefl.github.io/bigstatsr/ and https://privefl.github.io/bigsnpr/. Supplementary information: Supplementary data are available at Bioinformatics online.

17.
BMC Bioinformatics ; 19(1): 68, 2018 02 27.
Artigo em Inglês | MEDLINE | ID: mdl-29486711

RESUMO

BACKGROUND: Quantitative trait locus (QTL) mapping in genetic data often involves analysis of correlated observations, which need to be accounted for to avoid false association signals. This is commonly performed by modeling such correlations as random effects in linear mixed models (LMMs). The R package lme4 is a well-established tool that implements major LMM features using sparse matrix methods; however, it is not fully adapted for QTL mapping association and linkage studies. In particular, two LMM features are lacking in the base version of lme4: the definition of random effects by custom covariance matrices; and parameter constraints, which are essential in advanced QTL models. Apart from applications in linkage studies of related individuals, such functionalities are of high interest for association studies in situations where multiple covariance matrices need to be modeled, a scenario not covered by many genome-wide association study (GWAS) software. RESULTS: To address the aforementioned limitations, we developed a new R package lme4qtl as an extension of lme4. First, lme4qtl contributes new models for genetic studies within a single tool integrated with lme4 and its companion packages. Second, lme4qtl offers a flexible framework for scenarios with multiple levels of relatedness and becomes efficient when covariance matrices are sparse. We showed the value of our package using real family-based data in the Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) project. CONCLUSIONS: Our software lme4qtl enables QTL mapping models with a versatile structure of random effects and efficient computation for sparse covariances. lme4qtl is available at https://github.com/variani/lme4qtl .


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Software , Humanos , Modelos Lineares , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Trombofilia/genética
18.
Genet Epidemiol ; 42(3): 250-264, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29424028

RESUMO

The identification of gene-environment interactions in relation to risk of human diseases has been challenging. One difficulty has been that measurement error in the exposure can lead to massive reductions in the power of the test, as well as in bias toward the null in the interaction effect estimates. Leveraging previous work on linear discriminant analysis, we develop a new test of interaction between genetic variants and a continuous exposure that mitigates these detrimental impacts of exposure measurement error in ExG testing by reversing the role of exposure and the diseases status in the fitted model, thus transforming the analysis to standard linear regression. Through simulation studies, we show that the proposed approach is valid in the presence of classical exposure measurement error as well as when there is correlation between the exposure and the genetic variant. Simulations also demonstrated that the reverse test has greater power compared to logistic regression. Finally, we confirmed that our approach eliminates bias from exposure measurement error in estimation. Computing times are reduced by as much as fivefold in this new approach. For illustrative purposes, we applied the new approach to an ExGWAS study of interactions with alcohol and body mass index among 1,145 cases with invasive breast cancer and 1,142 controls from the Cancer Genetic Markers of Susceptibility study.


Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Viés , Neoplasias da Mama/genética , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Modelos Logísticos , Reprodutibilidade dos Testes
19.
Invest Ophthalmol Vis Sci ; 59(2): 629-636, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29392307

RESUMO

Purpose: Sex hormones may be associated with primary open-angle glaucoma (POAG), although the mechanisms are unclear. We previously observed that gene variants involved with estrogen metabolism were collectively associated with POAG in women but not men; here we assessed gene variants related to testosterone metabolism collectively and POAG risk. Methods: We used two datasets: one from the United States (3853 cases and 33,480 controls) and another from Australia (1155 cases and 1992 controls). Both datasets contained densely called genotypes imputed to the 1000 Genomes reference panel. We used pathway- and gene-based approaches with Pathway Analysis by Randomization Incorporating Structure (PARIS) software to assess the overall association between a panel of single nucleotide polymorphisms (SNPs) in testosterone metabolism genes and POAG. In sex-stratified analyses, we evaluated POAG overall and POAG subtypes defined by maximum IOP (high-tension [HTG] or normal tension glaucoma [NTG]). Results: In the US dataset, the SNP panel was not associated with POAG (permuted P = 0.77), although there was an association in the Australian sample (permuted P = 0.018). In both datasets, the SNP panel was associated with POAG in men (permuted P ≤ 0.033) and not women (permuted P ≥ 0.42), but in gene-based analyses, there was no consistency on the main genes responsible for these findings. In both datasets, the testosterone pathway association with HTG was significant (permuted P ≤ 0.011), but again, gene-based analyses showed no consistent driver gene associations. Conclusions: Collectively, testosterone metabolism pathway SNPs were consistently associated with the high-tension subtype of POAG in two datasets.


Assuntos
Glaucoma de Ângulo Aberto/genética , Redes e Vias Metabólicas/genética , Polimorfismo de Nucleotídeo Único , Testosterona/metabolismo , Conjuntos de Dados como Assunto , Feminino , Frequência do Gene , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Pressão Intraocular/fisiologia , Glaucoma de Baixa Tensão/genética , Masculino , Pessoa de Meia-Idade
20.
Am J Hum Genet ; 102(3): 375-400, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29455858

RESUMO

Genome-wide association analysis advanced understanding of blood pressure (BP), a major risk factor for vascular conditions such as coronary heart disease and stroke. Accounting for smoking behavior may help identify BP loci and extend our knowledge of its genetic architecture. We performed genome-wide association meta-analyses of systolic and diastolic BP incorporating gene-smoking interactions in 610,091 individuals. Stage 1 analysis examined ∼18.8 million SNPs and small insertion/deletion variants in 129,913 individuals from four ancestries (European, African, Asian, and Hispanic) with follow-up analysis of promising variants in 480,178 additional individuals from five ancestries. We identified 15 loci that were genome-wide significant (p < 5 × 10-8) in stage 1 and formally replicated in stage 2. A combined stage 1 and 2 meta-analysis identified 66 additional genome-wide significant loci (13, 35, and 18 loci in European, African, and trans-ancestry, respectively). A total of 56 known BP loci were also identified by our results (p < 5 × 10-8). Of the newly identified loci, ten showed significant interaction with smoking status, but none of them were replicated in stage 2. Several loci were identified in African ancestry, highlighting the importance of genetic studies in diverse populations. The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits. They also highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function (CDKN1B, BCAR1-CFDP1, PXDN, EEA1), ciliopathies (SDCCAG8, RPGRIP1L), telomere maintenance (TNKS, PINX1, AKTIP), and central dopaminergic signaling (MSRA, EBF2).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA