RESUMO
SUMMARY: flashfm-ivis provides a suite of interactive visualization plots to view potential causal genetic variants that underlie associations that are shared or distinct between multiple quantitative traits and compares results between single- and multi-trait fine-mapping. Unique features include network diagrams that show joint effects between variants for each trait and regional association plots that integrate fine-mapping results, all with user-controlled zoom features for an interactive exploration of potential causal variants across traits. AVAILABILITY AND IMPLEMENTATION: flashfm-ivis is an open-source software under the MIT license. It is available as an interactive web-based tool (http://shiny.mrc-bsu.cam.ac.uk/apps/flashfm-ivis/) and as an R package. Code and documentation are available at https://github.com/fz-cambridge/flashfm-ivis and https://zenodo.org/record/6376244#.YjnarC-l2X0. Additional features can be downloaded as standalone R libraries to encourage reuse. SUPPLEMENTARY INFORMATION: Supplementary information are available at Bioinformatics online.
Assuntos
Visualização de Dados , SoftwareRESUMO
Whilst thousands of genetic variants have been associated with human traits, identifying the subset of those variants that are causal requires a further 'fine-mapping' step. We review the basic fine-mapping approach, which is computationally fast and requires only summary data, but depends on an assumption of a single causal variant per associated region which is recognized as biologically unrealistic. We discuss different ways that the approach has been built upon to accommodate multiple causal variants in a region and to incorporate additional layers of functional annotation data. We further review methods for simultaneous fine-mapping of multiple datasets, either exploiting different linkage disequilibrium (LD) structures across ancestries or borrowing information between distinct but related traits. Finally, we look to the future and the opportunities that will be offered by increasingly accurate maps of causal variants for a multitude of human traits.
Assuntos
Mapeamento Cromossômico/métodos , Doença/genética , Marcadores Genéticos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Genoma Humano , Humanos , Desequilíbrio de Ligação , Modelos GenéticosRESUMO
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Assuntos
Variação Genética/genética , Genética Médica/tendências , Genoma Humano/genética , Genômica/tendências , África , África Subsaariana , Ásia/etnologia , Europa (Continente)/etnologia , Humanos , Fatores de Risco , Seleção Genética/genéticaRESUMO
To gain insight into potential regulatory mechanisms through which the effects of variants at four established type 2 diabetes (T2D) susceptibility loci (CDKAL1, CDKN2A-B, IGF2BP2 and KCNQ1) are mediated, we undertook transancestral fine-mapping in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent. Through high-density imputation and conditional analyses, we identified seven distinct association signals at these four loci, each with allelic effects on T2D susceptibility that were homogenous across ancestry groups. By leveraging differences in the structure of linkage disequilibrium between diverse populations, and increased sample size, we localised the variants most likely to drive each distinct association signal. We demonstrated that integration of these genetic fine-mapping data with genomic annotation can highlight potential causal regulatory elements in T2D-relevant tissues. These analyses provide insight into the mechanisms through which T2D association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci.
Assuntos
Mapeamento Cromossômico , Diabetes Mellitus Tipo 2/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Negro ou Afro-Americano/genética , Alelos , Povo Asiático/genética , Inibidor p16 de Quinase Dependente de Ciclina , Inibidor de Quinase Dependente de Ciclina p18/genética , Diabetes Mellitus Tipo 2/patologia , Feminino , Humanos , Canal de Potássio KCNQ1/genética , Desequilíbrio de Ligação , Masculino , Polimorfismo de Nucleotídeo Único , Proteínas de Ligação a RNA/genética , Elementos Reguladores de Transcrição/genética , População Branca/genética , tRNA Metiltransferases/genéticaRESUMO
There has been increasing interest in rare variants and their association with disease, and several rare variant-disease associations have already been detected. The usual association tests for common variants are underpowered for detecting variants of lower frequency, so alternative approaches are required. In addition to reviewing the association analysis methods for rare variants, we discuss the limitations of genome-wide association studies in identifying rare variants and the problems that arise in the imputation of rare variants.
Assuntos
Doença/genética , Estudos de Associação Genética/métodos , Mutação , Animais , Estudo de Associação Genômica Ampla/métodos , HumanosRESUMO
Diseases often cooccur in individuals more often than expected by chance, and may be explained by shared underlying genetic etiology. A common approach to genetic overlap analyses is to use summary genome-wide association study data to identify single-nucleotide polymorphisms (SNPs) that are associated with multiple traits at a selected P-value threshold. However, P-values do not account for differences in power, whereas Bayes' factors (BFs) do, and may be approximated using summary statistics. We use simulation studies to compare the power of frequentist and Bayesian approaches with overlap analyses, and to decide on appropriate thresholds for comparison between the two methods. It is empirically illustrated that BFs have the advantage over P-values of a decreasing type I error rate as study size increases for single-disease associations. Consequently, the overlap analysis of traits from different-sized studies encounters issues in fair P-value threshold selection, whereas BFs are adjusted automatically. Extensive simulations show that Bayesian overlap analyses tend to have higher power than those that assess association strength with P-values, particularly in low-power scenarios. Calibration tables between BFs and P-values are provided for a range of sample sizes, as well as an approximation approach for sample sizes that are not in the calibration table. Although P-values are sometimes thought more intuitive, these tables assist in removing the opaqueness of Bayesian thresholds and may also be used in the selection of a BF threshold to meet a certain type I error rate. An application of our methods is used to identify variants associated with both obesity and osteoarthritis.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Obesidade/epidemiologia , Osteoartrite/epidemiologia , Característica Quantitativa Herdável , Teorema de Bayes , Índice de Massa Corporal , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Modelos Genéticos , Obesidade/genética , Osteoartrite/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Probabilidade , Tamanho da AmostraRESUMO
Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.
Assuntos
Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , Alelos , Genótipo , Humanos , Modelos Genéticos , FenótipoRESUMO
The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.
Assuntos
Variação Genética , Interpretação Estatística de Dados , Estudo de Associação Genômica Ampla , HumanosRESUMO
OBJECTIVES: There is increasing evidence that rare variants play a role in some complex traits, but their analysis is not straightforward. Locus-based tests become necessary due to low power in rare variant single-point association analyses. In addition, variant quality scores are available for sequencing data, but are rarely taken into account. Here, we propose two locus-based methods that incorporate variant quality scores: a regression-based collapsing approach and an allele-matching method. METHODS: Using simulated sequencing data we compare 4 locus-based tests of trait association under different scenarios of data quality. We test two collapsing-based approaches and two allele-matching-based approaches, taking into account variant quality scores and ignoring variant quality scores. We implement the collapsing and allele-matching approaches accounting for variant quality in the freely available ARIEL and AMELIA software. RESULTS: The incorporation of variant quality scores in locus-based association tests has power advantages over weighting each variant equally. The allele-matching methods are robust to the presence of both protective and risk variants in a locus, while collapsing methods exhibit a dramatic loss of power in this scenario. CONCLUSIONS: The incorporation of variant quality scores should be a standard protocol when performing locus-based association analysis on sequencing data. The ARIEL and AMELIA software implement collapsing and allele-matching locus association analysis methods, respectively, that allow the incorporation of variant quality scores.
Assuntos
Estudos de Associação Genética , Variação Genética , Software , Alelos , Simulação por Computador , Genótipo , Humanos , Modelos LogísticosRESUMO
AIMS: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants. METHODS: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls. RESULTS: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage. CONCLUSION: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
Assuntos
Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Algoritmos , Humanos , Polimorfismo de Nucleotídeo Único , Curva ROCRESUMO
Statistical fine-mapping helps to pinpoint likely causal variants underlying genetic association signals. Its resolution can be improved by (i) leveraging information between traits; and (ii) exploiting differences in linkage disequilibrium structure between diverse population groups. Using association summary statistics, MGflashfm jointly fine-maps signals from multiple traits and population groups; MGfm uses an analogous framework to analyse each trait separately. We also provide a practical approach to fine-mapping with out-of-sample reference panels. In simulation studies we show that MGflashfm and MGfm are well-calibrated and that the mean proportion of causal variants with PP > 0.80 is above 0.75 (MGflashfm) and 0.70 (MGfm). In our analysis of four lipids traits across five population groups, MGflashfm gives a median 99% credible set reduction of 10.5% over MGfm. MGflashfm and MGfm only require summary level data, making them very useful fine-mapping tools in consortia efforts where individual-level data cannot be shared.
Assuntos
Estudo de Associação Genômica Ampla , Grupos Populacionais , Humanos , Mapeamento Cromossômico , Polimorfismo de Nucleotídeo Único , Desequilíbrio de LigaçãoRESUMO
Most genome-wide association studies (GWAS) for lipid traits focus on the separate analysis of lipid traits. Moreover, there are limited GWASs evaluating the genetic variants associated with multiple lipid traits in African ancestry. To further identify and localize loci with pleiotropic effects on lipid traits, we conducted a genome-wide meta-analysis, multi-trait analysis of GWAS (MTAG), and multi-trait fine-mapping (flashfm) in 125,000 individuals of African ancestry. Our meta-analysis and MTAG identified four and 14 novel loci associated with lipid traits, respectively. flashfm yielded an 18% mean reduction in the 99% credible set size compared to single-trait fine-mapping with JAM. Moreover, we identified more genetic variants with a posterior probability of causality >0.9 with flashfm than with JAM. In conclusion, we identified additional novel loci associated with lipid traits, and flashfm reduced the 99% credible set size to identify causal genetic variants associated with multiple lipid traits in African ancestry.
Assuntos
Estudo de Associação Genômica Ampla , Lipídeos , Humanos , População Negra , Lipídeos/genética , FenótipoRESUMO
Background: Genome-wide association studies for glycemic traits have identified hundreds of loci associated with these biomarkers of glucose homeostasis. Despite this success, the challenge remains to link variant associations to genes, and underlying biological pathways. Methods: To identify coding variant associations which may pinpoint effector genes at both novel and previously established genome-wide association loci, we performed meta-analyses of exome-array studies for four glycemic traits: glycated hemoglobin (HbA1c, up to 144,060 participants), fasting glucose (FG, up to 129,665 participants), fasting insulin (FI, up to 104,140) and 2hr glucose post-oral glucose challenge (2hGlu, up to 57,878). In addition, we performed network and pathway analyses. Results: Single-variant and gene-based association analyses identified coding variant associations at more than 60 genes, which when combined with other datasets may be useful to nominate effector genes. Network and pathway analyses identified pathways related to insulin secretion, zinc transport and fatty acid metabolism. HbA1c associations were strongly enriched in pathways related to blood cell biology. Conclusions: Our results provided novel glycemic trait associations and highlighted pathways implicated in glycemic regulation. Exome-array summary statistic results are being made available to the scientific community to enable further discoveries.
RESUMO
Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits of GWAS meta-analyses, in the presence and absence of modest levels of heterogeneity and across a range of different allelic architectures. We find that data combination through large-scale collaboration is vital in the quest for complex trait susceptibility loci, but that effect size heterogeneity across meta-analyzed studies drawn from similar populations does not appear to have a profound effect on sample size requirements.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Metanálise como Assunto , Frequência do Gene , Heterogeneidade Genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Tamanho da AmostraRESUMO
Early studies of breast cancer microarray data used linear models to quantify the relationship between measures of gene expression (GE) and copy number (CN) obtained from tumour samples. Motivated by a study of women with axillary node-negative breast cancer, we propose a regression-based scan statistic to identify within-chromosome clusters of genetic probes that exhibit association between GE and CN, while accounting for tumour characteristics known to be prognostic for clinical outcome. As a measure of the association between GE and CN, for each genetic probe available from a microarray we regress GE on CN, and include subject-specific covariates. In the development of the scan statistic, the within-chromosome spatial distribution of the subset of probes with a statistically significant association is approximated by a Poisson process. By incorporating the distance between the probe positions, the scan statistic accounts for the spatial nature of CN alterations. Regions identified as clusters of significant associations are hypothesized to harbour genes involved in breast cancer progression. Using simulations, we examine the sensitivity of the method to certain factors, and to address issues of repeatability, we consider reappearance probabilities for each probe within detected regions and assess the utility of a quantity estimated by bootstrap sample frequencies. Applications of the proposed method to joint analysis of GE and CN in breast tumours, with and without an informative covariate, and comparisons with alternative methods suggest that inclusion of covariates and the use of a regional test statistic can serve to refine regions for further investigation including the analysis of their association with outcome.
Assuntos
Neoplasias da Mama/genética , Dosagem de Genes , Expressão Gênica , Modelos Genéticos , Análise de Regressão , Simulação por Computador , Progressão da Doença , Feminino , Humanos , Análise de Sequência com Séries de OligonucleotídeosRESUMO
With rapid advances in genotyping technologies in recent years and the growing number of available markers, genome-wide association studies are emerging as promising approaches for the study of complex diseases and traits. However, there are several challenges with analysis and interpretation of such data. First, there is a massive multiple testing problem, due to the large number of markers that need to be analyzed, leading to an increased risk of false positives and decreased ability for association studies to detect truly associated markers. In particular, the ability to detect modest genetic effects can be severely compromised. Second, a genetic association of a given single-nucleotide polymorphism as determined by univariate statistical analyses does not typically explain biologically interesting features, and often requires subsequent interpretation using a higher unit, such as a gene or region, for example, as defined by haplotype blocks. Third, missing genotypes in the data set and other data quality issues can pose challenges when comparisons across platforms and replications are planned. Finally, depending on the type of univariate analysis, computational burden can arise as the number of markers continues to grow into the millions. One way to deal with these and related challenges is to consider higher units for the analysis, such as genes or regions. This article summarizes analytical methods and strategies that have been proposed and applied by Group 16 to two genome-wide association data sets made available through the Genetic Analysis Workshop 16.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Artrite Reumatoide/epidemiologia , Artrite Reumatoide/genética , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genótipo , Haplótipos , Humanos , Epidemiologia Molecular , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Thousands of genetic variants are associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. Both lack of power, and joint tagging of two or more distinct causal variants by a single non-causal SNP, lead to inaccuracies in fine-mapping, with stochastic search more robust than stepwise. We develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. MFM analysis of six immune-mediated diseases reveals causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes.
Assuntos
Autoimunidade/genética , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Alelos , Teorema de Bayes , Linfócitos T CD4-Positivos , Antígeno CTLA-4/genética , Mapeamento Cromossômico , Regulação da Expressão Gênica , Genótipo , Humanos , Subunidade alfa de Receptor de Interleucina-2/genética , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Shared genetic aetiology may explain the co-occurrence of diseases in individuals more often than expected by chance. On identifying associated variants shared between two traits, one objective is to determine whether such overlap may be explained by specific genomic characteristics (eg, functional annotation). In clinical studies, inter-rater agreement approaches assess concordance among expert opinions on the presence/absence of a complex disease for each subject. We adapt a two-stage inter-rater agreement model to the genetic association setting to identify features predictive of overlap variants, while accounting for their marginal trait associations. The resulting corrected overlap and marginal enrichment test (COMET) also assesses enrichment at the individual trait level. Multiple categories may be tested simultaneously and the method is computationally efficient, not requiring permutations to assess significance. In an extensive simulation study, COMET identifies features predictive of enrichment with high power and has well-calibrated type I error. In contrast, testing for overlap with a single-trait enrichment test has inflated type I error. COMET is applied to three glycaemic traits using a set of functional annotation categories as predictors, followed by further analyses that focus on tissue-specific regulatory variants. The results support previous findings that regulatory variants in pancreatic islets are enriched for fasting glucose-associated variants, and give insight into differences/similarities between characteristics of variants associated with glycaemic traits. Also, despite regulatory variants in pancreatic islets being enriched for variants that are marginally associated with fasting glucose and fasting insulin, there is no enrichment of shared variants between the traits.
Assuntos
Glicemia/genética , Modelos Genéticos , Mutação , Predisposição Genética para Doença , Humanos , Característica Quantitativa HerdávelRESUMO
The genetic features of isolated populations can boost power in complex-trait association studies, and an in-depth understanding of how their genetic variation has been shaped by their demographic history can help leverage these advantageous characteristics. Here, we perform a comprehensive investigation using 3,059 newly generated low-depth whole-genome sequences from eight European isolates and two matched general populations, together with published data from the 1000 Genomes Project and UK10K. Sequencing data give deeper and richer insights into population demography and genetic characteristics than genotype-chip data, distinguishing related populations more effectively and allowing their functional variants to be studied more fully. We demonstrate relaxation of purifying selection in the isolates, leading to enrichment of rare and low-frequency functional variants, using novel statistics, DVxy and SVxy. We also develop an isolation-index (Isx) that predicts the overall level of such key genetic characteristics and can thus help guide population choice in future complex-trait association studies.