RESUMEN
Genome-wide association studies (GWAS) have benefited greatly from enhanced high-throughput technology in recent decades. GWAS meta-analysis has become increasingly popular to highlight the genetic architecture of complex traits, informing about the replicability and variability of effect estimations across human ancestries. A wealth of GWAS meta-analysis methodologies have been developed depending on the input data and the outcome information of interest. We present a survey of current approaches from SNP to pathway-based meta-analysis by acknowledging the range of resources and methodologies in the field, and we provide a comprehensive review of different categories of Genome-Wide Meta-analysis methods employed. These methods highlight different levels at which GWAS meta-analysis may be done, including Single Nucleotide Polymorphisms, Genes and Pathways, for which we describe their framework outline. We also discuss the strengths and pitfalls of each approach and make suggestions regarding each of them.
Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Estudios de Asociación Genética , Herencia MultifactorialRESUMEN
Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.
Asunto(s)
Población Negra/genética , Bases de Datos de Ácidos Nucleicos , Variación Genética , Genoma Humano , Población Blanca/genética , Secuenciación Completa del Genoma , Humanos , Desequilibrio de LigamientoRESUMEN
Advances in human sequencing technologies, coupled with statistical and computational tools, have fostered the development of methods for dating admixture events. These methods have merits and drawbacks in estimating admixture events in multi-way admixed populations. Here, we first provide a comprehensive review and comparison of current methods pertinent to dating admixture events. Second, we assess various admixture dating tools. We do so by performing various simulations. Third, we apply the top two assessed methods to real data of a uniquely admixed population from South Africa. Results reveal that current dating admixture models are not sufficiently equipped to estimate ancient admixtures events and to identify multi-faceted admixture events in complex multi-way admixed populations. We conclude with a discussion of research areas where further work on dating admixture-based methods is needed.
RESUMEN
MOTIVATION: Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. RESULTS: Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: The FractalSIM package is available at http://www.cbio.uct.ac.za/FractalSIM. CONTACT: emile.chimusa@uct.ac.za. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genética de Población/métodos , Genómica/métodos , Variación Genética , Genoma , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Selección Genética , Análisis de Secuencia de ADN , Programas InformáticosRESUMEN
Background: Suicidal behaviour (SB) refers to behaviours, ranging from non-fatal suicidal behaviour, such as suicidal ideation and attempt, to completed suicide. Despite recent advancements in genomic technology and statistical methods, it is unclear to what extent the spectrum of suicidal behaviour is explained by shared genetic aetiology. Methods: We identified nine genome-wide association statistics of suicidal behaviour (sample sizes, n, ranging from 62,648 to 125,844), ten psychiatric traits [n up to 386,533] and collectively, nine summary datasets of anthropometric, behavioural and socioeconomic-related traits [n ranging from 58,610 to 941,280]. We calculated the genetic correlation among these traits and modelled this using genomic structural equation modelling, identified shared biological processes and pathways between suicidal behaviour and psychiatric disorders and evaluated potential causal associations using Mendelian randomisation. Results: Among populations of European ancestry, we observed strong positive genetic correlations between suicide ideation, attempt and self-harm (rg range, 0.71-1.09) and moderate to strong genetic correlations between suicidal behaviour traits and a range of psychiatric disorders, most notably, major depression disorder (rg = 0.86, p = 1.62 × 10-36). Multivariate analysis revealed a common factor structure for suicidal behaviour traits, major depression, attention deficit hyperactivity disorder (ADHD) and alcohol use disorder. The derived common factor explained 38.7% of the shared variance across the traits. We identified 2,951 genes and 98 sub-network hub genes associated with the common factor, including pathways associated with developmental biology, signal transduction and RNA degradation. We found suggestive evidence for the protective effects of higher household income level on suicide attempt [OR = 0.55 (0.44-0.70), p = 1.29 × 10-5] and while further investigation is needed, a nominal significant effect of smoking on suicide attempt [OR = 1.24 (1.04-1.44), p = 0.026]. Conclusion: Our findings provide evidence of shared aetiology between suicidal behaviour and psychiatric disorders and indicate potential common molecular mechanisms contributing to the overlapping pathophysiology. These findings provide a better understanding of the complex genetic architecture of suicidal behaviour and have implications for the prevention and treatment of suicidal behaviour.
RESUMEN
Over the past decades, advanced high-throughput technologies have continuously contributed to genome-wide association studies (GWASs). GWAS meta-analysis has been increasingly adopted, has cross-ancestry replicability, and has power to illuminate the genetic architecture of complex traits, informing about the reliability of estimation effects and their variability across human ancestries. However, detecting genetic variants that have low disease risk still poses a challenge. Designing a meta-analysis approach that combines the effect of various SNPs within genes or genes within pathways from multiple independent population GWASs may be helpful in identifying associations with small effect sizes and increasing the association power. Here, we proposed ancMETA, a Bayesian graph-based framework, to perform the gene/pathway-specific meta-analysis by combining the effect size of multiple SNPs within genes, and genes within subnetwork/pathways across multiple independent population GWASs to deconvolute the interactions between genes underlying the pathogenesis of complex diseases across human populations. We assessed the proposed framework on simulated datasets, and the results show that the proposed model holds promise for increasing statistical power for meta-analysis of genetic variants underlying the pathogenesis of complex diseases. To illustrate the proposed meta-analysis framework, we leverage seven different European bipolar disorder (BD) cohorts, and we identify variants in the angiotensinogen (AGT) gene to be significantly associated with BD across all 7 studies. We detect a commonly significant BD-specific subnetwork with the ESR1 gene as the main hub of a subnetwork, associated with neurotrophin signaling (p = 4e -14) and myometrial relaxation and contraction (p = 3e -08) pathways. ancMETA provides a new contribution to post-GWAS methodologies and holds promise for comprehensively examining interactions between genes underlying the pathogenesis of genetic diseases and also underlying ethnic differences.