Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 122
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 110(11): 1888-1902, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37890495

RESUMO

Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.


Assuntos
Herança Multifatorial , População Branca , Humanos , Herança Multifatorial/genética , População Branca/genética , Fenótipo , População Negra/genética , Povo Asiático/genética , Estudo de Associação Genômica Ampla/métodos
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36585781

RESUMO

Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.


Assuntos
Genoma , Genômica , Análise de Componente Principal , Frequência do Gene , Simulação por Computador , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único
3.
Stat Appl Genet Mol Biol ; 23(1)2024 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-38235525

RESUMO

Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.


Assuntos
Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Haplótipos , Teorema de Bayes , Fenótipo , Estudo de Associação Genômica Ampla
4.
BMC Genomics ; 25(1): 69, 2024 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-38233755

RESUMO

BACKGROUND: The yak is a symbol of the Qinghai-Tibet Plateau and provides important basic resources for human life on the plateau. Domestic yaks have been subjected to strong artificial selection and environmental pressures over the long-term. Understanding the molecular mechanisms of phenotypic differences in yak populations can reveal key functional genes involved in the domestication process and improve genetic breeding. MATERIAL AND METHOD: Here, we re-sequenced 80 yaks (Maiwa, Yushu, and Huanhu populations) to identify single-nucleotide polymorphisms (SNPs) as genetic variants. After filtering and quality control, remaining SNPs were kept to identify the genome-wide regions of selective sweeps associated with domestic traits. The four methods (π, XPEHH, iHS, and XP-nSL) were used to detect the population genetic separation. RESULTS: By comparing the differences in the population stratification, linkage disequilibrium decay rate, and characteristic selective sweep signals, we identified 203 putative selective regions of domestic traits, 45 of which were mapped to 27 known genes. They were clustered into 4 major GO biological process terms. All known genes were associated with seven major domestication traits, such as dwarfism (ANKRD28), milk (HECW1, HECW2, and OSBPL2), meat (SPATA5 and GRHL2), fertility (BTBD11 and ARFIP1), adaptation (NCKAP5, ANTXR1, LAMA5, OSBPL2, AOC2, and RYR2), growth (GRHL2, GRID2, SMARCAL1, and EPHB2), and the immune system (INPP5D and ADCYAP1R1). CONCLUSIONS: We provided there is an obvious genetic different among domestic progress in these three yak populations. Our findings improve the understanding of the major genetic switches and domestic processes among yak populations.


Assuntos
ATPases Associadas a Diversas Atividades Celulares , Domesticação , Receptores de Esteroides , Animais , Humanos , Bovinos/genética , Genoma , Análise de Sequência de DNA , Tibet , Genética Populacional , Proteínas dos Microfilamentos , Receptores de Superfície Celular , DNA Helicases , Proteínas do Tecido Nervoso , Ubiquitina-Proteína Ligases
5.
Am J Epidemiol ; 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38806447

RESUMO

Polygenic risk scores (PRS) are rapidly emerging as a way to measure disease risk by aggregating multiple genetic variants. Understanding the interplay of PRS with environmental factors is critical for interpreting and applying PRS in a wide variety of settings. We develop an efficient method for simultaneously modeling gene-environment correlations and interactions using PRS in case control studies. We use a logistic-normal regression modeling framework to specify the disease risk and PRS distribution in the underlying population and propose joint inference across the two models using the retrospective likelihood of the case-control data. Extensive simulation studies demonstrate the flexibility of the method in trading-off bias and efficiency for the estimation of various model parameters compared to the standard logistic regression or a case-only analysis for gene-environment interactions, or a control-only analysis for gene-environment correlations. Finally using simulated case-control data sets within the UK Biobank study, we demonstrate the power of our method for its ability to recover results from the full prospective cohort for the detection of an interaction between long-term oral contraceptive use and PRS on the risk of breast cancer. This method is computationally efficient and implemented in a user-friendly R package.

6.
Am J Hum Genet ; 108(7): 1270-1282, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-34157305

RESUMO

Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.


Assuntos
Interpretação Estatística de Dados , Metagenômica/métodos , Linhagem , Grupos Raciais/genética , Alelos , Simulação por Computador , Frequência do Gene , Humanos , Padrões de Herança , Software
7.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35021184

RESUMO

With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.


Assuntos
Modelos Genéticos , Estudos de Casos e Controles , Simulação por Computador , Humanos , Modelos Logísticos
8.
Hum Genomics ; 17(1): 46, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37268996

RESUMO

BACKGROUND: The Million Veteran Program (MVP) participants represent 100 years of US history, including significant social and demographic changes over time. Our study assessed two aspects of the MVP: (i) longitudinal changes in population diversity and (ii) how these changes can be accounted for in genome-wide association studies (GWAS). To investigate these aspects, we divided MVP participants into five birth cohorts (N-range = 123,888 [born from 1943 to 1947] to 136,699 [born from 1948 to 1953]). RESULTS: Ancestry groups were defined by (i) HARE (harmonized ancestry and race/ethnicity) and (ii) a random-forest clustering approach using the 1000 Genomes Project and the Human Genome Diversity Project (1kGP + HGDP) reference panels (77 world populations representing six continental groups). In these groups, we performed GWASs of height, a trait potentially affected by population stratification. Birth cohorts demonstrate important trends in ancestry diversity over time. More recent HARE-assigned Europeans, Africans, and Hispanics had lower European ancestry proportions than older birth cohorts (0.010 < Cohen's d < 0.259, p < 7.80 × 10-4). Conversely, HARE-assigned East Asians showed an increase in European ancestry proportion over time. In GWAS of height using HARE assignments, genomic inflation due to population stratification was prevalent across all birth cohorts (linkage disequilibrium score regression intercept = 1.08 ± 0.042). The 1kGP + HGDP-based ancestry assignment significantly reduced the population stratification (mean intercept reduction = 0.045 ± 0.007, p < 0.05) confounding in the GWAS statistics. CONCLUSIONS: This study provides a characterization of ancestry diversity of the MVP cohort over time and compares two strategies to infer genetically defined ancestry groups by assessing differences in controlling population stratification in genome-wide association studies.


Assuntos
Etnicidade , Grupos Raciais , Veteranos , Humanos , Etnicidade/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética , Grupos Raciais/genética
9.
Int J Mol Sci ; 25(2)2024 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-38256224

RESUMO

Numerous type 2 diabetes (T2D) polygenic risk scores (PGSs) have been developed to predict individuals' predisposition to the disease. An independent assessment and verification of the best-performing PGS are warranted to allow for a rapid application of developed models. To date, only 3% of T2D PGSs have been evaluated. In this study, we assessed all (n = 102) presently published T2D PGSs in an independent cohort of 3718 individuals, which has not been included in the construction or fine-tuning of any T2D PGS so far. We further chose the best-performing PGS, assessed its performance across major population principal component analysis (PCA) clusters, and compared it with newly developed population-specific T2D PGS. Our findings revealed that 88% of the published PGSs were significantly associated with T2D; however, their performance was lower than what had been previously reported. We found a positive association of PGS improvement over the years (p-value = 8.01 × 10-4 with PGS002771 currently showing the best discriminatory power (area under the receiver operating characteristic (AUROC) = 0.669) and PGS003443 exhibiting the strongest association PGS003443 (odds ratio (OR) = 1.899). Further investigation revealed no difference in PGS performance across major population PCA clusters and when compared with newly developed population-specific PGS. Our findings revealed a positive trend in T2D PGS performance, consistently identifying high-T2D-risk individuals in an independent European population.


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/genética , Estratificação de Risco Genético , Genótipo , Razão de Chances , Análise de Componente Principal
10.
BMC Bioinformatics ; 24(1): 135, 2023 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-37020193

RESUMO

BACKGROUND: Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS: We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS: The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .


Assuntos
Estudo de Associação Genômica Ampla , Software , Animais , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genoma , Fluxo de Trabalho
11.
Genet Epidemiol ; 46(8): 604-614, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35766057

RESUMO

Over the past years, genome-wide association studies (GWAS) have generated a wealth of new information. Summary data from many GWAS are now publicly available, promoting the development of many statistical methods for association studies based on GWAS summary statistics, which avoids the increasing challenges associated with individual-level genotype and phenotype data sharing. However, for population-based association studies such as GWAS, it has been long recognized that population stratification can seriously confound association results. For large GWAS, it is very likely that there exist population stratification and cryptic relatedness, which will result in inflated Type I error in association testing. Although many methods have been developed to control for population stratification, only two of these approaches can be used to control population stratification without individual-level data: one is based on genomic control (GC) and the other one is based on linkage disequilibrium score regression (LDSC). However, the performance of these two approaches is currently unknown. In this study, we use extensive simulation studies including populations with subpopulations, spatially structured populations, and populations with cryptic relatedness to compare the performance of these two approaches to control for population stratification using only GWAS summary statistics without individual-level data. Data sets from the genetic analysis workshop 19 and UK Biobank are also used to evaluate these two approaches. We demonstrate that the intercept of LDSC can be used as a more accurate correction factor than GC. The results from this study will provide very useful information for researchers using GWAS summary statistics while trying to control for population stratification.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Estudos de Associação Genética , Desequilíbrio de Ligação , Fenótipo
12.
Ann Hum Genet ; 87(6): 302-315, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37771252

RESUMO

INTRODUCTION: Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity. MATERIALS AND METHODS: To address these shortcomings, we introduce Bayestrat-a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs. RESULTS: Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses. DISCUSSION: The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Humanos , Teorema de Bayes , Estudos de Associação Genética , Simulação por Computador , Modelos Lineares , Fenótipo
13.
Am J Hum Genet ; 107(1): 60-71, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32533944

RESUMO

Adult height is one of the earliest putative examples of polygenic adaptation in humans. However, this conclusion was recently challenged because residual uncorrected stratification from large-scale consortium studies was considered responsible for the previously noted genetic difference. It thus remains an open question whether height loci exhibit signals of polygenic adaptation in any human population. We re-examined this question, focusing on one of the shortest European populations, the Sardinians, in addition to mainland European populations. We utilized height-associated loci from the Biobank Japan (BBJ) dataset to further alleviate concerns of biased ascertainment of GWAS loci and showed that the Sardinians remain significantly shorter than expected under neutrality (∼0.22 standard deviation shorter than Utah residents with ancestry from northern and western Europe [CEU] on the basis of polygenic height scores, p = 3.89 × 10-4). We also found the trajectory of polygenic height scores between the Sardinian and the British populations diverged over at least the last 10,000 years (p = 0.0082), consistent with a signature of polygenic adaptation driven primarily by the Sardinian population. Although the polygenic score-based analysis showed a much subtler signature in mainland European populations, we found a clear and robust adaptive signature in the UK population by using a haplotype-based statistic, the trait singleton density score (tSDS), driven by the height-increasing alleles (p = 9.1 × 10-4). In summary, by ascertaining height loci in a distant East Asian population, we further supported the evidence of polygenic adaptation at height-associated loci among the Sardinians. In mainland Europeans, the adaptive signature was detected in haplotype-based analysis but not in polygenic score-based analysis.


Assuntos
Adaptação Fisiológica/genética , Estatura/genética , Herança Multifatorial/genética , Alelos , Povo Asiático/genética , Bancos de Espécimes Biológicos , Genética Populacional/métodos , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Haplótipos/genética , Humanos , Itália , Japão , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética/genética , População Branca/genética
14.
Genet Epidemiol ; 45(5): 427-444, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33998038

RESUMO

Many genetic studies that aim to identify genetic variants associated with complex phenotypes are subject to unobserved confounding factors arising from environmental heterogeneity. This poses a challenge to detecting associations of interest and is known to induce spurious associations when left unaccounted for. Penalized linear mixed models (LMMs) are an attractive method to correct for unobserved confounding. These methods correct for varying levels of relatedness and population structure by modeling it as a random effect with a covariance structure estimated from observed genetic data. Despite an extensive literature on penalized regression and LMMs separately, the two are rarely discussed together. The aim of this review is to do so while examining the statistical properties of penalized LMMs in the genetic association setting. Specifically, the ability of penalized LMMs to accurately estimate genetic effects in the presence of environmental confounding has not been well studied. To clarify the important yet subtle distinction between population structure and environmental heterogeneity, we present a detailed review of relevant concepts and methods. In addition, we evaluate the performance of penalized LMMs and competing methods in terms of estimation and selection accuracy in the presence of a number of confounding structures.


Assuntos
Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Humanos , Modelos Lineares , Fenótipo
15.
Genet Epidemiol ; 45(8): 821-829, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34402542

RESUMO

Many methods for rare variant association studies require permutations to assess the significance of tests. Standard permutations assume that all individuals are exchangeable and do not take population stratification (PS), a known confounding factor in genetic studies, into account. We propose a novel strategy, LocPerm, in which individual phenotypes are permuted only with their closest ancestry-based neighbors. We performed a simulation study, focusing on small samples, to evaluate and compare LocPerm with standard permutations and classical adjustment on first principal components. Under the null hypothesis, LocPerm was the only method providing an acceptable type I error, regardless of sample size and level of stratification. The power of LocPerm was similar to that of standard permutation in the absence of PS, and remained stable in different PS scenarios. We conclude that LocPerm is a method of choice for taking PS and/or small sample size into account in rare variant association studies.


Assuntos
Genética Populacional , Modelos Genéticos , Simulação por Computador , Estudos de Associação Genética , Humanos , Tamanho da Amostra
16.
Genet Epidemiol ; 45(1): 82-98, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32929743

RESUMO

locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.


Assuntos
Estudo de Associação Genômica Ampla , Genoma , Algoritmos , Genômica , Humanos , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma
17.
BMC Genomics ; 23(1): 98, 2022 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-35120458

RESUMO

BACKGROUND: Mixed models are used to correct for confounding due to population stratification and hidden relatedness in genome-wide association studies. This class of models includes linear mixed models and generalized linear mixed models. Existing mixed model approaches to correct for population substructure have been previously investigated with both continuous and case-control response variables. However, they have not been investigated in the context of extreme phenotype sampling (EPS), where genetic covariates are only collected on samples having extreme response variable values. In this work, we compare the performance of existing binary trait mixed model approaches (GMMAT, LEAP and CARAT) on EPS data. Since linear mixed models are commonly used even with binary traits, we also evaluate the performance of a popular linear mixed model implementation (GEMMA). RESULTS: We used simulation studies to estimate the type I error rate and power of all approaches assuming a population with substructure. Our simulation results show that for a common candidate variant, both LEAP and GMMAT control the type I error rate while CARAT's rate remains inflated. We applied all methods to a real dataset from a Québec, Canada, case-control study that is known to have population substructure. We observe similar type I error control with the analysis on the Québec dataset. For rare variants, the false positive rate remains inflated even after correction with mixed model approaches. For methods that control the type I error rate, the estimated power is comparable. CONCLUSIONS: The methods compared in this study differ in their type I error control. Therefore, when data are from an EPS study, care should be taken to ensure that the models underlying the methodology are suitable to the sampling strategy and to the minor allele frequency of the candidate SNPs.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Estudos de Casos e Controles , Simulação por Computador , Modelos Lineares , Fenótipo , Polimorfismo de Nucleotídeo Único
18.
Am J Hum Genet ; 104(6): 1169-1181, 2019 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-31155286

RESUMO

Polygenic scores (PSs) are becoming a useful tool to identify individuals with high genetic risk for complex diseases, and several projects are currently testing their utility for translational applications. It is also tempting to use PSs to assess whether genetic variation can explain a part of the geographic distribution of a phenotype. However, it is not well known how the population genetic properties of the training and target samples affect the geographic distribution of PSs. Here, we evaluate geographic differences, and related biases, of PSs in Finland in a geographically well-defined sample of 2,376 individuals from the National FINRISK study. First, we detect geographic differences in PSs for coronary artery disease (CAD), rheumatoid arthritis, schizophrenia, waist-hip ratio (WHR), body-mass index (BMI), and height, but not for Crohn disease or ulcerative colitis. Second, we use height as a model trait to thoroughly assess the possible population genetic biases in PSs and apply similar approaches to the other phenotypes. Most importantly, we detect suspiciously large accumulations of geographic differences for CAD, WHR, BMI, and height, suggesting bias arising from the population's genetic structure rather than from a direct genotype-phenotype association. This work demonstrates how sensitive the geographic patterns of current PSs are for small biases even within relatively homogeneous populations and provides simple tools to identify such biases. A thorough understanding of the effects of population genetic structure on PSs is essential for translational applications of PSs.


Assuntos
Marcadores Genéticos , Genética Populacional , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Adulto , Idoso , Artrite Reumatoide/epidemiologia , Artrite Reumatoide/genética , Índice de Massa Corporal , Colite Ulcerativa/epidemiologia , Colite Ulcerativa/genética , Doença da Artéria Coronariana/epidemiologia , Doença da Artéria Coronariana/genética , Doença de Crohn/epidemiologia , Doença de Crohn/genética , Feminino , Finlândia/epidemiologia , Estudos de Associação Genética , Geografia , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Esquizofrenia/epidemiologia , Esquizofrenia/genética , Relação Cintura-Quadril
19.
Brief Bioinform ; 21(3): 753-761, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-30863848

RESUMO

Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations-so-called ancestry-informative markers (AIMs)-instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case-control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.


Assuntos
Estudo de Associação Genômica Ampla , Seleção Genética , População Branca/genética , Europa (Continente) , Genética Populacional , Humanos
20.
Behav Brain Sci ; 46: e207, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35551690

RESUMO

The sociogenomics revolution is upon us, we are told. Whether revolutionary or not, sociogenomics is poised to flourish given the ease of incorporating polygenic scores (or PGSs) as "genetic propensities" for complex traits into social science research. Pointing to evidence of ubiquitous heritability and the accessibility of genetic data, scholars have argued that social scientists not only have an opportunity but a duty to add PGSs to social science research. Social science research that ignores genetics is, some proponents argue, at best partial and likely scientifically flawed, misleading, and wasteful. Here, I challenge arguments about the value of genetics for social science and with it the claimed necessity of incorporating PGSs into social science models as measures of genetic influences. In so doing, I discuss the impracticability of distinguishing genetic influences from environmental influences because of non-causal gene-environment correlations, especially population stratification, familial confounding, and downward causation. I explain how environmental effects masquerade as genetic influences in PGSs, which undermines their raison d'être as measures of genetic propensity, especially for complex socially contingent behaviors that are the subject of sociogenomics. Additionally, I draw attention to the partial, unknown biology, while highlighting the persistence of an implicit, unavoidable reductionist genes versus environments approach. Leaving sociopolitical and ethical concerns aside, I argue that the potential scientific rewards of adding PGSs to social science are few and greatly overstated and the scientific costs, which include obscuring structural disadvantages and cultural influences, outweigh these meager benefits for most social science applications.


Assuntos
Herança Multifatorial , Ciências Sociais , Humanos , Herança Multifatorial/genética , Biologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA