Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Am J Hum Genet ; 109(5): 812-824, 2022 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-35417677

RESUMO

The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM treats linked markers as independent and does not explicitly model the underlying genealogical history of the study sample. Here, we propose a coalescent-informed framework, namely the expected GRM (eGRM), to infer the expected relatedness between pairs of individuals given an ancestral recombination graph (ARG) of the sample. Through extensive simulations, we show that the eGRM is an unbiased estimate of latent pairwise genome-wide relatedness and is robust when computed with ARG inferred from incomplete genetic data. As a result, the eGRM better captures the structure of a population than the canonical GRM, even when using the same genetic information. More importantly, our framework allows a principled approach to estimate the eGRM at different time depths of the ARG, thereby revealing the time-varying nature of population structure in a sample. When applied to SNP array genotypes from a population sample from Northern and Eastern Finland, we find that clustering analysis with the eGRM reveals population structure driven by subpopulations that would not be apparent via the canonical GRM and that temporally the population model is consistent with recent divergence and expansion. Taken together, our proposed eGRM provides a robust tree-centric estimate of relatedness with wide application to genetic studies.


Assuntos
Genoma , Modelos Genéticos , Finlândia , Genética Populacional , Genótipo , Humanos
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36585781

RESUMO

Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.


Assuntos
Genoma , Genômica , Análise de Componente Principal , Frequência do Gene , Simulação por Computador , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único
3.
Am J Hum Genet ; 108(5): 825-839, 2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-33836139

RESUMO

In genome-wide association studies, ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, because of the lack of analysis tools, methods designed for binary or quantitative traits are commonly used inappropriately to analyze categorical phenotypes. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, proportional odds logistic mixed model (POLMM). POLMM is computationally efficient to analyze large datasets with hundreds of thousands of samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than alternative methods. In contrast, the standard linear mixed model approaches cannot control type I error rates for rare variants when the phenotypic distribution is unbalanced, although they performed well when testing common variants. We applied POLMM to 258 ordinal categorical phenotypes on array genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which, 424 variants (7.2%) are rare variants with MAF < 0.01.


Assuntos
Simulação por Computador , Estudo de Associação Genômica Ampla , Modelos Genéticos , Fenótipo , Bancos de Espécimes Biológicos , Criança , Feminino , Humanos , Masculino , Projetos de Pesquisa , Reino Unido
4.
Curr Genomics ; 17(5): 439-443, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28217000

RESUMO

Analytical models usually assume an additive sex effect by treating it as a covariate to identify genetic associations with sex-influenced traits. Their underlying assumptions are violated by ignoring interactions of sex with genetic factors and heterogeneous genetic effects by sex. Methods to deal with the problems are compared and discussed in this article. Especially, heterogeneity of genetic variance by sex can be assessed employing a mixed model with genetic relationship matrix constructed from genome-wide nucleotide variant information. Estimating genetic architecture of each sex would help understand different prevalence, course, and severity of complex diseases between women and men in the era of personalized medicine.

5.
Adv Genet (Hoboken) ; 3(3): 2100066, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36620199

RESUMO

Recent advances in sequencing technologies enable genome-wide analyses for thousands of individuals. The sequential kernel association test (SKAT) is a widely used method to test for associations between a phenotype and a set of rare variants. As the sample size of human genetics studies increases, the computational time required to calculate a kernel is becoming more and more problematic. In this study, a new method to obtain kernel statistics without calculating a kernel matrix is proposed. A simple method for the computation of two kernel statistics, namely, a kernel statistic based on a genetic relationship matrix (GRM) and one based on an identity by state (IBS) matrix, are proposed. By using this method, calculation of the kernel statistics can be conducted using vector calculation without matrix calculation. The proposed method enables one to conduct SKAT for large samples of human genetics.

6.
Biol Psychiatry ; 83(7): 598-606, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29100628

RESUMO

BACKGROUND: Recent analyses of trait-disorder overlap suggest that psychiatric dimensions may relate to distinct sets of genes that exert maximum influence during different periods of development. This includes analyses of social communication difficulties that share, depending on their developmental stage, stronger genetic links with either autism spectrum disorder or schizophrenia. We developed a multivariate analysis framework in unrelated individuals to model directly the developmental profile of genetic influences contributing to complex traits, such as social communication difficulties, during an approximately 10-year period spanning childhood and adolescence. METHODS: Longitudinally assessed quantitative social communication problems (N ≤ 5551) were studied in participants from a United Kingdom birth cohort (Avon Longitudinal Study of Parents and Children; age range, 8-17 years). Using standardized measures, genetic architectures were investigated with novel multivariate genetic-relationship-matrix structural equation models incorporating whole-genome genotyping information. Analogous to twin research, genetic-relationship-matrix structural equation models included Cholesky decomposition, common pathway, and independent pathway models. RESULTS: A two-factor Cholesky decomposition model described the data best. One genetic factor was common to Social Communication Disorder Checklist measures across development; the other accounted for independent variation at 11 years and later, consistent with distinct developmental profiles in trait-disorder overlap. Importantly, genetic factors operating at 8 years explained only approximately 50% of genetic variation at 17 years. CONCLUSIONS: Using latent factor models, we identified developmental changes in the genetic architecture of social communication difficulties that enhance the understanding of autism spectrum disorder- and schizophrenia-related dimensions. More generally, genetic-relationship-matrix structural equation models present a framework for modeling shared genetic etiologies between phenotypes and can provide prior information with respect to patterns and continuity of trait-disorder overlap.


Assuntos
Desenvolvimento do Adolescente/fisiologia , Desenvolvimento Infantil/fisiologia , Variação Genética , Estudo de Associação Genômica Ampla , Modelos Estatísticos , Transtorno de Comunicação Social/genética , Transtorno de Comunicação Social/fisiopatologia , Adolescente , Criança , Feminino , Humanos , Estudos Longitudinais , Masculino , Análise Multivariada , Reino Unido
7.
BioData Min ; 11: 23, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30410580

RESUMO

BACKGROUND: ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding. METHODS: We introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity. RESULTS: We apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias. CONCLUSIONS: Our results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment.

8.
Evol Bioinform Online ; 13: 1176934316688663, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28469375

RESUMO

We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa