Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nat Commun ; 15(1): 5229, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38898015

RESUMO

Single-cell RNA sequencing (scRNA-seq) has been widely used to characterize cell types based on their average gene expression profiles. However, most studies do not consider cell type-specific variation across donors. Modelling this cell type-specific inter-individual variation could help elucidate cell type-specific biology and inform genes and cell types underlying complex traits. We therefore develop a new model to detect and quantify cell type-specific variation across individuals called CTMM (Cell Type-specific linear Mixed Model). We use extensive simulations to show that CTMM is powerful and unbiased in realistic settings. We also derive calibrated tests for cell type-specific interindividual variation, which is challenging given the modest sample sizes in scRNA-seq. We apply CTMM to scRNA-seq data from human induced pluripotent stem cells to characterize the transcriptomic variation across donors as cells differentiate into endoderm. We find that almost 100% of transcriptome-wide variability between donors is differentiation stage-specific. CTMM also identifies individual genes with statistically significant stage-specific variability across samples, including 85 genes that do not have significant stage-specific mean expression. Finally, we extend CTMM to partition interindividual covariance between stages, which recapitulates the overall differentiation trajectory. Overall, CTMM is a powerful tool to illuminate cell type-specific biology in scRNA-seq.


Assuntos
Diferenciação Celular , Células-Tronco Pluripotentes Induzidas , Análise de Sequência de RNA , Análise de Célula Única , Transcriptoma , Humanos , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Células-Tronco Pluripotentes Induzidas/metabolismo , Células-Tronco Pluripotentes Induzidas/citologia , Diferenciação Celular/genética , Perfilação da Expressão Gênica/métodos , RNA-Seq/métodos , Endoderma/citologia , Endoderma/metabolismo
2.
Nat Genet ; 55(12): 2269-2276, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37985819

RESUMO

Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or 'fill-in' missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.


Assuntos
Aprendizado Profundo , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Bancos de Espécimes Biológicos , Polimorfismo de Nucleotídeo Único , Fenótipo
3.
Am J Hum Genet ; 110(11): 1875-1887, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37922884

RESUMO

Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.


Assuntos
Epistasia Genética , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Fenótipo , Modelos Genéticos
4.
bioRxiv ; 2023 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-36909553

RESUMO

The development of single-cell RNA sequencing (scRNA-seq) offers opportunities to characterize cellular heterogeneity at unprecedented resolution. Although scRNA-seq has been widely used to identify and characterize gene expression variation across cell types and cell states based on their average gene expression profiles, most studies ignore variation across individual donors. Modelling this inter-individual variation could improve statistical power to detect cell type-specific biology and inform the genes and cell types that underlying complex traits. We therefore develop a new model to detect and quantify cell type-specific variation across individuals called CTMM (Cell Type-specific linear Mixed Model). CTMM operates on cell type-specific pseudobulk expression and is fit with efficient methods that scale to hundreds of samples. We use extensive simulations to show that CTMM is powerful and unbiased in realistic settings. We also derive calibrated tests for cell type-specific interindividual variation, which is challenging given the modest sample sizes in scRNA-seq data. We apply CTMM to scRNA-seq data from human induced pluripotent stem cells to characterize the transcriptomic variation across donors as cells differentiate into endoderm. We find that almost 100% of transcriptome-wide variability between donors is differentiation stage-specific. CTMM also identifies individual genes with statistically significant stage-specific variability across samples, including 61 genes that do not have significant stage-specific mean expression. Finally, we extend CTMM to partition interindividual covariance between stages, which recapitulates the overall differentiation trajectory. Overall, CTMM is a powerful tool to characterize a novel dimension of cell type-specific biology in scRNA-seq.

5.
Science ; 378(6621): 754-761, 2022 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-36395242

RESUMO

The observation of genetic correlations between disparate human traits has been interpreted as evidence of widespread pleiotropy. Here, we introduce cross-trait assortative mating (xAM) as an alternative explanation. We observe that xAM affects many phenotypes and that phenotypic cross-mate correlation estimates are strongly associated with genetic correlation estimates (R2=74%). We demonstrate that existing xAM plausibly accounts for substantial fractions of genetic correlation estimates and that previously reported genetic correlation estimates between some pairs of psychiatric disorders are congruent with xAM alone. Finally, we provide evidence for a history of xAM at the genetic level using cross-trait even/odd chromosome polygenic score correlations. Together, our results demonstrate that previous reports have likely overestimated the true genetic similarity between many phenotypes.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Comunicação Celular , Fenótipo
6.
Nat Commun ; 12(1): 3505, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-34108472

RESUMO

Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier develops the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we apply clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias display effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers average 60% or lower for most conditions. We assess epidemiologic and genetic factors contributing to risk prediction in monogenic variant carriers, demonstrating that inclusion of polygenic variation significantly improves biomarker estimation for two monogenic dyslipidemias.


Assuntos
Diabetes Mellitus Tipo 2/genética , Dislipidemias/genética , Predisposição Genética para Doença/genética , Adulto , Variação Biológica da População , Biomarcadores/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Dislipidemias/metabolismo , Exoma/genética , Genótipo , Humanos , Herança Multifatorial , Penetrância , Medição de Risco
7.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33833052

RESUMO

Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.


Assuntos
Epistasia Genética , Modelos Genéticos , Herança Multifatorial/genética , Evolução Molecular , Predisposição Genética para Doença , Humanos , Característica Quantitativa Herdável
8.
Annu Rev Genomics Hum Genet ; 21: 413-435, 2020 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-32873077

RESUMO

Disease classification, or nosology, was historically driven by careful examination of clinical features of patients. As technologies to measure and understand human phenotypes advanced, so too did classifications of disease, and the advent of genetic data has led to a surge in genetic subtyping in the past decades. Although the fundamental process of refining disease definitions and subtypes is shared across diverse fields, each field is driven by its own goals and technological expertise, leading to inconsistent and conflicting definitions of disease subtypes. Here, we review several classical and recent subtypes and subtyping approaches and provide concrete definitions to delineate subtypes. In particular, we focus on subtypes with distinct causal disease biology, which are of primary interest to scientists, and subtypes with pragmatic medical benefits, which are of primary interest to physicians. We propose genetic heterogeneity as a gold standard for establishing biologically distinct subtypes of complex polygenic disease. We focus especially on methods to find and validate genetic subtypes, emphasizing common pitfalls and how to avoid them.


Assuntos
Biomarcadores/análise , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Herança Multifatorial , Mutação , Neoplasias/genética , Regulação Neoplásica da Expressão Gênica , Estudos de Associação Genética , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/patologia , Humanos , Neoplasias/classificação , Neoplasias/patologia
9.
Genetics ; 215(2): 343-357, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32291292

RESUMO

We consider the problem of interpreting negative maximum likelihood estimates of heritability that sometimes arise from popular statistical models of additive genetic variation. These may result from random noise acting on estimates of genuinely positive heritability, but we argue that they may also arise from misspecification of the standard additive mechanism that is supposed to justify the statistical procedure. Researchers should be open to the possibility that negative heritability estimates could reflect a real physical feature of the biological process from which the data were sampled.


Assuntos
Variação Genética , Modelos Genéticos , Modelos Estatísticos , Herança Multifatorial , Fenótipo , Característica Quantitativa Herdável , Humanos
10.
Am J Hum Genet ; 106(1): 71-91, 2020 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-31901249

RESUMO

Gene-environment interactions (GxE) can be fundamental in applications ranging from functional genomics to precision medicine and is a conjectured source of substantial heritability. However, unbiased methods to profile GxE genome-wide are nascent and, as we show, cannot accommodate general environment variables, modest sample sizes, heterogeneous noise, and binary traits. To address this gap, we propose a simple, unifying mixed model for gene-environment interaction (GxEMM). In simulations and theory, we show that GxEMM can dramatically improve estimates and eliminate false positives when the assumptions of existing methods fail. We apply GxEMM to a range of human and model organism datasets and find broad evidence of context-specific genetic effects, including GxSex, GxAdversity, and GxDisease interactions across thousands of clinical and molecular phenotypes. Overall, GxEMM is broadly applicable for testing and quantifying polygenic interactions, which can be useful for explaining heritability and invaluable for determining biologically relevant environments.


Assuntos
Interação Gene-Ambiente , Marcadores Genéticos , Transtornos Mentais/genética , Transtornos Mentais/patologia , Modelos Genéticos , Herança Multifatorial/genética , Adulto , Animais , Simulação por Computador , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Fenômica , Fenótipo , Ratos
11.
PLoS Genet ; 15(4): e1008009, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30951530

RESUMO

Recent and classical work has revealed biologically and medically significant subtypes in complex diseases and traits. However, relevant subtypes are often unknown, unmeasured, or actively debated, making automated statistical approaches to subtype definition valuable. We propose reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. Unlike existing approaches relying on off-the-shelf clustering methods, RGWAS uses a novel decomposition, MFMR, to model covariates, binary traits, and population structure. We use extensive simulations to show that modelling these features can be crucial for power and calibration. We validate RGWAS in practice by recovering a recently discovered stress subtype in major depression. We then show the utility of RGWAS by identifying three novel subtypes of metabolic traits. We biologically validate these metabolic subtypes with SNP-level tests and a novel polygenic test: the former recover known metabolic GxE SNPs; the latter suggests subtypes may explain substantial missing heritability. Crucially, statins, which are widely prescribed and theorized to increase diabetes risk, have opposing effects on blood glucose across metabolic subtypes, suggesting the subtypes have potential translational value.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Herança Multifatorial , Fenótipo , Algoritmos , Glicemia/efeitos dos fármacos , Glicemia/genética , Análise por Conglomerados , Simulação por Computador , Doença das Coronárias/sangue , Doença das Coronárias/tratamento farmacológico , Doença das Coronárias/genética , Transtorno Depressivo Maior/classificação , Transtorno Depressivo Maior/genética , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/farmacologia , Lipídeos/sangue , Polimorfismo de Nucleotídeo Único , Estado Pré-Diabético/genética , Locos de Características Quantitativas
12.
Genetics ; 211(4): 1179-1189, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30692194

RESUMO

High-throughput measurements of molecular phenotypes provide an unprecedented opportunity to model cellular processes and their impact on disease. These highly structured datasets are usually strongly confounded, creating false positives and reducing power. This has motivated many approaches based on principal components analysis (PCA) to estimate and correct for confounders, which have become indispensable elements of association tests between molecular phenotypes and both genetic and nongenetic factors. Here, we show that these correction approaches induce a bias, and that it persists for large sample sizes and replicates out-of-sample. We prove this theoretically for PCA by deriving an analytic, deterministic, and intuitive bias approximation. We assess other methods with realistic simulations, which show that perturbing any of several basic parameters can cause false positive rate (FPR) inflation. Our experiments show the bias depends on covariate and confounder sparsity, effect sizes, and their correlation. Surprisingly, when the covariate and confounder have [Formula: see text], standard two-step methods all have [Formula: see text]-fold FPR inflation. Our analysis informs best practices for confounder correction in genomic studies, and suggests many false discoveries have been made and replicated in some differential expression analyses.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Análise de Componente Principal/métodos , Animais , Estudo de Associação Genômica Ampla/normas , Humanos , Modelos Genéticos , Análise de Componente Principal/normas , Locos de Características Quantitativas , Reprodutibilidade dos Testes
13.
Am J Psychiatry ; 175(6): 545-554, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29495898

RESUMO

OBJECTIVE: The extent to which major depression is the outcome of a single biological mechanism or represents a final common pathway of multiple disease processes remains uncertain. Genetic approaches can potentially identify etiologic heterogeneity in major depression by classifying patients on the basis of their experience of major adverse events. METHOD: Data are from the China, Oxford, and VCU Experimental Research on Genetic Epidemiology (CONVERGE) project, a study of Han Chinese women with recurrent major depression aimed at identifying genetic risk factors for major depression in a rigorously ascertained cohort carefully assessed for key environmental risk factors (N=9,599). To detect etiologic heterogeneity, genome-wide association studies, heritability analyses, and gene-by-environment interaction analyses were performed. RESULTS: Genome-wide association studies stratified by exposure to adversity revealed three novel loci associated with major depression only in study participants with no history of adversity. Significant gene-by-environment interactions were seen between adversity and genotype at all three loci, and 13.2% of major depression liability can be attributed to genome-wide interaction with adversity exposure. The genetic risk in major depression for participants who reported major adverse life events (27%) was partially shared with that in participants who did not (73%; genetic correlation=+0.64). Together with results from simulation studies, these findings suggest etiologic heterogeneity within major depression as a function of environmental exposures. CONCLUSIONS: The genetic contributions to major depression may differ between women with and those without major adverse life events. These results have implications for the molecular dissection of major depression and other complex psychiatric and biomedical diseases.


Assuntos
Transtorno Depressivo Maior/genética , Adulto , Sobreviventes Adultos de Maus-Tratos Infantis , Adultos Sobreviventes de Eventos Adversos na Infância , China/epidemiologia , Transtorno Depressivo Maior/etiologia , Feminino , Estudos de Associação Genética , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação/genética , Modelos Logísticos , Pessoa de Meia-Idade , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA