Pesquisa | Biblioteca Virtual em Saúde

1.

Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms.

Patowary, Ashok; Zhang, Pan; Jops, Connor; Vuong, Celine K; Ge, Xinzhou; Hou, Kangcheng; Kim, Minsoo; Gong, Naihua; Margolis, Michael; Vo, Daniel; Wang, Xusheng; Liu, Chunyu; Pasaniuc, Bogdan; Li, Jingyi Jessica; Gandal, Michael J; de la Torre-Ubieta, Luis.

Science ; 384(6698): eadh7688, 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38781356

RESUMO

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, the role of cell type-specific splicing and transcript-isoform diversity during human brain development has not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone and cortical plate regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 distinct isoforms, of which 72.6% were novel (not previously annotated in Gencode version 33), and uncovered a substantial contribution of transcript-isoform diversity-regulated by RNA binding proteins-in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to reprioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders.

Assuntos

Neocórtex , Isoformas de Proteínas , Análise de Célula Única , Transcriptoma , Humanos , Neocórtex/metabolismo , Neocórtex/embriologia , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Transtornos Mentais/genética , Splicing de RNA , Predisposição Genética para Doença , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Processamento Alternativo , Anotação de Sequência Molecular

2.

Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations.

Hou, Kangcheng; Gogarten, Stephanie; Kim, Joohyun; Hua, Xing; Dias, Julie-Alexia; Sun, Quan; Wang, Ying; Tan, Taotao; Atkinson, Elizabeth G; Martin, Alicia; Shortt, Jonathan; Hirbo, Jibril; Li, Yun; Pasaniuc, Bogdan; Zhang, Haoyu.

Bioinformatics ; 40(4)2024 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-38490256

RESUMO

SUMMARY: Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION: Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.

Assuntos

Software , Genótipo , Fenótipo

3.

Cell-type deconvolution of bulk-blood RNA-seq reveals biological insights into neuropsychiatric disorders.

Boltz, Toni; Schwarz, Tommer; Bot, Merel; Hou, Kangcheng; Caggiano, Christa; Lapinska, Sandra; Duan, Chenda; Boks, Marco P; Kahn, Rene S; Zaitlen, Noah; Pasaniuc, Bogdan; Ophoff, Roel.

Am J Hum Genet ; 111(2): 323-337, 2024 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-38306997

RESUMO

Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.

Assuntos

Estudo de Associação Genômica Ampla , Lítio , Humanos , Estudo de Associação Genômica Ampla/métodos , RNA-Seq , Locos de Características Quantitativas/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença

4.

Admix-kit: An Integrated Toolkit and Pipeline for Genetic Analyses of Admixed Populations.

Hou, Kangcheng; Gogarten, Stephanie; Kim, Joohyun; Hua, Xing; Dias, Julie-Alexia; Sun, Quan; Wang, Ying; Tan, Taotao; Atkinson, Elizabeth G; Martin, Alicia; Shortt, Jonathan; Hirbo, Jibril; Li, Yun; Pasaniuc, Bogdan; Zhang, Haoyu.

bioRxiv ; 2023 Oct 02.

Artigo em Inglês | MEDLINE | ID: mdl-37873338

RESUMO

Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic study of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations.

5.

Calibrated prediction intervals for polygenic scores across diverse contexts.

Hou, Kangcheng; Xu, Ziqi; Ding, Yi; Harpak, Arbel; Pasaniuc, Bogdan.

medRxiv ; 2023 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-37546999

RESUMO

Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.

6.

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.

Petter, Ella; Ding, Yi; Hou, Kangcheng; Bhattacharya, Arjun; Gusev, Alexander; Zaitlen, Noah; Pasaniuc, Bogdan.

Am J Hum Genet ; 110(8): 1319-1329, 2023 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-37490908

RESUMO

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

Assuntos

Genômica , Polimorfismo de Nucleotídeo Único , Incerteza , Genótipo , Genômica/métodos , Sequenciamento Completo do Genoma , Polimorfismo de Nucleotídeo Único/genética

7.

Cell type deconvolution of bulk blood RNA-Seq to reveal biological insights of neuropsychiatric disorders.

Boltz, Toni; Schwarz, Tommer; Bot, Merel; Hou, Kangcheng; Caggiano, Christa; Lapinska, Sandra; Duan, Chenda; Boks, Marco P; Kahn, Rene S; Zaitlen, Noah; Pasaniuc, Bogdan; Ophoff, Roel.

bioRxiv ; 2023 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-37293101

RESUMO

Genome-wide association studies (GWAS) have uncovered susceptibility loci associated with psychiatric disorders like bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome with unknown causal mechanisms of the link between genetic variation and disease risk. Expression quantitative trait loci (eQTL) analysis of bulk tissue is a common approach to decipher underlying mechanisms, though this can obscure cell-type specific signals thus masking trait-relevant mechanisms. While single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell type proportions and cell type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-Seq from 1,730 samples derived from whole blood in a cohort ascertained for individuals with BP and SCZ this study estimated cell type proportions and their relation with disease status and medication. We found between 2,875 and 4,629 eGenes for each cell type, including 1,211 eGenes that are not found using bulk expression alone. We performed a colocalization test between cell type eQTLs and various traits and identified hundreds of associations between cell type eQTLs and GWAS loci that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on cell type expression regulation and found examples of genes that are differentially regulated dependent on lithium use. Our study suggests that computational methods can be applied to large bulk RNA-Seq datasets of non-brain tissue to identify disease-relevant, cell type specific biology of psychiatric disorders and psychiatric medication.

8.

Polygenic scoring accuracy varies across the genetic ancestry continuum.

Ding, Yi; Hou, Kangcheng; Xu, Ziqi; Pimplaskar, Aditya; Petter, Ella; Boulier, Kristin; Privé, Florian; Vilhjálmsson, Bjarni J; Olde Loohuis, Loes M; Pasaniuc, Bogdan.

Nature ; 618(7966): 774-781, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37198491

RESUMO

Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.

Assuntos

Herança Multifatorial , Grupos Raciais , Humanos , Europa (Continente)/etnologia , Hispânico ou Latino/genética , Herança Multifatorial/genética , Grupos Raciais/genética , Reino Unido , População Branca/genética , População Europeia/genética , Los Angeles , Bases de Dados Genéticas

9.

Impact of cross-ancestry genetic architecture on GWASs in admixed populations.

Mester, Rachel; Hou, Kangcheng; Ding, Yi; Meeks, Gillian; Burch, Kathryn S; Bhattacharya, Arjun; Henn, Brenna M; Pasaniuc, Bogdan.

Am J Hum Genet ; 110(6): 927-939, 2023 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-37224807

RESUMO

Genome-wide association studies (GWASs) have identified thousands of variants for disease risk. These studies have predominantly been conducted in individuals of European ancestries, which raises questions about their transferability to individuals of other ancestries. Of particular interest are admixed populations, usually defined as populations with recent ancestry from two or more continental sources. Admixed genomes contain segments of distinct ancestries that vary in composition across individuals in the population, allowing for the same allele to induce risk for disease on different ancestral backgrounds. This mosaicism raises unique challenges for GWASs in admixed populations, such as the need to correctly adjust for population stratification. In this work we quantify the impact of differences in estimated allelic effect sizes for risk variants between ancestry backgrounds on association statistics. Specifically, while the possibility of estimated allelic effect-size heterogeneity by ancestry (HetLanc) can be modeled when performing a GWAS in admixed populations, the extent of HetLanc needed to overcome the penalty from an additional degree of freedom in the association statistic has not been thoroughly quantified. Using extensive simulations of admixed genotypes and phenotypes, we find that controlling for and conditioning effect sizes on local ancestry can reduce statistical power by up to 72%. This finding is especially pronounced in the presence of allele frequency differentiation. We replicate simulation results using 4,327 African-European admixed genomes from the UK Biobank for 12 traits to find that for most significant SNPs, HetLanc is not large enough for GWASs to benefit from modeling heterogeneity in this way.

Assuntos

Genética Populacional , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Frequência do Gene/genética , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único/genética

10.

Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms.

Patowary, Ashok; Zhang, Pan; Jops, Connor; Vuong, Celine K; Ge, Xinzhu; Hou, Kangcheng; Kim, Minsoo; Gong, Naihua; Margolis, Michael; Vo, Daniel; Wang, Xusheng; Liu, Chunyu; Pasaniuc, Bogdan; Li, Jingyi Jessica; Gandal, Michael J; de la Torre-Ubieta, Luis.

bioRxiv ; 2023 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-36993726

RESUMO

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders, yet the role of cell-type-specific splicing or transcript-isoform diversity during human brain development has not been systematically investigated. Here, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone (GZ) and cortical plate (CP) regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 unique isoforms, of which 72.6% are novel (unannotated in Gencode-v33), and uncovered a substantial contribution of transcript-isoform diversity, regulated by RNA binding proteins, in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to re-prioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders. One-Sentence Summary: A cell-specific atlas of gene isoform expression helps shape our understanding of brain development and disease. Structured Abstract: INTRODUCTION: The development of the human brain is regulated by precise molecular and genetic mechanisms driving spatio-temporal and cell-type-specific transcript expression programs. Alternative splicing, a major mechanism increasing transcript diversity, is highly prevalent in the human brain, influences many aspects of brain development, and has strong links to neuropsychiatric disorders. Despite this, the cell-type-specific transcript-isoform diversity of the developing human brain has not been systematically investigated.RATIONALE: Understanding splicing patterns and isoform diversity across the developing neocortex has translational relevance and can elucidate genetic risk mechanisms in neurodevelopmental disorders. However, short-read sequencing, the prevalent technology for transcriptome profiling, is not well suited to capturing alternative splicing and isoform diversity. To address this, we employed third-generation long-read sequencing, which enables capture and sequencing of complete individual RNA molecules, to deeply profile the full-length transcriptome of the germinal zone (GZ) and cortical plate (CP) regions of the developing human neocortex at tissue and single-cell resolution.RESULTS: We profiled microdissected GZ and CP regions of post-conception week (PCW) 15-17 human neocortex in bulk and at single-cell resolution across six subjects using high-fidelity long-read sequencing (PacBio IsoSeq). We identified 214,516 unique isoforms, of which 72.6% were novel (unannotated in Gencode), and >7,000 novel exons, expanding the proteome by 92,422 putative proteoforms. We uncovered thousands of isoform switches during cortical neurogenesis predicted to impact RNA regulatory domains or protein structure and implicating previously uncharacterized RNA-binding proteins in cellular identity and neuropsychiatric disease. At the single-cell level, early-stage excitatory neurons exhibited the greatest isoform diversity, and isoform-centric single-cell clustering led to the identification of previously uncharacterized cell states. We systematically assessed the contribution of transcriptomic features, and localized cell and spatio-temporal transcript expression signatures across neuropsychiatric disorders, revealing predominant enrichments in dynamic isoform expression and utilization patterns and that the number and complexity of isoforms per gene is strongly predictive of disease. Leveraging this resource, we re-prioritized thousands of rare de novo risk variants associated with autism spectrum disorders (ASD), intellectual disability (ID), and neurodevelopmental disorders (NDDs), more broadly, to potentially more severe consequences and revealed a larger proportion of cryptic splice variants with the expanded transcriptome annotation provided in this study.CONCLUSION: Our study offers a comprehensive landscape of isoform diversity in the human neocortex during development. This extensive cataloging of novel isoforms and splicing events sheds light on the underlying mechanisms of neurodevelopmental disorders and presents an opportunity to explore rare genetic variants linked to these conditions. The implications of our findings extend beyond fundamental neuroscience, as they provide crucial insights into the molecular basis of developmental brain disorders and pave the way for targeted therapeutic interventions. To facilitate exploration of this dataset we developed an online portal ( https://sciso.gandallab.org/ ).

11.

Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals.

Hou, Kangcheng; Ding, Yi; Xu, Ziqi; Wu, Yue; Bhattacharya, Arjun; Mester, Rachel; Belbin, Gillian M; Buyske, Steve; Conti, David V; Darst, Burcu F; Fornage, Myriam; Gignoux, Chris; Guo, Xiuqing; Haiman, Christopher; Kenny, Eimear E; Kim, Michelle; Kooperberg, Charles; Lange, Leslie; Manichaikul, Ani; North, Kari E; Peters, Ulrike; Rasmussen-Torvik, Laura J; Rich, Stephen S; Rotter, Jerome I; Wheeler, Heather E; Wojcik, Genevieve L; Zhou, Ying; Sankararaman, Sriram; Pasaniuc, Bogdan.

Nat Genet ; 55(4): 549-558, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36941441

RESUMO

Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.

Assuntos

Genética Populacional , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla/métodos , Grupos Raciais/genética , Negro ou Afro-Americano/genética , Polimorfismo de Nucleotídeo Único/genética

12.

Impact of cross-ancestry genetic architecture on GWAS in admixed populations.

Mester, Rachel; Hou, Kangcheng; Ding, Yi; Meeks, Gillian; Burch, Kathryn S; Bhattacharya, Arjun; Henn, Brenna M; Pasaniuc, Bogdan.

bioRxiv ; 2023 Jan 24.

Artigo em Inglês | MEDLINE | ID: mdl-36747759

RESUMO

Genome-wide association studies (GWAS) have identified thousands of variants for disease risk. These studies have predominantly been conducted in individuals of European ancestries, which raises questions about their transferability to individuals of other ancestries. Of particular interest are admixed populations, usually defined as populations with recent ancestry from two or more continental sources. Admixed genomes contain segments of distinct ancestries that vary in composition across individuals in the population, allowing for the same allele to induce risk for disease on different ancestral backgrounds. This mosaicism raises unique challenges for GWAS in admixed populations, such as the need to correctly adjust for population stratification to balance type I error with statistical power. In this work we quantify the impact of differences in estimated allelic effect sizes for risk variants between ancestry backgrounds on association statistics. Specifically, while the possibility of estimated allelic effect-size heterogeneity by ancestry (HetLanc) can be modeled when performing GWAS in admixed populations, the extent of HetLanc needed to overcome the penalty from an additional degree of freedom in the association statistic has not been thoroughly quantified. Using extensive simulations of admixed genotypes and phenotypes we find that modeling HetLanc in its absence reduces statistical power by up to 72%. This finding is especially pronounced in the presence of allele frequency differentiation. We replicate simulation results using 4,327 African-European admixed genomes from the UK Biobank for 12 traits to find that for most significant SNPs HetLanc is not large enough for GWAS to benefit from modeling heterogeneity.

13.

Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data.

Zhang, Martin Jinye; Hou, Kangcheng; Dey, Kushal K; Sakaue, Saori; Jagadeesh, Karthik A; Weinand, Kathryn; Taychameekiatchai, Aris; Rao, Poorvi; Pisco, Angela Oliveira; Zou, James; Wang, Bruce; Gandal, Michael; Raychaudhuri, Soumya; Pasaniuc, Bogdan; Price, Alkes L.

Nat Genet ; 54(10): 1572-1580, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36050550

RESUMO

Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type-disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.

Assuntos

Estudo de Associação Genômica Ampla , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Herança Multifatorial/genética , RNA-Seq , Análise de Célula Única/métodos , Triglicerídeos

14.

Powerful eQTL mapping through low-coverage RNA sequencing.

Schwarz, Tommer; Boltz, Toni; Hou, Kangcheng; Bot, Merel; Duan, Chenda; Loohuis, Loes Olde; Boks, Marco P; Kahn, René S; Ophoff, Roel A; Pasaniuc, Bogdan.

HGG Adv ; 3(3): 100103, 2022 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-35519825

RESUMO

Mapping genetic variants that regulate gene expression (eQTL mapping) in large-scale RNA sequencing (RNA-seq) studies is often employed to understand functional consequences of regulatory variants. However, the high cost of RNA-seq limits sample size, sequencing depth, and, therefore, discovery power in eQTL studies. In this work, we demonstrate that, given a fixed budget, eQTL discovery power can be increased by lowering the sequencing depth per sample and increasing the number of individuals sequenced in the assay. We perform RNA-seq of whole-blood tissue across 1,490 individuals at low coverage (5.9 million reads/sample) and show that the effective power is higher than that of an RNA-seq study of 570 individuals at moderate coverage (13.9 million reads/sample). Next, we leverage synthetic datasets derived from real RNA-seq data (50 million reads/sample) to explore the interplay of coverage and number individuals in eQTL studies, and show that a 10-fold reduction in coverage leads to only a 2.5-fold reduction in statistical power to identify eQTLs. Our work suggests that lowering coverage while increasing the number of individuals in RNA-seq is an effective approach to increase discovery power in eQTL studies.

15.

Partitioning gene-level contributions to complex-trait heritability by allele frequency identifies disease-relevant genes.

Burch, Kathryn S; Hou, Kangcheng; Ding, Yi; Wang, Yifei; Gazal, Steven; Shi, Huwenbo; Pasaniuc, Bogdan.

Am J Hum Genet ; 109(4): 692-709, 2022 04 07.

Artigo em Inglês | MEDLINE | ID: mdl-35271803

RESUMO

Recent works have shown that SNP heritability-which is dominated by low-effect common variants-may not be the most relevant quantity for localizing high-effect/critical disease genes. Here, we introduce methods to estimate the proportion of phenotypic variance explained by a given assignment of SNPs to a single gene ("gene-level heritability"). We partition gene-level heritability by minor allele frequency (MAF) to find genes whose gene-level heritability is explained exclusively by "low-frequency/rare" variants (0.5% ≤ MAF < 1%). Applying our method to â¼16K protein-coding genes and 25 quantitative traits in the UK Biobank (N = 290K "White British"), we find that, on average across traits, â¼2.5% of nonzero-heritability genes have a rare-variant component and only â¼0.8% (327 gene-trait pairs) have heritability exclusively from rare variants. Of these 327 gene-trait pairs, 114 (35%) were not detected by existing gene-level association testing methods. The additional genes we identify are significantly enriched for known disease genes, and we find several examples of genes that have been previously implicated in phenotypically related Mendelian disorders. Notably, the rare-variant component of gene-level heritability exhibits trends different from those of common-variant gene-level heritability. For example, while total gene-level heritability increases with gene length, the rare-variant component is significantly larger among shorter genes; the cumulative distributions of gene-level heritability also vary across traits and reveal differences in the relative contributions of rare/common variants to overall gene-level polygenicity. While nonzero gene-level heritability does not imply causality, if interpreted in the correct context, gene-level heritability can reveal useful insights into complex-trait genetic architecture.

Assuntos

Estudo de Associação Genômica Ampla , Herança Multifatorial , Frequência do Gene/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética

16.

A multi-dimensional integrative scoring framework for predicting functional variants in the human genome.

Li, Xihao; Yung, Godwin; Zhou, Hufeng; Sun, Ryan; Li, Zilin; Hou, Kangcheng; Zhang, Martin Jinye; Liu, Yaowu; Arapoglou, Theodore; Wang, Chen; Ionita-Laza, Iuliana; Lin, Xihong.

Am J Hum Genet ; 109(3): 446-456, 2022 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-35216679

RESUMO

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.

Assuntos

Genoma Humano , Estudo de Associação Genômica Ampla , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Probabilidade

17.

Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification.

Ding, Yi; Hou, Kangcheng; Burch, Kathryn S; Lapinska, Sandra; Privé, Florian; Vilhjálmsson, Bjarni; Sankararaman, Sriram; Pasaniuc, Bogdan.

Nat Genet ; 54(1): 30-39, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34931067

RESUMO

Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.

Assuntos

Predisposição Genética para Doença , Herança Multifatorial , Medição de Risco , Incerteza , Estudos de Associação Genética , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos , Modelos Estatísticos

18.

On powerful GWAS in admixed populations.

Hou, Kangcheng; Bhattacharya, Arjun; Mester, Rachel; Burch, Kathryn S; Pasaniuc, Bogdan.

Nat Genet ; 53(12): 1631-1633, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34824480

Assuntos

Estudo de Associação Genômica Ampla , Modelos Genéticos , Desequilíbrio de Ligação

19.

Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits.

Johnson, Ruth; Burch, Kathryn S; Hou, Kangcheng; Paciuc, Mario; Pasaniuc, Bogdan; Sankararaman, Sriram.

PLoS Comput Biol ; 17(10): e1009483, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34673766

RESUMO

The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.

Assuntos

Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Herança Multifatorial/genética , Algoritmos , Pressão Sanguínea/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética

20.

Integrative genomic analyses identify susceptibility genes underlying COVID-19 hospitalization.

Pathak, Gita A; Singh, Kritika; Miller-Fleming, Tyne W; Wendt, Frank R; Ehsan, Nava; Hou, Kangcheng; Johnson, Ruth; Lu, Zeyun; Gopalan, Shyamalika; Yengo, Loic; Mohammadi, Pejman; Pasaniuc, Bogdan; Polimanti, Renato; Davis, Lea K; Mancuso, Nicholas.

Nat Commun ; 12(1): 4569, 2021 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-34315903

RESUMO

Despite rapid progress in characterizing the role of host genetics in SARS-Cov-2 infection, there is limited understanding of genes and pathways that contribute to COVID-19. Here, we integrate a genome-wide association study of COVID-19 hospitalization (7,885 cases and 961,804 controls from COVID-19 Host Genetics Initiative) with mRNA expression, splicing, and protein levels (n = 18,502). We identify 27 genes related to inflammation and coagulation pathways whose genetically predicted expression was associated with COVID-19 hospitalization. We functionally characterize the 27 genes using phenome- and laboratory-wide association scans in Vanderbilt Biobank (n = 85,460) and identified coagulation-related clinical symptoms, immunologic, and blood-cell-related biomarkers. We replicate these findings across trans-ethnic studies and observed consistent effects in individuals of diverse ancestral backgrounds in Vanderbilt Biobank, pan-UK Biobank, and Biobank Japan. Our study highlights and reconfirms putative causal genes impacting COVID-19 severity and symptomology through the host inflammatory response.

Assuntos

COVID-19/metabolismo , COVID-19/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Hospitalização , Humanos , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA