Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Nature ; 618(7966): 774-781, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37198491

RESUMEN

Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.


Asunto(s)
Herencia Multifactorial , Grupos Raciales , Humanos , Europa (Continente)/etnología , Hispánicos o Latinos/genética , Herencia Multifactorial/genética , Grupos Raciales/genética , Reino Unido , Población Blanca/genética , Pueblo Europeo/genética , Los Angeles , Bases de Datos Genéticas
2.
Am J Hum Genet ; 111(2): 323-337, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38306997

RESUMEN

Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.


Asunto(s)
Estudio de Asociación del Genoma Completo , Litio , Humanos , Estudio de Asociación del Genoma Completo/métodos , RNA-Seq , Sitios de Carácter Cuantitativo/genética , Fenotipo , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
3.
Am J Hum Genet ; 110(8): 1319-1329, 2023 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-37490908

RESUMEN

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.


Asunto(s)
Genómica , Polimorfismo de Nucleótido Simple , Incertidumbre , Genotipo , Genómica/métodos , Secuenciación Completa del Genoma , Polimorfismo de Nucleótido Simple/genética
4.
Am J Hum Genet ; 110(6): 927-939, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37224807

RESUMEN

Genome-wide association studies (GWASs) have identified thousands of variants for disease risk. These studies have predominantly been conducted in individuals of European ancestries, which raises questions about their transferability to individuals of other ancestries. Of particular interest are admixed populations, usually defined as populations with recent ancestry from two or more continental sources. Admixed genomes contain segments of distinct ancestries that vary in composition across individuals in the population, allowing for the same allele to induce risk for disease on different ancestral backgrounds. This mosaicism raises unique challenges for GWASs in admixed populations, such as the need to correctly adjust for population stratification. In this work we quantify the impact of differences in estimated allelic effect sizes for risk variants between ancestry backgrounds on association statistics. Specifically, while the possibility of estimated allelic effect-size heterogeneity by ancestry (HetLanc) can be modeled when performing a GWAS in admixed populations, the extent of HetLanc needed to overcome the penalty from an additional degree of freedom in the association statistic has not been thoroughly quantified. Using extensive simulations of admixed genotypes and phenotypes, we find that controlling for and conditioning effect sizes on local ancestry can reduce statistical power by up to 72%. This finding is especially pronounced in the presence of allele frequency differentiation. We replicate simulation results using 4,327 African-European admixed genomes from the UK Biobank for 12 traits to find that for most significant SNPs, HetLanc is not large enough for GWASs to benefit from modeling heterogeneity in this way.


Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Frecuencia de los Genes/genética , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple/genética
5.
Am J Hum Genet ; 109(4): 692-709, 2022 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-35271803

RESUMEN

Recent works have shown that SNP heritability-which is dominated by low-effect common variants-may not be the most relevant quantity for localizing high-effect/critical disease genes. Here, we introduce methods to estimate the proportion of phenotypic variance explained by a given assignment of SNPs to a single gene ("gene-level heritability"). We partition gene-level heritability by minor allele frequency (MAF) to find genes whose gene-level heritability is explained exclusively by "low-frequency/rare" variants (0.5% ≤ MAF < 1%). Applying our method to ∼16K protein-coding genes and 25 quantitative traits in the UK Biobank (N = 290K "White British"), we find that, on average across traits, ∼2.5% of nonzero-heritability genes have a rare-variant component and only ∼0.8% (327 gene-trait pairs) have heritability exclusively from rare variants. Of these 327 gene-trait pairs, 114 (35%) were not detected by existing gene-level association testing methods. The additional genes we identify are significantly enriched for known disease genes, and we find several examples of genes that have been previously implicated in phenotypically related Mendelian disorders. Notably, the rare-variant component of gene-level heritability exhibits trends different from those of common-variant gene-level heritability. For example, while total gene-level heritability increases with gene length, the rare-variant component is significantly larger among shorter genes; the cumulative distributions of gene-level heritability also vary across traits and reveal differences in the relative contributions of rare/common variants to overall gene-level polygenicity. While nonzero gene-level heritability does not imply causality, if interpreted in the correct context, gene-level heritability can reveal useful insights into complex-trait genetic architecture.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Frecuencia de los Genes/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética
6.
Am J Hum Genet ; 109(3): 446-456, 2022 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-35216679

RESUMEN

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.


Asunto(s)
Genoma Humano , Estudio de Asociación del Genoma Completo , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genómica , Humanos , Anotación de Secuencia Molecular , Polimorfismo de Nucleótido Simple/genética , Probabilidad
7.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38490256

RESUMEN

SUMMARY: Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION: Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.


Asunto(s)
Programas Informáticos , Genotipo , Fenotipo
8.
PLoS Comput Biol ; 17(10): e1009483, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34673766

RESUMEN

The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.


Asunto(s)
Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Herencia Multifactorial/genética , Algoritmos , Presión Sanguínea/genética , Humanos , Polimorfismo de Nucleótido Simple/genética
9.
Nat Genet ; 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38886587

RESUMEN

Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations.

10.
HGG Adv ; : 100320, 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38902927

RESUMEN

KRAS mutation is the most common oncogenic driver in patients with non-small cell lung cancer (NSCLC). However, the detailed understanding of how self-reported race and/or ethnicity (SIRE), genetically inferred ancestry (GIA), and their interaction affect KRAS mutation is largely unknown. Here, we investigated the associations between SIRE, quantitative GIA, and KRAS mutation and its allele-specific subtypes in a multi-ethnic cohort of 3918 patients from the Boston Lung Cancer Survival cohort and the Chinese OrigiMed cohort with an independent validation cohort of 1450 patients with NSCLC. This comprehensive analysis included detailed covariates including age at diagnosis, sex, clinical stage, cancer histology, and smoking status. We report that SIRE is significantly associated with KRAS mutations, modified by sex, with SIRE-Asian patients showing lower rates of KRAS mutation, transversion substitution, and the allele-specific subtype KRASG12C compared to SIRE-White patients, after adjusting for potential confounders. Moreover, GIA was found to correlate with KRAS mutations, where patients with a higher proportion of European ancestry had an increased risk of KRAS mutations, especially more transition substitutions and KRASG12D. Notably, among SIRE-White patients, an increase in European ancestry was linked to a higher likelihood of KRAS mutations, whereas an increase in Admixed American ancestry was associated with a reduced likelihood, suggesting that quantitative GIA offers additional information beyond SIRE. The association of SIRE, GIA, and their interplay with KRAS driver mutations in NSCLC highlights the importance of incorporating both into population-based cancer research, aiming to refine clinical decision-making processes and mitigate health disparities.

11.
Science ; 384(6698): eadh7688, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38781356

RESUMEN

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, the role of cell type-specific splicing and transcript-isoform diversity during human brain development has not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone and cortical plate regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 distinct isoforms, of which 72.6% were novel (not previously annotated in Gencode version 33), and uncovered a substantial contribution of transcript-isoform diversity-regulated by RNA binding proteins-in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to reprioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders.


Asunto(s)
Trastornos Mentales , Neocórtex , Neurogénesis , Isoformas de Proteínas , Empalme del ARN , Análisis de la Célula Individual , Transcriptoma , Humanos , Empalme Alternativo , Predisposición Genética a la Enfermedad , Trastornos Mentales/genética , Anotación de Secuencia Molecular , Neocórtex/metabolismo , Neocórtex/embriología , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Neurogénesis/genética
12.
medRxiv ; 2023 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-37546999

RESUMEN

Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.

13.
bioRxiv ; 2023 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-36747759

RESUMEN

Genome-wide association studies (GWAS) have identified thousands of variants for disease risk. These studies have predominantly been conducted in individuals of European ancestries, which raises questions about their transferability to individuals of other ancestries. Of particular interest are admixed populations, usually defined as populations with recent ancestry from two or more continental sources. Admixed genomes contain segments of distinct ancestries that vary in composition across individuals in the population, allowing for the same allele to induce risk for disease on different ancestral backgrounds. This mosaicism raises unique challenges for GWAS in admixed populations, such as the need to correctly adjust for population stratification to balance type I error with statistical power. In this work we quantify the impact of differences in estimated allelic effect sizes for risk variants between ancestry backgrounds on association statistics. Specifically, while the possibility of estimated allelic effect-size heterogeneity by ancestry (HetLanc) can be modeled when performing GWAS in admixed populations, the extent of HetLanc needed to overcome the penalty from an additional degree of freedom in the association statistic has not been thoroughly quantified. Using extensive simulations of admixed genotypes and phenotypes we find that modeling HetLanc in its absence reduces statistical power by up to 72%. This finding is especially pronounced in the presence of allele frequency differentiation. We replicate simulation results using 4,327 African-European admixed genomes from the UK Biobank for 12 traits to find that for most significant SNPs HetLanc is not large enough for GWAS to benefit from modeling heterogeneity.

14.
bioRxiv ; 2023 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-37873338

RESUMEN

Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic study of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations.

15.
bioRxiv ; 2023 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-37293101

RESUMEN

Genome-wide association studies (GWAS) have uncovered susceptibility loci associated with psychiatric disorders like bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome with unknown causal mechanisms of the link between genetic variation and disease risk. Expression quantitative trait loci (eQTL) analysis of bulk tissue is a common approach to decipher underlying mechanisms, though this can obscure cell-type specific signals thus masking trait-relevant mechanisms. While single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell type proportions and cell type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-Seq from 1,730 samples derived from whole blood in a cohort ascertained for individuals with BP and SCZ this study estimated cell type proportions and their relation with disease status and medication. We found between 2,875 and 4,629 eGenes for each cell type, including 1,211 eGenes that are not found using bulk expression alone. We performed a colocalization test between cell type eQTLs and various traits and identified hundreds of associations between cell type eQTLs and GWAS loci that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on cell type expression regulation and found examples of genes that are differentially regulated dependent on lithium use. Our study suggests that computational methods can be applied to large bulk RNA-Seq datasets of non-brain tissue to identify disease-relevant, cell type specific biology of psychiatric disorders and psychiatric medication.

16.
bioRxiv ; 2023 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-36993726

RESUMEN

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders, yet the role of cell-type-specific splicing or transcript-isoform diversity during human brain development has not been systematically investigated. Here, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone (GZ) and cortical plate (CP) regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 unique isoforms, of which 72.6% are novel (unannotated in Gencode-v33), and uncovered a substantial contribution of transcript-isoform diversity, regulated by RNA binding proteins, in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to re-prioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders. One-Sentence Summary: A cell-specific atlas of gene isoform expression helps shape our understanding of brain development and disease. Structured Abstract: INTRODUCTION: The development of the human brain is regulated by precise molecular and genetic mechanisms driving spatio-temporal and cell-type-specific transcript expression programs. Alternative splicing, a major mechanism increasing transcript diversity, is highly prevalent in the human brain, influences many aspects of brain development, and has strong links to neuropsychiatric disorders. Despite this, the cell-type-specific transcript-isoform diversity of the developing human brain has not been systematically investigated.RATIONALE: Understanding splicing patterns and isoform diversity across the developing neocortex has translational relevance and can elucidate genetic risk mechanisms in neurodevelopmental disorders. However, short-read sequencing, the prevalent technology for transcriptome profiling, is not well suited to capturing alternative splicing and isoform diversity. To address this, we employed third-generation long-read sequencing, which enables capture and sequencing of complete individual RNA molecules, to deeply profile the full-length transcriptome of the germinal zone (GZ) and cortical plate (CP) regions of the developing human neocortex at tissue and single-cell resolution.RESULTS: We profiled microdissected GZ and CP regions of post-conception week (PCW) 15-17 human neocortex in bulk and at single-cell resolution across six subjects using high-fidelity long-read sequencing (PacBio IsoSeq). We identified 214,516 unique isoforms, of which 72.6% were novel (unannotated in Gencode), and >7,000 novel exons, expanding the proteome by 92,422 putative proteoforms. We uncovered thousands of isoform switches during cortical neurogenesis predicted to impact RNA regulatory domains or protein structure and implicating previously uncharacterized RNA-binding proteins in cellular identity and neuropsychiatric disease. At the single-cell level, early-stage excitatory neurons exhibited the greatest isoform diversity, and isoform-centric single-cell clustering led to the identification of previously uncharacterized cell states. We systematically assessed the contribution of transcriptomic features, and localized cell and spatio-temporal transcript expression signatures across neuropsychiatric disorders, revealing predominant enrichments in dynamic isoform expression and utilization patterns and that the number and complexity of isoforms per gene is strongly predictive of disease. Leveraging this resource, we re-prioritized thousands of rare de novo risk variants associated with autism spectrum disorders (ASD), intellectual disability (ID), and neurodevelopmental disorders (NDDs), more broadly, to potentially more severe consequences and revealed a larger proportion of cryptic splice variants with the expanded transcriptome annotation provided in this study.CONCLUSION: Our study offers a comprehensive landscape of isoform diversity in the human neocortex during development. This extensive cataloging of novel isoforms and splicing events sheds light on the underlying mechanisms of neurodevelopmental disorders and presents an opportunity to explore rare genetic variants linked to these conditions. The implications of our findings extend beyond fundamental neuroscience, as they provide crucial insights into the molecular basis of developmental brain disorders and pave the way for targeted therapeutic interventions. To facilitate exploration of this dataset we developed an online portal ( https://sciso.gandallab.org/ ).

17.
Nat Genet ; 55(4): 549-558, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36941441

RESUMEN

Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.


Asunto(s)
Genética de Población , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo/métodos , Grupos Raciales/genética , Negro o Afroamericano/genética , Polimorfismo de Nucleótido Simple/genética
18.
Nat Genet ; 54(1): 30-39, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34931067

RESUMEN

Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.


Asunto(s)
Predisposición Genética a la Enfermedad , Herencia Multifactorial , Medición de Riesgo , Incertidumbre , Estudios de Asociación Genética , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Modelos Estadísticos
19.
HGG Adv ; 3(3): 100103, 2022 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-35519825

RESUMEN

Mapping genetic variants that regulate gene expression (eQTL mapping) in large-scale RNA sequencing (RNA-seq) studies is often employed to understand functional consequences of regulatory variants. However, the high cost of RNA-seq limits sample size, sequencing depth, and, therefore, discovery power in eQTL studies. In this work, we demonstrate that, given a fixed budget, eQTL discovery power can be increased by lowering the sequencing depth per sample and increasing the number of individuals sequenced in the assay. We perform RNA-seq of whole-blood tissue across 1,490 individuals at low coverage (5.9 million reads/sample) and show that the effective power is higher than that of an RNA-seq study of 570 individuals at moderate coverage (13.9 million reads/sample). Next, we leverage synthetic datasets derived from real RNA-seq data (50 million reads/sample) to explore the interplay of coverage and number individuals in eQTL studies, and show that a 10-fold reduction in coverage leads to only a 2.5-fold reduction in statistical power to identify eQTLs. Our work suggests that lowering coverage while increasing the number of individuals in RNA-seq is an effective approach to increase discovery power in eQTL studies.

20.
Nat Genet ; 54(10): 1572-1580, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36050550

RESUMEN

Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type-disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.


Asunto(s)
Estudio de Asociación del Genoma Completo , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Herencia Multifactorial/genética , RNA-Seq , Análisis de la Célula Individual/métodos , Triglicéridos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA