Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38746154

RESUMEN

Functional enhancer annotation is a valuable first step for understanding tissue-specific transcriptional regulation and prioritizing disease-associated non-coding variants for investigation. However, unbiased enhancer discovery in physiologically relevant contexts remains a major challenge. To discover regulatory elements pertinent to diabetes, we conducted a CRISPR interference (CRISPRi) screen in the human pluripotent stem cell (hPSC) pancreatic differentiation system. Among the enhancers uncovered, we focused on a long-range enhancer ∼664 kb from the ONECUT1 promoter, as coding mutations in ONECUT1 cause pancreatic hypoplasia and neonatal diabetes. Homozygous enhancer deletion in hPSCs was associated with a near-complete loss of ONECUT1 gene expression and compromised pancreatic differentiation. This enhancer contains a confidently fine-mapped type 2 diabetes (T2D) associated variant (rs528350911) which disrupts a GATA motif. Introduction of the risk variant into hPSCs revealed substantially reduced binding of key pancreatic transcription factors (GATA4, GATA6 and FOXA2) on the edited allele, accompanied by a slight reduction of ONECUT1 transcription, supporting a causal role for this risk variant in metabolic disease. This work expands our knowledge about transcriptional regulation in pancreatic development through the characterization of a long-range enhancer and highlights the utility of enhancer discovery in disease-relevant settings for understanding monogenic and complex disease.

3.
Nat Genet ; 56(4): 615-626, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38594305

RESUMEN

Translating genome-wide association study (GWAS) loci into causal variants and genes requires accurate cell-type-specific enhancer-gene maps from disease-relevant tissues. Building enhancer-gene maps is essential but challenging with current experimental methods in primary human tissues. Here we developed a nonparametric statistical method, SCENT (single-cell enhancer target gene mapping), that models association between enhancer chromatin accessibility and gene expression in single-cell or nucleus multimodal RNA sequencing and ATAC sequencing data. We applied SCENT to 9 multimodal datasets including >120,000 single cells or nuclei and created 23 cell-type-specific enhancer-gene maps. These maps were highly enriched for causal variants in expression quantitative loci and GWAS for 1,143 diseases and traits. We identified likely causal genes for both common and rare diseases and linked somatic mutation hotspots to target genes. We demonstrate that application of SCENT to multimodal data from disease-relevant human tissue enables the scalable construction of accurate cell-type-specific enhancer-gene maps, essential for defining noncoding variant function.


Asunto(s)
Estudio de Asociación del Genoma Completo , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Alelos , Estudio de Asociación del Genoma Completo/métodos , Mapeo Cromosómico , Fenotipo , Cromatina/genética , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad/genética
4.
Nat Genet ; 56(4): 627-636, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38514783

RESUMEN

We present a gene-level regulatory model, single-cell ATAC + RNA linking (SCARlink), which predicts single-cell gene expression and links enhancers to target genes using multi-ome (scRNA-seq and scATAC-seq co-assay) sequencing data. The approach uses regularized Poisson regression on tile-level accessibility data to jointly model all regulatory effects at a gene locus, avoiding the limitations of pairwise gene-peak correlations and dependence on peak calling. SCARlink outperformed existing gene scoring methods for imputing gene expression from chromatin accessibility across high-coverage multi-ome datasets while giving comparable to improved performance on low-coverage datasets. Shapley value analysis on trained models identified cell-type-specific gene enhancers that are validated by promoter capture Hi-C and are 11× to 15× and 5× to 12× enriched in fine-mapped eQTLs and fine-mapped genome-wide association study (GWAS) variants, respectively. We further show that SCARlink-predicted and observed gene expression vectors provide a robust way to compute a chromatin potential vector field to enable developmental trajectory analysis.


Asunto(s)
Cromatina , Estudio de Asociación del Genoma Completo , Cromatina/genética , Secuencias Reguladoras de Ácidos Nucleicos , Regulación de la Expresión Génica , Regiones Promotoras Genéticas/genética , ARN , Análisis de la Célula Individual/métodos
5.
Nat Commun ; 15(1): 563, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38233398

RESUMEN

Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and studies integrating GWAS with scRNA-seq have shown promise, but studies integrating GWAS with scATAC-seq have been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases/traits (average N = 298 K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (respectively adult) brain cell types for 22 (respectively 23) of 28 traits using scATAC-seq, and for 8 (respectively 17) of 28 traits using scRNA-seq. Significant scATAC-seq enrichments included fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases/traits and inform future analyses.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Trastorno Depresivo Mayor , Humanos , RNA-Seq , Estudio de Asociación del Genoma Completo , Cromatina/genética , Encéfalo , Análisis de la Célula Individual
6.
bioRxiv ; 2023 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-38014075

RESUMEN

Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

7.
bioRxiv ; 2023 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-36747789

RESUMEN

E3 ligases regulate key processes, but many of their roles remain unknown. Using Perturb-seq, we interrogated the function of 1,130 E3 ligases, partners and substrates in the inflammatory response in primary dendritic cells (DCs). Dozens impacted the balance of DC1, DC2, migratory DC and macrophage states and a gradient of DC maturation. Family members grouped into co-functional modules that were enriched for physical interactions and impacted specific programs through substrate transcription factors. E3s and their adaptors co-regulated the same processes, but partnered with different substrate recognition adaptors to impact distinct aspects of the DC life cycle. Genetic interactions were more prevalent within than between modules, and a deep learning model, comßVAE, predicts the outcome of new combinations by leveraging modularity. The E3 regulatory network was associated with heritable variation and aberrant gene expression in immune cells in human inflammatory diseases. Our study provides a general approach to dissect gene function.

8.
Nature ; 614(7948): 492-499, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36755099

RESUMEN

Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes1-3. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear4. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes5. Rare coding variants (allele frequency < 1 × 10-3) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average-much less than common variants-and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10-5). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder6,7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.


Asunto(s)
Exoma , Frecuencia de los Genes , Variación Genética , Herencia Multifactorial , Humanos , Exoma/genética , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Factores de Riesgo , Reino Unido , Sitios Genéticos/genética , Esquizofrenia/genética , Trastorno Bipolar/genética
10.
Nat Genet ; 54(10): 1572-1580, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36050550

RESUMEN

Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type-disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.


Asunto(s)
Estudio de Asociación del Genoma Completo , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Herencia Multifactorial/genética , RNA-Seq , Análisis de la Célula Individual/métodos , Triglicéridos
11.
Nat Genet ; 54(10): 1466-1469, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36138231

RESUMEN

Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) ≤ 1%, but inflation is observed in variance component set-based tests when restricting to variants with MAF ≤ 0.1% or 0.01%. Here, we propose SAIGE-GENE+ with greatly improved type I error control and computational efficiency to facilitate rare variant tests in large-scale data. We further show that incorporating multiple MAF cutoffs and functional annotations can improve power and thus uncover new gene-phenotype associations. In the analysis of UKBB whole exome sequencing data for 30 quantitative and 141 binary traits, SAIGE-GENE+ identified 551 gene-phenotype associations.


Asunto(s)
Estudio de Asociación del Genoma Completo , Frecuencia de los Genes/genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Secuenciación del Exoma
12.
Nat Genet ; 54(10): 1479-1492, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36175791

RESUMEN

Genome-wide association studies provide a powerful means of identifying loci and genes contributing to disease, but in many cases, the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. In the present study, we introduce sc-linker, a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. The inferred disease enrichments recapitulated known biology and highlighted notable cell-disease relationships, including γ-aminobutyric acid-ergic neurons in major depressive disorder, a disease-dependent M-cell program in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease-dependent immune cell-type programs were associated, whereas only disease-dependent epithelial cell programs were prominent, suggesting a role in disease response rather than initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.


Asunto(s)
Trastorno Depresivo Mayor , Estudio de Asociación del Genoma Completo , Trastorno Depresivo Mayor/genética , Predisposición Genética a la Enfermedad , Genética Humana , Humanos , Polimorfismo de Nucleótido Simple/genética , ARN , Ácido gamma-Aminobutírico
13.
Cell Genom ; 2(7)2022 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-35873673

RESUMEN

We assess contributions to autoimmune disease of genes whose regulation is driven by enhancer regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using several SNP-to-gene (S2G) strategies and apply heritability analyses to draw three conclusions about 11 autoimmune/blood-related diseases/traits. First, several characterizations of enhancer-related genes using functional genomics data are informative for autoimmune disease heritability after conditioning on a broad set of regulatory annotations. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2-fold stronger heritability signal and >2-fold stronger enrichment for drug targets, compared with the recently proposed enhancer domain score. In each case, functionally informed S2G strategies produced 4.1- to 13-fold stronger disease signals than conventional window-based strategies.

14.
Nat Genet ; 54(6): 827-836, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35668300

RESUMEN

Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
15.
bioRxiv ; 2021 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-34845454

RESUMEN

Genome-wide association studies (GWAS) provide a powerful means to identify loci and genes contributing to disease, but in many cases the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. Here, we introduce sc-linker, a framework for integrating single-cell RNA-seq (scRNA-seq), epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. We analyzed 1.6 million scRNA-seq profiles from 209 individuals spanning 11 tissue types and 6 disease conditions, and constructed gene programs capturing cell types, disease progression, and cellular processes both within and across cell types. We evaluated these gene programs for disease enrichment by transforming them to SNP annotations with tissue-specific epigenomic maps and computing enrichment scores across 60 diseases and complex traits (average N= 297K). Cell type, disease progression, and cellular process programs captured distinct heritability signals even within the same cell type, as we show in multiple complex diseases that affect the brain (Alzheimer’s disease, multiple sclerosis), colon (ulcerative colitis) and lung (asthma, idiopathic pulmonary fibrosis, severe COVID-19). The inferred disease enrichments recapitulated known biology and highlighted novel cell-disease relationships, including GABAergic neurons in major depressive disorder (MDD), a disease progression M cell program in ulcerative colitis, and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease progression immune cell type programs were associated, whereas for epithelial cells, disease progression programs were most prominent, perhaps suggesting a role in disease progression over initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.

16.
Nature ; 595(7865): 107-113, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33915569

RESUMEN

COVID-19, which is caused by SARS-CoV-2, can result in acute respiratory distress syndrome and multiple organ failure1-4, but little is known about its pathophysiology. Here we generated single-cell atlases of 24 lung, 16 kidney, 16 liver and 19 heart autopsy tissue samples and spatial atlases of 14 lung samples from donors who died of COVID-19. Integrated computational analysis uncovered substantial remodelling in the lung epithelial, immune and stromal compartments, with evidence of multiple paths of failed tissue regeneration, including defective alveolar type 2 differentiation and expansion of fibroblasts and putative TP63+ intrapulmonary basal-like progenitor cells. Viral RNAs were enriched in mononuclear phagocytic and endothelial lung cells, which induced specific host programs. Spatial analysis in lung distinguished inflammatory host responses in lung regions with and without viral RNA. Analysis of the other tissue atlases showed transcriptional alterations in multiple cell types in heart tissue from donors with COVID-19, and mapped cell types and genes implicated with disease severity based on COVID-19 genome-wide association studies. Our foundational dataset elucidates the biological effect of severe SARS-CoV-2 infection across the body, a key step towards new treatments.


Asunto(s)
COVID-19/patología , COVID-19/virología , Riñón/patología , Hígado/patología , Pulmón/patología , Miocardio/patología , SARS-CoV-2/patogenicidad , Adulto , Anciano , Anciano de 80 o más Años , Atlas como Asunto , Autopsia , Bancos de Muestras Biológicas , COVID-19/genética , COVID-19/inmunología , Células Endoteliales , Células Epiteliales/patología , Células Epiteliales/virología , Femenino , Fibroblastos , Estudio de Asociación del Genoma Completo , Corazón/virología , Humanos , Inflamación/patología , Inflamación/virología , Riñón/virología , Hígado/virología , Pulmón/virología , Masculino , Persona de Mediana Edad , Especificidad de Órganos , Fagocitos , Alveolos Pulmonares/patología , Alveolos Pulmonares/virología , ARN Viral/análisis , Regeneración , SARS-CoV-2/inmunología , Análisis de la Célula Individual , Carga Viral
17.
Glob Ecol Biogeogr ; 30(3): 685-696, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33776580

RESUMEN

AIM: Biogeographical regions (realms) reflect patterns of co-distributed species (biotas) across space. Their boundaries are set by dispersal barriers and difficulties of establishment in new locations. We extend new methods to assess these two contributions by quantifying the degree to which realms intergrade across geographical space and the contributions of individual species to the delineation of those realms. As our example, we focus on Wallace's Line, the most enigmatic partitioning of the world's faunas, where climate is thought to have little effect and the majority of dispersal barriers are short water gaps. LOCATION: Indo-Pacific. TIME PERIOD: Present day. MAJOR TAXA STUDIED: Birds and mammals. METHODS: Terrestrial bird and mammal assemblages were established in 1-degree map cells using range maps. Assemblage structure was modelled using latent Dirichlet allocation, a continuous clustering method that simultaneously establishes the likely partitioning of species into biotas and the contribution of biotas to each map cell. Phylogenetic trees were used to assess the contribution of deep historical processes. Spatial segregation between biotas was evaluated across time and space in comparison with numerous hard realm boundaries drawn by various workers. RESULTS: We demonstrate that the strong turnover between biotas coincides with the north-western extent of the region not connected to the mainland during the Pleistocene, although the Philippines contains mixed contributions. At deeper taxonomic levels, Sulawesi and the Philippines shift to primarily Asian affinities, resulting from transgressions of a few Asian-derived lineages across the line. The partitioning of biotas sometimes produces fragmented regions that reflect habitat. Differences in partitions between birds and mammals reflect differences in dispersal ability. MAIN CONCLUSIONS: Permanent water barriers have selected for a dispersive archipelago fauna, excluded by an incumbent continental fauna on the Sunda shelf. Deep history, such as plate movements, is relatively unimportant in setting boundaries. The analysis implies a temporally dynamic interaction between a species' intrinsic dispersal ability, physiographic barriers, and recent climate change in the genesis of Earth's biotas.

18.
bioRxiv ; 2021 Feb 25.
Artículo en Inglés | MEDLINE | ID: mdl-33655247

RESUMEN

The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients' demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63+ intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies.

19.
Nat Commun ; 11(1): 6258, 2020 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-33288751

RESUMEN

Despite considerable progress on pathogenicity scores prioritizing variants for Mendelian disease, little is known about the utility of these scores for common disease. Here, we assess the informativeness of Mendelian disease-derived pathogenicity scores for common disease and improve upon existing scores. We first apply stratified linkage disequilibrium (LD) score regression to evaluate published pathogenicity scores across 41 common diseases and complex traits (average N = 320K). Several of the resulting annotations are informative for common disease, even after conditioning on a broad set of functional annotations. We then improve upon published pathogenicity scores by developing AnnotBoost, a machine learning framework to impute and denoise pathogenicity scores using a broad set of functional annotations. AnnotBoost substantially increases the informativeness for common disease of both previously uninformative and previously informative pathogenicity scores, implying that Mendelian and common disease variants share similar properties. The boosted scores also produce improvements in heritability model fit and in classifying disease-associated, fine-mapped SNPs. Our boosted scores may improve fine-mapping and candidate gene discovery for common disease.


Asunto(s)
Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad/genética , Desequilibrio de Ligamiento , Mutación Missense , Polimorfismo de Nucleótido Simple , Alelos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Aprendizaje Automático , Análisis de la Aleatorización Mendeliana/métodos
20.
Nat Genet ; 52(12): 1346-1354, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33257898

RESUMEN

Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R2). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.


Asunto(s)
Pueblo Asiatico/genética , Elementos de Facilitación Genéticos/genética , Predisposición Genética a la Enfermedad/genética , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética , Secuencia de Bases , Biología Computacional/métodos , Regulación de la Expresión Génica/genética , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Anotación de Secuencia Molecular , Herencia Multifactorial/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...