Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
1.
bioRxiv ; 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38645217

RESUMEN

Differential expression (DE) analysis is a widely used method for identifying genes that are functionally relevant for an observed phenotype or biological response. However, typical DE analysis includes selection of genes based on a threshold of fold change in expression under the implicit assumption that all genes are equally sensitive to dosage changes of their transcripts. This tends to favor highly variable genes over more constrained genes where even small changes in expression may be biologically relevant. To address this limitation, we have developed a method to recalibrate each gene's differential expression fold change based on genetic expression variance observed in the human population. The newly established metric ranks statistically differentially expressed genes not by nominal change of expression, but by relative change in comparison to natural dosage variation for each gene. We apply our method to RNA sequencing datasets from rare disease and in-vitro stimulus response experiments. Compared to the standard approach, our method adjusts the bias in discovery towards highly variable genes, and enriches for pathways and biological processes related to metabolic and regulatory activity, indicating a prioritization of functionally relevant driver genes. With that, our method provides a novel view on DE and contributes towards bridging the existing gap between statistical and biological significance. We believe that this approach will simplify the identification of disease causing genes and enhance the discovery of therapeutic targets.

2.
PLoS One ; 19(3): e0291960, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38478511

RESUMEN

Common variants affecting mRNA splicing are typically identified though splicing quantitative trait locus (sQTL) mapping and have been shown to be enriched for GWAS signals by a similar degree to eQTLs. However, the specific splicing changes induced by these variants have been difficult to characterize, making it more complicated to analyze the effect size and direction of sQTLs, and to determine downstream splicing effects on protein structure. In this study, we catalogue sQTLs using exon percent spliced in (PSI) scores as a quantitative phenotype. PSI is an interpretable metric for identifying exon skipping events and has some advantages over other methods for quantifying splicing from short read RNA sequencing. In our set of sQTL variants, we find evidence of selective effects based on splicing effect size and effect direction, as well as exon symmetry. Additionally, we utilize AlphaFold2 to predict changes in protein structure associated with sQTLs overlapping GWAS traits, highlighting a potential new use-case for this technology for interpreting genetic effects on traits and disorders.


Asunto(s)
Empalme Alternativo , Polimorfismo de Nucleótido Simple , Empalme del ARN/genética , Proteínas/genética , Exones/genética
3.
bioRxiv ; 2024 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-38464330

RESUMEN

Genomic loci associated with common traits and diseases are typically non-coding and likely impact gene expression, sometimes coinciding with rare loss-of-function variants in the target gene. However, our understanding of how gradual changes in gene dosage affect molecular, cellular, and organismal traits is currently limited. To address this gap, we induced gradual changes in gene expression of four genes using CRISPR activation and inactivation. Downstream transcriptional consequences of dosage modulation of three master trans-regulators associated with blood cell traits (GFI1B, NFE2, and MYB) were examined using targeted single-cell multimodal sequencing. We showed that guide tiling around the TSS is the most effective way to modulate cis gene expression across a wide range of fold-changes, with further effects from chromatin accessibility and histone marks that differ between the inhibition and activation systems. Our single-cell data allowed us to precisely detect subtle to large gene expression changes in dozens of trans genes, revealing that many responses to dosage changes of these three TFs are non-linear, including non-monotonic behaviours, even when constraining the fold-changes of the master regulators to a copy number gain or loss. We found that the dosage properties are linked to gene constraint and that some of these non-linear responses are enriched for disease and GWAS genes. Overall, our study provides a straightforward and scalable method to precisely modulate gene expression and gain insights into its downstream consequences at high resolution.

4.
Cell ; 187(5): 1059-1075, 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38428388

RESUMEN

Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.


Asunto(s)
Genética Humana , Humanos , Variación Genética , Herencia Multifactorial , Fenotipo
5.
Ann Am Thorac Soc ; 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-38335160

RESUMEN

Rationale Chronic obstructive pulmonary disease (COPD) and emphysema are associated with endothelial damage and altered pulmonary microvascular perfusion. Molecular mechanisms underlying these changes are poorly understood in patients due, in part, to the inaccessibility of the pulmonary vasculature. Peripheral blood mononuclear cells (PBMC) interact with the pulmonary endothelium. Objective To test the association between gene expression in PBMCs and pulmonary microvascular perfusion in COPD. Methods The Multi-Ethnic Study of Atherosclerosis (MESA) COPD Study recruited two independent samples of COPD cases and controls with 10 or more pack-years. In both samples, pulmonary microvascular blood flow, pulmonary microvascular blood volume (PMBV), and mean transit time were assessed on contrast-enhanced MRI, and PBMC gene expression was assessed by microarray. Additional replication was performed in a third sample with PMBV measures on contrast-enhanced, dual-energy CT. Differential expression analyses were adjusted for age, gender, race-ethnicity, educational attainment, height, weight, smoking status, and pack-years. Results The 79 participants in the discovery sample had mean age of 69±6 years, 44% were female, 25% were non-white, 34% were current smokers and 66% had COPD. There were large PBMC gene expression signatures associated with pulmonary microvascular perfusion traits, with several replicated in the replication sets with MRI (n=47) or dual-energy CT scan (n=157) measures. Many of the identified genes are involved in inflammatory processes, including NF-κB and chemokine signaling pathways. Conclusions PBMC gene expression in NF-κB, inflammatory and chemokine signaling pathways was associated pulmonary microvascular perfusion in COPD, potentially offering new targetable candidates for novel therapies.

6.
bioRxiv ; 2024 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-38260658

RESUMEN

Understanding the role of transcription and transcription factors in cellular identity and disease, such as cancer and autoimmunity, is essential. However, comprehensive data resources for cell line-specific transcription factor-to-target gene annotations are currently limited. To address this, we developed a straightforward method to define regulons that capture the cell-specific aspects of TF binding and transcript expression levels. By integrating cellular transcriptome and transcription factor binding data, we generated regulons for four common cell lines comprising both proximal and distal cell line-specific regulatory events. Through systematic benchmarking involving transcription factor knockout experiments, we demonstrated performance on par with state-of-the-art methods, with our method being easily applicable to other cell types of interest. We present case studies using three cancer single-cell datasets to showcase the utility of these cell-type-specific regulons in exploring transcriptional dysregulation. In summary, this study provides a valuable tool and a resource for systematically exploring cell line-specific transcriptional regulations, emphasizing the utility of network analysis in deciphering disease mechanisms.

7.
Am J Hum Genet ; 111(1): 133-149, 2024 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-38181730

RESUMEN

Bulk-tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, and context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from the blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell-type proportions, we demonstrate that cell-type iQTLs could be considered as proxies for cell-type-specific QTL effects, particularly for the most abundant cell type in the tissue. The interpretation of age iQTLs, however, warrants caution because the moderation effect of age on the genotype and molecular phenotype association could be mediated by changes in cell-type composition. Finally, we show that cell-type iQTLs contribute to cell-type-specific enrichment of diseases that, in combination with additional functional data, could guide future functional studies. Overall, this study highlights the use of iQTLs to gain insights into the context specificity of regulatory effects.


Asunto(s)
Regulación de la Expresión Génica , Sitios de Carácter Cuantitativo , Humanos , Sitios de Carácter Cuantitativo/genética , Genotipo , Fenotipo
8.
BMC Genomics ; 24(1): 790, 2023 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-38114913

RESUMEN

Transcriptome studies disentangle functional mechanisms of gene expression regulation and may elucidate the underlying biology of disease processes. However, the types of tissues currently collected typically assay a single post-mortem timepoint or are limited to investigating cell types found in blood. Noninvasive tissues may improve disease-relevant discovery by enabling more complex longitudinal study designs, by capturing different and potentially more applicable cell types, and by increasing sample sizes due to reduced collection costs and possible higher enrollment from vulnerable populations. Here, we develop methods for sampling noninvasive biospecimens, investigate their performance across commercial and in-house library preparations, characterize their biology, and assess the feasibility of using noninvasive tissues in a multitude of transcriptomic applications. We collected buccal swabs, hair follicles, saliva, and urine cell pellets from 19 individuals over three to four timepoints, for a total of 300 unique biological samples, which we then prepared with replicates across three library preparations, for a final tally of 472 transcriptomes. Of the four tissues we studied, we found hair follicles and urine cell pellets to be most promising due to the consistency of sample quality, the cell types and expression profiles we observed, and their performance in disease-relevant applications. This is the first study to thoroughly delineate biological and technical features of noninvasive samples and demonstrate their use in a wide array of transcriptomic and clinical analyses. We anticipate future use of these biospecimens will facilitate discovery and development of clinical applications.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Humanos , Estudios Longitudinales , Regulación de la Expresión Génica , Saliva
9.
Am J Hum Genet ; 110(12): 1996-2002, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-37995684

RESUMEN

In this perspective we discuss the current lack of genetic and environmental diversity in functional genomics datasets. There is a well-described Eurocentric bias in genetic and functional genomic research that has a clear impact on the benefit this research can bring to underrepresented populations. Current research focused on genetic variant-to-function experiments aims to identify molecular QTLs, but the lack of data from genetically diverse individuals has limited analyses to mostly populations of European ancestry. Although some efforts have been established to increase diversity in functional genomic studies, much remains to be done to consistently generate data for underrepresented populations from now on. We discuss the major barriers for this continuity and suggest actionable insights, aiming to empower research and researchers from underserved populations.


Asunto(s)
Genómica , Grupos de Población , Humanos
10.
Cell Genom ; 3(10): 100401, 2023 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-37868038

RESUMEN

Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis, which included several hundred individuals, with whole-genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, 10 years apart. We evaluated each multi-omics phenotype's ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62 times and rare frameshift variants 216 times as frequently as controls, compared to 13-27 times as frequently for expression or protein effects alone. We extended a Bayesian hierarchical model, "Watershed," to prioritize specific rare variants underlying multi-omics signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer's disease.

11.
bioRxiv ; 2023 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-37905013

RESUMEN

Inference of directed biological networks is an important but notoriously challenging problem. We introduce inverse sparse regression (inspre), an approach to learning causal networks that leverages large-scale intervention-response data. Applied to 788 genes from the genome-wide perturb-seq dataset, inspre helps elucidate the network architecture of blood traits.

12.
Cell Genom ; 3(9): 100382, 2023 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-37719147

RESUMEN

Genetic variants affecting gene expression levels in humans have been mapped in the Genotype-Tissue Expression (GTEx) project. Trans-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.

13.
Cell Genom ; 3(8): 100359, 2023 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-37601969

RESUMEN

Multi-omics datasets are becoming more common, necessitating better integration methods to realize their revolutionary potential. Here, we introduce multi-set correlation and factor analysis (MCFA), an unsupervised integration method tailored to the unique challenges of high-dimensional genomics data that enables fast inference of shared and private factors. We used MCFA to integrate methylation markers, protein expression, RNA expression, and metabolite levels in 614 diverse samples from the Trans-Omics for Precision Medicine/Multi-Ethnic Study of Atherosclerosis multi-omics pilot. Samples cluster strongly by ancestry in the shared space, even in the absence of genetic information, while private spaces frequently capture dataset-specific technical variation. Finally, we integrated genetic data by conducting a genome-wide association study (GWAS) of our inferred factors, observing that several factors are enriched for GWAS hits and trans-expression quantitative trait loci. Two of these factors appear to be related to metabolic disease. Our study provides a foundation and framework for further integrative analysis of ever larger multi-modal genomic datasets.

14.
bioRxiv ; 2023 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-37425716

RESUMEN

Bulk tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, while context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell type proportions, we demonstrate that cell type iQTLs could be considered as proxies for cell type-specific QTL effects. The interpretation of age iQTLs, however, warrants caution as the moderation effect of age on the genotype and molecular phenotype association may be mediated by changes in cell type composition. Finally, we show that cell type iQTLs contribute to cell type-specific enrichment of diseases that, in combination with additional functional data, may guide future functional studies. Overall, this study highlights iQTLs to gain insights into the context-specificity of regulatory effects.

15.
Thorax ; 78(11): 1067-1079, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37268414

RESUMEN

BACKGROUND: Treatment and preventative advances for chronic obstructive pulmonary disease (COPD) have been slow due, in part, to limited subphenotypes. We tested if unsupervised machine learning on CT images would discover CT emphysema subtypes with distinct characteristics, prognoses and genetic associations. METHODS: New CT emphysema subtypes were identified by unsupervised machine learning on only the texture and location of emphysematous regions on CT scans from 2853 participants in the Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS), a COPD case-control study, followed by data reduction. Subtypes were compared with symptoms and physiology among 2949 participants in the population-based Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study and with prognosis among 6658 MESA participants. Associations with genome-wide single-nucleotide-polymorphisms were examined. RESULTS: The algorithm discovered six reproducible (interlearner intraclass correlation coefficient, 0.91-1.00) CT emphysema subtypes. The most common subtype in SPIROMICS, the combined bronchitis-apical subtype, was associated with chronic bronchitis, accelerated lung function decline, hospitalisations, deaths, incident airflow limitation and a gene variant near DRD1, which is implicated in mucin hypersecretion (p=1.1 ×10-8). The second, the diffuse subtype was associated with lower weight, respiratory hospitalisations and deaths, and incident airflow limitation. The third was associated with age only. The fourth and fifth visually resembled combined pulmonary fibrosis emphysema and had distinct symptoms, physiology, prognosis and genetic associations. The sixth visually resembled vanishing lung syndrome. CONCLUSION: Large-scale unsupervised machine learning on CT scans defined six reproducible, familiar CT emphysema subtypes that suggest paths to specific diagnosis and personalised therapies in COPD and pre-COPD.


Asunto(s)
Enfisema , Enfermedad Pulmonar Obstructiva Crónica , Enfisema Pulmonar , Humanos , Enfisema Pulmonar/diagnóstico por imagen , Enfisema Pulmonar/genética , Estudios de Casos y Controles , Aprendizaje Automático no Supervisado , Pulmón , Tomografía Computarizada por Rayos X
16.
Genetics ; 224(4)2023 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-37348055

RESUMEN

Exonic variants present some of the strongest links between genotype and phenotype. However, these variants can have significant inter-individual pathogenicity differences, known as variable penetrance. In this study, we propose a model where genetically controlled mRNA splicing modulates the pathogenicity of exonic variants. By first cataloging exonic inclusion from RNA-sequencing data in GTEx V8, we find that pathogenic alleles are depleted on highly included exons. Using a large-scale phased whole genome sequencing data from the TOPMed consortium, we observe that this effect may be driven by common splice-regulatory genetic variants, and that natural selection acts on haplotype configurations that reduce the transcript inclusion of putatively pathogenic variants, especially when limiting to haploinsufficient genes. Finally, we test if this effect may be relevant for autism risk using families from the Simons Simplex Collection, but find that splicing of pathogenic alleles has a penetrance reducing effect here as well. Overall, our results indicate that common splice-regulatory variants may play a role in reducing the damaging effects of rare exonic variants.


Asunto(s)
Sitios de Empalme de ARN , Empalme del ARN , Penetrancia , Exones , Genotipo , ARN Mensajero/genética , Empalme Alternativo
17.
PLoS Genet ; 19(6): e1010445, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37352370

RESUMEN

Hyper-secretion and/or hyper-concentration of mucus is a defining feature of multiple obstructive lung diseases, including chronic obstructive pulmonary disease (COPD). Mucus itself is composed of a mixture of water, ions, salt and proteins, of which the gel-forming mucins, MUC5AC and MUC5B, are the most abundant. Recent studies have linked the concentrations of these proteins in sputum to COPD phenotypes, including chronic bronchitis (CB) and acute exacerbations (AE). We sought to determine whether common genetic variants influence sputum mucin concentrations and whether these variants are also associated with COPD phenotypes, specifically CB and AE. We performed a GWAS to identify quantitative trait loci for sputum mucin protein concentration (pQTL) in the Sub-Populations and InteRmediate Outcome Measures in COPD Study (SPIROMICS, n = 708 for total mucin, n = 215 for MUC5AC, MUC5B). Subsequently, we tested for associations of mucin pQTL with CB and AE using regression modeling (n = 822-1300). Replication analysis was conducted using data from COPDGene (n = 5740) and by examining results from the UK Biobank. We identified one genome-wide significant pQTL for MUC5AC (rs75401036) and two for MUC5B (rs140324259, rs10001928). The strongest association for MUC5B, with rs140324259 on chromosome 11, explained 14% of variation in sputum MUC5B. Despite being associated with lower MUC5B, the C allele of rs140324259 conferred increased risk of CB (odds ratio (OR) = 1.42; 95% confidence interval (CI): 1.10-1.80) as well as AE ascertained over three years of follow up (OR = 1.41; 95% CI: 1.02-1.94). Associations between rs140324259 and CB or AE did not replicate in COPDGene. However, in the UK Biobank, rs140324259 was associated with phenotypes that define CB, namely chronic mucus production and cough, again with the C allele conferring increased risk. We conclude that sputum MUC5AC and MUC5B concentrations are associated with common genetic variants, and the top locus for MUC5B may influence COPD phenotypes, in particular CB.


Asunto(s)
Mucinas , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Mucinas/genética , Mucinas/metabolismo , Esputo/metabolismo , Enfermedad Pulmonar Obstructiva Crónica/genética , Moco/metabolismo , Fenotipo
18.
Science ; 380(6646): eadh7699, 2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37141313

RESUMEN

Most variants associated with complex traits and diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown effects. Using ancestrally diverse, biobank-scale GWAS data, massively parallel CRISPR screens, and single-cell transcriptomic and proteomic sequencing, we discovered 124 cis-target genes of 91 noncoding blood trait GWAS loci. Using precise variant insertion through base editing, we connected specific variants with gene expression changes. We also identified trans-effect networks of noncoding loci when cis target genes encoded transcription factors or microRNAs. Networks were themselves enriched for GWAS variants and demonstrated polygenic contributions to complex traits. This platform enables massively parallel characterization of the target genes and mechanisms of human noncoding variants in both cis and trans.


Asunto(s)
Enfermedad , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Sitios de Carácter Cuantitativo , Análisis de la Célula Individual , Humanos , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Proteómica , Células Sanguíneas , RNA-Seq , Enfermedad/genética
19.
PLoS Genet ; 19(5): e1010517, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37216410

RESUMEN

Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.


Asunto(s)
Análisis de Correlación Canónica , Proteómica , Humanos , Proteómica/métodos , Multiómica , Estudios de Cohortes
20.
bioRxiv ; 2023 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-36778406

RESUMEN

Exonic variants present some of the strongest links between genotype and phenotype. However, these variants can have significant inter-individual pathogenicity differences, known as variable penetrance. In this study, we propose a model where genetically controlled mRNA splicing modulates the pathogenicity of exonic variants. By first cataloging exonic inclusion from RNA-seq data in GTEx v8, we find that pathogenic alleles are depleted on highly included exons. Using a large-scale phased WGS data from the TOPMed consortium, we observe that this effect may be driven by common splice-regulatory genetic variants, and that natural selection acts on haplotype configurations that reduce the transcript inclusion of putatively pathogenic variants, especially when limiting to haploinsufficient genes. Finally, we test if this effect may be relevant for autism risk using families from the Simons Simplex Collection, but find that splicing of pathogenic alleles has a penetrance reducing effect here as well. Overall, our results indicate that common splice-regulatory variants may play a role in reducing the damaging effects of rare exonic variants.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...