ABSTRACT
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
Subject(s)
Disease/genetics , Multifactorial Inheritance/genetics , Population/genetics , RNA, Long Noncoding/genetics , Transcriptome , Coronary Artery Disease/genetics , Diabetes Mellitus, Type 1/genetics , Diabetes Mellitus, Type 2/genetics , Gene Expression Profiling , Genetic Variation , Humans , Inflammatory Bowel Diseases/genetics , Organ Specificity/genetics , Quantitative Trait LociABSTRACT
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
Subject(s)
Multifactorial Inheritance , Obesity , Body Mass Index , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Multifactorial Inheritance/genetics , Obesity/genetics , Phenotype , Risk FactorsABSTRACT
Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.
Subject(s)
Codon, Nonsense/genetics , Gene Expression Regulation , Genetic Diseases, Inborn/pathology , Genetic Variation , Mutation , Nonsense Mediated mRNA Decay , RNA, Messenger/genetics , Gene Frequency , Genetic Diseases, Inborn/genetics , HumansABSTRACT
Recombinant adeno-associated virus (rAAV) vectors have the unique ability to promote targeted integration of transgenes via homologous recombination at specified genomic sites, reaching frequencies of 0.1%-1%. We studied genomic parameters that influence targeting efficiencies on a large scale. To do this, we generated more than 1,000 engineered, doxycycline-inducible target sites in the human HAP1 cell line and infected this polyclonal population with a library of AAV-DJ targeting vectors, with each carrying a unique barcode. The heterogeneity of barcode integration at each target site provided an assessment of targeting efficiency at that locus. We compared targeting efficiency with and without target site transcription for identical chromosomal positions. Targeting efficiency was enhanced by target site transcription, while chromatin accessibility was associated with an increased likelihood of targeting. ChromHMM chromatin states characterizing transcription and enhancers in wild-type K562 cells were also associated with increased AAV-HR efficiency with and without target site transcription, respectively. Furthermore, the amenability of a site to targeting was influenced by the endogenous transcriptional level of intersecting genes. These results define important parameters that may not only assist in designing optimal targeting vectors for genome editing, but also provide new insights into the mechanism of AAV-mediated homologous recombination.
Subject(s)
Chromatin/genetics , Dependovirus/genetics , Gene Targeting/methods , Gene Transfer Techniques/statistics & numerical data , Genetic Vectors/genetics , Homologous Recombination , Transgenes , Genetic Vectors/administration & dosage , Humans , K562 CellsABSTRACT
BACKGROUND: Maternal-fetal medicine is a rapidly growing field requiring collaboration from many subspecialties. We provide an evidence-based estimate of capacity needs for our clinic, as well as demonstrate how simulation can aid in capacity planning in similar environments. METHODS: A Discrete Event Simulation of the Center for Fetal Diagnosis and Treatment and Special Delivery Unit at The Children's Hospital of Philadelphia was designed and validated. This model was then used to determine the time until demand overwhelms inpatient bed availability under increasing capacity. FINDINGS: No significant deviation was found between historical inpatient censuses and simulated censuses for the validation phase (p = 0.889). Prospectively increasing capacity was found to delay time to balk (the inability of the center to provide bed space for a patient in need of admission). With current capacity, the model predicts mean time to balk of 276 days. Adding three beds delays mean time to first balk to 762 days; an additional six beds to 1,335 days. CONCLUSION: Providing sufficient access is a patient safety issue, and good planning is crucial for targeting infrastructure investments appropriately. Computer-simulated analysis can provide an evidence base for both medical and administrative decision making in a complex clinical environment.
Subject(s)
Computer Simulation , Hospital Bed Capacity/statistics & numerical data , Models, Statistical , Delivery of Health Care , HumansABSTRACT
Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.
Subject(s)
Genetic Variation , Induced Pluripotent Stem Cells/physiology , Quantitative Trait Loci , Bardet-Biedl Syndrome/genetics , Calcium Channels/genetics , Cell Line , Cerebellar Ataxia/genetics , DNA Methylation , Gene Expression , Humans , Induced Pluripotent Stem Cells/cytology , Polymorphism, Single Nucleotide , Proteins/genetics , Rare Diseases/genetics , Regulatory Sequences, Nucleic Acid , Sequence Analysis, RNA , Whole Genome SequencingABSTRACT
Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
Subject(s)
Genetic Variation , Genome, Human , Multifactorial Inheritance , Transcriptome , Humans , Organ SpecificityABSTRACT
It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.
Subject(s)
Rare Diseases/genetics , Acid Ceramidase/genetics , Case-Control Studies , Child , Child, Preschool , Cohort Studies , Female , Genetic Variation , Humans , Male , Models, Genetic , Mutation , Oxidoreductases Acting on CH-CH Group Donors/genetics , Potassium Channels/genetics , RNA/blood , RNA/genetics , RNA Splicing/genetics , Rare Diseases/blood , Sequence Analysis, RNA , Exome SequencingABSTRACT
Medical emergency preparedness has been an issue of medical relevance since the advent of hospital care. Studies have simulated emergency department (ED) overcrowding but not yet characterized effects of large-scale, planned events that drastically alter a city's demography, such as in Philadelphia, Pennsylvania during the 2015 World Meeting of Families. A discrete event simulation of the ED at the Children's Hospital of Philadelphia was designed and validated using past data. The model was used to predict the patient length of stay (LOS) and number of admitted patients if the arrival stream to the ED were to change by 50% from typical arrivals in either direction. We compared the model's estimations with data produced during the papal visit that had 39.65% fewer patient arrivals. For validation, the simulated mean LOS was 226.1 ± 173.3 minutes (mean ± SD) for all patients and 352.1 ± 170.3 minutes for admitted patients. Real-world mean LOSs for the fiscal year 2014 were 230.6 ± 134.8 for all patients and 345.0 ± 147.7 for admitted patients. For the estimation of the World Meeting of Families, the simulation accurately estimated the LOS of both patients overall and admitted patients within 10%. These results show that it is possible to use simulations to project the patient flow effects in EDs in case of large-scale events. Providing efficient care is essential to emergency operations, and projections of demand are crucial for targeting appropriate changes during large-scale events. Analysis of validated computer simulations allows for evidence-based decision making in a complex clinical environment.
ABSTRACT
Macrophages, the primary cell of the innate immune system, act on a spectrum of phenotypes that correspond to diverse functions. Dysregulation of macrophage phenotype is associated with many diseases. In particular, defective transition from pro-inflammatory (M1) to anti-inflammatory (M2) behavior has been implicated as a potential source of sustained inflammation that prevents healing of chronic wounds such as diabetic ulcers. In order to design effective treatments, an understanding of the relative presence of macrophage phenotypes during tissue repair is necessary. Inferring the relative phenotype composition is currently challenging due to the heterogeneous nature of the macrophages themselves and also of tissue samples. We propose here a method to deconvolute gene expression from heterogeneous tissue samples into the composition of two primary macrophage phenotypes (M1 and M2). Our final method uses gene expression signatures for each phenotype cultivated in vitro as input to a predictive model that infers sample composition with an average error of 0.16, and whose predictions fit known compositions prepared in vitro with an R2 value of 0.90. Finally, we apply this model to describe macrophage behavior in human diabetic ulcer healing using clinically isolated ulcer tissue samples. The model predicted that non-healing diabetic ulcers contained higher proportions of M1 macrophages compared to healing diabetic ulcers, in agreement with numerous studies that have implicated a dysfunctional M1-to-M2 transition in the impaired healing of diabetic ulcers. These results show proof of concept that the model holds utility in making predictions regarding macrophage behavior in heterogeneous samples, with potential application as a wound healing diagnostic.
Subject(s)
Cell Culture Techniques , Gene Expression Regulation , Macrophages/cytology , Wound Healing , Aged , Diabetes Complications/pathology , Female , Gene Expression Profiling , Humans , Immunity, Innate , Inflammation , Macrophages/metabolism , Male , Middle Aged , Phenotype , Regression Analysis , Sequence Analysis, RNA , Ulcer/pathologyABSTRACT
Alternatively activated "M2" macrophages are believed to function during late stages of wound healing, behaving in an anti-inflammatory manner to mediate the resolution of the pro-inflammatory response caused by "M1" macrophages. However, the differences between two main subtypes of M2 macrophages, namely interleukin-4 (IL-4)-stimulated "M2a" macrophages and IL-10-stimulated "M2c" macrophages, are not well understood. M2a macrophages are characterized by their ability to inhibit inflammation and contribute to the stabilization of angiogenesis. However, the role and temporal profile of M2c macrophages in wound healing are not known. Therefore, we performed next generation sequencing (RNA-seq) to identify biological functions and gene expression signatures of macrophages polarized in vitro with IL-10 to the M2c phenotype in comparison to M1 and M2a macrophages and an unactivated control (M0). We then explored the expression of these gene signatures in a publicly available data set of human wound healing. RNA-seq analysis showed that hundreds of genes were upregulated in M2c macrophages compared to the M0 control, with thousands of alternative splicing events. Following validation by Nanostring, 39 genes were found to be upregulated by M2c macrophages compared to the M0 control, and 17 genes were significantly upregulated relative to the M0, M1, and M2a phenotypes (using an adjusted p-value cutoff of 0.05 and fold change cutoff of 1.5). Many of the identified M2c-specific genes are associated with angiogenesis, matrix remodeling, and phagocytosis, including CD163, MMP8, TIMP1, VCAN, SERPINA1, MARCO, PLOD2, PCOCLE2 and F5. Analysis of the macrophage-conditioned media for secretion of matrix-remodeling proteins showed that M2c macrophages secreted higher levels of MMP7, MMP8, and TIMP1 compared to the other phenotypes. Interestingly, temporal gene expression analysis of a publicly available microarray data set of human wound healing showed that M2c-related genes were upregulated at early times after injury, similar to M1-related genes, while M2a-related genes appeared at later stages or were downregulated after injury. While further studies are required to confirm the timing and role of M2c macrophages in vivo, these results suggest that M2c macrophages may function at early stages of wound healing. Identification of markers of the M2c phenotype will allow more detailed investigations into the role of M2c macrophages in vivo.