RESUMEN
Sarcoidosis is a complex systemic disease. Our study aimed to (1) identify novel alleles associated with sarcoidosis susceptibility; (2) provide an in-depth evaluation of HLA alleles and sarcoidosis susceptibility and (3) integrate genetic and transcription data to identify risk loci that may more directly impact disease pathogenesis. We report a genome-wide association study of 1335 sarcoidosis cases and 1264 controls of European descent (EA) and investigate associated alleles in a study of African Americans (AA: 1487 cases and 1504 controls). The EA and AA cohort was recruited from multiple United States sites. HLA alleles were imputed and tested for association with sarcoidosis susceptibility. Expression quantitative locus and colocalization analysis were performed using a subset of subjects with transcriptome data. Forty-nine SNPs in the HLA region in HLA-DRA, -DRB9, -DRB5, -DQA1 and BRD2 genes were significantly associated with sarcoidosis susceptibility in EA, rs3129888 was also a risk variant for sarcoidosis in AA. Classical HLA alleles DRB1*0101, DQA1*0101 and DQB1*0501, which are highly correlated, were also associated with sarcoidosis. rs3135287 near HLA-DRA was associated with HLA-DRA expression in peripheral blood mononuclear cells and bronchoalveolar lavage from subjects and lung tissue and whole blood from GTEx. We identified six novel SNPs (out of the seven SNPs representing the 49 significant SNPs) and nine HLA alleles associated with sarcoidosis susceptibility in the largest EA population. We also replicated our findings in an AA population. Our study reiterates the potential role of antigen recognition and/or presentation HLA class II genes in sarcoidosis pathogenesis.
Asunto(s)
Estudio de Asociación del Genoma Completo , Sarcoidosis , Humanos , Predisposición Genética a la Enfermedad , Cadenas alfa de HLA-DR/genética , Leucocitos Mononucleares , Sarcoidosis/genética , Cadenas HLA-DRB1/genética , AlelosRESUMEN
Rationale: Relatives of patients with familial interstitial pneumonia (FIP) are at increased risk for pulmonary fibrosis and develop preclinical pulmonary fibrosis (PrePF). Objectives: We defined the incidence and progression of new-onset PrePF and its relationship to survival among first-degree relatives of families with FIP. Methods: This is a cohort study of family members with FIP who were initially screened with a health questionnaire and chest high-resolution computed tomography (HRCT) scan, and approximately 4 years later, the evaluation was repeated. A total of 493 asymptomatic first-degree relatives of patients with FIP were evaluated at baseline, and 296 (60%) of the original subjects participated in the subsequent evaluation. Measurements and Main Results: The median interval between HRCTs was 3.9 years (interquartile range, 3.5-4.4 yr). A total of 252 subjects who agreed to repeat evaluation were originally determined not to have PrePF at baseline; 16 developed PrePF. A conservative estimate of the annual incidence of PrePF is 1,023 per 100,000 person-years (95% confidence interval, 511-1,831 per 100,000 person-years). Of 44 subjects with PrePF at baseline, 38.4% subjects had worsening dyspnea compared with 15.4% of those without PrePF (P = 0.002). Usual interstitial pneumonia by HRCT (P < 0.0002) and baseline quantitative fibrosis score (P < 0.001) are also associated with worsening dyspnea. PrePF at the initial screen is associated with decreased survival (P < 0.001). Conclusions: The incidence of PrePF in this at-risk population is at least 100-fold higher than that reported for sporadic idiopathic pulmonary fibrosis (IPF). Although PrePF and IPF represent distinct entities, our study demonstrates that PrePF, like IPF, is progressive and associated with decreased survival.
Asunto(s)
Fibrosis Pulmonar Idiopática , Enfermedades Pulmonares Intersticiales , Humanos , Estudios de Cohortes , Incidencia , Disnea , Pulmón , Estudios RetrospectivosRESUMEN
Rationale: In addition to rare genetic variants and the MUC5B locus, common genetic variants contribute to idiopathic pulmonary fibrosis (IPF) risk. The predictive power of common variants outside the MUC5B locus for IPF and interstitial lung abnormalities (ILAs) is unknown. Objectives: We tested the predictive value of IPF polygenic risk scores (PRSs) with and without the MUC5B region on IPF, ILA, and ILA progression. Methods: We developed PRSs that included (PRS-M5B) and excluded (PRS-NO-M5B) the MUC5B region (500-kb window around rs35705950-T) using an IPF genome-wide association study. We assessed PRS associations with area under the receiver operating characteristic curve (AUC) metrics for IPF, ILA, and ILA progression. Measurements and Main Results: We included 14,650 participants (1,970 IPF; 1,068 ILA) from six multi-ancestry population-based and case-control cohorts. In cases excluded from genome-wide association study, the PRS-M5B (odds ratio [OR] per SD of the score, 3.1; P = 7.1 × 10-95) and PRS-NO-M5B (OR per SD, 2.8; P = 2.5 × 10-87) were associated with IPF. Participants in the top PRS-NO-M5B quintile had â¼sevenfold odds for IPF compared with those in the first quintile. A clinical model predicted IPF (AUC, 0.61); rs35705950-T and PRS-NO-M5B demonstrated higher AUCs (0.73 and 0.7, respectively), and adding both genetic predictors to a clinical model yielded the highest performance (AUC, 0.81). The PRS-NO-M5B was associated with ILA (OR, 1.25) and ILA progression (OR, 1.16) in European ancestry participants. Conclusions: A common genetic variant risk score complements the MUC5B variant to identify individuals at high risk of interstitial lung abnormalities and pulmonary fibrosis.
Asunto(s)
Estudio de Asociación del Genoma Completo , Fibrosis Pulmonar Idiopática , Humanos , Fibrosis Pulmonar Idiopática/genética , Factores de Riesgo , Pulmón , Mucina 5B/genética , Predisposición Genética a la EnfermedadRESUMEN
It is now common to have a modest to large number of features on individuals with complex diseases. Unsupervised analyses, such as clustering with and without preprocessing by Principle Component Analysis (PCA), is widely used in practice to uncover subgroups in a sample. However, in many modern studies features are often highly correlated and noisy (e.g. SNP's, -omics, quantitative imaging markers, and electronic health record data). The practical performance of clustering approaches in these settings remains unclear. Through extensive simulations and empirical examples applying Gaussian Mixture Models and related clustering methods, we show these approaches (including variants of kmeans, VarSelLCM, HDClassifier, and Fisher-EM) can have very poor performance in many settings. We also show the poor performance is often driven by either an explicit or implicit assumption by the clustering algorithm that high variance features are relevant while lower variance features are irrelevant, called the variance as relevance assumption. We develop practical pre-processing approaches that improve analysis performance in some cases. This work offers practical guidance on the strengths and limitations of unsupervised clustering approaches in modern data analysis applications.
RESUMEN
Idiopathic pulmonary fibrosis (IPF) is an incurable complex genetic disorder that is associated with sequence changes in 7 genes (MUC5B, TERT, TERC, RTEL1, PARN, SFTPC, and SFTPA2) and with variants in at least 11 novel loci. We have previously found that 1) a common gain-of-function promoter variant in MUC5B rs35705950 is the strongest risk factor (genetic and otherwise), accounting for 30-35% of the risk of developing IPF, a disease that was previously considered idiopathic; 2) the MUC5B promoter variant can potentially be used to identify individuals with preclinical pulmonary fibrosis and is predictive of radiologic progression of preclinical pulmonary fibrosis; and 3) MUC5B may be involved in the pathogenesis of pulmonary fibrosis with MUC5B message and protein expressed in bronchiolo-alveolar epithelia of IPF and the characteristic IPF honeycomb cysts. Based on these considerations, we hypothesize that excessive production of MUC5B either enhances injury due to reduced mucociliary clearance or impedes repair consequent to disruption of normal regenerative mechanisms in the distal lung. In aggregate, these novel considerations should have broad impact, resulting in specific etiologic targets, early detection of disease, and novel biologic pathways for use in the design of future intervention, prevention, and mechanistic studies of IPF.
Asunto(s)
Bronquiolos/fisiopatología , Fibrosis Pulmonar Idiopática/genética , Mucina 5B/genética , Depuración Mucociliar/genética , Alveolos Pulmonares/fisiopatología , Animales , Predisposición Genética a la Enfermedad , Humanos , Fibrosis Pulmonar Idiopática/fisiopatología , Mucosa Respiratoria/fisiopatologíaRESUMEN
Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are "driving" the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.
Asunto(s)
Fibrosis Pulmonar Idiopática , Bosques Aleatorios , Humanos , Frecuencia de los Genes , Análisis de Secuencia de ADN , Proteínas Activadoras de GTPasaRESUMEN
BACKGROUND: Idiopathic pulmonary fibrosis (IPF) is a heterogeneous disease that is pathologically characterized by areas of normal-appearing lung parenchyma, active fibrosis (transition zones including fibroblastic foci) and dense fibrosis. Defining transcriptional differences between these pathologically heterogeneous regions of the IPF lung is critical to understanding the distribution and extent of fibrotic lung disease and identifying potential therapeutic targets. Application of a spatial transcriptomics platform would provide more detailed spatial resolution of transcriptional signals compared to previous single cell or bulk RNA-Seq studies. METHODS: We performed spatial transcriptomics using GeoMx Nanostring Digital Spatial Profiling on formalin-fixed paraffin-embedded (FFPE) tissue from 32 IPF and 12 control subjects and identified 231 regions of interest (ROIs). We compared normal-appearing lung parenchyma and airways between IPF and controls with histologically normal lung tissue, as well as histologically distinct regions within IPF (normal-appearing lung parenchyma, transition zones containing fibroblastic foci, areas of dense fibrosis, and honeycomb epithelium metaplasia). RESULTS: We identified 254 differentially expressed genes (DEGs) between IPF and controls in histologically normal-appearing regions of lung parenchyma; pathway analysis identified disease processes such as EIF2 signaling (important for cap-dependent mRNA translation), epithelial adherens junction signaling, HIF1α signaling, and integrin signaling. Within IPF, we identified 173 DEGs between transition and normal-appearing lung parenchyma and 198 DEGs between dense fibrosis and normal lung parenchyma; pathways dysregulated in both transition and dense fibrotic areas include EIF2 signaling pathway activation (upstream of endoplasmic reticulum (ER) stress proteins ATF4 and CHOP) and wound healing signaling pathway deactivation. Through cell deconvolution of transcriptome data and immunofluorescence staining, we confirmed loss of alveolar parenchymal signals (AGER, SFTPB, SFTPC), gain of secretory cell markers (SCGB3A2, MUC5B) as well as dysregulation of the upstream regulator ATF4, in histologically normal-appearing tissue in IPF. CONCLUSIONS: Our findings demonstrate that histologically normal-appearing regions from the IPF lung are transcriptionally distinct when compared to similar lung tissue from controls with histologically normal lung tissue, and that transition zones and areas of dense fibrosis within the IPF lung demonstrate activation of ER stress and deactivation of wound healing pathways.
Asunto(s)
Factor 2 Eucariótico de Iniciación , Fibrosis Pulmonar Idiopática , Humanos , Factor 2 Eucariótico de Iniciación/genética , Factor 2 Eucariótico de Iniciación/metabolismo , Fibrosis Pulmonar Idiopática/metabolismo , Pulmón/metabolismo , Transcriptoma , FibrosisRESUMEN
Rationale: Common genetic variants have been associated with idiopathic pulmonary fibrosis (IPF). Objectives: To determine functional relevance of the 10 IPF-associated common genetic variants we previously identified. Methods: We performed expression quantitative trait loci (eQTL) and methylation quantitative trait loci (mQTL) mapping, followed by co-localization of eQTL and mQTL with genetic association signals and functional validation by luciferase reporter assays. Illumina multi-ethnic genotyping arrays, mRNA sequencing, and Illumina 850k methylation arrays were performed on lung tissue of participants with IPF (234 RNA and 345 DNA samples) and non-diseased controls (188 RNA and 202 DNA samples). Measurements and Main Results: Focusing on genetic variants within 10 IPF-associated genetic loci, we identified 27 eQTLs in controls and 24 eQTLs in cases (false-discovery-rate-adjusted P < 0.05). Among these signals, we identified associations of lead variants rs35705950 with expression of MUC5B and rs2076295 with expression of DSP in both cases and controls. mQTL analysis identified CpGs in gene bodies of MUC5B (cg17589883) and DSP (cg08964675) associated with the lead variants in these two loci. We also demonstrated strong co-localization of eQTL/mQTL and genetic signal in MUC5B (rs35705950) and DSP (rs2076295). Functional validation of the mQTL in MUC5B using luciferase reporter assays demonstrates that the CpG resides within a putative internal repressor element. Conclusions: We have established a relationship of the common IPF genetic risk variants rs35705950 and rs2076295 with respective changes in MUC5B and DSP expression and methylation. These results provide additional evidence that both MUC5B and DSP are involved in the etiology of IPF.
Asunto(s)
Fibrosis Pulmonar Idiopática , Humanos , ADN , Metilación de ADN/genética , Expresión Génica , Predisposición Genética a la Enfermedad/genética , Fibrosis Pulmonar Idiopática/genética , Mucina 5B/genética , Sitios de Carácter Cuantitativo/genética , ARNRESUMEN
BACKGROUND: Refractory asthma (RA) remains poorly controlled, resulting in high health care utilization despite guideline-based therapies. Patients with RA manifest higher neutrophilia as a result of increased airway inflammation and subclinical infection, the underlying mechanisms of which remain unclear. OBJECTIVE: We sought to characterize and clinically correlate gene expression differences between refractory and nonrefractory (NR) asthma to uncover molecular mechanisms driving group distinctions. METHODS: Microarray gene expression of paired airway epithelial brush and endobronchial biopsy samples was compared between 60 RA and 30 NR subjects. Subjects were hierarchically clustered to identify subgroups of RA, and biochemical and clinical traits (airway inflammatory molecules, respiratory pathogens, chest imaging) were compared between groups. Weighted gene correlation network analysis was used to identify coexpressed gene modules. Module expression scores were compared between groups using linear regression, controlling for age, sex, and body mass index. RESULTS: Differential gene expression analysis showed upregulation of proneutrophilic and downregulation of ciliary function genes/pathways in RA compared to NR. A subgroup of RA with downregulated ciliary gene expression had increased levels of subclinical infections, airway neutrophilia, and eosinophilia as well as higher chest imaging mucus burden compared to other RA, the dominant differences between RA and NR. Weighted gene correlation network analysis identified gene modules related to ciliary function, which were downregulated in RA and were associated with lower pulmonary function and higher airway wall thickness/inflammation, markers of poorer asthma control. CONCLUSIONS: Identification of a novel ciliary-deficient subgroup of RA suggests that diminished mucociliary clearance may underlie repeated asthma exacerbations despite adequate treatment, necessitating further exploration of function, mechanism, and therapeutics.
Asunto(s)
Asma , Asma/metabolismo , Biomarcadores , Broncoscopía , Humanos , Inflamación/metabolismo , Pulmón/patología , Depuración MucociliarRESUMEN
Chronic beryllium disease (CBD) is a Th1 granulomatous lung disease preceded by sensitization to beryllium (BeS). We profiled the methylome, transcriptome, and selected proteins in the lung to identify molecular signatures and networks associated with BeS and CBD. BAL cell DNA and RNA were profiled using microarrays from CBD (n = 30), BeS (n = 30), and control subjects (n = 12). BAL fluid proteins were measured using Olink Immune Response Panel proteins from CBD (n = 22) and BeS (n = 22) subjects. Linear models identified features associated with CBD, adjusting for covariation and batch effects. Multiomic integration methods identified correlated features between datasets. We identified 1,546 differentially expressed genes in CBD versus control subjects and 204 in CBD versus BeS. Of the 101 shared transcripts, 24 have significant cis relationships between gene expression and DNA methylation, assessed using expression quantitative trait methylation analysis, including genes not previously identified in CBD. A multiomic model of top DNA methylation and gene expression features demonstrated that the first component separated CBD from other samples and the second component separated control subjects from remaining samples. The top features on component one were enriched for T-lymphocyte function, and the top features on component two were enriched for innate immune signaling. We identified six differentially abundant proteins in CBD versus BeS, with two (SIT1 and SH2D1A) selected as important RNA features in the multiomic model. Our integrated analysis of DNA methylation, gene expression, and proteins in the lung identified multiomic signatures of CBD that differentiated it from BeS and control subjects.
Asunto(s)
Beriliosis , Humanos , Beriliosis/genética , Linfocitos T , Lavado Broncoalveolar , Líquido del Lavado Bronquioalveolar , Inmunidad Innata/genética , ARN , Enfermedad CrónicaRESUMEN
BACKGROUND: Most phenotyping paradigms in sarcoidosis are based on expert opinion; however, no paradigm has been widely adopted because of the subjectivity in classification. We hypothesized that cluster analysis could be performed on common clinical variables to define more objective sarcoidosis phenotypes. METHODS: We performed a retrospective cohort study of 554 sarcoidosis cases to identify distinct phenotypes of sarcoidosis based on 29 clinical features. Model-based clustering was performed using the VarSelLCM R package and the Integrated Completed Likelihood (ICL) criteria were used to estimate number of clusters. To identify features associated with cluster membership, features were ranked based on variable importance scores from the VarSelLCM model, and additional univariate tests (Fisher's exact test and one-way ANOVA) were performed using q-values correcting for multiple testing. The Wasfi severity score was also compared between clusters. RESULTS: Cluster analysis resulted in 6 sarcoidosis phenotypes. Salient characteristics for each cluster are as follows: Phenotype (1) supranormal lung function and majority Scadding stage 2/3; phenotype (2) supranormal lung function and majority Scadding stage 0/1; phenotype (3) normal lung function and split Scadding stages between 0/1 and 2/3; phenotype (4) obstructive lung function and majority Scadding stage 2/3; phenotype (5) restrictive lung function and majority Scadding stage 2/3; phenotype (6) mixed obstructive and restrictive lung function and mostly Scadding stage 4. Although there were differences in the percentages, all Scadding stages were encompassed by all of the phenotypes, except for phenotype 1, in which none were Scadding stage 4. Clusters 4, 5, 6 were significantly more likely to have ever been on immunosuppressive treatment and had higher Wasfi disease severity scores. CONCLUSIONS: Cluster analysis produced 6 sarcoidosis phenotypes that demonstrated less severe and severe phenotypes. Phenotypes 1, 2, 3 have less lung function abnormalities, a lower percentage on immunosuppressive treatment and lower Wasfi severity scores. Phenotypes 4, 5, 6 were characterized by lung function abnormalities, more parenchymal abnormalities, an increased percentage on immunosuppressive treatment and higher Wasfi severity scores. These data support using cluster analysis as an objective and clinically useful way to phenotype sarcoidosis subjects and to empower clinicians to identify those with more severe disease versus those who have less severe disease, independent of Scadding stage.
Asunto(s)
Sarcoidosis , Análisis por Conglomerados , Humanos , Fenotipo , Estudios Retrospectivos , Sarcoidosis/diagnóstico , Sarcoidosis/epidemiología , Sarcoidosis/genética , Índice de Severidad de la EnfermedadRESUMEN
BACKGROUND: As the cost of RNA-sequencing decreases, complex study designs, including paired, longitudinal, and other correlated designs, become increasingly feasible. These studies often include multiple hypotheses and thus multiple degree of freedom tests, or tests that evaluate multiple hypotheses jointly, are often useful for filtering the gene list to a set of interesting features for further exploration while controlling the false discovery rate. Though there are several methods which have been proposed for analyzing correlated RNA-sequencing data, there has been little research evaluating and comparing the performance of multiple degree of freedom tests across methods. METHODS: We evaluated 11 different methods for modelling correlated RNA-sequencing data by performing a simulation study to compare the false discovery rate, power, and model convergence rate across several hypothesis tests and sample size scenarios. We also applied each method to a real longitudinal RNA-sequencing dataset. RESULTS: Linear mixed modelling using transformed data had the best false discovery rate control while maintaining relatively high power. However, this method had high model non-convergence, particularly at small sample sizes. No method had high power at the lowest sample size. We found a mix of conservative and anti-conservative behavior across the other methods, which was influenced by the sample size and the hypothesis being evaluated. The patterns observed in the simulation study were largely replicated in the analysis of a longitudinal study including data from intensive care unit patients experiencing cardiogenic or septic shock. CONCLUSIONS: Multiple degree of freedom testing is a valuable tool in longitudinal and other correlated RNA-sequencing experiments. Of the methods that we investigated, linear mixed modelling had the best overall combination of power and false discovery rate control. Other methods may also be appropriate in some scenarios.
Asunto(s)
ARN , Proyectos de Investigación , Humanos , Estudios Longitudinales , ARN/genética , Tamaño de la Muestra , Análisis de Secuencia de ARN/métodosRESUMEN
OBJECTIVES: Human leukocyte antigen-DP beta 1 (HLA-DPB1) with a glutamic acid at the 69th position of the ß chain (E69) genotype and inhalational beryllium exposure individually contribute to risk of chronic beryllium disease (CBD) and beryllium sensitisation (BeS) in exposed individuals. This retrospective nested case-control study assessed the contribution of genetics and exposure in the development of BeS and CBD. METHODS: Workers with BeS (n=444), CBD (n=449) and beryllium-exposed controls (n=890) were enrolled from studies conducted at nuclear weapons and primary beryllium manufacturing facilities. Lifetime-average beryllium exposure estimates were based on workers' job questionnaires and historical and industrial hygienist exposure estimates, blinded to genotype and case status. Genotyping was performed using sequence-specific primer-PCR. Logistic regression models were developed allowing for over-dispersion, adjusting for workforce, race, sex and ethnicity. RESULTS: Having no E69 alleles was associated with lower odds of both CBD and BeS; every additional E69 allele increased odds for CBD and BeS. Increasing exposure was associated with lower odds of BeS. CBD was not associated with exposure as compared to controls, yet the per cent of individuals with CBD versus BeS increased with increasing exposure. No evidence of a gene-by-exposure interaction was found for CBD or BeS. CONCLUSIONS: Risk of CBD increases with E69 allele frequency and increasing exposure, although no gene by environment interaction was found. A decreased risk of BeS with increasing exposure and lack of exposure response in CBD cases may be due to the limitations of reconstructed exposure estimates. Although reducing exposure may not prevent BeS, it may reduce CBD and the associated health effects, especially in those carrying E69 alleles.
Asunto(s)
Beriliosis/genética , Berilio/toxicidad , Cadenas beta de HLA-DP/genética , Exposición Profesional/efectos adversos , Beriliosis/epidemiología , Estudios de Casos y Controles , Enfermedad Crónica , Femenino , Genotipo , Humanos , Masculino , Polimorfismo Genético , Estudios RetrospectivosRESUMEN
INTRODUCTION: Studies that examine the role of rare variants in both simple and complex disease are increasingly common. Though the usual approach of testing rare variants in aggregate sets is more powerful than testing individual variants, it is of interest to identify the variants that are plausible drivers of the association. We present a novel method for prioritization of rare variants after a significant aggregate test by quantifying the influence of the variant on the aggregate test of association. METHODS: In addition to providing a measure used to rank variants, we use outlier detection methods to present the computationally efficient Rare Variant Influential Filtering Tool (RIFT) to identify a subset of variants that influence the disease association. We evaluated several outlier detection methods that vary based on the underlying variance measure: interquartile range (Tukey fences), median absolute deviation, and SD. We performed 1,000 simulations for 50 regions of size 3 kb and compared the true and false positive rates. We compared RIFT using the Inner Tukey to 2 existing methods: adaptive combination of p values (ADA) and a Bayesian hierarchical model (BeviMed). Finally, we applied this method to data from our targeted resequencing study in idiopathic pulmonary fibrosis (IPF). RESULTS: All outlier detection methods observed higher sensitivity to detect uncommon variants (0.001 < minor allele frequency, MAF > 0.03) compared to very rare variants (MAF <0.001). For uncommon variants, RIFT had a lower median false positive rate compared to the ADA. ADA and RIFT had significantly higher true positive rates than that observed for BeviMed. When applied to 2 regions found previously associated with IPF including 100 rare variants, we identified 6 polymorphisms with the greatest evidence for influencing the association with IPF. DISCUSSION: In summary, RIFT has a high true positive rate while maintaining a low false positive rate for identifying polymorphisms influencing rare variant association tests. This work provides an approach to obtain greater resolution of the rare variant signals within significant aggregate sets; this information can provide an objective measure to prioritize variants for follow-up experimental studies and insight into the biological pathways involved.
RESUMEN
Molecular patterns and pathways in idiopathic pulmonary fibrosis (IPF) have been extensively investigated, but few studies have assimilated multiomic platforms to provide an integrative understanding of molecular patterns that are relevant in IPF. Herein, we combine the coding and noncoding transcriptomes, DNA methylomes, and proteomes from IPF and healthy lung tissue to identify molecules and pathways associated with this disease. RNA sequencing, Illumina MethylationEPIC array, and liquid chromatography-mass spectrometry proteomic data were collected on lung tissue from 24 subjects with IPF and 14 control subjects. Significant differential features were identified by using linear models adjusting for age and sex, inflation, and bias when appropriate. Data Integration Analysis for Biomarker Discovery Using a Latent Component Method for Omics Studies was used for integrative multiomic analysis. We identified 4,643 differentially expressed transcripts aligning to 3,439 genes, 998 differentially abundant proteins, 2,500 differentially methylated regions, and 1,269 differentially expressed long noncoding RNAs (lncRNAs) that were significant after correcting for multiple tests (false discovery rate < 0.05). Unsupervised hierarchical clustering using 20 coding mRNA, protein, methylation, and lncRNA features with the highest loadings on the top latent variable from the four data sets demonstrates perfect separation of IPF and control lungs. Our analysis confirmed previously validated molecules and pathways known to be dysregulated in disease and implicated novel molecular features as potential drivers and modifiers of disease. For example, 4 proteins, 18 differentially methylated regions, and 10 lncRNAs were found to have strong correlations (|r| > 0.8) with MMP7 (matrix metalloproteinase 7). Therefore, by using a system biology approach, we have identified novel molecular relationships in IPF.
Asunto(s)
Fibrosis Pulmonar Idiopática/metabolismo , Pulmón/metabolismo , ARN Largo no Codificante/genética , Transcriptoma/fisiología , Anciano , Estudios de Casos y Controles , Femenino , Perfilación de la Expresión Génica/métodos , Humanos , Masculino , Metaloproteinasa 7 de la Matriz/metabolismo , Persona de Mediana Edad , ARN Mensajero/metabolismoRESUMEN
Rationale: Idiopathic pulmonary fibrosis (IPF) is a complex lung disease characterized by scarring of the lung that is believed to result from an atypical response to injury of the epithelium. Genome-wide association studies have reported signals of association implicating multiple pathways including host defense, telomere maintenance, signaling, and cell-cell adhesion.Objectives: To improve our understanding of factors that increase IPF susceptibility by identifying previously unreported genetic associations.Methods: We conducted genome-wide analyses across three independent studies and meta-analyzed these results to generate the largest genome-wide association study of IPF to date (2,668 IPF cases and 8,591 controls). We performed replication in two independent studies (1,456 IPF cases and 11,874 controls) and functional analyses (including statistical fine-mapping, investigations into gene expression, and testing for enrichment of IPF susceptibility signals in regulatory regions) to determine putatively causal genes. Polygenic risk scores were used to assess the collective effect of variants not reported as associated with IPF.Measurements and Main Results: We identified and replicated three new genome-wide significant (P < 5 × 10-8) signals of association with IPF susceptibility (associated with altered gene expression of KIF15, MAD1L1, and DEPTOR) and confirmed associations at 11 previously reported loci. Polygenic risk score analyses showed that the combined effect of many thousands of as yet unreported IPF susceptibility variants contribute to IPF susceptibility.Conclusions: The observation that decreased DEPTOR expression associates with increased susceptibility to IPF supports recent studies demonstrating the importance of mTOR signaling in lung fibrosis. New signals of association implicating KIF15 and MAD1L1 suggest a possible role of mitotic spindle-assembly genes in IPF susceptibility.
Asunto(s)
Fibrosis Pulmonar Idiopática/genética , Anciano , Estudios de Casos y Controles , Proteínas de Ciclo Celular/genética , Femenino , Expresión Génica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Péptidos y Proteínas de Señalización Intracelular/genética , Cinesinas/genética , Masculino , Persona de Mediana Edad , Medición de Riesgo , Transducción de Señal , Huso Acromático , Serina-Treonina Quinasas TOR/metabolismoRESUMEN
Rationale: Interstitial lung abnormalities (ILAs) are associated with the highest genetic risk locus for idiopathic pulmonary fibrosis (IPF); however, the extent to which there are unique associations among individuals with ILAs or additional overlap with IPF is not known.Objectives: To perform a genome-wide association study (GWAS) of ILAs.Methods: ILAs and a subpleural-predominant subtype were assessed on chest computed tomography (CT) scans in the AGES (Age Gene/Environment Susceptibility), COPDGene (Genetic Epidemiology of Chronic Obstructive Pulmonary Disease [COPD]), Framingham Heart, ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points), MESA (Multi-Ethnic Study of Atherosclerosis), and SPIROMICS (Subpopulations and Intermediate Outcome Measures in COPD Study) studies. We performed a GWAS of ILAs in each cohort and combined the results using a meta-analysis. We assessed for overlapping associations in independent GWASs of IPF.Measurements and Main Results: Genome-wide genotyping data were available for 1,699 individuals with ILAs and 10,274 control subjects. The MUC5B (mucin 5B) promoter variant rs35705950 was significantly associated with both ILAs (P = 2.6 × 10-27) and subpleural ILAs (P = 1.6 × 10-29). We discovered novel genome-wide associations near IPO11 (rs6886640, P = 3.8 × 10-8) and FCF1P3 (rs73199442, P = 4.8 × 10-8) with ILAs, and near HTRE1 (rs7744971, P = 4.2 × 10-8) with subpleural-predominant ILAs. These novel associations were not associated with IPF. Among 12 previously reported IPF GWAS loci, five (DPP9, DSP, FAM13A, IVD, and MUC5B) were significantly associated (P < 0.05/12) with ILAs.Conclusions: In a GWAS of ILAs in six studies, we confirmed the association with a MUC5B promoter variant and found strong evidence for an effect of previously described IPF loci; however, novel ILA associations were not associated with IPF. These findings highlight common genetically driven biologic pathways between ILAs and IPF, and also suggest distinct ones.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Fibrosis Pulmonar Idiopática/genética , Enfermedades Pulmonares Intersticiales/genética , Anciano , Estudios de Casos y Controles , Femenino , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Mucina 5B/genética , Polimorfismo de Nucleótido Simple/genética , Regiones Promotoras Genéticas/genética , Proteínas Similares a la Proteína de Unión a TATA-Box , beta Carioferinas/genéticaRESUMEN
INTRODUCTION: When analyzing data from large-scale genetic association studies, such as targeted or genome-wide resequencing studies, it is common to assume a single genetic model, such as dominant or additive, for all tests of association between a given genetic variant and the phenotype. However, for many variants, the chosen model will result in poor model fit and may lack statistical power due to model misspecification. OBJECTIVE: We develop power and sample size calculations for tests of gene and gene × environment interaction, allowing for misspecification of the true mode of genetic susceptibility. METHODS: The power calculations are based on a likelihood ratio test framework and are implemented in an open-source R package ("genpwr"). RESULTS: We use these methods to develop an analysis plan for a resequencing study in idiopathic pulmonary fibrosis and show that using a 2-degree of freedom test can increase power to detect recessive genetic effects while maintaining power to detect dominant and additive effects. CONCLUSIONS: Understanding the impact of model misspecification can aid in study design and developing analysis plans that maximize power to detect a range of true underlying genetic effects. In particular, these calculations help identify when a multiple degree of freedom test or other robust test of association may be advantageous.
RESUMEN
Epigenetic marks are likely to explain variability of response to antigen in granulomatous lung disease. The objective of this study was to identify DNA methylation and gene expression changes associated with chronic beryllium disease (CBD) and sarcoidosis in lung cells obtained by BAL. BAL cells from CBD (n = 8), beryllium-sensitized (n = 8), sarcoidosis (n = 8), and additional progressive sarcoidosis (n = 9) and remitting (n = 15) sarcoidosis were profiled on the Illumina 450k methylation and Affymetrix/Agilent gene expression microarrays. Statistical analyses were performed to identify DNA methylation and gene expression changes associated with CBD, sarcoidosis, and disease progression in sarcoidosis. DNA methylation array findings were validated by pyrosequencing. We identified 52,860 significant (P < 0.005 and q < 0.05) CpGs associated with CBD; 2,726 CpGs near 1,944 unique genes have greater than 25% methylation change. A total of 69% of differentially methylated genes are significantly (q < 0.05) differentially expressed in CBD, with many canonical inverse relationships of methylation and expression in genes critical to T-helper cell type 1 differentiation, chemokines and their receptors, and other genes involved in immunity. Testing of these CBD-associated CpGs in sarcoidosis reveals that methylation changes only approach significance, but are methylated in the same direction, suggesting similarities between the two diseases with more heterogeneity in sarcoidosis that limits power with the current sample size. Analysis of progressive versus remitting sarcoidosis identified 15,215 CpGs (P < 0.005 and q < 0.05), but only 801 of them have greater than 5% methylation change, demonstrating that DNA methylation marks of disease progression changes are more subtle. Our study highlights the significance of epigenetic marks in lung immune response in granulomatous lung disease.