RESUMEN
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
Asunto(s)
Genoma Humano , Genómica , Medicina de Precisión , Diabetes Mellitus Tipo 2/genética , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Metabolómica , Persona de Mediana Edad , Mutación , Proteómica , Virus Sincitiales Respiratorios/aislamiento & purificación , Rhinovirus/aislamiento & purificaciónRESUMEN
SUMMARY: PyIOmica is an open-source Python package focusing on integrating longitudinal multiple omics datasets, characterizing and categorizing temporal trends. The package includes multiple bioinformatics tools including data normalization, annotation, categorization, visualization and enrichment analysis for gene ontology terms and pathways. Additionally, the package includes an implementation of visibility graphs to visualize time series as networks. AVAILABILITY AND IMPLEMENTATION: PyIOmica is implemented as a Python package (pyiomica), available for download and installation through the Python Package Index (https://pypi.python.org/pypi/pyiomica), and can be deployed using the Python import function following installation. PyIOmica has been tested on Mac OS X, Unix/Linux and Microsoft Windows. The application is distributed under an MIT license. Source code for each release is also available for download on Zenodo (https://doi.org/10.5281/zenodo.3548040). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics.
Asunto(s)
Biología Computacional , Programas Informáticos , Ontología de GenesRESUMEN
BACKGROUND: Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects. OBJECTIVE: We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families. METHODS: We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA. RESULTS: Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA. CONCLUSIONS: We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.
Asunto(s)
Síndromes de Inmunodeficiencia/genética , Atresia Intestinal/genética , Intestinos/anomalías , Proteínas/genética , Animales , Preescolar , Exoma/genética , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Ratones , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN Mensajero/metabolismo , Timo/metabolismo , Análisis de Matrices TisularesRESUMEN
NUT carcinoma (NC) is an aggressive cancer with no effective treatment. About 70% of NUT carcinoma is associated with chromosome translocation events that lead to the formation of a BRD4::NUTM1 fusion gene. Because the BRD4::NUTM1 gene is unequivocally cytotoxic when ectopically expressed in cell lines, questions remain on whether the fusion gene can initiate NC. Here, we report the first genetically engineered mouse model for NUT carcinoma that recapitulates the human t(15;19) chromosome translocation in mice. We demonstrated that the mouse t(2;17) syntenic chromosome translocation, forming the Brd4::Nutm1 fusion gene, could induce aggressive carcinomas in mice. The tumors present histopathological and molecular features similar to human NC, with enrichment of undifferentiated cells. Similar to the reports of human NC incidence, Brd4::Nutm1 can induce NC from a broad range of tissues with a strong phenotypical variability. The consistent induction of poorly differentiated carcinoma demonstrated a strong reprogramming activity of BRD4::NUTM1. The new mouse model provided a critical preclinical model for NC that will lead to better understanding and therapy development for NC.
Asunto(s)
Proteínas que Contienen Bromodominio , Proteínas de Neoplasias , Proteínas Nucleares , Proteínas de Fusión Oncogénica , Factores de Transcripción , Animales , Ratones , Carcinoma/genética , Carcinoma/metabolismo , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Modelos Animales de Enfermedad , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Proteínas de Fusión Oncogénica/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Translocación Genética/genéticaRESUMEN
From the early days of spaceflight to current missions, astronauts continue to be exposed to multiple hazards that affect human health, including low gravity, high radiation, isolation during long-duration missions, a closed environment and distance from Earth. Their effects can lead to adverse physiological changes and necessitate countermeasure development and/or longitudinal monitoring. A time-resolved analysis of biological signals can detect and better characterize potential adverse events during spaceflight, ideally preventing them and maintaining astronauts' wellness. Here we provide a time-resolved assessment of the impact of spaceflight on multiple astronauts (n=27) by studying multiple biochemical and immune measurements before, during, and after long-duration orbital spaceflight. We reveal space-associated changes of astronauts' physiology on both the individual level and across astronauts, including associations with bone resorption and kidney function, as well as immune-system dysregulation.
RESUMEN
From the early days of spaceflight to current missions, astronauts continue to be exposed to multiple hazards that affect human health, including low gravity, high radiation, isolation during long-duration missions, a closed environment and distance from Earth. Their effects can lead to adverse physiological changes and necessitate countermeasure development and/or longitudinal monitoring. A time-resolved analysis of biological signals can detect and better characterize potential adverse events during spaceflight, ideally preventing them and maintaining astronauts' wellness. Here we provide a time-resolved assessment of the impact of spaceflight on multiple astronauts (n = 27) by studying multiple biochemical and immune measurements before, during, and after long-duration orbital spaceflight. We reveal space-associated changes of astronauts' physiology on both the individual level and across astronauts, including associations with bone resorption and kidney function, as well as immune-system dysregulation.
RESUMEN
Longitudinal deep multiomics profiling, which combines biomolecular, physiological, environmental and clinical measures data, shows great promise for precision health. However, integrating and understanding the complexity of such data remains a big challenge. Here we utilize an individual-focused bottom-up approach aimed at first assessing single individuals' multiomics time series, and using the individual-level responses to assess multi-individual grouping based directly on similarity of their longitudinal deep multiomics profiles. We used this individual-focused approach to analyze profiles from a study profiling longitudinal responses in type 2 diabetes mellitus. After generating periodograms for individual subject omics signals, we constructed within-person omics networks and analyzed personal-level immune changes. The results identified both individual-level responses to immune perturbation, and the clusters of individuals that have similar behaviors in immune response and which were associated to measures of their diabetic status.
Asunto(s)
Diabetes Mellitus Tipo 2 , Estado Prediabético , Diabetes Mellitus Tipo 2/genética , Humanos , Estado Prediabético/genéticaRESUMEN
Differential Network (DN) analysis is a method that has long been used to interpret changes in gene expression data and provide biological insights. The method identifies the rewiring of gene networks in response to external perturbations. Our study applies the DN method to the analysis of RNA-sequencing (RNA-seq) time series datasets. We focus on expression changes: (i) in saliva of a human subject after pneumococcal vaccination (PPSV23) and (ii) in primary B cells treated ex vivo with a monoclonal antibody drug (Rituximab). The DN method enabled us to identify the activation of biological pathways consistent with the mechanisms of action of the PPSV23 vaccine and target pathways of Rituximab. The community detection algorithm on the DN revealed clusters of genes characterized by collective temporal behavior. All saliva and some B cell DN communities showed characteristic time signatures, outlining a chronological order in pathway activation in response to the perturbation. Moreover, we identified early and delayed responses within network modules in the saliva dataset and three temporal patterns in the B cell data.
RESUMEN
Recent clinical studies report that chromosomal 12q24.31 microdeletions are associated with autism spectrum disorder (ASD) and intellectual disability (ID). However, the causality and underlying mechanisms linking 12q24.31 microdeletions to ASD/ID remain undetermined. Here we show Kdm2b, one gene located in chromosomal 12q24.31, plays a critical role in maintaining neural stem cells (NSCs) in the mouse brain. Loss of the CxxC-ZF domain of KDM2B impairs its function in recruiting Polycomb repressive complex 1 (PRC1) to chromatin, resulting in de-repression of genes involved in cell apoptosis, cell-cycle arrest, NSC senescence, and loss of NSC populations in the brain. Of importance, the Kdm2b mutation is sufficient to induce ASD/ID-like behavioral and memory deficits. Thus, our study reveals a critical role of KDM2B in normal brain development, a causality between the Kdm2b mutation and ASD/ID-like phenotypes in mice, and potential molecular mechanisms linking the function of KDM2B-PRC1 in transcriptional regulation to the 12q24.31 microdeletion-associated ASD/ID.
RESUMEN
Temporal behavior is an essential aspect of all biological systems. Time series have been previously represented as networks. Such representations must address two fundamental problems on how to: (1) Create appropriate networks to reflect the characteristics of biological time series. (2) Detect characteristic dynamic patterns or events as network temporal communities. General community detection methods use metrics comparing the connectivity within a community to random models, or are based on the betweenness centrality of edges or nodes. However, such methods were not designed for network representations of time series. We introduce a visibility-graph-based method to build networks from time series and detect temporal communities within these networks. To characterize unevenly sampled time series (typical of biological experiments), and simultaneously capture events associated to peaks and troughs, we introduce the Weighted Dual-Perspective Visibility Graph (WDPVG). To detect temporal communities in individual signals, we first find the shortest path of the network between start and end nodes, identifying high intensity nodes as the main stem of our community detection algorithm that act as hubs for each community. Then, we aggregate nodes outside the shortest path to the closest nodes found on the main stem based on the closest path length, thereby assigning every node to a temporal community based on proximity to the stem nodes/hubs. We demonstrate the validity and effectiveness of our method through simulation and biological applications.
Asunto(s)
Algoritmos , Características de la Residencia , Simulación por Computador , Bases de Datos como Asunto , Humanos , Factores de TiempoRESUMEN
ASH1L and MLL1 are two histone methyltransferases that facilitate transcriptional activation during normal development. However, the roles of ASH1L and its enzymatic activity in the development of MLL-rearranged leukemias are not fully elucidated in Ash1L gene knockout animal models. In this study, we used an Ash1L conditional knockout mouse model to show that loss of ASH1L in hematopoietic progenitor cells impaired the initiation of MLL-AF9-induced leukemic transformation in vitro. Furthermore, genetic deletion of ASH1L in the MLL-AF9-transformed cells impaired the maintenance of leukemic cells in vitro and largely blocked the leukemia progression in vivo. Importantly, the loss of ASH1L function in the Ash1L-deleted cells could be rescued by wild-type but not the catalytic-dead mutant ASH1L, suggesting the enzymatic activity of ASH1L was required for its function in promoting MLL-AF9-induced leukemic transformation. At the molecular level, ASH1L enhanced the MLL-AF9 target gene expression by directly binding to the gene promoters and modifying the local histone H3K36me2 levels. Thus, our study revealed the critical functions of ASH1L in promoting the MLL-AF9-induced leukemogenesis, which provides a molecular basis for targeting ASH1L and its enzymatic activity to treat MLL-AF9-induced leukemias.
RESUMEN
Autism spectrum disorder (ASD) is a neurodevelopmental disease associated with various gene mutations. Recent genetic and clinical studies report that mutations of the epigenetic gene ASH1L are highly associated with human ASD and intellectual disability (ID). However, the causality and underlying molecular mechanisms linking ASH1L mutations to genesis of ASD/ID remain undetermined. Here we show loss of ASH1L in the developing mouse brain is sufficient to cause multiple developmental defects, core autistic-like behaviors, and impaired cognitive memory. Gene expression analyses uncover critical roles of ASH1L in regulating gene expression during neural cell development. Thus, our study establishes an ASD/ID mouse model revealing the critical function of an epigenetic factor ASH1L in normal brain development, a causality between Ash1L mutations and ASD/ID-like behaviors in mice, and potential molecular mechanisms linking Ash1L mutations to brain functional abnormalities.
Asunto(s)
Trastorno del Espectro Autista/genética , Encéfalo/crecimiento & desarrollo , Encéfalo/metabolismo , Proteínas de Unión al ADN/genética , N-Metiltransferasa de Histona-Lisina/genética , Discapacidad Intelectual/genética , Animales , Trastorno del Espectro Autista/metabolismo , Modelos Animales de Enfermedad , Desarrollo Embrionario/genética , Humanos , Ratones , Ratones Endogámicos C57BL , Ratones NoqueadosRESUMEN
Saliva omics has immense potential for non-invasive diagnostics, including monitoring very young or elderly populations, or individuals in remote locations. In this study, multiple saliva omics from an individual were monitored over three periods (100 timepoints) involving: (1) hourly sampling over 24 h without intervention, (2) hourly sampling over 24 h including immune system activation using the standard 23-valent pneumococcal polysaccharide vaccine, (3) daily sampling for 33 days profiling the post-vaccination response. At each timepoint total saliva transcriptome and proteome, and small RNA from salivary extracellular vesicles were profiled, including mRNA, miRNA, piRNA and bacterial RNA. The two 24-h periods were used in a paired analysis to remove daily variation and reveal vaccination responses. Over 18,000 omics longitudinal series had statistically significant temporal trends compared to a healthy baseline. Various immune response and regulation pathways were activated following vaccination, including interferon and cytokine signaling, and MHC antigen presentation. Immune response timeframes were concordant with innate and adaptive immunity development, and coincided with vaccination and reported fever. Overall, mRNA results appeared more specific and sensitive (timewise) to vaccination compared to other omics. The results suggest saliva omics can be consistently assessed for non-invasive personalized monitoring and immune response diagnostics.
Asunto(s)
Infecciones Neumocócicas/inmunología , Vacunas Neumococicas/administración & dosificación , Proteoma/efectos de los fármacos , Saliva/metabolismo , Sinusitis/inmunología , Streptococcus pneumoniae/inmunología , Transcriptoma/efectos de los fármacos , Adulto , Humanos , Inmunidad , Estudios Longitudinales , Masculino , Infecciones Neumocócicas/tratamiento farmacológico , Infecciones Neumocócicas/microbiología , Saliva/efectos de los fármacos , Sinusitis/tratamiento farmacológico , Sinusitis/microbiología , Factores de Tiempo , VacunaciónRESUMEN
MathIOmica is a package for bioinformatics, written in the Wolfram language, that provides multiple utilities to facilitate the analysis of longitudinal data generated from omics experiments, including transcriptomics, proteomics, and metabolomics data, as well as any generalized time series. MathIOmica uses Mathematica's notebook interface, wherein users can import longitudinal datasets, carry out quality control and normalization, generate time series, and classify temporal trends. MathIOmica provides spectral methods based on periodograms and autocorrelations for automatically detecting classes of temporal behavior and allowing the user to visualize collective temporal behavior, and also assess biological significance through Gene Ontology and pathway enrichment analyses. MathIOmica's time-series classification methods address common issues including missing data and uneven sampling in measurements. As such, the software is ideally suited for the analysis of experimental data from individualized profiling of subjects, can facilitate analysis of data from the emerging field of individualized health monitoring, and can detect temporal trends that may be associated with adverse health events. In this article, we import a transcriptomics (RNA-sequencing) dataset collected over multiple timepoints and generate time series for each transcript represented in the data. We classify the time series to identify classes of significant temporal trends (using autocorrelations). We assess statistical significance cutoffs in the classification by generating null distributions using randomly resampled time series. We then visualize the significant trends in heatmaps and assess biological significance using enrichment analyses. Finally, we visualize pathway results for statistically significant pathways of interest. © 2019 by John Wiley & Sons, Inc. Basic Protocol: Time series analysis of transcriptomics expression dataset.
Asunto(s)
Bases de Datos Factuales , Genómica/métodos , Programas Informáticos , Regulación de la Expresión Génica , Humanos , FN-kappa B/metabolismo , Necroptosis/genética , Transducción de Señal , Factores de Tiempo , Transcriptoma/genéticaRESUMEN
Cells release nanometer-scale, lipid bilayer-enclosed biomolecular packages (extracellular vesicles; EVs) into their surrounding environment. EVs are hypothesized to be intercellular communication agents that regulate physiological states by transporting biomolecules between near and distant cells. The research community has consistently advocated for the importance of RNA contents in EVs by demonstrating that: (1) EV-related RNA contents can be detected in a liquid biopsy, (2) disease states significantly alter EV-related RNA contents, and (3) sensitive and specific liquid biopsies can be implemented in precision medicine settings by measuring EV-derived RNA contents. Furthermore, EVs have medical potential beyond diagnostics. Both natural and engineered EVs are being investigated for therapeutic applications such as regenerative medicine and as drug delivery agents. This review focuses specifically on EV characterization, analysis of their RNA content, and their functional implications. The NIH extracellular RNA communication (ERC) program has catapulted human EV research from an RNA profiling standpoint by standardizing the pipeline for working with EV transcriptomics data, and creating a centralized database for the scientific community. There are currently thousands of RNA-sequencing profiles hosted on the Extracellular RNA Atlas alone (Murillo et al., 2019), encompassing a variety of human biofluid types and health conditions. While a number of significant discoveries have been made through these studies individually, integrative analyses of these data have thus far been limited. A primary focus of the ERC program over the next five years is to bring higher resolution tools to the EV research community so that investigators can isolate and analyze EV sub-populations, and ultimately single EVs sourced from discrete cell types, tissues, and complex biofluids. Higher resolution techniques will be essential for evaluating the roles of circulating EVs at a level which impacts clinical decision making. We expect that advances in microfluidic technologies will drive near-term innovation and discoveries about the diverse RNA contents of EVs. Long-term translation of EV-based RNA profiling into a mainstay medical diagnostic tool will depend upon identifying robust patterns of circulating genetic material that correlate with a change in health status.
RESUMEN
The recruitment of Polycomb repressive complex 2 (PRC2) to gene promoters is critical for its function in repressing gene expression in murine embryonic stem cells (mESCs). However, previous studies have demonstrated that although the expression of early lineage-specific genes is largely repressed, the genome-wide PRC2 occupancy is unexpectedly reduced in naive mESCs. In this study, we provide evidence that fibroblast growth factor/extracellular signal-regulated kinase signaling determines the global PRC2 occupancy through regulating the expression of PRC2-recruiting factor JARID2 in naive mESCs. At the transcriptional level, the de-repression of bivalent genes is predominantly determined by the presence of cell signaling-associated transcription factors but not the status of PRC2 occupancy at gene promoters. Hence, this study not only reveals a key molecular mechanism by which cell signaling regulates the PRC2 occupancy in mESCs but also elucidates the functional roles of transcription factors and Polycomb-mediated epigenetic mechanisms in transcriptional regulation.
RESUMEN
In 2019 it is estimated that more than 21,000 new acute myeloid leukemia (AML) patients will be diagnosed in the United States, and nearly 11,000 are expected to die from the disease. AML is primarily diagnosed among the elderly (median 68 years old at diagnosis). Prognoses have significantly improved for younger patients, but as much as 70% of patients over 60 years old will die within a year of diagnosis. In this study, we conducted a reanalysis of 2,213 acute myeloid leukemia patients compared to 548 healthy individuals, using curated publicly available microarray gene expression data. We carried out an analysis of normalized batch corrected data, using a linear model that included considerations for disease, age, sex, and tissue. We identified 974 differentially expressed probe sets and 4 significant pathways associated with AML. Additionally, we identified 375 age- and 70 sex-related probe set expression signatures relevant to AML. Finally, we trained a k nearest neighbors model to classify AML and healthy subjects with 90.9% accuracy. Our findings provide a new reanalysis of public datasets, that enabled the identification of new gene sets relevant to AML that can potentially be used in future experiments and possible stratified disease diagnostics.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Perfilación de la Expresión Génica , Regulación Leucémica de la Expresión Génica , Leucemia Mieloide Aguda , Transcriptoma , Adulto , Anciano , Femenino , Humanos , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/metabolismo , Masculino , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Estados UnidosRESUMEN
Alzheimer's disease (AD) has been categorized by the Centers for Disease Control and Prevention (CDC) as the 6th leading cause of death in the United States. AD is a significant health-care burden because of its increased occurrence (specifically in the elderly population), and the lack of effective treatments and preventive methods. With an increase in life expectancy, the CDC expects AD cases to rise to 15 million by 2060. Aging has been previously associated with susceptibility to AD, and there are ongoing efforts to effectively differentiate between normal and AD age-related brain degeneration and memory loss. AD targets neuronal function and can cause neuronal loss due to the buildup of amyloid-beta plaques and intracellular neurofibrillary tangles. Our study aims to identify temporal changes within gene expression profiles of healthy controls and AD subjects. We conducted a meta-analysis using publicly available microarray expression data from AD and healthy cohorts. For our meta-analysis, we selected datasets that reported donor age and gender, and used Affymetrix and Illumina microarray platforms (8 datasets, 2,088 samples). Raw microarray expression data were re-analyzed, and normalized across arrays. We then performed an analysis of variance, using a linear model that incorporated age, tissue type, sex, and disease state as effects, as well as study to account for batch effects, and included binary interactions between factors. Our results identified 3,735 statistically significant (Bonferroni adjusted p < 0.05) gene expression differences between AD and healthy controls, which we filtered for biological effect (10% two-tailed quantiles of mean differences between groups) to obtain 352 genes. Interesting pathways identified as enriched comprised of neurodegenerative diseases pathways (including AD), and also mitochondrial translation and dysfunction, synaptic vesicle cycle and GABAergic synapse, and gene ontology terms enrichment in neuronal system, transmission across chemical synapses and mitochondrial translation. Overall our approach allowed us to effectively combine multiple available microarray datasets and identify gene expression differences between AD and healthy individuals including full age and tissue type considerations. Our findings provide potential gene and pathway associations that can be targeted to improve AD diagnostics and potentially treatment or prevention.
RESUMEN
Chronic obstructive pulmonary disease (COPD) was classified by the Centers for Disease Control and Prevention in 2014 as the 3rd leading cause of death in the United States (US). The main cause of COPD is exposure to tobacco smoke and air pollutants. Problems associated with COPD include under-diagnosis of the disease and an increase in the number of smokers worldwide. The goal of our study is to identify disease variability in the gene expression profiles of COPD subjects compared to controls, by reanalyzing pre-existing, publicly available microarray expression datasets. Our inclusion criteria for microarray datasets selected for smoking status, age and sex of blood donors reported. Our datasets used Affymetrix, Agilent microarray platforms (7 datasets, 1,262 samples). We re-analyzed the curated raw microarray expression data using R packages, and used Box-Cox power transformations to normalize datasets. To identify significant differentially expressed genes we used generalized least squares models with disease state, age, sex, smoking status and study as effects that also included binary interactions, followed by likelihood ratio tests (LRT). We found 3,315 statistically significant (Storey-adjusted q-value <0.05) differentially expressed genes with respect to disease state (COPD or control). We further filtered these genes for biological effect using results from LRT q-value <0.05 and model estimates' 10% two-tailed quantiles of mean differences between COPD and control), to identify 679 genes. Through analysis of disease, sex, age, and also smoking status and disease interactions we identified differentially expressed genes involved in a variety of immune responses and cell processes in COPD. We also trained a logistic regression model using the common array genes as features, which enabled prediction of disease status with 81.7% accuracy. Our results give potential for improving the diagnosis of COPD through blood and highlight novel gene expression disease signatures.
Asunto(s)
Minería de Datos , Enfermedad Pulmonar Obstructiva Crónica/epidemiología , Transcriptoma/genética , Factores de Edad , Contaminantes Atmosféricos/efectos adversos , Biomarcadores/metabolismo , Conjuntos de Datos como Asunto , Regulación hacia Abajo , Femenino , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Modelos Logísticos , Aprendizaje Automático , Masculino , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/etiología , Enfermedad Pulmonar Obstructiva Crónica/genética , Medición de Riesgo/métodos , Factores de Riesgo , Factores Sexuales , Fumar/efectos adversos , Fumar/epidemiología , Estados Unidos/epidemiología , Regulación hacia ArribaRESUMEN
Influenza, a communicable disease, affects thousands of people worldwide. Young children, elderly, immunocompromised individuals and pregnant women are at higher risk for being infected by the influenza virus. Our study aims to highlight differentially expressed genes in influenza disease compared to influenza vaccination, including variability due to age and sex. To accomplish our goals, we conducted a meta-analysis using publicly available microarray expression data. Our inclusion criteria included subjects with influenza, subjects who received the influenza vaccine and healthy controls. We curated 18 microarray datasets for a total of 3,481 samples (1,277 controls, 297 influenza infection, 1,907 influenza vaccination). We pre-processed the raw microarray expression data in R using packages available to pre-process Affymetrix and Illumina microarray platforms. We used a Box-Cox power transformation of the data prior to our down-stream analysis to identify differentially expressed genes. Statistical analyses were based on linear mixed effects model with all study factors and successive likelihood ratio tests (LRT) to identify differentially-expressed genes. We filtered LRT results by disease (Bonferroni adjusted p < 0.05) and used a two-tailed 10% quantile cutoff to identify biologically significant genes. Furthermore, we assessed age and sex effects on the disease genes by filtering for genes with a statistically significant (Bonferroni adjusted p < 0.05) interaction between disease and age, and disease and sex. We identified 4,889 statistically significant genes when we filtered the LRT results by disease factor, and gene enrichment analysis (gene ontology and pathways) included innate immune response, viral process, defense response to virus, Hematopoietic cell lineage and NF-kappa B signaling pathway. Our quantile filtered gene lists comprised of 978 genes each associated with influenza infection and vaccination. We also identified 907 and 48 genes with statistically significant (Bonferroni adjusted p < 0.05) disease-age and disease-sex interactions, respectively. Our meta-analysis approach highlights key gene signatures and their associated pathways for both influenza infection and vaccination. We also were able to identify genes with an age and sex effect. This gives potential for improving current vaccines and exploring genes that are expressed equally across ages when considering universal vaccinations for influenza.