RESUMO
The study of gene expression variability, especially for cancer and cell differentiation studies, has become important. Here, we investigate transcriptome-wide scatter of 23 cell types and conditions across different levels of biological complexity. We focused on genes that act like toggle switches between pairwise replicates of the same cell type, i.e. genes expressed in one replicate and not expressed in the other, sometimes also referred as ON/OFF genes. The proportion of these toggle genes dramatically increases from unicellular to multicellular organization, especially for development and cancer cells. A relevant portion of toggle switches are non-coding genes: in unicellular systems the most represented classes are tRNA and rRNA, while multicellular systems more frequently show lncRNA, sncRNA and pseudogenes. Notably, disease associated microRNAs (miRNAs), pseudogenes and numerous uncharacterized transcripts are present in both development and cancer cells. On top of the known intrinsic and extrinsic factors, our work indicates toggle genes as a novel collective component creating transcriptome-wide variability. This requires further investigation for elucidating both evolutionary and disease processes.
Assuntos
MicroRNAs , Neoplasias , RNA Longo não Codificante , Diferenciação Celular , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética , TranscriptomaRESUMO
Gene expression profiling techniques, such as DNA microarray and RNA-Sequencing, have provided significant impact on our understanding of biological systems. They contribute to almost all aspects of biomedical research, including studying developmental biology, host-parasite relationships, disease progression and drug effects. However, the high-throughput data generations present challenges for many wet experimentalists to analyze and take full advantage of such rich and complex data. Here we present GeneCloudOmics, an easy-to-use web server for high-throughput gene expression analysis that extends the functionality of our previous ABioTrans with several new tools, including protein datasets analysis, and a web interface. GeneCloudOmics allows both microarray and RNA-Seq data analysis with a comprehensive range of data analytics tools in one package that no other current standalone software or web-based tool can do. In total, GeneCloudOmics provides the user access to 23 different data analytical and bioinformatics tasks including reads normalization, scatter plots, linear/non-linear correlations, PCA, clustering (hierarchical, k-means, t-SNE, SOM), differential expression analyses, pathway enrichments, evolutionary analyses, pathological analyses, and protein-protein interaction (PPI) identifications. Furthermore, GeneCloudOmics allows the direct import of gene expression data from the NCBI Gene Expression Omnibus database. The user can perform all tasks rapidly through an intuitive graphical user interface that overcomes the hassle of coding, installing tools/packages/libraries and dealing with operating systems compatibility and version issues, complications that make data analysis tasks challenging for biologists. Thus, GeneCloudOmics is a one-stop open-source tool for gene expression data analysis and visualization. It is freely available at http://combio-sifbi.org/GeneCloudOmics.
RESUMO
Differential expressed (DE) genes analysis is valuable for understanding comparative transcriptomics between cells, conditions or time evolution. However, the predominant way of identifying DE genes is to use arbitrary threshold fold or expression changes as cutoff. Here, we developed a more objective method, Scatter Overlay or ScatLay, to extract and graphically visualize DE genes across any two samples by utilizing their pair-wise scatter or transcriptome-wide noise, while factoring replicate variabilities. We tested ScatLay for 3 cell types: between time points for Escherichia coli aerobiosis and Saccharomyces cerevisiae hypoxia, and between untreated and Etomoxir treated Mus Musculus embryonic stem cell. As a result, we obtain 1194, 2061 and 2932 DE genes, respectively. Next, we compared these data with two widely used current approaches (DESeq2 and NOISeq) with typical twofold expression changes threshold, and show that ScatLay reveals significantly larger number of DE genes. Hence, our method provides a wider coverage of DE genes, and will likely pave way for finding more novel regulatory genes in future works.
Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Transcriptoma , Animais , Hipóxia Celular , Gráficos por Computador , Células-Tronco Embrionárias/metabolismo , Inibidores Enzimáticos/farmacologia , Compostos de Epóxi/farmacologia , Escherichia coli/metabolismo , Perfilação da Expressão Gênica , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Análise de Componente Principal , Linguagens de Programação , Saccharomyces cerevisiae/metabolismo , Espalhamento de Radiação , Biologia de SistemasRESUMO
For any dynamical system, like living organisms, an attractor state is a set of variables or mechanisms that converge towards a stable system behavior despite a wide variety of initial conditions. Here, using multi-dimensional statistics, we investigate the global gene expression attractor mechanisms shaping anaerobic to aerobic state transition (AAT) of Escherichia coli in a bioreactor at early times. Out of 3,389 RNA-Seq expression changes over time, we identified 100 sharply changing genes that are key for guiding 1700 genes into the AAT attractor basin. Collectively, these genes were named as attractor genes constituting of 6 dynamic clusters. Apart from the expected anaerobic (glycolysis), aerobic (TCA cycle) and fermentation (succinate pathways) processes, sulphur metabolism, ribosome assembly and amino acid transport mechanisms together with 332 uncharacterised genes are also key for AAT. Overall, our work highlights the importance of multi-dimensional statistical analyses for revealing novel processes shaping AAT.
Assuntos
Aerobiose/genética , Escherichia coli/metabolismo , Transcriptoma , Aerobiose/fisiologia , Anaerobiose/genética , Anaerobiose/fisiologia , Escherichia coli/genética , Escherichia coli/fisiologia , Perfilação da Expressão Gênica , Regulação Bacteriana da Expressão Gênica/genética , Regulação Bacteriana da Expressão Gênica/fisiologia , Genes Bacterianos/fisiologia , Transcriptoma/genéticaRESUMO
Here we report a bio-statistical/informatics tool, ABioTrans, developed in R for gene expression analysis. The tool allows the user to directly read RNA-Seq data files deposited in the Gene Expression Omnibus or GEO database. Operated using any web browser application, ABioTrans provides easy options for multiple statistical distribution fitting, Pearson and Spearman rank correlations, PCA, k-means and hierarchical clustering, differential expression (DE) analysis, Shannon entropy and noise (square of coefficient of variation) analyses, as well as Gene ontology classifications.
RESUMO
BACKGROUND: There is a paucity of data regarding risk factors associated with suboptimal breastfeeding practices in urbanized areas of low-middle income countries (LMICs). METHODS: Through a large prospective birth cohort, which enrolled 6706 infants in Vietnam between 2009 and 2013, we investigated the practice of exclusive breastfeeding during hospital stay in urban and semi-rural populations and aimed to identify factors associated with suboptimal breastfeeding practices. Univariate and multivariable logistic regression were performed to determine factors associated with not exclusive breastfeeding during hospital stay. RESULTS: Of 6076 mothers, 33% (2187) breastfed their infant exclusively before hospital discharge; 9% (364/4248) in urban and 74% (1823/2458) in semi-rural areas. Exclusive breastfeeding up to 4 months was recorded in 15% (959/6210) of participants; this declined to < 1% (56/6093) at 6 months. Delivery by Caesarean section (Odds Ratio [OR] 0.07; 95% Confidence Interval [CI] 0.04, 0.11 and OR 0.05; 95% CI 0.03, 0.08) and neonatal complications (OR 0.2; 95% CI 0.07, 0.47 and OR 0.25; 95% CI 0.14, 0.46) were common and highly significant risk factors associated with a lack of exclusive breastfeeding during hospital stay in urban and semi-rural settings, respectively. CONCLUSIONS: To our knowledge, this is the first large-scale investigation aimed at identifying factors associated with exclusive breastfeeding during hospital stay in Vietnam. Breastfeeding promotion strategies should prioritize common risk factors in hospital, such as Caesarean section and neonatal complications, and other location specific factors associated with socioeconomics.