RESUMO
MOTIVATION: The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. RESULTS: In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'â(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that â(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. AVAILABILITY: Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip.
Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Seleção Genética , Arabidopsis/genética , Genes de PlantasRESUMO
BACKGROUND: The identification of protein coding elements in sets of mammalian conserved elements is one of the major challenges in the current molecular biology research. Many features have been proposed for automatically distinguishing coding and non coding conserved sequences, making so necessary a systematic statistical assessment of their differences. A comprehensive study should be composed of an association study, i.e. a comparison of the distributions of the features in the two classes, and a prediction study in which the prediction accuracies of classifiers trained on single and groups of features are analyzed, conditionally to the compared species and to the sequence lengths. RESULTS: In this paper we compared distributions of a set of comparative and non comparative features and evaluated the prediction accuracy of classifiers trained for discriminating sequence elements conserved among human, mouse and rat species. The association study showed that the analyzed features are statistically different in the two classes. In order to study the influence of the sequence lengths on the feature performances, a predictive study was performed on different data sets composed of coding and non coding alignments in equal number and equally long with an ascending average length. We found that the most discriminant feature was a comparative measure indicating the proportion of synonymous nucleotide substitutions per synonymous sites. Moreover, linear discriminant classifiers trained by using comparative features in general outperformed classifiers based on intrinsic ones. Finally, the prediction accuracy of classifiers trained on comparative features increased significantly by adding intrinsic features to the set of input variables, independently on sequence length (Kolmogorov-Smirnov P-value Assuntos
Sequência Conservada
, Fases de Leitura Aberta
, Proteínas/química
, Animais
, Sequência de Bases
, Genômica
, Humanos
, Camundongos
, Ratos
, Análise de Sequência
, Especificidade da Espécie
RESUMO
BACKGROUND: Ulcerative colitis (UC) and Crohn's disease (CD) share some pathogenetic features. To provide new steps on the role of altered gene expression, and the involvement of gene networks, in the pathogenesis of these diseases, we performed a genome-wide analysis in 15 patients with CD and 14 patients with UC by comparing the RNA from inflamed and noninflamed colonic mucosa. METHODS: Two hundred ninety-eight differentially expressed genes in CD and 520 genes in UC were identified. By bioinformatic analyses, 34 pathways for CD, 6 of them enriched in noninflamed and 28 in inflamed tissues, and 19 pathways for UC, 17 in noninflamed and 2 in inflamed tissues, were also highlighted. RESULTS: In CD, the pathways included genes associated with cytokines and cytokine receptors connection, response to external stimuli, activation of cell proliferation or differentiation, cell migration, apoptosis, and immune regulation. In UC, the pathways were associated with genes related to metabolic and catabolic processes, biosynthesis and interconversion processes, leukocyte migration, regulation of cell proliferation, and epithelial-to-mesenchymal transition. CONCLUSIONS: In UC, the pattern of inflammation of colonic mucosa is due to a complex interaction network between host, gut microbiome, and diet, suggesting that bacterial products or endogenous synthetic/catabolic molecules contribute to impairment of the immune response, to breakdown of epithelial barrier, and to enhance the inflammatory process. In patients with CD, genes encoding a large variety of proteins, growth factors, cytokines, chemokines, and adhesion molecules may lead to uncontrolled inflammation with ensuing destruction of epithelial cells, inappropriate stimulation of antimicrobial and T cells differentiation, and inflammasome events.
Assuntos
Colo/metabolismo , Doenças Inflamatórias Intestinais/genética , Mucosa Intestinal/metabolismo , Adulto , Diferenciação Celular/genética , Proliferação de Células/genética , Citocinas/metabolismo , Transição Epitelial-Mesenquimal/genética , Feminino , Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , RNA/análise , Transdução de SinaisRESUMO
Differential gene expression profiling studies have lead to the identification of several disease biomarkers. However, the oncogenic alterations in coding regions can modify the gene functions without affecting their own expression profiles. Moreover, post-translational modifications can modify the activity of the coded protein without altering the expression levels of the coding gene, but eliciting variations to the expression levels of the regulated genes. These considerations motivate the study of the rewiring of networks co-expressed genes as a consequence of the aforementioned alterations in order to complement the informative content of differential expression. We analyzed 339 mRNAomes of five distinct cancer types to find single genes that presented co-expression patterns strongly differentiated between normal and tumor phenotypes. Our analysis of differentially connected genes indicates the loss of connectivity as a common topological trait of cancer networks, and unveils novel candidate cancer genes. Moreover, our integrated approach that combines the differential expression together with the differential connectivity improves the classic enrichment pathway analysis providing novel insights on putative cancer gene biosystems not still fully investigated.
Assuntos
Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias/genética , Biomarcadores Tumorais/metabolismo , Perfilação da Expressão Gênica , Genes Neoplásicos , Humanos , Transdução de Sinais/genéticaRESUMO
At present, 51 genes are already known to be responsible for Non-Syndromic hereditary Hearing Loss (NSHL), but the knowledge of 121 NSHL-linked chromosomal regions brings to the hypothesis that a number of disease genes have still to be uncovered. To help scientists to find new NSHL genes, we built a gene-scoring system, integrating Gene Ontology, NCBI Gene and Map Viewer databases, which prioritizes the candidate genes according to their probability to cause NSHL. We defined a set of candidates and measured their functional similarity with respect to the disease gene set, computing a score ( S S M avg) that relies on the assumption that functionally related genes might contribute to the same (disease) phenotype. A Kolmogorov-Smirnov test, comparing the pair-wise distribution on the disease gene set with the distribution on the remaining human genes, provided a statistical assessment of this assumption. We found at a p-value < 2.2.10 (-16) that the former pair-wise is greater than the latter, justifying a prioritization strategy based on the functional similarity of candidate genes respect to the disease gene set. A cross-validation test measured to what extent the S S M avg ranking for NSHL is different from a random ordering: adding 15% of the disease genes to the candidate gene set, the ranking of the disease genes in the first eight positions resulted statistically different from a hypergeometric distribution with a p-value = 2.04.10(-5) and a power > 0.99. The twenty top-scored genes were finally examined to evaluate their possible involvement in NSHL. We found that half of them are known to be expressed in human inner ear or cochlea and are mainly involved in remodeling and organization of actin formation and maintenance of the cilia and the endocochlear potential. These findings strongly indicate that our metric was able to suggest excellent NSHL candidates to be screened in patients and controls for causative mutations.