RESUMO
Gene and protein expression is controlled so that cells can react to changing intra- and extracellular signals by modulating biochemical networks and pathways. We have previously shown that gene expression and the properties of expressed proteins are dynamically correlated. Here we investigated correlations between gene related parameters and gene expression patterns, and found statistically significant correlations in microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, cell cycle in HeLa cells, infection in intestinal epithelial cells, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. Our method was applied to time course datasets individually for each time point. We derived from sequence information numerous parameters for nucleotide composition, two-base composition, codon usage, skew parameters, and codon bias. In addition to coding regions, we also investigated correlations for complete genes and introns. Significant dynamic correlations were identified for each of the analyses. Our method also proved useful for detecting dynamic shifts in gene expression profiles, such as in the D. melanogaster dataset. Detection of changes in the properties of expressed genes and proteins might be useful for predicting or following biological processes, responses, growth, differentiation and possibly in related disorders.
Assuntos
Expressão Gênica , Genoma , Animais , Ciclo Celular , Drosophila melanogaster/genética , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genéticaRESUMO
When complex data is distributed in a biased manner between disease classes, classification accuracy can be increased with a network set of perceptron neural networks developed by us. A novel projection method is also introduced for the visual classification of the data to elucidate its features and disease class distribution. The set of the perceptron neural networks and the projection method were tested with otoneurological data and they improved average sensitivity and positive predictive value at least 10% up to 85% and 83%, compared to our earlier neural network classifications with the same data. The methods were also experimented with two additional data sets, which included diagnostically very difficult cases.
Assuntos
Otopatias , Doenças do Sistema Nervoso , Redes Neurais de Computação , Algoritmos , HumanosRESUMO
It is frequently useful and advantageous to investigate not only the classification efficacy of neural networks, but also the reasons for misclassification and relations between input variables and output classes. We have developed novel techniques to disentangle these dilemmas: a network structure and learning strategy for biased output class distributions, a method to measure the classification information incorporated in variables and variable groups, and methods to express properties learned by a network from its structure. We tested these techniques with otoneurological data from the conjunction with vertiginous diseases that we have explored in our previous neural network studies.
Assuntos
Viés , Processamento Eletrônico de Dados/métodos , Redes Neurais de Computação , Neurologia , Otolaringologia , Finlândia , HumanosRESUMO
BACKGROUND: Cells react to changing intra- and extracellular signals by dynamically modulating complex biochemical networks. Cellular responses to extracellular signals lead to changes in gene and protein expression. Since the majority of genes encode proteins, we investigated possible correlations between protein parameters and gene expression patterns to identify proteome-wide characteristics indicative of trends common to expressed proteins. RESULTS: Numerous bioinformatics methods were used to filter and merge information regarding gene and protein annotations. A new statistical time point-oriented analysis was developed for the study of dynamic correlations in large time series data. The method was applied to investigate microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. CONCLUSION: We show that the properties of proteins synthesized correlate dynamically with the gene expression profile, indicating that not only is the actual identity and function of expressed proteins important for cellular responses but that several physicochemical and other protein properties correlate with gene expression as well. Gene expression correlates strongly with amino acid composition, composition- and sequence-derived variables, functional, structural, localization and gene ontology parameters. Thus, our results suggest that a dynamic relationship exists between proteome properties and gene expression in many biological systems, and therefore this relationship is fundamental to understanding cellular mechanisms in health and disease.
Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/classificação , Animais , Linfócitos B/fisiologia , Ciclo Celular/genética , Biologia Computacional/métodos , Apresentação de Dados , Drosophila melanogaster/genética , Processamento Eletrônico de Dados , Frequência do Gene , Humanos , Armazenamento e Recuperação da Informação/métodos , Ativação Linfocitária/genética , Cadeias de Markov , Modelos Biológicos , Saccharomyces cerevisiae/genética , Análise de Sequência de Proteína/métodos , Transdução de Sinais/genética , Software , Linfócitos T/fisiologiaRESUMO
Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them.
Assuntos
Biologia Computacional/métodos , Ontologia Genética/estatística & dados numéricos , Genoma , Família Multigênica , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Mapeamento Cromossômico , Análise por Conglomerados , Biologia Computacional/estatística & dados numéricos , Drosophila melanogaster/genética , Escherichia coli K12/genética , Humanos , Camundongos , Saccharomyces cerevisiae/genéticaRESUMO
Many genes and proteins are required to carry out the processes of innate and adaptive immunity. For many studies, including systems biology, it is necessary to have a clear and comprehensive definition of the immune system, including the genes and proteins that take part in immunological processes. We have identified and cataloged a large portion of the human immunology-related genes, which we call the essential immunome. The 847 identified genes and proteins were annotated, and their chromosomal localizations were compared to the mouse genome. Relation to disease was also taken into account. We identified numerous pseudogenes, many of which are expressed, and found two putative new genes. We also carried out an evolutionary analysis of immune processes based on gene orthologs to gain an overview of the evolutionary past and molecular present of the human immune system. A list of genes and proteins were compiled. A comprehensive characterization of the member genes and proteins, including the corresponding pseudogenes is presented. Immunome genes were found to have three types of emergence in independent studies of their ontologies, domains, and functions.
Assuntos
Genoma Humano , Imunidade/genética , Proteínas/genética , Animais , Mapeamento Cromossômico , Cromossomos Humanos/genética , Biologia Computacional , Evolução Molecular , Humanos , Sistema Imunitário/imunologia , Camundongos , Estrutura Terciária de Proteína , Proteínas/classificação , PseudogenesRESUMO
BACKGROUND: The immune system, which is a complex machinery, is based on the highly coordinated expression of a wide array of genes and proteins. The evolutionary history of the human immune system is not well characterised. Although several studies related to the development and evolution of immunological processes have been published, a full-scale genome-based analysis is still missing. A database focused on the evolutionary relationships of immune related genes would contribute to and facilitate research on immunology and evolutionary biology. RESULTS: An Internet resource called ImmTree http://bioinf.uta.fi/ImmTree was constructed for studying the evolution and evolutionary trees of the human immune system. ImmTree contains information about orthologs in 80 species collected from the HomoloGene, OrthoMCL and EGO databases. In addition to phylogenetic trees, the service provides data for the comparison of human-mouse ortholog pairs, including synonymous and non-synonymous mutation rates, Z values, and Ka/Ks quotients. A versatile search engine allows complex queries from the database. Currently, data is available for 847 human immune system related genes and proteins. CONCLUSION: ImmTree provides a unique data set of genes and proteins from the human immune system, their phylogenetics, and information for comparisons of human-mouse ortholog pairs, synonymous and non-synonymous mutation rates, as well as other statistical information.