RESUMEN
The identification of human predisposition genes to severe forms of infectious diseases is important for understanding the mechanisms of pathogenesis, as well as for the detection of the risk groups. This will allow one to carry out targeted vaccination and preventive therapy. The most common approaches to the genetic risk estimation include conducting association studies, in which the groups of patients and control individuals are compared using both preliminarily selected candidate genes and using genome-wide analysis. To search for genetic variants predisposed to severe forms of infectious diseases, it is expedient to form a control that consists of patients with clinically proven infections with asymptomatic or mild forms of the disease. The examples of the use of these approaches to identify genetic factors that predispose one to severe forms of infections caused by viruses from the Flaviviridae family are considered in the review. At present, a number of genetic markers associated with predisposition to tick-borne encephalitis, West Nile fever, and Dengue fever have already been detected. These associations must be confirmed in independent samples. Genetic variants, for which the association with spontaneous recovery during infection with hepatitis C virus, patient's reaction on antiviral drugs, and the development of liver fibrosis was established, were also detected. The gene variants with more pronounced phenotypic effects will probably be found during further studies; they can be used in clinical practice as prognostic markers of the course and outcomes of infection with the Flaviviridae, as well as of the response to treatment.
Asunto(s)
Infecciones por Flaviviridae/genética , Infecciones por Flaviviridae/metabolismo , Flaviviridae , Predisposición Genética a la Enfermedad , Infecciones por Flaviviridae/virología , Estudio de Asociación del Genoma Completo , HumanosRESUMEN
The review describes integrated experimental and computer approaches to the investigation of the mechanisms of transcriptional regulation of the organization of eukaryotic genes and transcription regulatory regions. These include (a) an analysis of the factors affecting the affinity of TBP (TATA-binding protein) for the TATA box; (b) research on the patterns of chromatin mark distributions and their role in the regulation of gene expression; (c) a study of 3D chromatin organization; (d) an estimation of the effects of polymorphisms on gene expression via high-resolution Chip-seq and DNase-seq techniques. It was demonstrated that integrated experimental and computer approaches are very important for the current understanding of transcription regulatory mechanisms and the structural and functional organization of the regulatory regions controlling transcription.
Asunto(s)
Ensamble y Desensamble de Cromatina/fisiología , Simulación por Computador , Genómica/métodos , Elementos de Respuesta/fisiología , Análisis de Secuencia de ADN/métodos , Transcripción Genética/fisiologíaRESUMEN
Telomeres are the terminal regions of chromosomes that ensure their stability while cell division. Telomere shortening initiates cellular senescence, which can lead to degeneration and atrophy of tissues, so the process is associated with a reduction in life expectancy and predisposition to a number of diseases. An accelerated rate of telomere attrition can serve as a predictor of life expectancy and health status of an individual. Telomere length is a complex phenotypic trait that is determined by many factors, including the genetic ones. Numerous studies (including genome-wide association studies, GWAS) indicate the polygenic nature of telomere length control. The objective of the present study was to characterize the genetic basis of the telomere length regulation using the GWAS data obtained during the studies of various human and other animal populations. To do so, a compilation of the genes associated with telomere length in GWAS experiments was collected, which included information on 270 human genes, as well as 23, 22, and 9 genes identified in the cattle, sparrow, and nematode, respectively. Among them were two orthologous genes encoding a shelterin protein (POT1 in humans and pot-2 in C. elegans). Functional analysis has shown that telomere length can be influenced by genetic variants in the genes encoding: (1) structural components of telomerase; (2) the protein components of telomeric regions (shelterin and CST complexes); (3) the proteins involved in telomerase biogenesis and regulating its activity; (4) the proteins that regulate the functional activity of the shelterin components; (5) the proteins involved in telomere replication and/or capping; (6) the proteins involved in the alternative telomere lengthening; (7) the proteins that respond to DNA damage and are responsible for DNA repair; (8) RNA-exosome components. The human genes identified by several research groups in populations of different ethnic origins are the genes encoding telomerase components such as TERC and TERT as well as STN1 encoding the CST complex component. Apparently, the polymorphic loci affecting the functions of these genes may be the most reliable susceptibility markers for telomere-related diseases. The systematized data about the genes and their functions can serve as a basis for the development of prognostic criteria for telomere length-associated diseases in humans. Information about the genes and processes that control telomere length can be used for marker-assisted and genomic selection in the farm animals, aimed at increasing the duration of their productive lifetime.
RESUMEN
Genes encoding cell surface receptors make up a significant portion of the human genome (more than a thousand genes) and play an important role in gene networks. Cell surface receptors are transmembrane proteins that interact with molecules (ligands) located outside the cell. This interaction activates signal transduction pathways in the cell. A large number of exogenous ligands of various origins, including drugs, are known for cell surface receptors, which accounts for interest in them from biomedical researchers. Appetite (the desire of the animal organism to consume food) is one of the most primitive instincts that contribute to survival. However, when the supply of nutrients is stable, the mechanism of adaptation to adverse factors acquired in the course of evolution turned out to be excessive, and therefore obesity has become one of the most serious public health problems of the twenty-first century. Pathological human conditions characterized by appetite violations include both hyperphagia, which inevitably leads to obesity, and anorexia nervosa induced by psychosocial stimuli, as well as decreased appetite caused by neurodegeneration, inflammation or cancer. Understanding the evolutionary mechanisms of human diseases, especially those related to lifestyle changes that have occurred over the past 100-200 years, is of fundamental and applied importance. It is also very important to identify relationships between the evolutionary characteristics of genes in gene networks and the resistance of these networks to changes caused by mutations. The aim of the current study is to identify the distinctive features of human genes encoding cell surface receptors involved in appetite regulation using the phylostratigraphic age index (PAI) and divergence index (DI). The values of PAI and DI were analyzed for 64 human genes encoding cell surface receptors, the orthologs of which were involved in the regulation of appetite in model animal species. It turned out that the set of genes under consideration contains an increased number of genes with the same phylostratigraphic age (PAI = 5, the stage of vertebrate divergence), and almost all of these genes (28 out of 31) belong to the superfamily of G-protein coupled receptors. Apparently, the synchronized evolution of such a large group of genes (31 genes out of 64) is associated with the development of the brain as a separate organ in the first vertebrates. When studying the distribution of genes from the same set by DI values, a significant enrichment with genes having a low DIs was revealed: eight genes (GPR26, NPY1R, GHSR, ADIPOR1, DRD1, NPY2R, GPR171, NPBWR1) had extremely low DIs (less than 0.05). Such low DI values indicate that most likely these genes are subjected to stabilizing selection. It was also found that the group of genes with low DIs was enriched with genes that had brain-specific patterns of expression. In particular, GPR26, which had the lowest DI, is in the group of brain-specific genes. Because the endogenous ligand for the GPR26 receptor has not yet been identified, this gene seems to be an extremely interesting object for further theoretical and experimental research. We believe that the features of the genes encoding cell surface receptors we have identified using the evolutionary metrics PAI and DI can be a starting point for further evolutionary analysis of the gene network regulating appetite.
RESUMEN
The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.
Asunto(s)
Biología Computacional/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Algoritmos , Secuencia de Bases , Sitios de Unión , Minería de Datos , Datos de Secuencia Molecular , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
Whole genome and whole exome sequencing technologies play a very important role in the studies of the genetic aspects of the pathogenesis of various diseases. The ample use of genome-wide and exome-wide association study methodology (GWAS and EWAS) made it possible to identify a large number of genetic variants associated with diseases. This information is accumulated in the databases like GWAS central, GWAS catalog, OMIM, ClinVar, etc. Most of the variants identified by the GWAS technique are located in the noncoding regions of the human genome. According to the ENCODE project, the fraction of regions in the human genome potentially involved in transcriptional control is many times greater than the fraction of coding regions. Thus, genetic variation in noncoding regions of the genome can increase the susceptibility to diseases by disrupting various regulatory elements (promoters, enhancers, silencers, insulator regions, etc.). However, identification of the mechanisms of influence of pathogenic genetic variants on the diseases risk is difficult due to a wide variety of regulatory elements. The present review focuses on the molecular genetic mechanisms by which pathogenic genetic variants affect gene expression. At the same time, attention is concentrated on the transcriptional level of regulation as an initial step in the expression of any gene. A triggering event mediating the effect of a pathogenic genetic variant on the level of gene expression can be, for example, a change in the functional activity of transcription factor binding sites (TFBSs) or DNA methylation change, which, in turn, affects the functional activity of promoters or enhancers. Dissecting the regulatory roles of polymorphic loci have been impossible without close integration of modern experimental approaches with computer analysis of a growing wealth of genetic and biological data obtained using omics technologies. The review provides a brief description of a number of the most well-known public genomic information resources containing data obtained using omics technologies, including (1) resources that accumulate data on the chromatin states and the regions of transcription factor binding derived from ChIP-seq experiments; (2) resources containing data on genomic loci, for which allele-specific transcription factor binding was revealed based on ChIP-seq technology; (3) resources containing in silico predicted data on the potential impact of genetic variants on the transcription factor binding sites.
RESUMEN
The local DNA conformation in the region of transcription factor binding sites, determined by context, is one of the factors underlying the specificity of DNA-protein interactions. Analysis of the local conformation of a set of functional DNA sequences may allow for determination of the conservative conformational and physicochemical parameters reflecting molecular mechanisms of interaction. The web resource SITECON is designed to detect conservative conformational and physicochemical properties in transcription factor binding sites, contains a knowledge base of conservative properties for >100 high-quality sample sites and allows for recognition of potential transcription factor binding sites based on conservative properties from both the knowledge base and the results of analysis of a sample proposed by a user. The resource SITECON is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/sitecon/.
Asunto(s)
ADN/química , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Factores de Transcripción/metabolismo , Sitios de Unión , ADN/metabolismo , Internet , Conformación de Ácido Nucleico , Alineación de Secuencia , Análisis de Secuencia de ADN , Interfaz Usuario-ComputadorRESUMEN
The GeneNet database is designed for accumulation of information on gene networks. Original technology applied in GeneNet enables description of not only a gene network structure and functional relationships between components, but also metabolic and signal transduction pathways. Specialised software, GeneNet Viewer, automatically displays the graphical diagram of gene networks described in the database. Current release 3.0 of GeneNet database contains descriptions of 25 gene networks, 945 proteins, 567 genes, 151 other substances and 1364 relationships between components of gene networks. Information distributed between 14 interlinked tables was obtained by annotating 968 scientific publications. The SRS-version of GeneNet database is freely available (http://wwwmgs.bionet.nsc.ru/mgs/systems/genenet/).
Asunto(s)
Bases de Datos Genéticas , Metabolismo/genética , Transducción de Señal/genética , Animales , Gráficos por Computador , Predicción , Genes , Humanos , Almacenamiento y Recuperación de la Información , Internet , Proteínas/genética , Proteínas/fisiología , ARN/genética , Interfaz Usuario-ComputadorRESUMEN
Transcription Regulatory Regions Database (TRRD) is an informational resource containing an integrated description of the gene transcription regulation. An entry of the database corresponds to a gene and contains the data on localization and functions of the transcription regulatory regions as well as gene expression patterns. TRRD contains only experimental data that are inputted into the database through annotating scientific publication. TRRD release 6.0 comprises the information on 1167 genes, 5537 transcription factor binding sites, 1714 regulatory regions, 14 locus control regions and 5335 expression patterns obtained through annotating 3898 scientific papers. This information is arranged in seven databases: TRRDGENES (general gene description), TRRDLCR (locus control regions); TRRDUNITS (regulatory regions: promoters, enhancers, silencers, etc.), TRRDSITES (transcription factor binding sites), TRRDFACTORS (transcription factors), TRRDEXP (expression patterns) and TRRDBIB (experimental publications). Sequence Retrieval System (SRS) is used as a basic tool for navigating and searching TRRD and integrating it with external informational and software resources. The visualization tool, TRRD Viewer, provides the information representation in a form of maps of gene regulatory regions. The option allowing nucleotide sequences to be searched for according to their homology using BLAST is also included. TRRD is available at http://www.bionet.nsc.ru/trrd/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Transcripción Genética , Animales , Sitios de Unión , Gráficos por Computador , Proteínas de Unión al ADN/metabolismo , Silenciador del Gen , Humanos , Almacenamiento y Recuperación de la Información , Internet , Control de Calidad , Secuencias Reguladoras de Ácidos Nucleicos , Homología de Secuencia de Ácido Nucleico , Relación Estructura-Actividad , Factores de Transcripción/metabolismo , Activación TranscripcionalRESUMEN
Steroidogenic factor 1 (SF-1) belongs to a small group of the transcription factors that bind DNA only as a monomer. Three different approaches-Sitecon, SiteGA, and oPWM-constructed using the same training sample of experimentally confirmed SF-1 binding sites have been used to recognize these sites. The appropriate prediction thresholds for recognition models have been selected. Namely, the thresholds concordant by false positive or negative rates for various methods were used to optimize the discrimination of steroidogenic gene promoters from the datasets of non-specific promoters. After experimental verification, the models were used to analyze the ChIP-seq data for SF-1. It has been shown that the sets of sites recognized by different models overlap only partially and that an integration of these models allows for identification of SF-1 sites in up to 80% of the ChIP-seq loci. The structures of the sites detected using the three recognition models in the ChIP-seq peaks falling within the [-5000, +5000] region relative to the transcription start sites (TSS) extracted from the FANTOM5 project have been analyzed. The MATLIGN classified the frequency matrices for the sites predicted by oPWM, Sitecon, and SiteGA into two groups. The first group is described by oPWM/Sitecon and the second, by SiteGA. Gene ontology (GO) analysis has been used to clarify the differences between the sets of genes carrying different variants of SF-1 binding sites. Although this analysis in general revealed a considerable overlap in GO terms for the genes carrying the binding sites predicted by oPWM, Sitecon, or SiteGA, only the last method elicited notable trend to terms related to negative regulation and apoptosis. The results suggest that the SF-1 binding sites are different in both their structure and the functional annotation of the set of target genes correspond to the predictions by oPWM+Sitecon and SiteGA. Further application of Homer software for de novo identification of enriched motifs in ChIP-Seq data for SF-1ChIP-seq dataset gave the data similar to oPWM+Sitecon.
Asunto(s)
Factor Esteroidogénico 1/metabolismo , Animales , Sitios de Unión , Inmunoprecipitación de Cromatina , Ensayo de Cambio de Movilidad Electroforética , Masculino , Ratas , Ratas Wistar , Factor Esteroidogénico 1/químicaRESUMEN
The review describes several modules of the GeneExpress integrated computer system concerning the regulation of gene expression in eukaryotes. Approaches to the presentation of experimental data in databases are considered. The employment of GeneExpress in computer analysis and modeling of the organization and function of genetic systems is illustrated with examples. GeneExpress is available at http://wwwmgs.bionet.nsc.ru/mgs/gnw/.
Asunto(s)
Regulación de la Expresión Génica , Integración de Sistemas , Animales , Bases de Datos Genéticas , Evolución Molecular , Regiones Promotoras Genéticas , ARN Mensajero/genética , Vertebrados/genéticaRESUMEN
The development of computer-assisted methods for transcription factor binding sites (TFBS) recognition is necessary for study the DNA regulatory transcription code. There are a great number of experimental methods that enable TFBS identification in genome sequences. The experimental data can be used to elaborate multiple computer approaches to recognition of TFBS, each of which has its own advantages and limitations. A short review of the characteristics of computer methods of TFBS prediction based on various principles is presented. Methods used for experimental monitoring of predicted sites are analyzed. Data concerning DNA regulatory potential and its realization at the chromatin level, obtained using these methods, are discussed along with approaches to recognition of target genes of certain transcription factors in the genome sequences.
Asunto(s)
Biología Computacional , Factores de Transcripción/metabolismo , Vertebrados/genética , Vertebrados/metabolismo , Animales , Sitios de Unión/genética , Simulación por Computador , ADN/genética , ADN/metabolismo , Genoma , HumanosRESUMEN
Using gel retardation of DNA samples and specific antibodies, binding sites for the transcription factor SF-1 were found in positions -53/-44 and -285/-270 in the promoter region of the mouse Cyp17 gene and in position -117/-108 of the promoter region of the mouse 3betaHSDI gene.
Asunto(s)
17-Hidroxiesteroide Deshidrogenasas/genética , 3-Hidroxiesteroide Deshidrogenasas/genética , Proteínas de Homeodominio/metabolismo , Regiones Promotoras Genéticas/fisiología , Receptores Citoplasmáticos y Nucleares/metabolismo , Factores de Transcripción/metabolismo , 17-Hidroxiesteroide Deshidrogenasas/metabolismo , 3-Hidroxiesteroide Deshidrogenasas/metabolismo , Animales , Especificidad de Anticuerpos , Secuencia de Bases , Sitios de Unión/genética , ADN/química , Electroforesis/métodos , Regulación Enzimológica de la Expresión Génica , Ratones , Factor Esteroidogénico 1RESUMEN
The effects of thymic hormone thymosin (fraction 5) and tactivin on the adrenal glucocorticoid function were compared in BALB/c mice. An elevation in plasma corticosterone level was found 3 h after i.p. injection of thymosin (1 microgram/mouse) which was possibly caused by an activation of neuroendocrine structures. This appeared plausible because the pretreatment with dexamethasone (10 micrograms/mouse) abolished the effect of thymosin. In contrast, tactivin produced a decrease in plasma corticosterone if administered to mice with high basal level of the hormone. Tactivin added at doses from 0.00064-2 micrograms/ml together with ACTH (1.6 microIU/ml) to isolated adrenal cells hindered the stimulatory influence of ACTH on the production of corticosterone by the adrenal cells. Thus, the thymic hormone thymosin and tactivin showed opposite influences on the adrenal glucocorticoid function which appeared to be mediated through different mechanisms.
Asunto(s)
Péptidos/fisiología , Sistema Hipófiso-Suprarrenal/fisiología , Timosina/fisiología , Extractos del Timo/fisiología , Glándulas Suprarrenales/citología , Animales , Corticosterona/sangre , Femenino , Masculino , Ratones , Ratones Endogámicos BALB CRESUMEN
Using 42 nucleotide sequences extracted from the Transcription Regulatory Regions Database (TRRD) containing SF-1 transcription factor binding site, we have determined the decanucleotide (GTCAAGGTCA) consensus sequence for SF-1 binding. In the frequency matrix of this sequence nucleotides between the 3rd and the 7th position had the highest frequency and guanine nucleotides at the 6th and the 7th positions were recognized in all nucleotide sequences. The latter suggests a crucial role of these guanines for the interaction of DNA with SF-1 protein. The determined consensus and frequency matrix were used for search of putative SF-1 binding sites in regulatory regions of two genes, encoding mouse Cyp17 (17alpha-hydroxylase/17-20-lyase) and 3betaHSDI (3beta-hydroxysteroid dehydrogenase/4delta-5delta-isomerase I), the microsomal enzymes involved in steroidogenesis. 5;-Flanking regions of genes encoding Cyp17 and 3betaHSDI were shown to contain six and five such binding sites, respectively. The presence of the putative SF-1 binding sites in the regulatory regions of mouse Cyp17 and 3betaHSDI suggests that gene SF-1 could represent one of the putative genes which (as we predicted earlier) determine coordinated inheritable variability of hormonal activity in mouse Leydig cells.
Asunto(s)
3-Hidroxiesteroide Deshidrogenasas/genética , Proteínas de Unión al ADN/metabolismo , Esteroide 17-alfa-Hidroxilasa/genética , Factores de Transcripción/metabolismo , 3-Hidroxiesteroide Deshidrogenasas/metabolismo , Animales , Secuencia de Bases , Sitios de Unión/genética , Secuencia de Consenso/genética , Proteínas de Unión al ADN/genética , Factores de Transcripción Fushi Tarazu , Ratones , Datos de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Esteroide 17-alfa-Hidroxilasa/metabolismo , Factores de Transcripción/genéticaRESUMEN
TRANSFAC, TRRD (Transcription Regulatory Region Database) and COMPEL are databases which store information about transcriptional regulation in eukaryotic cells. The three databases provide distinct views on the components involved in transcription: transcription factors and their binding sites and binding profiles (TRANSFAC), the regulatory hierarchy of whole genes (TRRD), and the structural and functional properties of composite elements (COMPEL). The quantitative and qualitative changes of all three databases and connected programs are described. The databases are accessible via WWW:http://transfac.gbf.de/TRANSFAC orhttp://www.bionet.nsc.ru/TRRD
Asunto(s)
Bases de Datos Factuales , Regulación de la Expresión Génica , Transcripción Genética , Animales , Redes de Comunicación de Computadores , Humanos , Programas Informáticos , Factores de Transcripción , Interfaz Usuario-ComputadorRESUMEN
Transcription Regulatory Regions Database (TRRD) has been developed for accumulation of experimental information on the structure-function features of regulatory regions of eukaryotic genes. Each entry in TRRD corresponds to a particular gene and contains a description of structure-function features of its regulatory regions (transcription factor binding sites, promoters, enhancers, silencers, etc.) and gene expression regulation patterns. The current release, TRRD 4.2.5, comprises the description of 760 genes, 3403 expression patterns, and >4600 regulatory elements including 3604 transcription factor binding sites, 600 promoters and 152 enhancers. This information was obtained through annotation of 2537 scientific publications. TRRD 4.2.5 is available through the WWW at http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4/
Asunto(s)
Bases de Datos Factuales , Transcripción Genética , Elementos de Facilitación Genéticos , Internet , Regiones Promotoras Genéticas , Secuencias Reguladoras de Ácidos NucleicosRESUMEN
The Transcription Regulatory Regions Database (TRRD) is a curated database designed for accumulation of experimental data on extended regulatory regions of eukaryotic genes, the regulatory elements they contain, i.e., transcription factor binding sites, promoters, enhancers, silencers, etc., and expression patterns of the genes. Release 4.1 of TRRD offers a number of significant improvements, in particular, a more detailed description of transcription factor binding sites, transcription factors per se, and gene expression patterns in a computer-readable format. In addition, the new TRRD release provides considerably more references to other molecular biological databases. TRRD 4.1 is installed under SRS and is available through the WWW at http://www.bionet.nsc.ru/trrd/
Asunto(s)
Bases de Datos Factuales , Secuencias Reguladoras de Ácidos Nucleicos/genética , Transcripción Genética/genética , Animales , Secuencia de Bases , Sitios de Unión , Línea Celular , Bases de Datos Factuales/tendencias , Elementos de Facilitación Genéticos/genética , Células Eucariotas , Regulación de la Expresión Génica/genética , Glutatión Peroxidasa/genética , Almacenamiento y Recuperación de la Información , Internet , Ratones , Especificidad de Órganos , Regiones Promotoras Genéticas/genética , Elementos de Respuesta/genética , Federación de Rusia , Factores de Transcripción/genética , Interfaz Usuario-ComputadorRESUMEN
GeneExpress system has been designed to integrate description, analysis, and recognition of eukaryotic regulatory sequences. The system includes 5 basic units: (1) GeneNet contains an object-oriented database for accumulation of data on gene networks and signal transduction pathways and a Java-based viewer that allows an exploration and visualization of the GeneNet information; (2) Transcription Regulation combines the database on transcription regulatory regions of eukaryotic genes (TRRD) and TRRD Viewer; (3) Transcription Factor Binding Site Recognition contains a compilation of transcription factor binding sites (TFBSC) and programs for their analysis and recognition; (4) mRNA Translation is designed for analysis of structural and contextual features of mRNA 5'UTRs and prediction of their translation efficiency; and (5) ACTIVITY is the module for analysis and site activity prediction of a given nucleotide sequence. Integration of the databases in the GeneExpress is based on the Sequence Retrieval System (SRS) created in the European Bioinformatics Institute.
Asunto(s)
Sistemas de Computación , Genes Reguladores , Genoma , Inteligencia Artificial , Sitios de Unión , Bases de Datos Factuales , Células Eucariotas , Regulación de la Expresión Génica , Biosíntesis de Proteínas , ARN Mensajero/genética , Programas Informáticos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Transcripción GenéticaRESUMEN
MOTIVATION: The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences. RESULTS: The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins. AVAILABILITY: GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++.