Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Environ Microbiol ; 21(2): 784-799, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30536693

RESUMEN

Bacterial genes for molybdenum-containing and tungsten-containing enzymes are often differentially regulated depending on the metal availability in the environment. Here, we describe a new family of transcription factors with an unusual DNA-binding domain related to excisionases of bacteriophages. These transcription factors are associated with genes for various molybdate and tungstate-specific transporting systems as well as molybdo/tungsto-enzymes in a wide range of bacterial genomes. We used a combination of computational and experimental techniques to study a member of the TF family, named TaoR (for tungsten-containing aldehyde oxidoreductase regulator). In Desulfovibrio vulgaris Hildenborough, a model bacterium for sulfate reduction studies, TaoR activates expression of aldehyde oxidoreductase aor and represses tungsten-specific ABC-type transporter tupABC genes under tungsten-replete conditions. TaoR binding sites at aor promoter were identified by electrophoretic mobility shift assay and DNase I footprinting. We also reconstructed TaoR regulons in 45 Deltaproteobacteria by comparative genomics approach and predicted target genes for TaoR family members in other Proteobacteria and Firmicutes.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/genética , Proteínas Bacterianas/metabolismo , Desulfovibrio vulgaris/genética , Desulfovibrio vulgaris/metabolismo , Molibdeno/metabolismo , Factores de Transcripción/metabolismo , Compuestos de Tungsteno/metabolismo , Transportadoras de Casetes de Unión a ATP/metabolismo , Proteínas Bacterianas/genética , Sitios de Unión , Transporte Biológico , Desulfovibrio vulgaris/aislamiento & purificación , Regulación Bacteriana de la Expresión Génica , Regulación Enzimológica de la Expresión Génica , Familia de Multigenes , Regiones Promotoras Genéticas , Regulón , Factores de Transcripción/genética
2.
J Bacteriol ; 197(1): 29-39, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25313388

RESUMEN

Although the enzymes for dissimilatory sulfate reduction by microbes have been studied, the mechanisms for transcriptional regulation of the encoding genes remain unknown. In a number of bacteria the transcriptional regulator Rex has been shown to play a key role as a repressor of genes producing proteins involved in energy conversion. In the model sulfate-reducing microbe Desulfovibrio vulgaris Hildenborough, the gene DVU_0916 was observed to resemble other known Rex proteins. Therefore, the DVU_0916 protein has been predicted to be a transcriptional repressor of genes encoding proteins that function in the process of sulfate reduction in D. vulgaris Hildenborough. Examination of the deduced DVU_0916 protein identified two domains, one a winged helix DNA-binding domain common for transcription factors, and the other a Rossman fold that could potentially interact with pyridine nucleotides. A deletion of the putative rex gene was made in D. vulgaris Hildenborough, and transcript expression studies of sat, encoding sulfate adenylyl transferase, showed increased levels in the D. vulgaris Hildenborough Rex (RexDvH) mutant relative to the parental strain. The RexDvH-binding site upstream of sat was identified, confirming RexDvH to be a repressor of sat. We established in vitro that the presence of elevated NADH disrupted the interaction between RexDvH and DNA. Examination of the 5' transcriptional start site for the sat mRNA revealed two unique start sites, one for respiring cells that correlated with the RexDvH-binding site and a second for fermenting cells. Collectively, these data support the role of RexDvH as a transcription repressor for sat that senses the redox status of the cell.


Asunto(s)
Proteínas Bacterianas/metabolismo , Desulfovibrio vulgaris/metabolismo , Regulación Enzimológica de la Expresión Génica/fisiología , NAD/metabolismo , Sulfato Adenililtransferasa/metabolismo , Proteínas Bacterianas/genética , Secuencia de Bases , Sitios de Unión , Desulfovibrio vulgaris/genética , Eliminación de Gen , Regulación Bacteriana de la Expresión Génica/fisiología , Sulfato Adenililtransferasa/antagonistas & inhibidores , Sulfato Adenililtransferasa/genética
3.
Science ; 313(5793): 1596-604, 2006 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-16973872

RESUMEN

We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.


Asunto(s)
Duplicación de Gen , Genoma de Planta , Populus/genética , Análisis de Secuencia de ADN , Arabidopsis/genética , Mapeo Cromosómico , Biología Computacional , Evolución Molecular , Etiquetas de Secuencia Expresada , Expresión Génica , Genes de Plantas , Análisis de Secuencia por Matrices de Oligonucleótidos , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Polimorfismo de Nucleótido Simple , Populus/crecimiento & desarrollo , Populus/metabolismo , Estructura Terciaria de Proteína , ARN de Planta/análisis , ARN no Traducido/análisis
4.
J Theor Biol ; 212(2): 129-39, 2001 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-11531380

RESUMEN

Automatic identification of sub-structures in multi-aligned sequences is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and other molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically significant blocks, or segments. This algorithm applies dynamic programming and progressive optimization to the statistical profile of a multi-alignment in order to optimally demarcate relatively homogenous sub-regions. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns were identified automatically and efficiently: shared conserved domain; shared variable motif; and rare signature sequence. Results were consistent with the patterns identified through independent phylogenetic and structural approaches. This algorithm facilitates the automation of sequence-based molecular structural and evolutionary analyses through statistical modeling and high performance computation.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Genéticos , Alineación de Secuencia , Animales , Secuencia Conservada , ARN Ribosómico 16S
5.
Nucleic Acids Res ; 29(19): 3928-38, 2001 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-11574674

RESUMEN

Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.


Asunto(s)
Biología Computacional/métodos , Genes Arqueales , Genes Bacterianos , ARN no Traducido/genética , Escherichia coli/genética , Predicción , Genoma Arqueal , Genoma Bacteriano , Redes Neurales de la Computación , ARN Mensajero/genética
6.
Nucleic Acids Res ; 29(11): 2338-48, 2001 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-11376152

RESUMEN

Alternative pre-mRNA splicing is a major cellular process by which functionally diverse proteins can be generated from the primary transcript of a single gene, often in tissue-specific patterns. The current study investigates the hypothesis that splicing of tissue-specific alternative exons is regulated in part by control sequences in adjacent introns and that such elements may be recognized via computational analysis of exons sharing a highly specific expression pattern. We have identified 25 brain-specific alternative cassette exons, compiled a dataset of genomic sequences encompassing these exons and their adjacent introns and used word contrast algorithms to analyze key features of these nucleotide sequences. By comparison to a control group of constitutive exons, brain-specific exons were often found to possess the following: divergent 5' splice sites; highly pyrimidine-rich upstream introns; a paucity of GGG motifs in the downstream intron; a highly statistically significant over-representation of the hexanucleotide UGCAUG in the proximal downstream intron. UGCAUG was also found at a high frequency downstream of a smaller group of muscle-specific exons. Intriguingly, UGCAUG has been identified previously in a few intron splicing enhancers. Our results indicate that this element plays a much wider role than previously appreciated in the regulated tissue-specific splicing of many alternative exons.


Asunto(s)
Empalme Alternativo , Encéfalo/metabolismo , Intrones/genética , Precursores del ARN/genética , Secuencias Reguladoras de Ácidos Nucleicos , Algoritmos , Secuencia de Bases , ADN/genética , Exones/genética , Genes/genética , Humanos
7.
Bioinformatics ; 17(4): 349-58, 2001 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-11301304

RESUMEN

MOTIVATION: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.


Asunto(s)
Redes Neurales de la Computación , Pliegue de Proteína , Proteínas/química , Análisis Discriminante , Proteínas/clasificación
8.
Genome Res ; 10(9): 1304-6, 2000 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-10984448

RESUMEN

Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human-mouse DNA comparison studies have discovered numerous conserved noncoding sequences of which only a fraction has been functionally investigated A question therefore remains as to whether most of these noncoding sequences are conserved because of functional constraints or are the result of a lack of divergence time.


Asunto(s)
Secuencia Conservada/genética , Alineación de Secuencia , Regiones no Traducidas/genética , Animales , Perros , Humanos , Ratones , Datos de Secuencia Molecular , Especificidad de la Especie , Regiones no Traducidas/aislamiento & purificación
9.
Bioinformatics ; 16(11): 1046-7, 2000 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-11159318

RESUMEN

SUMMARY: VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers. AVAILABILITY: VISTA server is available on the web at http://www-gsd.lbl.gov/vista. The source code is available upon request. CONTACT: vista@lbl.gov


Asunto(s)
ADN/genética , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Animales , Biología Computacional , Humanos , Internet , Ratones , Conejos
10.
J Comput Biol ; 7(6): 849-62, 2000.
Artículo en Inglés | MEDLINE | ID: mdl-11382366

RESUMEN

This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about protein families. Given the limited expressive capacity of a particular method, a mixture of protein annotation and fold recognition experts, each implementing a different underlying representation, should provide a robust method for assigning sequences to families. These ideas are illustrated using two data-driven learning methods that make use of different prior information and employ independent, yet complementary, projections of a family: hidden Markov models (HMMs) based on a multiple sequence alignment and neural networks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biologically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosinemia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.


Asunto(s)
Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Secuencia de Aminoácidos , Cadenas de Markov , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Proteínas/metabolismo , Alineación de Secuencia/métodos
11.
Nucleic Acids Res ; 28(1): 296-7, 2000 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-10592252

RESUMEN

Version 2.1 of ASDB (Alternative Splicing Data Base) contains 1922 protein and 2486 DNA sequences. The protein entries from SWISS-PROT are joined into clusters corresponding to alternatively spliced variants of one gene. The DNA division consists of complete genes with alternative splicing mentioned or annotated in GenBank. The search engine allows one to search over SWISS-PROT and GenBank fields and then follow the links to all variants. The database can be assessed at the URL http://cbcg.nersc.gov/asdb


Asunto(s)
Empalme Alternativo/genética , ADN/genética , Bases de Datos Factuales , Proteínas/química , Internet , Proteínas/genética
12.
Proteins ; 35(4): 401-7, 1999 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-10382667

RESUMEN

A computational method has been developed for the assignment of a protein sequence to a folding class in the Structural Classification of Proteins (SCOP). This method uses global descriptors of a primary protein sequence in terms of the physical, chemical, and structural properties of the constituent amino acids. Neural networks are utilized to combine these descriptors in a way to discriminate members of a given fold from members of all other folds. An extensive testing of the method has been performed to evaluate its prediction accuracy. The method is applicable for the fold assignment of any protein sequence with or without significant sequence homology to known proteins. A WWW page for predicting protein folds is available at URL http://cbcg.lbl.gov/.


Asunto(s)
Pliegue de Proteína , Proteínas/química , Aminoácidos/química , Bases de Datos Factuales
13.
Nucleic Acids Res ; 27(1): 301-2, 1999 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-9847209

RESUMEN

A database of alternatively spliced genes (ASDB) has been constructed based on (i) the results of the analysis of Swiss-Prot entries containing products of these genes and (ii) clustering procedure joining proteins that could arise by alternative splicing of the same gene. ASDB incorporates information about alternatively spliced genes, their products and expression patterns. It can be searched in order to find all products of alternative splicing produced in a particular tissue or a given organism, or all variants generated by a particular transcript. ASDB currently contains about 1700 protein sequences and can be accessed via the Internet at URL http://cbcg.nersc.gov/asdb


Asunto(s)
Empalme Alternativo , Bases de Datos Factuales , Isoformas de Proteínas/genética , Animales , Bases de Datos Factuales/tendencias , Humanos , Almacenamiento y Recuperación de la Información , Internet , Mutación , Isoformas de Proteínas/química , Alineación de Secuencia
14.
Artículo en Inglés | MEDLINE | ID: mdl-10786312

RESUMEN

We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locall homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utilit in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree-placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.


Asunto(s)
Análisis por Conglomerados , Técnicas Químicas Combinatorias , ARN Ribosómico/genética , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Eucariontes/genética , Genes Fúngicos , Genes Protozoarios , Modelos Estadísticos , Filogenia , ARN Ribosómico 18S/genética
15.
Microb Comp Genomics ; 3(3): 171-5, 1998.
Artículo en Inglés | MEDLINE | ID: mdl-9775387

RESUMEN

Analysis of DNA sequences of several microbial genomes has revealed that a large fraction of predicted coding regions has no known protein function. Information about the three-dimensional folds of these proteins may provide insight into their possible functions. To predict the folds for protein sequences with little or no homology to proteins of known function, we used computational neural networks trained on the database of proteins with known three-dimensional structures. Global descriptions of protein sequences based on physical and structural properties of the constituent amino acids were used as inputs for neural networks. Of the 131, 498, and 868 protein sequences of unknown function from Mycoplasma genitalium, Haemophilus influenzae, and Methanococcus jannaschii (Fleischmann et al. 1995), we have made high-confidence fold assignments for 4, 10, and 19 sequences, respectively.


Asunto(s)
Proteínas Bacterianas/genética , Pliegue de Proteína , Secuencia de Aminoácidos , Biología Computacional/clasificación , Bases de Datos Factuales/clasificación , Genoma Bacteriano , Haemophilus influenzae/genética , Methanococcus/genética , Datos de Secuencia Molecular , Mycoplasma/genética
16.
Artículo en Inglés | MEDLINE | ID: mdl-9322023

RESUMEN

This work demonstrates new techniques developed for the prediction of protein folding class in the context of the most comprehensive Structural Classification of Proteins (SCOP). The prediction method uses global descriptors of a protein in terms of the physical, chemical and structural properties of its constituent amino acids. Neural networks are utilized to combine these descriptors in a specific way to discriminate members of a given folding class from members of all other classes. It is shown that a specific amino acid's properties work completely differently on different folding classes. This creates the possibility of finding an individual set of descriptors that works best on a particular folding class.


Asunto(s)
Inteligencia Artificial , Pliegue de Proteína , Algoritmos , Aminoácidos/química , Bases de Datos Factuales , Estudios de Evaluación como Asunto , Redes Neurales de la Computación , Conformación Proteica , Proteínas/química , Proteínas/clasificación
17.
Proc Natl Acad Sci U S A ; 92(19): 8700-4, 1995 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-7568000

RESUMEN

We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.


Asunto(s)
Secuencia de Aminoácidos , Simulación por Computador , Modelos Químicos , Pliegue de Proteína , Aminoácidos/química , Bases de Datos Factuales , Redes Neurales de la Computación , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/clasificación , Reproducibilidad de los Resultados , Solventes
18.
Artículo en Inglés | MEDLINE | ID: mdl-7584443

RESUMEN

A method of quantitative comparison of two classifications rules applied to protein folding problem is presented. Classification of proteins based on sequence homology and based on amino acid composition were compared and analyzed according to this approach. The coefficient of correlation between these classification methods and the procedure of estimation of robustness of the coefficient are discussed.


Asunto(s)
Aminoácidos/análisis , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Homología de Secuencia de Aminoácido , Secuencia de Aminoácidos , Bases de Datos Factuales , Datos de Secuencia Molecular , Programas Informáticos
19.
Biotechniques ; 14(6): 984-9, 1993 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-8333967

RESUMEN

A computer program, PROBE, has been designed for the prediction of protein structural features from amino acid sequence. This program integrates a variety of computer-simulated neural networks, each predicting an aspect of protein structure, into a single, easy-to-use package. The surface accessibility of each residue, the presence of disulfide bonds, the overall secondary structure composition and the residue secondary structures, including beta-turn type, are predicted. In addition, the overall amino acid composition and relative hydrophobicity are used to determine whether a protein belongs to one of four common folding motifs. PROBE is able to compare and synergistically improve the predictions by allowing communication between the different networks.


Asunto(s)
Redes Neurales de la Computación , Estructura Secundaria de Proteína , Programas Informáticos , Secuencia de Aminoácidos , Datos de Secuencia Molecular
20.
Proteins ; 16(1): 79-91, 1993 May.
Artículo en Inglés | MEDLINE | ID: mdl-8497486

RESUMEN

An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4 alpha-helical bundles, (2) parallel (alpha/beta)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.


Asunto(s)
Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Secuencia de Aminoácidos , Bases de Datos Factuales , Redes Neurales de la Computación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA