Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Environ Microbiol ; 21(2): 784-799, 2019 02.
Article in English | MEDLINE | ID: mdl-30536693

ABSTRACT

Bacterial genes for molybdenum-containing and tungsten-containing enzymes are often differentially regulated depending on the metal availability in the environment. Here, we describe a new family of transcription factors with an unusual DNA-binding domain related to excisionases of bacteriophages. These transcription factors are associated with genes for various molybdate and tungstate-specific transporting systems as well as molybdo/tungsto-enzymes in a wide range of bacterial genomes. We used a combination of computational and experimental techniques to study a member of the TF family, named TaoR (for tungsten-containing aldehyde oxidoreductase regulator). In Desulfovibrio vulgaris Hildenborough, a model bacterium for sulfate reduction studies, TaoR activates expression of aldehyde oxidoreductase aor and represses tungsten-specific ABC-type transporter tupABC genes under tungsten-replete conditions. TaoR binding sites at aor promoter were identified by electrophoretic mobility shift assay and DNase I footprinting. We also reconstructed TaoR regulons in 45 Deltaproteobacteria by comparative genomics approach and predicted target genes for TaoR family members in other Proteobacteria and Firmicutes.


Subject(s)
ATP-Binding Cassette Transporters/genetics , Bacterial Proteins/metabolism , Desulfovibrio vulgaris/genetics , Desulfovibrio vulgaris/metabolism , Molybdenum/metabolism , Transcription Factors/metabolism , Tungsten Compounds/metabolism , ATP-Binding Cassette Transporters/metabolism , Bacterial Proteins/genetics , Binding Sites , Biological Transport , Desulfovibrio vulgaris/isolation & purification , Gene Expression Regulation, Bacterial , Gene Expression Regulation, Enzymologic , Multigene Family , Promoter Regions, Genetic , Regulon , Transcription Factors/genetics
2.
J Bacteriol ; 197(1): 29-39, 2015 Jan 01.
Article in English | MEDLINE | ID: mdl-25313388

ABSTRACT

Although the enzymes for dissimilatory sulfate reduction by microbes have been studied, the mechanisms for transcriptional regulation of the encoding genes remain unknown. In a number of bacteria the transcriptional regulator Rex has been shown to play a key role as a repressor of genes producing proteins involved in energy conversion. In the model sulfate-reducing microbe Desulfovibrio vulgaris Hildenborough, the gene DVU_0916 was observed to resemble other known Rex proteins. Therefore, the DVU_0916 protein has been predicted to be a transcriptional repressor of genes encoding proteins that function in the process of sulfate reduction in D. vulgaris Hildenborough. Examination of the deduced DVU_0916 protein identified two domains, one a winged helix DNA-binding domain common for transcription factors, and the other a Rossman fold that could potentially interact with pyridine nucleotides. A deletion of the putative rex gene was made in D. vulgaris Hildenborough, and transcript expression studies of sat, encoding sulfate adenylyl transferase, showed increased levels in the D. vulgaris Hildenborough Rex (RexDvH) mutant relative to the parental strain. The RexDvH-binding site upstream of sat was identified, confirming RexDvH to be a repressor of sat. We established in vitro that the presence of elevated NADH disrupted the interaction between RexDvH and DNA. Examination of the 5' transcriptional start site for the sat mRNA revealed two unique start sites, one for respiring cells that correlated with the RexDvH-binding site and a second for fermenting cells. Collectively, these data support the role of RexDvH as a transcription repressor for sat that senses the redox status of the cell.


Subject(s)
Bacterial Proteins/metabolism , Desulfovibrio vulgaris/metabolism , Gene Expression Regulation, Enzymologic/physiology , NAD/metabolism , Sulfate Adenylyltransferase/metabolism , Bacterial Proteins/genetics , Base Sequence , Binding Sites , Desulfovibrio vulgaris/genetics , Gene Deletion , Gene Expression Regulation, Bacterial/physiology , Sulfate Adenylyltransferase/antagonists & inhibitors , Sulfate Adenylyltransferase/genetics
3.
Nucleic Acids Res ; 29(19): 3928-38, 2001 Oct 01.
Article in English | MEDLINE | ID: mdl-11574674

ABSTRACT

Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.


Subject(s)
Computational Biology/methods , Genes, Archaeal , Genes, Bacterial , RNA, Untranslated/genetics , Escherichia coli/genetics , Forecasting , Genome, Archaeal , Genome, Bacterial , Neural Networks, Computer , RNA, Messenger/genetics
4.
Nucleic Acids Res ; 29(11): 2338-48, 2001 Jun 01.
Article in English | MEDLINE | ID: mdl-11376152

ABSTRACT

Alternative pre-mRNA splicing is a major cellular process by which functionally diverse proteins can be generated from the primary transcript of a single gene, often in tissue-specific patterns. The current study investigates the hypothesis that splicing of tissue-specific alternative exons is regulated in part by control sequences in adjacent introns and that such elements may be recognized via computational analysis of exons sharing a highly specific expression pattern. We have identified 25 brain-specific alternative cassette exons, compiled a dataset of genomic sequences encompassing these exons and their adjacent introns and used word contrast algorithms to analyze key features of these nucleotide sequences. By comparison to a control group of constitutive exons, brain-specific exons were often found to possess the following: divergent 5' splice sites; highly pyrimidine-rich upstream introns; a paucity of GGG motifs in the downstream intron; a highly statistically significant over-representation of the hexanucleotide UGCAUG in the proximal downstream intron. UGCAUG was also found at a high frequency downstream of a smaller group of muscle-specific exons. Intriguingly, UGCAUG has been identified previously in a few intron splicing enhancers. Our results indicate that this element plays a much wider role than previously appreciated in the regulated tissue-specific splicing of many alternative exons.


Subject(s)
Alternative Splicing , Brain/metabolism , Introns/genetics , RNA Precursors/genetics , Regulatory Sequences, Nucleic Acid , Algorithms , Base Sequence , DNA/genetics , Exons/genetics , Genes/genetics , Humans
5.
J Comput Biol ; 7(6): 849-62, 2000.
Article in English | MEDLINE | ID: mdl-11382366

ABSTRACT

This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about protein families. Given the limited expressive capacity of a particular method, a mixture of protein annotation and fold recognition experts, each implementing a different underlying representation, should provide a robust method for assigning sequences to families. These ideas are illustrated using two data-driven learning methods that make use of different prior information and employ independent, yet complementary, projections of a family: hidden Markov models (HMMs) based on a multiple sequence alignment and neural networks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biologically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosinemia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.


Subject(s)
Models, Molecular , Protein Folding , Proteins/chemistry , Amino Acid Sequence , Markov Chains , Molecular Sequence Data , Protein Structure, Tertiary , Proteins/metabolism , Sequence Alignment/methods
6.
Biotechniques ; 14(6): 984-9, 1993 Jun.
Article in English | MEDLINE | ID: mdl-8333967

ABSTRACT

A computer program, PROBE, has been designed for the prediction of protein structural features from amino acid sequence. This program integrates a variety of computer-simulated neural networks, each predicting an aspect of protein structure, into a single, easy-to-use package. The surface accessibility of each residue, the presence of disulfide bonds, the overall secondary structure composition and the residue secondary structures, including beta-turn type, are predicted. In addition, the overall amino acid composition and relative hydrophobicity are used to determine whether a protein belongs to one of four common folding motifs. PROBE is able to compare and synergistically improve the predictions by allowing communication between the different networks.


Subject(s)
Neural Networks, Computer , Protein Structure, Secondary , Software , Amino Acid Sequence , Molecular Sequence Data
7.
Science ; 313(5793): 1596-604, 2006 Sep 15.
Article in English | MEDLINE | ID: mdl-16973872

ABSTRACT

We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.


Subject(s)
Gene Duplication , Genome, Plant , Populus/genetics , Sequence Analysis, DNA , Arabidopsis/genetics , Chromosome Mapping , Computational Biology , Evolution, Molecular , Expressed Sequence Tags , Gene Expression , Genes, Plant , Oligonucleotide Array Sequence Analysis , Phylogeny , Plant Proteins/chemistry , Plant Proteins/genetics , Polymorphism, Single Nucleotide , Populus/growth & development , Populus/metabolism , Protein Structure, Tertiary , RNA, Plant/analysis , RNA, Untranslated/analysis
8.
Article in English | MEDLINE | ID: mdl-7584443

ABSTRACT

A method of quantitative comparison of two classifications rules applied to protein folding problem is presented. Classification of proteins based on sequence homology and based on amino acid composition were compared and analyzed according to this approach. The coefficient of correlation between these classification methods and the procedure of estimation of robustness of the coefficient are discussed.


Subject(s)
Amino Acids/analysis , Protein Conformation , Protein Folding , Proteins/chemistry , Sequence Homology, Amino Acid , Amino Acid Sequence , Databases, Factual , Molecular Sequence Data , Software
9.
Bioinformatics ; 17(4): 349-58, 2001 Apr.
Article in English | MEDLINE | ID: mdl-11301304

ABSTRACT

MOTIVATION: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.


Subject(s)
Neural Networks, Computer , Protein Folding , Proteins/chemistry , Discriminant Analysis , Proteins/classification
10.
Nucleic Acids Res ; 27(1): 301-2, 1999 Jan 01.
Article in English | MEDLINE | ID: mdl-9847209

ABSTRACT

A database of alternatively spliced genes (ASDB) has been constructed based on (i) the results of the analysis of Swiss-Prot entries containing products of these genes and (ii) clustering procedure joining proteins that could arise by alternative splicing of the same gene. ASDB incorporates information about alternatively spliced genes, their products and expression patterns. It can be searched in order to find all products of alternative splicing produced in a particular tissue or a given organism, or all variants generated by a particular transcript. ASDB currently contains about 1700 protein sequences and can be accessed via the Internet at URL http://cbcg.nersc.gov/asdb


Subject(s)
Alternative Splicing , Databases, Factual , Protein Isoforms/genetics , Animals , Databases, Factual/trends , Humans , Information Storage and Retrieval , Internet , Mutation , Protein Isoforms/chemistry , Sequence Alignment
11.
Proteins ; 16(1): 79-91, 1993 May.
Article in English | MEDLINE | ID: mdl-8497486

ABSTRACT

An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4 alpha-helical bundles, (2) parallel (alpha/beta)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.


Subject(s)
Models, Molecular , Protein Conformation , Protein Folding , Amino Acid Sequence , Databases, Factual , Neural Networks, Computer
12.
Article in English | MEDLINE | ID: mdl-9322023

ABSTRACT

This work demonstrates new techniques developed for the prediction of protein folding class in the context of the most comprehensive Structural Classification of Proteins (SCOP). The prediction method uses global descriptors of a protein in terms of the physical, chemical and structural properties of its constituent amino acids. Neural networks are utilized to combine these descriptors in a specific way to discriminate members of a given folding class from members of all other classes. It is shown that a specific amino acid's properties work completely differently on different folding classes. This creates the possibility of finding an individual set of descriptors that works best on a particular folding class.


Subject(s)
Artificial Intelligence , Protein Folding , Algorithms , Amino Acids/chemistry , Databases, Factual , Evaluation Studies as Topic , Neural Networks, Computer , Protein Conformation , Proteins/chemistry , Proteins/classification
13.
Microb Comp Genomics ; 3(3): 171-5, 1998.
Article in English | MEDLINE | ID: mdl-9775387

ABSTRACT

Analysis of DNA sequences of several microbial genomes has revealed that a large fraction of predicted coding regions has no known protein function. Information about the three-dimensional folds of these proteins may provide insight into their possible functions. To predict the folds for protein sequences with little or no homology to proteins of known function, we used computational neural networks trained on the database of proteins with known three-dimensional structures. Global descriptions of protein sequences based on physical and structural properties of the constituent amino acids were used as inputs for neural networks. Of the 131, 498, and 868 protein sequences of unknown function from Mycoplasma genitalium, Haemophilus influenzae, and Methanococcus jannaschii (Fleischmann et al. 1995), we have made high-confidence fold assignments for 4, 10, and 19 sequences, respectively.


Subject(s)
Bacterial Proteins/genetics , Protein Folding , Amino Acid Sequence , Computational Biology/classification , Databases, Factual/classification , Genome, Bacterial , Haemophilus influenzae/genetics , Methanococcus/genetics , Molecular Sequence Data , Mycoplasma/genetics
14.
Article in English | MEDLINE | ID: mdl-7584326

ABSTRACT

We have designed, trained and tested two types of neural networks for the prediction of protein folding pattern from sequence. Here we describe the differences in the networks and compare their performance on a variety of proteins. Both network representations are generally successful in predicting protein fold and can also be used together to confirm a prediction.


Subject(s)
Neural Networks, Computer , Protein Folding , Sequence Analysis/methods , Amino Acids/chemistry , Databases, Factual , Models, Molecular , Proteins/chemistry , Reproducibility of Results
15.
Proc Natl Acad Sci U S A ; 92(19): 8700-4, 1995 Sep 12.
Article in English | MEDLINE | ID: mdl-7568000

ABSTRACT

We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.


Subject(s)
Amino Acid Sequence , Computer Simulation , Models, Chemical , Protein Folding , Amino Acids/chemistry , Databases, Factual , Neural Networks, Computer , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification , Reproducibility of Results , Solvents
16.
Proteins ; 35(4): 401-7, 1999 Jun 01.
Article in English | MEDLINE | ID: mdl-10382667

ABSTRACT

A computational method has been developed for the assignment of a protein sequence to a folding class in the Structural Classification of Proteins (SCOP). This method uses global descriptors of a primary protein sequence in terms of the physical, chemical, and structural properties of the constituent amino acids. Neural networks are utilized to combine these descriptors in a way to discriminate members of a given fold from members of all other folds. An extensive testing of the method has been performed to evaluate its prediction accuracy. The method is applicable for the fold assignment of any protein sequence with or without significant sequence homology to known proteins. A WWW page for predicting protein folds is available at URL http://cbcg.lbl.gov/.


Subject(s)
Protein Folding , Proteins/chemistry , Amino Acids/chemistry , Databases, Factual
17.
Nucleic Acids Res ; 28(1): 296-7, 2000 Jan 01.
Article in English | MEDLINE | ID: mdl-10592252

ABSTRACT

Version 2.1 of ASDB (Alternative Splicing Data Base) contains 1922 protein and 2486 DNA sequences. The protein entries from SWISS-PROT are joined into clusters corresponding to alternatively spliced variants of one gene. The DNA division consists of complete genes with alternative splicing mentioned or annotated in GenBank. The search engine allows one to search over SWISS-PROT and GenBank fields and then follow the links to all variants. The database can be assessed at the URL http://cbcg.nersc.gov/asdb


Subject(s)
Alternative Splicing/genetics , DNA/genetics , Databases, Factual , Proteins/chemistry , Internet , Proteins/genetics
18.
Article in English | MEDLINE | ID: mdl-10786312

ABSTRACT

We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locall homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utilit in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree-placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.


Subject(s)
Cluster Analysis , Combinatorial Chemistry Techniques , RNA, Ribosomal/genetics , Sequence Analysis, RNA/methods , Algorithms , Animals , Eukaryota/genetics , Genes, Fungal , Genes, Protozoan , Models, Statistical , Phylogeny , RNA, Ribosomal, 18S/genetics
19.
J Theor Biol ; 212(2): 129-39, 2001 Sep 21.
Article in English | MEDLINE | ID: mdl-11531380

ABSTRACT

Automatic identification of sub-structures in multi-aligned sequences is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and other molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically significant blocks, or segments. This algorithm applies dynamic programming and progressive optimization to the statistical profile of a multi-alignment in order to optimally demarcate relatively homogenous sub-regions. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns were identified automatically and efficiently: shared conserved domain; shared variable motif; and rare signature sequence. Results were consistent with the patterns identified through independent phylogenetic and structural approaches. This algorithm facilitates the automation of sequence-based molecular structural and evolutionary analyses through statistical modeling and high performance computation.


Subject(s)
Algorithms , Computational Biology/methods , Models, Genetic , Sequence Alignment , Animals , Conserved Sequence , RNA, Ribosomal, 16S
20.
Genome Res ; 10(9): 1304-6, 2000 Sep.
Article in English | MEDLINE | ID: mdl-10984448

ABSTRACT

Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human-mouse DNA comparison studies have discovered numerous conserved noncoding sequences of which only a fraction has been functionally investigated A question therefore remains as to whether most of these noncoding sequences are conserved because of functional constraints or are the result of a lack of divergence time.


Subject(s)
Conserved Sequence/genetics , Sequence Alignment , Untranslated Regions/genetics , Animals , Dogs , Humans , Mice , Molecular Sequence Data , Species Specificity , Untranslated Regions/isolation & purification
SELECTION OF CITATIONS
SEARCH DETAIL