Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Environ Microbiol ; 21(2): 784-799, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30536693

RESUMO

Bacterial genes for molybdenum-containing and tungsten-containing enzymes are often differentially regulated depending on the metal availability in the environment. Here, we describe a new family of transcription factors with an unusual DNA-binding domain related to excisionases of bacteriophages. These transcription factors are associated with genes for various molybdate and tungstate-specific transporting systems as well as molybdo/tungsto-enzymes in a wide range of bacterial genomes. We used a combination of computational and experimental techniques to study a member of the TF family, named TaoR (for tungsten-containing aldehyde oxidoreductase regulator). In Desulfovibrio vulgaris Hildenborough, a model bacterium for sulfate reduction studies, TaoR activates expression of aldehyde oxidoreductase aor and represses tungsten-specific ABC-type transporter tupABC genes under tungsten-replete conditions. TaoR binding sites at aor promoter were identified by electrophoretic mobility shift assay and DNase I footprinting. We also reconstructed TaoR regulons in 45 Deltaproteobacteria by comparative genomics approach and predicted target genes for TaoR family members in other Proteobacteria and Firmicutes.


Assuntos
Transportadores de Cassetes de Ligação de ATP/genética , Proteínas de Bactérias/metabolismo , Desulfovibrio vulgaris/genética , Desulfovibrio vulgaris/metabolismo , Molibdênio/metabolismo , Fatores de Transcrição/metabolismo , Compostos de Tungstênio/metabolismo , Transportadores de Cassetes de Ligação de ATP/metabolismo , Proteínas de Bactérias/genética , Sítios de Ligação , Transporte Biológico , Desulfovibrio vulgaris/isolamento & purificação , Regulação Bacteriana da Expressão Gênica , Regulação Enzimológica da Expressão Gênica , Família Multigênica , Regiões Promotoras Genéticas , Regulon , Fatores de Transcrição/genética
2.
J Bacteriol ; 197(1): 29-39, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25313388

RESUMO

Although the enzymes for dissimilatory sulfate reduction by microbes have been studied, the mechanisms for transcriptional regulation of the encoding genes remain unknown. In a number of bacteria the transcriptional regulator Rex has been shown to play a key role as a repressor of genes producing proteins involved in energy conversion. In the model sulfate-reducing microbe Desulfovibrio vulgaris Hildenborough, the gene DVU_0916 was observed to resemble other known Rex proteins. Therefore, the DVU_0916 protein has been predicted to be a transcriptional repressor of genes encoding proteins that function in the process of sulfate reduction in D. vulgaris Hildenborough. Examination of the deduced DVU_0916 protein identified two domains, one a winged helix DNA-binding domain common for transcription factors, and the other a Rossman fold that could potentially interact with pyridine nucleotides. A deletion of the putative rex gene was made in D. vulgaris Hildenborough, and transcript expression studies of sat, encoding sulfate adenylyl transferase, showed increased levels in the D. vulgaris Hildenborough Rex (RexDvH) mutant relative to the parental strain. The RexDvH-binding site upstream of sat was identified, confirming RexDvH to be a repressor of sat. We established in vitro that the presence of elevated NADH disrupted the interaction between RexDvH and DNA. Examination of the 5' transcriptional start site for the sat mRNA revealed two unique start sites, one for respiring cells that correlated with the RexDvH-binding site and a second for fermenting cells. Collectively, these data support the role of RexDvH as a transcription repressor for sat that senses the redox status of the cell.


Assuntos
Proteínas de Bactérias/metabolismo , Desulfovibrio vulgaris/metabolismo , Regulação Enzimológica da Expressão Gênica/fisiologia , NAD/metabolismo , Sulfato Adenililtransferase/metabolismo , Proteínas de Bactérias/genética , Sequência de Bases , Sítios de Ligação , Desulfovibrio vulgaris/genética , Deleção de Genes , Regulação Bacteriana da Expressão Gênica/fisiologia , Sulfato Adenililtransferase/antagonistas & inibidores , Sulfato Adenililtransferase/genética
3.
Science ; 313(5793): 1596-604, 2006 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-16973872

RESUMO

We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.


Assuntos
Duplicação Gênica , Genoma de Planta , Populus/genética , Análise de Sequência de DNA , Arabidopsis/genética , Mapeamento Cromossômico , Biologia Computacional , Evolução Molecular , Etiquetas de Sequências Expressas , Expressão Gênica , Genes de Plantas , Análise de Sequência com Séries de Oligonucleotídeos , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Polimorfismo de Nucleotídeo Único , Populus/crescimento & desenvolvimento , Populus/metabolismo , Estrutura Terciária de Proteína , RNA de Plantas/análise , RNA não Traduzido/análise
4.
J Theor Biol ; 212(2): 129-39, 2001 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-11531380

RESUMO

Automatic identification of sub-structures in multi-aligned sequences is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and other molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically significant blocks, or segments. This algorithm applies dynamic programming and progressive optimization to the statistical profile of a multi-alignment in order to optimally demarcate relatively homogenous sub-regions. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns were identified automatically and efficiently: shared conserved domain; shared variable motif; and rare signature sequence. Results were consistent with the patterns identified through independent phylogenetic and structural approaches. This algorithm facilitates the automation of sequence-based molecular structural and evolutionary analyses through statistical modeling and high performance computation.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Genéticos , Alinhamento de Sequência , Animais , Sequência Conservada , RNA Ribossômico 16S
5.
Nucleic Acids Res ; 29(19): 3928-38, 2001 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-11574674

RESUMO

Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.


Assuntos
Biologia Computacional/métodos , Genes Arqueais , Genes Bacterianos , RNA não Traduzido/genética , Escherichia coli/genética , Previsões , Genoma Arqueal , Genoma Bacteriano , Redes Neurais de Computação , RNA Mensageiro/genética
6.
Nucleic Acids Res ; 29(11): 2338-48, 2001 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-11376152

RESUMO

Alternative pre-mRNA splicing is a major cellular process by which functionally diverse proteins can be generated from the primary transcript of a single gene, often in tissue-specific patterns. The current study investigates the hypothesis that splicing of tissue-specific alternative exons is regulated in part by control sequences in adjacent introns and that such elements may be recognized via computational analysis of exons sharing a highly specific expression pattern. We have identified 25 brain-specific alternative cassette exons, compiled a dataset of genomic sequences encompassing these exons and their adjacent introns and used word contrast algorithms to analyze key features of these nucleotide sequences. By comparison to a control group of constitutive exons, brain-specific exons were often found to possess the following: divergent 5' splice sites; highly pyrimidine-rich upstream introns; a paucity of GGG motifs in the downstream intron; a highly statistically significant over-representation of the hexanucleotide UGCAUG in the proximal downstream intron. UGCAUG was also found at a high frequency downstream of a smaller group of muscle-specific exons. Intriguingly, UGCAUG has been identified previously in a few intron splicing enhancers. Our results indicate that this element plays a much wider role than previously appreciated in the regulated tissue-specific splicing of many alternative exons.


Assuntos
Processamento Alternativo , Encéfalo/metabolismo , Íntrons/genética , Precursores de RNA/genética , Sequências Reguladoras de Ácido Nucleico , Algoritmos , Sequência de Bases , DNA/genética , Éxons/genética , Genes/genética , Humanos
7.
Bioinformatics ; 17(4): 349-58, 2001 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-11301304

RESUMO

MOTIVATION: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.


Assuntos
Redes Neurais de Computação , Dobramento de Proteína , Proteínas/química , Análise Discriminante , Proteínas/classificação
8.
Genome Res ; 10(9): 1304-6, 2000 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-10984448

RESUMO

Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human-mouse DNA comparison studies have discovered numerous conserved noncoding sequences of which only a fraction has been functionally investigated A question therefore remains as to whether most of these noncoding sequences are conserved because of functional constraints or are the result of a lack of divergence time.


Assuntos
Sequência Conservada/genética , Alinhamento de Sequência , Regiões não Traduzidas/genética , Animais , Cães , Humanos , Camundongos , Dados de Sequência Molecular , Especificidade da Espécie , Regiões não Traduzidas/isolamento & purificação
9.
Nucleic Acids Res ; 28(1): 296-7, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592252

RESUMO

Version 2.1 of ASDB (Alternative Splicing Data Base) contains 1922 protein and 2486 DNA sequences. The protein entries from SWISS-PROT are joined into clusters corresponding to alternatively spliced variants of one gene. The DNA division consists of complete genes with alternative splicing mentioned or annotated in GenBank. The search engine allows one to search over SWISS-PROT and GenBank fields and then follow the links to all variants. The database can be assessed at the URL http://cbcg.nersc.gov/asdb


Assuntos
Processamento Alternativo/genética , DNA/genética , Bases de Dados Factuais , Proteínas/química , Internet , Proteínas/genética
10.
Bioinformatics ; 16(11): 1046-7, 2000 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-11159318

RESUMO

SUMMARY: VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers. AVAILABILITY: VISTA server is available on the web at http://www-gsd.lbl.gov/vista. The source code is available upon request. CONTACT: vista@lbl.gov


Assuntos
DNA/genética , Alinhamento de Sequência/estatística & dados numéricos , Software , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Coelhos
11.
J Comput Biol ; 7(6): 849-62, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-11382366

RESUMO

This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about protein families. Given the limited expressive capacity of a particular method, a mixture of protein annotation and fold recognition experts, each implementing a different underlying representation, should provide a robust method for assigning sequences to families. These ideas are illustrated using two data-driven learning methods that make use of different prior information and employ independent, yet complementary, projections of a family: hidden Markov models (HMMs) based on a multiple sequence alignment and neural networks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biologically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosinemia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.


Assuntos
Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Sequência de Aminoácidos , Cadeias de Markov , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Proteínas/metabolismo , Alinhamento de Sequência/métodos
12.
Proteins ; 35(4): 401-7, 1999 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-10382667

RESUMO

A computational method has been developed for the assignment of a protein sequence to a folding class in the Structural Classification of Proteins (SCOP). This method uses global descriptors of a primary protein sequence in terms of the physical, chemical, and structural properties of the constituent amino acids. Neural networks are utilized to combine these descriptors in a way to discriminate members of a given fold from members of all other folds. An extensive testing of the method has been performed to evaluate its prediction accuracy. The method is applicable for the fold assignment of any protein sequence with or without significant sequence homology to known proteins. A WWW page for predicting protein folds is available at URL http://cbcg.lbl.gov/.


Assuntos
Dobramento de Proteína , Proteínas/química , Aminoácidos/química , Bases de Dados Factuais
13.
Nucleic Acids Res ; 27(1): 301-2, 1999 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-9847209

RESUMO

A database of alternatively spliced genes (ASDB) has been constructed based on (i) the results of the analysis of Swiss-Prot entries containing products of these genes and (ii) clustering procedure joining proteins that could arise by alternative splicing of the same gene. ASDB incorporates information about alternatively spliced genes, their products and expression patterns. It can be searched in order to find all products of alternative splicing produced in a particular tissue or a given organism, or all variants generated by a particular transcript. ASDB currently contains about 1700 protein sequences and can be accessed via the Internet at URL http://cbcg.nersc.gov/asdb


Assuntos
Processamento Alternativo , Bases de Dados Factuais , Isoformas de Proteínas/genética , Animais , Bases de Dados Factuais/tendências , Humanos , Armazenamento e Recuperação da Informação , Internet , Mutação , Isoformas de Proteínas/química , Alinhamento de Sequência
14.
Artigo em Inglês | MEDLINE | ID: mdl-10786312

RESUMO

We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locall homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utilit in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree-placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.


Assuntos
Análise por Conglomerados , Técnicas de Química Combinatória , RNA Ribossômico/genética , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Eucariotos/genética , Genes Fúngicos , Genes de Protozoários , Modelos Estatísticos , Filogenia , RNA Ribossômico 18S/genética
15.
Microb Comp Genomics ; 3(3): 171-5, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9775387

RESUMO

Analysis of DNA sequences of several microbial genomes has revealed that a large fraction of predicted coding regions has no known protein function. Information about the three-dimensional folds of these proteins may provide insight into their possible functions. To predict the folds for protein sequences with little or no homology to proteins of known function, we used computational neural networks trained on the database of proteins with known three-dimensional structures. Global descriptions of protein sequences based on physical and structural properties of the constituent amino acids were used as inputs for neural networks. Of the 131, 498, and 868 protein sequences of unknown function from Mycoplasma genitalium, Haemophilus influenzae, and Methanococcus jannaschii (Fleischmann et al. 1995), we have made high-confidence fold assignments for 4, 10, and 19 sequences, respectively.


Assuntos
Proteínas de Bactérias/genética , Dobramento de Proteína , Sequência de Aminoácidos , Biologia Computacional/classificação , Bases de Dados Factuais/classificação , Genoma Bacteriano , Haemophilus influenzae/genética , Mathanococcus/genética , Dados de Sequência Molecular , Mycoplasma/genética
16.
Artigo em Inglês | MEDLINE | ID: mdl-9322023

RESUMO

This work demonstrates new techniques developed for the prediction of protein folding class in the context of the most comprehensive Structural Classification of Proteins (SCOP). The prediction method uses global descriptors of a protein in terms of the physical, chemical and structural properties of its constituent amino acids. Neural networks are utilized to combine these descriptors in a specific way to discriminate members of a given folding class from members of all other classes. It is shown that a specific amino acid's properties work completely differently on different folding classes. This creates the possibility of finding an individual set of descriptors that works best on a particular folding class.


Assuntos
Inteligência Artificial , Dobramento de Proteína , Algoritmos , Aminoácidos/química , Bases de Dados Factuais , Estudos de Avaliação como Assunto , Redes Neurais de Computação , Conformação Proteica , Proteínas/química , Proteínas/classificação
17.
Proc Natl Acad Sci U S A ; 92(19): 8700-4, 1995 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-7568000

RESUMO

We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.


Assuntos
Sequência de Aminoácidos , Simulação por Computador , Modelos Químicos , Dobramento de Proteína , Aminoácidos/química , Bases de Dados Factuais , Redes Neurais de Computação , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/classificação , Reprodutibilidade dos Testes , Solventes
18.
Artigo em Inglês | MEDLINE | ID: mdl-7584443

RESUMO

A method of quantitative comparison of two classifications rules applied to protein folding problem is presented. Classification of proteins based on sequence homology and based on amino acid composition were compared and analyzed according to this approach. The coefficient of correlation between these classification methods and the procedure of estimation of robustness of the coefficient are discussed.


Assuntos
Aminoácidos/análise , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Bases de Dados Factuais , Dados de Sequência Molecular , Software
19.
Biotechniques ; 14(6): 984-9, 1993 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-8333967

RESUMO

A computer program, PROBE, has been designed for the prediction of protein structural features from amino acid sequence. This program integrates a variety of computer-simulated neural networks, each predicting an aspect of protein structure, into a single, easy-to-use package. The surface accessibility of each residue, the presence of disulfide bonds, the overall secondary structure composition and the residue secondary structures, including beta-turn type, are predicted. In addition, the overall amino acid composition and relative hydrophobicity are used to determine whether a protein belongs to one of four common folding motifs. PROBE is able to compare and synergistically improve the predictions by allowing communication between the different networks.


Assuntos
Redes Neurais de Computação , Estrutura Secundária de Proteína , Software , Sequência de Aminoácidos , Dados de Sequência Molecular
20.
Proteins ; 16(1): 79-91, 1993 May.
Artigo em Inglês | MEDLINE | ID: mdl-8497486

RESUMO

An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4 alpha-helical bundles, (2) parallel (alpha/beta)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.


Assuntos
Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Sequência de Aminoácidos , Bases de Dados Factuais , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...