Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Nucleic Acids Res ; 36(Database issue): D196-201, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18158298

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).


Assuntos
Bases de Dados de Proteínas , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas Fúngicas/metabolismo , Genoma Fúngico , Genoma de Planta , Genômica , Internet , Proteínas de Plantas/metabolismo , Mapeamento de Interação de Proteínas , Análise de Sequência de Proteína , Software , Interface Usuário-Computador
2.
Nucleic Acids Res ; 34(Database issue): D169-72, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381839

RESUMO

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).


Assuntos
Bases de Dados Genéticas , Genômica , Proteínas/genética , Animais , Biologia Computacional/métodos , Evolução Molecular , Internet , Camundongos , Modelos Genéticos , Mapeamento de Interação de Proteínas , Interface Usuário-Computador
3.
Nucleic Acids Res ; 30(1): 31-4, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752246

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).


Assuntos
Bases de Dados Genéticas , Bases de Dados de Proteínas , Genoma , Sequência de Aminoácidos , Arabidopsis/genética , Sequência de Bases , Etiquetas de Sequências Expressas , Genoma Fúngico , Genoma Humano , Genoma de Planta , Alemanha , Humanos , Internet , Proteínas Mitocondriais/genética , Neurospora crassa/genética , Leveduras/genética
4.
Nucleic Acids Res ; 32(Database issue): D41-4, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681354

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).


Assuntos
Bases de Dados de Proteínas , Genoma , Proteômica , Animais , Biologia Computacional , DNA Complementar/genética , Fungos/genética , Humanos , Internet , Modelos Biológicos , Ligação Proteica , Homologia de Sequência
5.
BMC Bioinformatics ; 6: 266, 2005 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-16274476

RESUMO

BACKGROUND: Alternative splicing is a major mechanism of generating protein diversity in higher eukaryotes. Although at least half, and probably more, of mammalian genes are alternatively spliced, it was not clear, whether the frequency of alternative splicing is the same in different functional categories. The problem is obscured by uneven coverage of genes by ESTs and a large number of artifacts in the EST data. RESULTS: We have developed a method that generates possible mRNA isoforms for human genes contained in the EDAS database, taking into account the effects of nonsense-mediated decay and translation initiation rules, and a procedure for offsetting the effects of uneven EST coverage. Then we computed the number of mRNA isoforms for genes from different functional categories. Genes encoding ribosomal proteins and genes in the category "Small GTPase-mediated signal transduction" tend to have fewer isoforms than the average, whereas the genes in the category "DNA replication and chromosome cycle" have more isoforms than the average. Genes encoding proteins involved in protein-protein interactions tend to be alternatively spliced more often than genes encoding non-interacting proteins, although there is no significant difference in the number of isoforms of alternatively spliced genes. CONCLUSION: Filtering for functional isoforms satisfying biological constraints and accounting for uneven EST coverage allowed us to describe differences in alternative splicing of genes from different functional categories. The observations seem to be consistent with expectations based on current biological knowledge: less isoforms for ribosomal and signal transduction proteins, and more alternative splicing of interacting and cell cycle proteins.


Assuntos
Algoritmos , Processamento Alternativo/fisiologia , Mapeamento Cromossômico/métodos , Códon de Iniciação , Computadores Moleculares , Humanos , Biossíntese de Proteínas , Isoformas de Proteínas/classificação , RNA Mensageiro/química , RNA Mensageiro/classificação , Software
6.
Prog Biophys Mol Biol ; 72(1): 1-17, 1999.
Artigo em Inglês | MEDLINE | ID: mdl-10446500

RESUMO

Spectacular achievements in whole genome sequencing open up new possibilities for structural research. Protein structures can now be studied in their natural genomic context. On the other hand, structure prediction algorithms can be improved using species-specific tendencies in folding patterns. Finally, efficient strategies to select targets for structure determination can be devised. In this review we consider new computational approaches and results in protein structure analysis stemming from the availability of complete genomes.


Assuntos
Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Bactérias/genética , Fungos/genética , Internet , Modelos Moleculares , Estrutura Molecular , Dobramento de Proteína , Análise de Sequência
7.
J Mol Biol ; 228(3): 951-62, 1992 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-1469726

RESUMO

A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families.


Assuntos
Redes Neurais de Computação , Proteínas/genética , Alinhamento de Sequência/métodos , Aldeído Desidrogenase/genética , Sequência de Aminoácidos , Computadores , Sequência Conservada , DNA Nucleotidiltransferases/genética , DNA Polimerase Dirigida por DNA/genética , Bases de Dados Factuais , Imunoglobulinas/genética , Integrases , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Software
8.
J Mol Biol ; 311(4): 639-56, 2001 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-11518521

RESUMO

We describe a computational approach for finding genes that are functionally related but do not possess any noticeable sequence similarity. Our method, which we call SNAP (similarity-neighborhood approach), reveals the conservation of gene order on bacterial chromosomes based on both cross-genome comparison and context information. The novel feature of this method is that it does not rely on detection of conserved colinear gene strings. Instead, we introduce the notion of a similarity-neighborhood graph (SN-graph), which is constructed from the chains of similarity and neighborhood relationships between orthologous genes in different genomes and adjacent genes in the same genome, respectively. An SN-cycle is defined as a closed path on the SN-graph and is postulated to preferentially join functionally related gene products that participate in the same biochemical or regulatory process. We demonstrate the substantial non-randomness and functional significance of SN-cycles derived from real genome data and estimate the prediction accuracy of SNAP in assigning broad function to uncharacterized proteins. Examples of practical application of SNAP for improving the quality of genome annotation are described.


Assuntos
Bactérias/genética , Ordem dos Genes/genética , Genes Bacterianos/genética , Genoma Bacteriano , Genômica/métodos , Algoritmos , Bactérias/metabolismo , Biologia Computacional/métodos , Sequência Conservada/genética , Bases de Dados como Assunto , Família Multigênica/genética
9.
Gene ; 234(2): 257-65, 1999 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-10395898

RESUMO

Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding site (RBS) properties of a given genome and the potential gene start prediction accuracy. This correlation is of considerable predictive power and may be useful for estimating the expected success of future genome analysis efforts. We also demonstrate that the RBS properties depend on the phylogenetic position of a genome.


Assuntos
Genes Bacterianos/genética , Algoritmos , Sequência de Bases , Sítios de Ligação , Códon de Iniciação/genética , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Evolução Molecular , Filogenia , RNA Bacteriano/genética , RNA Ribossômico/genética , Reprodutibilidade dos Testes , Ribossomos/metabolismo , Alinhamento de Sequência , Software
10.
Mol Biol (Mosk) ; 24(5): 1241-5, 1990.
Artigo em Russo | MEDLINE | ID: mdl-2127071

RESUMO

Local homology was found between a conservative region of the family of alpha-subunits of GTP-binding proteins and the ganglioside-binding site of influenza virus hemagglutinins. Both families of proteins have similar patterns of distribution of hydrophilic and hydrophobic amino acid residues. GTP-binding proteins and hemagglutinins are proposed to have a common molecular mechanism which underlies their attachment to cell membrane.


Assuntos
Proteínas de Ligação ao GTP/genética , Gangliosídeos/metabolismo , Hemaglutininas Virais/genética , Orthomyxoviridae/metabolismo , Sequência de Aminoácidos , Animais , Sítios de Ligação , Bovinos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico , Transducina/genética
11.
Zh Evol Biokhim Fiziol ; 26(1): 14-29, 1990.
Artigo em Russo | MEDLINE | ID: mdl-2113755

RESUMO

A sequence comparison of signal receptor proteins (SR) was carried out using computer techniques based on physicochemical characteristics of amino acids. A new method of conserved regions determination for a family of proteins is described. Visual pigments have four, and all SR--three such regions in the cytoplasmic loops. Possible functional significance of these regions is discussed. We also report here that the family of SR is similar with the family of G-proteins involved in extracellular signal transduction. Both families have similar regions consisting of 7-8 amino acids and a number of identical amino acids distributed on the considerable part of the polypeptide chain of the proteins. These facts may indicate that the whole ensemble of the proteins participating in transmembrane signalling pathways (or some part of it) could evolve from a common progenitor. At the same time, similar structure elements of members of the mentioned protein families my be functionally important for protein-protein interaction.


Assuntos
Evolução Biológica , Proteínas de Ligação ao GTP/análise , Sinais Direcionadores de Proteínas/análise , Sequência de Aminoácidos , Animais , Fenômenos Químicos , Físico-Química , Matemática , Dados de Sequência Molecular , Ligação Proteica , Transdução de Sinais
12.
Zh Evol Biokhim Fiziol ; 24(6): 797-807, 1988.
Artigo em Russo | MEDLINE | ID: mdl-2854348

RESUMO

Computer analysis has been made of the primary structure of 6 different types of receptor proteins: rhodopsin, adrenoreceptor, muscarinic acetylcholine receptor, insulin receptor, nicotinic cholinoreceptor, and bacteriorhodopsin. The aim of the present investigation was to elucidate, at least partially, to what extent insignificant similarity in the primary structure of rhodopsin, muscarinic cholinoreceptor and adrenoreceptor is due to divergent, but not convergent, evolution. Nicotinic cholinoreceptor, bacteriorhodopsin and insulin receptor were chosen for comparison with rhodopsin, adrenoreceptor and muscarinic cholinoreceptor since each of these proteins exhibits this or that structural or functional property which is common for rhodopsin, adrenoreceptor or muscarinic cholinoreceptor; on the other hand, nicotinic cholinoreceptor, bacteriorhodopsin and insulin receptor differ from other receptor proteins by their molecular mechanisms. Comparison of the primary structure of rhodopsin, adrenoreceptor and muscarinic cholinoreceptor on the one hand, and insulin receptor, nicotinic cholinoreceptor and bacteriorhodopsin on the other indicates that only the former exhibit similar primary structure, whereas insulin receptor, nicotinic cholinoreceptor and bacteriorhodopsin show no similarity neither in their primary structure, nor in the primary structure of rhodopsin and other receptor proteins which are similar to the latter with respect to their mode of action. The data obtained indicate that similarity in the primary structure between rhodopsin, muscarinic cholinoreceptor and adrenoreceptor is a consequence of divergent, not convergent, evolution; in other words, these receptor proteins are homologous.


Assuntos
Receptores Adrenérgicos beta/análise , Receptores Muscarínicos/análise , Pigmentos da Retina/análise , Rodopsina/análise , Sequência de Aminoácidos , Animais , Bacteriorodopsinas/análise , Evolução Biológica , Humanos , Dados de Sequência Molecular , Receptor de Insulina/análise , Receptores Nicotínicos/análise , Relação Estrutura-Atividade
15.
J Mol Microbiol Biotechnol ; 16(1-2): 81-90, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-18957864

RESUMO

Anaerobranca gottschalkii strain LBS3 T is an extremophile living at high temperature (up to 65 degrees C) and in alkaline environments (up to pH 10.5). An assembly of 696 DNA contigs representing about 96% of the 2.26-Mbp genome of A. gottschalkii has been generated with a low-sequence-coverage shotgun-sequencing strategy. The chosen sequencing strategy provided rapid and economical access to genes encoding key enzymes of the mono- and polysaccharide metabolism, without dilution of spare resources for extensive sequencing of genes lacking potential economical value. Five of these amylolytic enzymes of considerable commercial interest for biotechnological applications have been expressed and characterized in more detail after identification of their genes in the partial genome sequence: type I pullulanase, cyclodextrin glycosyltransferase (CGTase), two alpha-amylases (AmyA and AmyB), and an alpha-1,4-glucan-branching enzyme.


Assuntos
Biotecnologia , Enzimas/genética , Genes Bacterianos/genética , Genoma Bacteriano/genética , Bactérias Gram-Positivas/enzimologia , Bactérias Gram-Positivas/genética , alfa-Amilases/química , alfa-Amilases/genética , alfa-Amilases/isolamento & purificação , alfa-Amilases/metabolismo
16.
Biochem Biophys Res Commun ; 219(3): 686-9, 1996 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-8645242

RESUMO

Prediction of the DsbC protein secondary structure has been performed using a novel prediction technique which is based on consideration of both local and long-range interactions between amino acid residues. The C-terminal portion of the protein is shown to contain the thioredoxin folding motif. The N-terminal part represents a yet unknown structural domain.


Assuntos
Isomerases/química , Estrutura Secundária de Proteína , Sequência de Aminoácidos , Sítios de Ligação , Dissulfetos , Erwinia/enzimologia , Escherichia coli/enzimologia , Glutationa Peroxidase/química , Glutationa Transferase/química , Modelos Estruturais , Dados de Sequência Molecular , Isomerases de Dissulfetos de Proteínas , Homologia de Sequência de Aminoácidos
17.
Proteins ; 23(4): 566-79, 1995 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-8749853

RESUMO

We have developed an automatic algorithm STRIDE for protein secondary structure assignment from atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information. Parameters of the pattern recognition procedure were optimized using designations provided by the crystallographers as a standard-of-truth. Comparison to the currently most widely used technique DSSP by Kabsch and Sander (Biopolymers 22:2577-2637, 1983) shows that STRIDE and DSSP assign secondary structural states in 58 and 31% of 226 protein chains in our data sample, respectively, in greater agreement with the specific residue-by-residue definitions provided by the discoverers of the structures while in 11% of the chains, the assignments are the same. STRIDE delineates every 11th helix and every 32nd strand more in accord with published assignments.


Assuntos
Algoritmos , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/química , Sequência de Aminoácidos , Ligação de Hidrogênio , Substâncias Macromoleculares , Modelos Teóricos , Probabilidade , Software
18.
Protein Eng ; 9(2): 133-42, 1996 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-9005434

RESUMO

Existing approaches to protein secondary structure prediction from the amino acid sequence usually rely on the statistics of local residue interactions within a sliding window and the secondary structural state of the central residue. The practically achieved accuracy limit of such single residue and single sequence prediction methods is 65% in three structural stages (alpha-helix, beta-strand and coil). Further improvement in the prediction quality is likely to require exploitation of various aspects of three-dimensional protein architecture. Here we make such an attempt and present an accurate algorithm for secondary structure prediction based on recognition of potentially hydrogen-bonded residues in a single amino acid sequence. The unique feature of our approach involves database-derived statistics on residue type occurrences in different classes of beta-bridges to delineate interacting beta-strands. The alpha-helical structures are also recognized on the basis of amino acid occurrences in hydrogen-bonded pairs (i,i + 4). The algorithm has a prediction accuracy of 68% in three structural stages, relies only on a single protein sequence as input and has the potential to be improved by 5-7% if homologous aligned sequences are also considered.


Assuntos
Sequência de Aminoácidos , Estrutura Secundária de Proteína , Algoritmos , Redes de Comunicação de Computadores , Ligação de Hidrogênio , Sistemas de Informação , Modelos Moleculares , Software
19.
Fold Des ; 2(3): 159-62, 1997.
Artigo em Inglês | MEDLINE | ID: mdl-9218953

RESUMO

BACKGROUND: The accuracy of secondary structure prediction for a protein from knowledge of its sequence has been significantly improved by about 7% to the 70-75% range by inclusion of information residing in sequences similar to the query sequence. The scientific literature has been inconsistent, if not negative, regarding chances for further improvement from the vast knowledge to be provided by genome sequencing efforts. RESULTS: By applying a prediction technique that is particularly sensitive to added sequence information to a standard set of query sequences with related primary structures taken from chronologically successive releases of the SWISS-PROT database, it is shown that prediction accuracy can be expected to reach 80-85% with a large 10-fold increase in present sequence knowledge. CONCLUSIONS: Even with present prediction approaches, improvement in prediction accuracy can still be expected, albeit limited to no more than 10%.


Assuntos
Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Bases de Dados Factuais , Proteínas/genética , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Software
20.
Proteins ; 27(3): 329-35, 1997 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-9094735

RESUMO

In this study we present an accurate secondary structure prediction procedure by using an query and related sequences. The most novel aspect of our approach is its reliance on local pairwise alignment of the sequence to be predicted with each related sequence rather than utilization of a multiple alignment. The residue-by-residue accuracy of the method is 75% in three structural states after jack-knife tests. The gain in prediction accuracy compared with the existing techniques, which are at best 72%, is achieved by secondary structure propensities based on both local and long-range effects, utilization of similar sequence information in the form of carefully selected pairwise alignment fragments, and reliance on a large collection of known protein primary structures. The method is especially appropriate for large-scale sequence analysis of efforts such as genome characterization, where precise and significant multiple sequence alignments are not available or achievable.


Assuntos
Algoritmos , Modelos Moleculares , Proteínas/química , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Proteínas de Transporte/química , Bases de Dados Factuais , Fatores Hospedeiros de Integração , Dados de Sequência Molecular , Conformação Proteica , Estrutura Secundária de Proteína , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA