Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Pac Symp Biocomput ; : 375-86, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14992518

RESUMO

Structural genomics--large-scale macromolecular 3-dimenional structure determination--is unique in that major participants report scientific progress on a weekly basis. The target database (TargetDB) maintained by the Protein Data Bank (http://targetdb.pdb.org) reports this progress through the status of each protein sequence (target) under consideration by the major structural genomics centers worldwide. Hence, TargetDB provides a unique opportunity to analyze the potential impact that this major initiative provides to scientists interested in the sequence-structure-function-disease paradigm. Here we report such an analysis with a focus on: (i) temporal characteristics--how is the project doing and what can we expect in the future? (ii) target characteristics--what are the predicted functions of the proteins targeted by structural genomics and how biased is the target set when compared to the PDB and to predictions across complete genomes? (iii) structures solved--what are the characteristics of structures solved thus far and what do they contribute? The analysis required a more extensive database of structure predictions using different methods integrated with data from other sources. This database, associated tools and related data sources are available from http://spam.sdsc.edu.


Assuntos
Biologia Computacional , Genômica/estatística & dados numéricos , Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Proteínas/genética , Proteômica/estatística & dados numéricos
2.
Bioinformatics ; 20(12): 1940-7, 2004 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-15044237

RESUMO

MOTIVATION: Analysis of large biological data sets using a variety of parallel processor computer architectures is a common task in bioinformatics. The efficiency of the analysis can be significantly improved by properly handling redundancy present in these data combined with taking advantage of the unique features of these compute architectures. RESULTS: We describe a generalized approach to this analysis, but present specific results using the program CEPAR, an efficient implementation of the Combinatorial Extension algorithm in a massively parallel (PAR) mode for finding pairwise protein structure similarities and aligning protein structures from the Protein Data Bank. CEPAR design and implementation are described and results provided for the efficiency of the algorithm when run on a large number of processors. AVAILABILITY: Source code is available by contacting one of the authors.


Assuntos
Algoritmos , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Metodologias Computacionais , Modelos Moleculares , Conformação Proteica , Homologia de Sequência de Aminoácidos
3.
Pac Symp Biocomput ; : 275-86, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11262947

RESUMO

We have developed a new algorithm for the alignment of multiple protein structures based on a Monte Carlo optimization technique. The algorithm uses pair-wise structural alignments as a starting point. Four different types of moves were designed to generate random changes in the alignment. A distance-based score is calculated for each trial move and moves are accepted or rejected based on the improvement in the alignment score until the alignment is converged. Initial tests on 66 protein structural families show promising results, the score increases by 69% on average. The increase in score is accompanied by an increase (12%) in the number of residue positions incorporated into the alignment. Two specific families, protein kinases and aspartic proteinases were tested and compared against curated alignments from HOMSTRAD and manual alignments. This algorithm has improved the overall number of aligned residues while preserving key catalytic residues. Further refinement of the method and its application to generate multiple alignments for all protein families in the PDB, is currently in progress.


Assuntos
Algoritmos , Proteínas/genética , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Aminoácidos , Ácido Aspártico Endopeptidases/química , Ácido Aspártico Endopeptidases/genética , Bases de Dados Factuais , Dados de Sequência Molecular , Método de Monte Carlo , Proteínas Quinases/química , Proteínas Quinases/genética , Proteínas/química , Homologia de Sequência de Aminoácidos
4.
Proteins ; 42(2): 148-63, 2001 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-11119639

RESUMO

An all-against-all protein structure comparison using the Combinatorial Extension (CE) algorithm applied to a representative set of PDB structures revealed a gallery of common substructures in proteins (http://cl.sdsc.edu/ce.html). These substructures represent commonly identified folds, domains, or components thereof. Most of the subsequences forming these similar substructures have no significant sequence similarity. We present a method to identify conserved amino acid positions and residue-dependent property clusters within these subsequences starting with structure alignments. Each of the subsequences is aligned to its homologues in SWALL, a nonredundant protein sequence database. The most similar sequences are purged into a common frequency matrix, and weighted homologues of each one of the subsequences are used in scoring for conserved key amino acid positions (CKAAPs). We have set the top 20% of the high-scoring positions in each substructure to be CKAAPs. It is hypothesized that CKAAPs may be responsible for the common folding patterns in either a local or global view of the protein-folding pathway. Where a significant number of structures exist, CKAAPs have also been identified in structure alignments of complete polypeptide chains from the same protein family or superfamily. Evidence to support the presence of CKAAPs comes from other computational approaches and experimental studies of mutation and protein-folding experiments, notably the Paracelsus challenge. Finally, the structural environment of CKAAPs versus non-CKAAPs is examined for solvent accessibility, hydrogen bonding, and secondary structure. The identification of CKAAPs has important implications for protein engineering, fold recognition, modeling, and structure prediction studies and is dependent on the availability of structures and an accurate structure alignment methodology. Proteins 2001;42:148-163.


Assuntos
Aminoácidos/química , Sequência Conservada , Imunoglobulinas/química , Conformação Proteica , Algoritmos , Motivos de Aminoácidos , Sequência de Aminoácidos , Proteínas de Bactérias/química , Cálcio/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Engenharia de Proteínas , Dobramento de Proteína , Proteínas Repressoras/química , Homologia de Sequência de Aminoácidos , Troponina C/química , Proteínas Virais/química , Proteínas Virais Reguladoras e Acessórias
5.
Nucleic Acids Res ; 29(1): 228-9, 2001 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11125099

RESUMO

The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure-structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure-structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.


Assuntos
Bases de Dados Factuais , Proteínas/química , Algoritmos , Internet , Modelos Moleculares , Conformação Proteica , Proteínas/classificação , Alinhamento de Sequência/métodos , Relação Estrutura-Atividade
6.
Nucleic Acids Res ; 29(1): 329-31, 2001 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11125128

RESUMO

The Conserved Key Amino Acid Positions DataBase (CKAAPs DB) provides access to an analysis of structurally similar proteins with dissimilar sequences where key residues within a common fold are identified. The derivation and significance of CKAAPs starting from pairwise structure alignments is described fully in Reddy et al. [Reddy,B.V.B., Li,W.W., Shindyalov,I.N. and Bourne,P.E. (2000) PROTEINS:, in press]. The CKAAPs identified from this theoretical analysis are provided to experimentalists and theoreticians for potential use in protein engineering and modeling. It has been suggested that CKAAPs may be crucial features for protein folding, structural stability and function. Over 170 substructures, as defined by the Combinatorial Extension (CE) database, which are found in approximately 3000 representative polypeptide chains have been analyzed and are available in the CKAAPs DB. CKAAPs DB also provides CKAAPs of the representative set of proteins derived from the CE and FSSP databases. Thus the database contains over 5000 representative poly-peptide chains, covering all known structures in the PDB. A web interface to a relational database permits fast retrieval of structure-sequence alignments, CKAAPs and associated statistics. Users may query by PDB ID, protein name, function and Enzyme Classification number. Users may also submit protein alignments of their own to obtain CKAAPs. An interface to display CKAAPs on each structure from a web browser is also being implemented. CKAAPs DB is maintained by the San Diego Supercomputer Center and accessible at the URL http://ckaaps.sdsc.edu.


Assuntos
Bases de Dados Factuais , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Sequência Conservada , Internet , Conformação Proteica , Proteínas/genética , Alinhamento de Sequência , Relação Estrutura-Atividade
7.
Proteins ; 38(3): 247-60, 2000 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-10713986

RESUMO

Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3-dimensional structures. Classification schemes have focused on biological function found within protein domains and on structure classification based on topology. Here an alternative view is presented that groups substructures. Substructures are long (50-150 residue) highly repetitive near-contiguous pieces of polypeptide chain that occur frequently in a set of proteins from the PDB defined as structurally non-redundant over the complete polypeptide chain. The substructure classification is based on a previously reported Combinatorial Extension (CE) algorithm that provides a significantly different set of structure alignments than those previously described, having, for example, only a 40% overlap with FSSP. Qualitatively the algorithm provides longer contiguous aligned segments at the price of a slightly higher root-mean-square deviation (rmsd). Clustering these alignments gives a discreet and highly repetitive set of substructures not detectable by sequence similarity alone. In some cases different substructures represent all or different parts of well known folds indicative of the Russian doll effect--the continuity of protein fold space. In other cases they fall into different structure and functional classifications. It is too early to determine whether these newly classified substructures represent new insights into the evolution of a structural framework important to many proteins. What is apparent from on-going work is that these substructures have the potential to be useful probes in finding remote sequence homology and in structure prediction studies. The characteristics of the complete all-by-all comparison of the polypeptide chains present in the PDB and details of the filtering procedure by pair-wise structure alignment that led to the emergent substructure gallery are discussed. Substructure classification, alignments, and tools to analyze them are available at http://cl.sdsc.edu/ce.html.


Assuntos
Dobramento de Proteína , Proteínas/química , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Bases de Dados Factuais , Internet , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Secundária de Proteína , Proteínas/classificação
8.
Protein Sci ; 9(1): 180-5, 2000 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-10739260

RESUMO

Comparisons of protein sequence via cyclic training of Hidden Markov Models (HMMs) in conjunction with alignments of three-dimensional structure, using the Combinatorial Extension (CE) algorithm, reveal two putative EF-hand metal binding domains in acetylcholinesterase. Based on sequence similarity, putative EF-hands are also predicted for the neuroligin family of cell surface proteins. These predictions are supported by experimental evidence. In the acetylcholinesterase crystal structure from Torpedo californica, the first putative EF-hand region binds the Zn2+ found in the heavy metal replacement structure. Further, the interaction of neuroligin 1 with its cognate receptor neurexin depends on Ca2+. Thus, members of the alpha,beta hydrolase fold family of proteins contain potential Ca2+ binding sites, which in some family members may be critical for heterologous cell associations.


Assuntos
Proteínas de Ligação ao Cálcio/química , Colinesterases/química , Motivos EF Hand , Proteínas de Membrana/química , Proteínas do Tecido Nervoso/química , Sequência de Aminoácidos , Sítios de Ligação , Cadeias de Markov , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência
9.
Nucleic Acids Res ; 28(1): 235-42, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592235

RESUMO

The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.


Assuntos
Bases de Dados Factuais , Proteínas/química , Armazenamento e Recuperação da Informação , Internet , Espectroscopia de Ressonância Magnética , Conformação Proteica
10.
Protein Eng ; 11(9): 739-47, 1998 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-9796821

RESUMO

A new algorithm is reported which builds an alignment between two protein structures. The algorithm involves a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs) rather than the more conventional techniques using dynamic programming and Monte Carlo optimization. AFPs, as the name suggests, are pairs of fragments, one from each protein, which confer structure similarity. AFPs are based on local geometry, rather than global features such as orientation of secondary structures and overall topology. Combinations of AFPs that represent possible continuous alignment paths are selectively extended or discarded thereby leading to a single optimal alignment. The algorithm is fast and accurate in finding an optimal structure alignment and hence suitable for database scanning and detailed analysis of large protein families. The method has been tested and compared with results from Dali and VAST using a representative sample of similar structures. Several new structural similarities not detected by these other methods are reported. Specific one-on-one alignments and searches against all structures as found in the Protein Data Bank (PDB) can be performed via the Web at http://cl.sdsc.edu/ce.html.


Assuntos
Proteínas/química , Alinhamento de Sequência , Algoritmos , Sequência de Aminoácidos , Dados de Sequência Molecular , Método de Monte Carlo , Conformação Proteica , Dobramento de Proteína , Homologia de Sequência de Aminoácidos
11.
Acta Crystallogr D Biol Crystallogr ; 54(Pt 6 Pt 1): 1085-94, 1998 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-10089484

RESUMO

Databases containing macromolecular structure data provide a crystallographer with important tools for use in solving, refining and understanding the functional significance of their protein structures. Given this importance, this paper briefly summarizes past progress by outlining the features of the significant number of relevant databases developed to date. One recent database, PDB+, containing all current and obsolete structures deposited with the Protein Data Bank (PDB) is discussed in more detail. PDB+ has been used to analyze the self-consistency of the current (1 January 1998) corpus of over 7000 structures. A summary of those findings is presented (a full discussion will appear elsewhere) in the form of global and temporal trends within the data. These trends indicate that challenges exist if crystallographers are to provide the community with complete and consistent structural results in the future. It is argued that better information management practices are required to meet these challenges.


Assuntos
Bases de Dados Factuais , Conformação Proteica , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/tendências
13.
Comput Appl Biosci ; 13(5): 487-96, 1997 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-9367122

RESUMO

MOTIVATION: To provide data management tools to maintain and query efficiently experimental and derived protein data with the goal of providing new insights into structure-function relationships. The tools should be portable, extensible, and accessible locally, or via the World Wide Web, providing data that would not otherwise be available. RESULTS: The initial phase of the work, the data representation and query of all available macromolecular structure data, including real-time access to complex property patterns based on the amino acid sequence, is reported. protein structure data taken from the Protein Data Bank (PDB) are decomposed into native and derived elementary properties, and represented as compact indexed objects minimizing storage requirements and query time for select types of query. In addition, collections of indices representing a particular property are maintained and can be queried for specific property patterns found across the whole database. The approach is proving applicable to a wide variety of data available on specific protein families.


Assuntos
Bases de Dados Factuais , Proteínas/química , Algoritmos , Redes de Comunicação de Computadores , Coleta de Dados , Apresentação de Dados , Processamento Eletrônico de Dados , Linguagens de Programação , Software , Relação Estrutura-Atividade
14.
Artigo em Inglês | MEDLINE | ID: mdl-7584438

RESUMO

A computer tool has been developed for revealing sets of oligonucleotides invariant for isofunctional families of DNA (RNA) and for using these in functional identification of nucleotide sequences. The tool allows one to: build up vocabularies of invariant oligonucleotides for the families of isofunctional nucleotide sequences; assess significance of the vocabularies; identify nucleotide sequences with the vocabularies of invariant oligonucleotides; determine the most effective identification parameters to minimize first and second type errors; assess the efficiency of identification of individual isofunctional families with the oligonucleotide vocabularies; determine the evolutionary characteristics of the families of isofunctional sequences on which vocabulary volume depends. Based on the system mentioned, we have analyzed a total of 322 protein-encoding gene families and have built up sets of invariant oligonucleotides, or again, oligonucleotide vocabularies that are characteristic of gene families and subfamilies. Identification of nucleotide sequences belonging to these families with the sets of invariant oligonucleotides revealed has been shown. Under the most effective identification parameters, the first type error (false negative) on control (independent) data was 10-15%, the second type error (false positive) was just 1-2 redundant sequences per sequence being examined. As has been shown, the volume of a vocabulary of invariant oligonucleotides depends on the percentage of variable positions in the multiple alignment within a family.


Assuntos
Sequência de Bases , DNA Complementar/química , Bases de Dados Factuais , Oligodesoxirribonucleotídeos/química , Proteínas/genética , Software , DNA/química , Genes , Matemática , Modelos Estatísticos , Dados de Sequência Molecular , Família Multigênica , Nucleoproteínas/química , Nucleoproteínas/genética , Fosfogluconato Desidrogenase/química , Fosfogluconato Desidrogenase/genética , RNA/química , Homologia de Sequência do Ácido Nucleico
15.
J Mol Evol ; 39(6): 625-30, 1994 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-7807551

RESUMO

A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.


Assuntos
Evolução Biológica , Modelos Genéticos , Família Multigênica , Peptídeos/genética , Proteínas/genética
16.
Comput Appl Biosci ; 10(6): 575-86, 1994 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-7704656

RESUMO

PDBlib is an extensible object-oriented class library written in C++ for representing the three-dimensional structure of biological macromolecules. The software design strategy, features of many of the 129 classes currently distributed with the library, and two sample applications which use the library are described. Version 1.0 of the library represents the structural features of proteins, DNA, RNA and complexes thereof, at a level of detail on a par with that which can be parsed from a Protein Data Bank (PDB) entry. However, the memory-resident representation of the macromolecule is independent of the PDB entry and can be obtained from other sources, e.g. relational and object-oriented databases. PDBlib classes are organized into four categories: (i) classes that model the macromolecule; (ii) classes that enhance the extensibility of the library; (iii) classes that provide navigation facilities of the object-oriented macromolecular structure representation; and (iv) a class that loads a PDB file into the memory-resident object-oriented representation. A number of general-purpose procedures that return features of this representation and that are relevant to all biological disciplines are included in (i). The library has been used to develop PDBtool, a prototype structure verification tool, and PDBview, a structure rendering tool that requires no specialized graphics hardware and software. Current work centers on making the macromolecular structures represented by PDBlib persistent using a commercial object-oriented database and providing an additional class library, MMQLlib, to query those structures.


Assuntos
Bases de Dados Factuais , Estrutura Molecular , Linguagens de Programação , Proteínas/química , Software , Armazenamento e Recuperação da Informação , Conformação Molecular , Conformação Proteica , Design de Software
17.
Protein Eng ; 7(11): 1311-22, 1994 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-7700863

RESUMO

Macromolecular query language (MMQL) is an extensible interpretive language in which to pose questions concerning the experimental or derived features of the 3-D structure of biological macromolecules. MMQL portends to be intuitive with a simple syntax, so that from a user's perspective complex queries are easily written. A number of basic queries and a more complex query--determination of structures containing a five-strand Greek key motif--are presented to illustrate the strengths and weaknesses of the language. The predominant features of MMQL are a filter and pattern grammar which are combined to express a wide range of interesting biological queries. Filters permit the selection of object attributes, for example, compound name and resolution, whereas the patterns currently implemented query primary sequence, close contacts, hydrogen bonding, secondary structure, conformation and amino acid properties (volume, polarity, isoelectric point, hydrophobicity and different forms of exposure). MMQL queries are processed by MMQLlib; a C++ class library, to which new query methods and pattern types are easily added. The prototype implementation described uses PDBlib, another C(++)-based class library from representing the features of biological macromolecules at the level of detail parsable from a PDB file. Since PDBlib can represent data stored in relational and object-oriented databases, as well as PDB files, once these data are loaded they too can be queried by MMQL. Performance metrics are given for queries of PDB files for which all derived data are calculated at run time and compared to a preliminary version of OOPDB, a prototype object-oriented database with a schema based on a persistent version of PDBlib which offers more efficient data access and the potential to maintain derived information. MMQLlib, PDBlib and associated software are available via anonymous ftp from cuhhca.hhmi.columbia.edu.


Assuntos
Linguagens de Programação , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Bases de Dados Factuais , Ligação de Hidrogênio , Ponto Isoelétrico , Estrutura Secundária de Proteína , Proteínas/classificação , Estatística como Assunto
18.
Protein Eng ; 7(3): 349-58, 1994 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-8177884

RESUMO

A method has been developed to detect pairs of positions with correlated mutations in protein multiple sequence alignments. The method is based on reconstruction of the phylogenetic tree for a set of sequences and statistical analysis of the distribution of mutations in the branches of the tree. The database of homology-derived protein structures (HSSP) is used as the source of multiple sequence alignments for proteins of known three-dimensional structure. We analyse pairs of positions with correlated mutations in 67 protein families and show quantitatively that the presence of such positions is a typical feature of protein families. A significant but weak tendency is observed for correlated residue pairs to be close in the three-dimensional structure. With further improvements, methods of this type may be useful for the prediction of residue--residue contacts and subsequent prediction of protein structure using distance geometry algorithms. In conclusion, we suggest a new experimental approach to protein structure determination in which selection of functional mutants after random mutagenesis and analysis of correlated mutations provide sufficient proximity constraints for calculation of the protein fold.


Assuntos
Proteínas de Ligação a DNA , Mutação , Proteínas/química , Alinhamento de Sequência , Sequência de Aminoácidos , Aminoácidos/química , Evolução Biológica , Fenômenos Químicos , Físico-Química , Grupo dos Citocromos c/química , Dados de Sequência Molecular , Estrutura Molecular , Mutagênese , Estrutura Secundária de Proteína , Proteínas Repressoras/química , Tripsina/química , Proteínas Virais , Proteínas Virais Reguladoras e Acessórias
19.
Artigo em Inglês | MEDLINE | ID: mdl-7584420

RESUMO

PDBlib is an extensible object oriented class library written in C++ for representing the 3-dimensional structure of biological macromolecules. PDBlib forms the kernel of a larger software framework being developed for assiting in knowledge discovery from macromolecular structure data. The software design strategy used by PDBlib, how the library may be used and several prototype applications that use the library are summarized. PDBlib represents the structural features of proteins, DNA, RNA, and complexes thereof, at a level of detail on a par with that which can be parsed from a Protein Data Bank (PDB) entry. However, the memory resident representation of the macromolecule is independent of the PDB entry and can be obtained from other back-end data sources, for example, existing relational databases and our own object oriented database (OOPDB) built on top of the commercial object oriented database, ObjectStore. At the front-end are several prototype applications that use the library: Macromolecular Query Language (MMQL) is based on a separate class library (MMQLlib) for building complex queries pertaining to macromolecular structure; PDBtool is an interactive structure verification tool; and PDBview, is a structure rendering tool used either as a standalone tool or as part of another application. Each of these software components are described. All software is available via anonymous ftp from cuhhca.hhmi.columbia.edu.


Assuntos
Estrutura Molecular , Software , Animais , Simulação por Computador , Humanos , Substâncias Macromoleculares
20.
Comput Appl Biosci ; 8(6): 529-34, 1992 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-1468007

RESUMO

We present a new pairwise alignment algorithm that uses iterative statistical analysis of homologous subsequences. Apart from the classical conversion of the DOT-matrix characteristic of the Needleman-Wunsch algorithm (NW), we used only those matrix elements that corresponded to the most non-random subsequence homologies. The most reliable elements of the DOT-matrix are written to the compact competition matrices. The algorithm then searches for alignment on the base of only these matrix elements. Our algorithm has low storage and memory requirements, but provides a reliable alignment for the sequences of weak homology (or, at least for the homology regions). In such cases classical NW algorithms often produce unreliable results on the level of statistical noise due to accumulation of random matchings throughout the aligned sequences.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Computadores
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA