Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Comput Biol Chem ; 33(2): 121-36, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19152793

RESUMO

Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html.


Assuntos
Genômica/métodos , Biologia Computacional/métodos , DNA/química , Etiquetas de Sequências Expressas , Genoma , Polimorfismo de Nucleotídeo Único , Sequências Repetitivas de Ácido Nucleico
2.
Anim Genet ; 39(2): 193-5, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18261187

RESUMO

The Sino-Danish pig genome project produced 685 851 ESTs (Gorodkin et al. 2007), of which 41 499 originated from the mitochondrial genome. In this study, the mitochondrial ESTs were assembled, and 374 putative SNPs were found. Chromatograms for the ESTs containing SNPs were manually inspected, and 112 total (52 non-synonymous) SNPs were found to be of high confidence (five of them are close to disease-causing SNPs in humans). Nine of the high-confidence SNPs were tested experimentally, and eight were confirmed. The SNPs can be accessed online at http://pigest.ku.dk/more/mito.


Assuntos
Etiquetas de Sequências Expressas , Mitocôndrias/genética , Polimorfismo de Nucleotídeo Único , Suínos/genética , Animais , Intervalos de Confiança , Frequência do Gene , Genoma , Humanos , Especificidade da Espécie
3.
Anim Genet ; 38(4): 401-5, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17559553

RESUMO

A total of 10 882 porcine microsatellite repeats were identified in genomic shotgun sequences from the Sino-Danish Pig Genome Sequencing Consortium (http://www.piggenome.dk). Of these, 4528 microsatellites were placed on a pig-human comparative map by blast analysis of porcine sequences against the human genome (blast cut-off threshold =1 x 10(-5)). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html. These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude. They are a new resource of microsatellite sequences for generating markers to be used in linkage studies and in fine mapping and positional cloning of quantitative trait loci.


Assuntos
Repetições de Microssatélites , Suínos/genética , Animais , Mapeamento Cromossômico , Biologia Computacional , Ligação Genética , Marcadores Genéticos , Genoma , Humanos
4.
Comput Biol Chem ; 30(4): 249-54, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16798093

RESUMO

The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature miRNAs in their genomic contexts. We have compared profiles of mature miRNAs within their genomic context of the 5' and 3' stemloop precursor arms and we find asymmetry between mature sequences of the 5' and 3' stemloop precursor arms. The main observation is that vertebrate organisms have a characteristic motif on the 5' arm which is in contrast to the 3' arm motif which mainly show the conserved U at the position of the mature start. Also the vertebrate 5' arm motif show a semi-conserved G 13 nucleotides upstream from the first position. We compared the 5' and 3' arm profiles using the average log likelihood ratio (ALLR) score, as defined by Wang and Stormo (2003) [Wang T., Stormo, G.D., 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2369-2380.] and computing a p-value we find that the two profiles differs significantly in their 3' end where the 5' arm motif (in contrast to the 3' arm motif) has a semi-conserved GU rich region. Similar findings are also obtained for other organisms, such as fly, worm and plants. The observed similarities and differences between closely and distantly related organisms are discussed and related to current knowledge of miRNA processing.


Assuntos
MicroRNAs/química , Animais , Arabidopsis/genética , Sequência de Bases , Biologia Computacional , Sequência Conservada , Bases de Dados de Ácidos Nucleicos , Genômica , Humanos
5.
Anim Genet ; 37(3): 199-204, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16734676

RESUMO

Single nucleotide polymorphisms (SNPs) were discovered in porcine expressed sequence tags (ESTs) orthologous to genes from human chromosome 13 (HSA13) and predicted to be located on pig chromosome 11 (SSC11). The SNPs were identified as sequence variants in clusters of EST sequences from pig cDNA libraries constructed in the Sino-Danish pig genome project. In total, 312 human gene sequences from HSA13 were used for similarity searches in our pig EST database. Pig ESTs showing significant similarity with HSA13 genes were clustered and candidate SNPs were identified. Allele frequencies for 26 SNPs were estimated in a group of 80 unrelated pigs from Danish commercial pig breeds: Duroc, Hampshire, Landrace and Large White. Eighteen of the 26 SNPs genotyped in the PiGMaP Reference Families were mapped by linkage analysis to SSC11. The EST-based SNPs published here are new genetic markers useful for linkage and association studies in commercial and experimental pig populations. This study represents the first gene-associated SNP linkage map of pig chromosome 11 and adds new comparative mapping information between SSC11 and HSA13. Furthermore, our data facilitate future studies aimed at the identification of interesting regions on pig chromosome 11, positional cloning and fine mapping of quantitative trait loci in pig.


Assuntos
Mapeamento Cromossômico , Cromossomos de Mamíferos , Ligação Genética , Polimorfismo de Nucleotídeo Único , Suínos/genética , Animais , Cruzamento , Dinamarca , Etiquetas de Sequências Expressas , Frequência do Gene , Genótipo , Suínos/classificação
6.
Comput Biol Chem ; 28(5-6): 367-74, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15556477

RESUMO

Predicted assignments of biological sequences are often evaluated by Matthews correlation coefficient. However, Matthews correlation coefficient applies only to cases where the assignments belong to two categories, and cases with more than two categories are often artificially forced into two categories by considering what belongs and what does not belong to one of the categories, leading to the loss of information. Here, an extended correlation coefficient that applies to K-categories is proposed, and this measure is shown to be highly applicable for evaluating prediction of RNA secondary structure in cases where some predicted pairs go into the category "unknown" due to lack of reliability in predicted pairs or unpaired residues. Hence, predicting base pairs of RNA secondary structure can be a three-category problem. The measure is further shown to be well in agreement with existing performance measures used for ranking protein secondary structure predictions. Server and software is available at http://rk.kvl.dk/.


Assuntos
Biologia Computacional , Bases de Dados Factuais , RNA/química , RNA/classificação , Alinhamento de Sequência , Pareamento de Bases , Dados de Sequência Molecular , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/classificação
7.
Comput Biol Chem ; 28(3): 219-26, 2004 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-15261152

RESUMO

Predicting RNA secondary structure using evolutionary history can be carried out by using an alignment of related RNA sequences with conserved structure. Accurately determining evolutionary substitution rates for base pairs and single stranded nucleotides is a concern for methods based on this type of approach. Determining these rates can be hard to do reliably without a large and accurate initial alignment, which ideally also has structural annotation. Hence, one must often apply rates extracted from other RNA families with trusted alignments and structures. Here, we investigate this problem by applying rates derived from tRNA and rRNA to the prediction of the much more rapidly evolving 5'-region of HIV-1. We find that the HIV-1 prediction is in agreement with experimental data, even though the relative evolutionary rate between A and G is significantly increased, both in stem and loop regions. In addition we obtained an alignment of the 5' HIV-1 region that is more consistent with the structure than that currently in the database. We added randomized noise to the original values of the rates to investigate the stability of predictions to rate matrix deviations. We find that changes within a fairly large range still produce reliable predictions and conclude that using rates from a limited set of RNA sequences is valid over a broader range of sequences.


Assuntos
Evolução Molecular , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Pareamento de Bases/genética , Bases de Dados de Ácidos Nucleicos , HIV-1/química , HIV-1/genética , Cinética , Modelos Genéticos , Mutação Puntual/genética , RNA/genética , RNA Ribossômico/química , RNA Ribossômico/genética , RNA de Transferência/química , RNA de Transferência/genética , RNA Viral/química , RNA Viral/genética , Alinhamento de Sequência/métodos
8.
Bioinformatics ; 17(7): 642-5, 2001 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-11448882

RESUMO

UNLABELLED: We have developed a series of programs which assist in maintenance of structural RNA databases. A main program BLASTs the RNA database against GenBank and automatically extends and realigns the sequences to include the entire range of the RNA query sequences. After manual update of the database, other programs can examine base pair consistency and phylogenetic support. The output can be applied iteratively to refine the structural alignment of the RNA database. Using these tools, the number of potential misannotations per sequence was reduced from 20 to 3 in the Signal Recognition Particle RNA database. AVAILABILITY: A quick-server and programs are available at http://www.bioinf.au.dk/rnadbtool/


Assuntos
Bases de Dados como Assunto , RNA/genética , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Bases , Biologia Computacional , Dados de Sequência Molecular , RNA Mensageiro/genética , Análise de Sequência de RNA/estatística & dados numéricos , Homologia de Sequência do Ácido Nucleico , Software
9.
Comput Chem ; 25(3): 301-7, 2001 May.
Artigo em Inglês | MEDLINE | ID: mdl-11339412

RESUMO

Through computational analysis of high-performance liquid chromatography (HPLC) traces we find correlations between secondary metabolites and growth conditions of six varieties of barley. Using artificial neural networks, it was possible to classify chromatograms for which the varieties were fertilized by nitrogen and treated by fungicide. For each variety of barley we could also differentiate it from the others. Surprisingly, all these classification tasks could be solved successfully by a simple network with no hidden units. When adding to the methodology pruning of the network weights, we were able to reduce the set of peaks in the chromatograms and obtain a necessary subset from which the growth conditions and differentiation may be decided. In some instances, more complex networks with hidden units could lead to a further reduction of the number of peaks used. In most cases, far more than half of the peaks are redundant. We find that it requires fewer information-rich peaks to perform the variety differentiation tasks than to recognize any of the growth conditions. Analysis of the network weights reveals correlations between weighted combinations of peaks.


Assuntos
Hordeum/química , Hordeum/genética , Redes Neurais de Computação , Fenóis/química , Cromatografia/métodos , Cromatografia Líquida de Alta Pressão , Fertilizantes/análise , Fungicidas Industriais/análise , Hordeum/crescimento & desenvolvimento , Nitratos/análise , Especificidade da Espécie
10.
Nucleic Acids Res ; 29(10): 2135-44, 2001 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-11353083

RESUMO

Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem-loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem-Loop Align SearcH (SLASH), which will perform stem-loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/.


Assuntos
Biologia Computacional , Conformação de Ácido Nucleico , RNA/química , RNA/genética , Software , Algoritmos , Sequência de Bases , Bases de Dados como Assunto , Internet , Dados de Sequência Molecular , RNA/metabolismo , RNA Arqueal/química , RNA Arqueal/genética , RNA Arqueal/metabolismo , RNA Ribossômico/química , RNA Ribossômico/genética , RNA Ribossômico/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Sensibilidade e Especificidade , Alinhamento de Sequência , Regiões não Traduzidas/química , Regiões não Traduzidas/genética , Regiões não Traduzidas/metabolismo
11.
Genome Inform ; 12: 184-93, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11791237

RESUMO

When a set of coregulated genes share a common structural RNA motif, e.g. a hairpin, most motif search approaches fail to locate the covarying but structurally conserved motif. There do exist methods that can locate structural RNA motifs, like FOLDALIGN, but the main problem with these methods is that they are computationally expensive. In FOLDALIGN, a major contribution to this is the use of a greedy algorithm to construct the multiple alignment. To ensure good quality many redundant computations must be made. However, by applying the greedy algorithm on a carefully selected subset of sequences, near full greedy quality can be obtained. The basic idea is to estimate the order in which the sequences entered a good greedy alignment. If such a ranking, found from all pairwise alignments, is in good agreement with the order of appearance in the multiple alignment, the core structural motif can be found by performing the greedy algorithm on just the top sequences in the ranking. The ranking used in this mini-greedy algorithm is found by using two complementing approaches: 1) When interpreting the FOLDALIGN score as an inner product (kernel), the sequences can be ranked according to their distance to their center of mass; 2) We construct an algorithm that attempts to find the K closest sequences in the vector space associated with the inner product, and the remaining sequences can be ranked by their minimum distance to any of the sequences, or to the center of mass in this set. The two approaches arecompared and merged, and the results discussed. We also show that structural alignments of near full greedy quality can found in significantly reduced time, using these methods. The algorithm is being included in the SLASH (Stem-Loop Align SearcH) server available at http://www.bioinf.au.dk/slash.


Assuntos
Algoritmos , RNA/química , RNA/genética , Sequência de Bases , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Conformação de Ácido Nucleico , Alinhamento de Sequência/estatística & dados numéricos
12.
Nucleic Acids Res ; 29(1): 169-70, 2001 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11125080

RESUMO

Signal recognition particle (SRP) is a stable cytoplasmic ribonucleoprotein complex that serves to translocate secretory proteins across membranes during translation. The SRP Database (SRPDB) provides compilations of SRP components, ordered alphabetically and phylogenetically. Alignments emphasize phylogenetically-supported base pairs in SRP RNA and conserved residues in the proteins. Data are provided in various formats including a column arrangement for improved access and simplified computational usability. Included are motifs for identification of new sequences, SRP RNA secondary structure diagrams, 3-D models and links to high-resolution structures. This release includes 11 new SRP RNA sequences (total of 129), two protein SRP9 sequences (total of seven), two protein SRP14 sequences (total of 10), two protein SRP19 sequences (total of 16), 10 new SRP54 (ffh) sequences (total of 66), two protein SRP68 sequences (total of seven) and two protein SRP72 sequences (total of nine). Seven sequences of the SRP receptor alpha-subunit and its FtsY homolog (total of 51) are new. Also considered are ss-subunit of SRP receptor, Flhf, Hbsu, CaM kinase II and cpSRP43. Access to SRPDB is at http://psyche.uthct. edu/dbs/SRPDB/SRPDB.html and the European mirror http://www.medkem. gu.se/dbs/SRPDB/SRPDB.html


Assuntos
Bases de Dados Factuais , Partícula de Reconhecimento de Sinal/genética , Internet , Proteínas/genética , RNA/genética
13.
Nucleic Acids Res ; 29(1): 171-2, 2001 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11125081

RESUMO

The tmRNA database (tmRDB) is maintained at the University of Texas Health Science Center at Tyler, Texas, and accessible on the World Wide Web at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.++ +html. Mirror sites are located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Institute of Biological Sciences, Aarhus, Denmark (http://www.bioinf.au. dk/tmRDB/). The tmRDB provides information and citation links about tmRNA, a molecule that combines functions of tRNA and mRNA in trans-translation. tmRNA is likely to be present in all bacteria and has been found in algae chloroplasts, the cyanelle of Cyanophora paradoxa and the mitochondrion of the flagellate Reclinomonas americana. This release adds 26 new sequences and corresponding predicted tmRNA-encoded tag peptides for a total of 86 tmRNAs, ordered alphabetically and phylogenetically. Secondary structures and three-dimensional models in PDB format for representative molecules are being made available. tmRNA alignments prove individual base pairs and are generated manually assisted by computational tools. The alignments with their corresponding structural annotation can be obtained in various formats, including a new column format designed to improve and simplify computational usability of the data.


Assuntos
Bases de Dados Factuais , RNA Mensageiro/genética , RNA de Transferência/genética , Internet , Filogenia , Células Procarióticas/metabolismo , Alinhamento de Sequência
14.
Bioinformatics ; 15(9): 769-70, 1999 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-10498780

RESUMO

UNLABELLED: MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. AVAILABILITY: MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. CONTACT: gorodkin@cbs.dtu.dk


Assuntos
Proteínas/química , Alinhamento de Sequência , Software , Ácidos Nucleicos/química , Análise de Sequência de Proteína
15.
Artigo em Inglês | MEDLINE | ID: mdl-10786291

RESUMO

Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analysis are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10-30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://www.cbs.dtu.dk/services/distanceP/.


Assuntos
Motivos de Aminoácidos , Redes Neurais de Computação , Proteínas/química , Algoritmos , Bases de Dados Factuais , Entropia , Modelos Estatísticos
16.
Artigo em Inglês | MEDLINE | ID: mdl-9783207

RESUMO

We study from a computational standpoint several different physical scales associated with structural features of DNA sequences, including dinucleotide scales such as base stacking energy and propeller twist, and trinucleotide scales such as bendability and nucleosome positioning. We show that these scales provide an alternative or complementary compact representation of DNA sequences. As an example we construct a strand invariant representation of DNA sequences. The scales can also be used to analyze and discover new DNA structural patterns, especially in combinations with hidden Markov models (HMMs). The scales are applied to HMMs of human promoter sequences revealing a number of significant differences between regions upstream and downstream of the transcriptional start point. Finally we show, with some qualifications, that such scales are by and large independent, and therefore complement each other.


Assuntos
DNA/química , Inteligência Artificial , Sequência de Bases , DNA/genética , Humanos , Cadeias de Markov , Estrutura Molecular , Oligodesoxirribonucleotídeos/química , Oligodesoxirribonucleotídeos/genética , Reconhecimento Automatizado de Padrão , Regiões Promotoras Genéticas , TATA Box , Termodinâmica
17.
Nucleic Acids Res ; 25(18): 3724-32, 1997 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-9278497

RESUMO

We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.


Assuntos
Simulação por Computador , RNA/genética , Análise de Sequência , Algoritmos , Animais , Bases de Dados Factuais , Humanos
18.
Artigo em Inglês | MEDLINE | ID: mdl-9322025

RESUMO

We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The overall method has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.


Assuntos
Algoritmos , RNA/química , RNA/genética , Alinhamento de Sequência/métodos , Sequência de Bases , Bases de Dados Factuais , Estrutura Molecular , Conformação de Ácido Nucleico , Software
19.
Comput Appl Biosci ; 13(6): 583-6, 1997 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-9475985

RESUMO

MOTIVATION: We extend the standard 'Sequence Logo' method of Schneider and Stevens (Nucleic Acids Res., 18, 6097-6100, 1990) to incorporate prior frequencies on the bases, allow for gaps in the alignments, and indicate the mutual information of base-paired regions in RNA. RESULTS: Given an alignment of RNA sequences with the base pairings indicated, the program will calculate the information at each position, including the mutual information of the base pairs, and display the results in a 'Structure Logo'. Alignments without base pairing can also be displayed in a 'Sequence Logo', but still allowing gaps and incorporating prior frequencies if desired. AVAILABILITY: The code is available from, and an Internet server can be used to run the program at, http://www.cbs.dtu.dk/gorodkin/appl/slogo. html.


Assuntos
Biologia Computacional/métodos , RNA/química , Alinhamento de Sequência/métodos , Algoritmos , Composição de Bases/genética , Simulação por Computador , Matemática , Conformação de Ácido Nucleico , Hibridização de Ácido Nucleico/genética
20.
Protein Eng ; 10(11): 1241-8, 1997 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-9514112

RESUMO

We predict interatomic Calpha distances by two independent data driven methods. The first method uses statistically derived probability distributions of the pairwise distance between two amino acids, whilst the latter method consists of a neural network prediction approach equipped with windows taking the context of the two residues into account. These two methods are used to predict whether distances in independent test sets were above or below given thresholds. We investigate which distance thresholds produce the most information-rich constraints and, in turn, the optimal performance of the two methods. The predictions are based on a data set derived using a new threshold which defines when sequence similarity implies structural similarity. We show that distances in proteins are predicted more accurately by neural networks than by probability density functions. We show that the accuracy of the predictions can be further increased by using sequence profiles. A threading method based on the predicted distances is presented. A homepage with software, predictions and data related to this paper is available at http://www.cbs.dtu.dk/services/CPHmodels/.


Assuntos
Redes Neurais de Computação , Probabilidade , Proteínas/química , Aminoácidos/química , Fenômenos Químicos , Físico-Química , Bases de Dados Factuais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA