Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Comput Biol ; 6(2): 219-35, 1999.
Artigo em Inglês | MEDLINE | ID: mdl-10421524

RESUMO

We introduce a minimal-risk method for estimating the frequencies of amino acids at conserved positions in a protein family. Our method, called minimal-risk estimation, finds the optimal weighting between a set of observed amino acid counts and a set of pseudofrequencies, which represent prior information about the frequencies. We compute the optimal weighting by minimizing the expected distance between the estimated frequencies and the true population frequencies, measured by either a squared-error or a relative-entropy metric. Our method accounts for the source of the pseudofrequencies, which arise either from the background distribution of amino acids or from applying a substitution matrix to the observed data. Our frequency estimates therefore depend on the size and composition of the observed data as well as the source of the pseudofrequencies. We convert our frequency estimates into minimal-risk scoring matrices for sequence analysis. A large-scale cross-validation study, involving 48 variants of seven methods, shows that the best performing method is minimal-risk estimation using the squared-error metric. Our method is implemented in the package EMATRIX, which is available on the Internet at http://motif.stanford.edu/ematrix.


Assuntos
Funções Verossimilhança , Proteínas/química , Análise de Sequência/métodos , Aminoácidos/análise , Sequência Conservada/genética , Entropia , Cadeias de Markov , Reprodutibilidade dos Testes , Risco , Sensibilidade e Especificidade , Software
2.
Psychon Bull Rev ; 3(2): 208-14, 1996 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24213869

RESUMO

Counterfactual imaginings are known to have far-reaching implications. In the present experiment, we ask if imagining events from one's past can affect memory for childhood events. We draw on the social psychology literature showing that imagining a future event increases the subjective likelihood that the event will occur. The concepts of cognitive availability and the source-monitoring framework provide reasons to expect that imagination may inflate confidence that a childhood event occurred. However, people routinely produce myriad counterfactual imaginings (i.e., daydreams and fantasies) but usually do not confuse them with past experiences. To determine the effects of imagining a childhood event, we pretested subjects on how confident they were that a number of childhood events had happened, asked them to imagine some of those events, and then gathered new confidence measures. For each of the target items, imagination inflated confidence that the event had occurred in childhood. We discuss implications for situations in which imagination is used as an aid in searching for presumably lost memories.

3.
Artigo em Inglês | MEDLINE | ID: mdl-10977073

RESUMO

Position-specific scoring matrices have been used extensively to recognize highly conserved protein regions. We present a method for accelerating these searches using a suffix tree data structure computed from the sequences to be searched. Building on earlier work that allows evaluation of a scoring matrix to be stopped early, the suffix tree-based method excludes many protein segments from consideration at once by pruning entire subtrees. Although suffix trees are usually expensive in space, the fact that scoring matrix evaluation requires an in-order traversal allows nodes to be stored more compactly without loss of speed, and our implementation requires only 17 bytes of primary memory per input symbol. Searches are accelerated by up to a factor of ten.


Assuntos
Proteínas/classificação , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Animais , Humanos
4.
Artigo em Inglês | MEDLINE | ID: mdl-9322037

RESUMO

Discrete motifs that discriminate functional classes of proteins are useful for classifying new sequences, capturing structural constraints, and identifying protein subclasses. Despite the fact that the space of such motifs can grow exponentially with sequence length and number, we show that in practice it usually does not, and we describe a technique that infers motifs from aligned protein sequences by exhaustively searching this space. Our method generates sequence motifs over a wide range of recall and precision, and chooses a representative motif based on a score that we derive from both statistical and information-theoretic frameworks. Finally, we show that the selected motifs perform well in practice, classifying unseen sequences with extremely high precision, and infer protein subclasses that correspond to known biochemical classes.


Assuntos
Algoritmos , Conformação Proteica , Sequência de Aminoácidos , Aminoácidos/química , Inteligência Artificial , Bases de Dados Factuais , Dados de Sequência Molecular , Proteínas/química , Proteínas/classificação , Proteínas/genética , Alinhamento de Sequência , Software , Tubulina (Proteína)/química , Tubulina (Proteína)/genética
5.
Bioinformatics ; 16(3): 233-44, 2000 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-10869016

RESUMO

MOTIVATION: We present techniques for increasing the speed of sequence analysis using scoring matrices. Our techniques are based on calculating, for a given scoring matrix, the quantile function, which assigns a probability, or p, value to each segmental score. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow scoring matrices to be used more widely in large-scale sequencing and annotation projects. RESULTS: We develop three techniques for increasing the speed of sequence analysis: probability filtering, lookahead scoring, and permuted lookahead scoring. In probability filtering, we compute the score threshold that corresponds to the user-specified p threshold. We use the score threshold to limit the number of segments that are retained in the search process. In lookahead scoring, we test intermediate scores to determine whether they will possibly exceed the score threshold. In permuted lookahead scoring, we score each segment in a particular order designed to maximize the likelihood of early termination. Our two lookahead scoring techniques reduce substantially the number of residues that must be examined. The fraction of residues examined ranges from 62 to 6%, depending on the p threshold chosen by the user. These techniques permit sequence analysis with scoring matrices at speeds that are several times faster than existing programs. On a database of 12 177 alignment blocks, our techniques permit sequence analysis at a speed of 225 residues/s for a p threshold of 10-6, and 541 residues/s for a p threshold of 10-20. In order to compute the quantile function, we may use either an independence assumption or a Markov assumption. We measure the effect of first- and second-order Markov assumptions and find that they tend to raise the p value of segments, when compared with the independence assumption, by average ratios of 1.30 and 1.69, respectively. We also compare our technique with the empirical 99. 5th percentile scores compiled in the BLOCKSPLUS database, and find that they correspond on average to a p value of 1.5 x 10-5. AVAILABILITY: The techniques described above are implemented in a software package called EMATRIX. This package is available from the authors for free academic use or for licensed commercial use. The EMATRIX set of programs is also available on the Internet at http://motif.stanford.edu/ematrix.


Assuntos
Análise de Sequência/métodos , Software , Cadeias de Markov , Probabilidade , Fatores de Tempo
6.
Proc Natl Acad Sci U S A ; 95(11): 5865-71, 1998 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-9600885

RESUMO

We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called EMOTIF (http://motif. stanford.edu/emotif). Given an aligned set of protein sequences, EMOTIF generates a set of motifs with a wide range of specificities and sensitivities. EMOTIF also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used EMOTIF to generate sets of motifs from all 7,000 protein alignments in the BLOCKS and PRINTS databases. The resulting database, called IDENTIFY (http://motif. stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10(-10) to 10(-5). Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. IDENTIFY assigns biological functions to 25-30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, IDENTIFY assigned functions to 172 of proteins of unknown function in the yeast genome.


Assuntos
Genoma , Proteínas/genética , Análise de Sequência/métodos , Software , Sequência de Aminoácidos , Animais , Simulação por Computador , Humanos , Modelos Genéticos , Dados de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa