Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Math Biol ; 67(5): 1261-78, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23053535

RESUMO

To an RNA pseudoknot structure is naturally associated a topological surface, which has its associated genus, and structures can thus be classified by the genus. Based on earlier work of Harer-Zagier, we compute the generating function Dg,σ (z) = ∑n dg,σ (n)zn for the number dg,σ (n) of those structures of fixed genus g and minimum stack size σ with n nucleotides so that no two consecutive nucleotides are basepaired and show that Dg,σ (z) is algebraic. In particular, we prove that dg,2(n) ∼ kg n3(g−1/2 )γ n2, where γ2 ≈ 1.9685. Thus, for stack size at least two, the genus only enters through the sub-exponential factor, and the slow growth rate compared to the number of RNA molecules implies the existence of neutral networks of distinct molecules with the same structure of any genus. Certain RNA structures called shapes are shown to be in natural one-to-one correspondence with the cells in the Penner-Strebel decomposition of Riemann's moduli space of a surface of genus g with one boundary component, thus providing a link between RNA enumerative problems and the geometry of Riemann's moduli space.


Assuntos
RNA/química , Conformação de Ácido Nucleico , RNA/classificação
2.
J Mol Biol ; 235(1): 1-12, 1994 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-8289235

RESUMO

Alignment algorithms to compare DNA or amino acid sequences are widely used tools in molecular biology. The algorithms depend on the setting of various parameters, most notably gap penalties. The effect that such parameters have on the resulting alignments is still poorly understood. This paper begins by reviewing two recent advances in algorithms and probability that enable us to take a new approach to this question. The first tool we introduce is a newly developed method to delineate efficiently all optimal alignments arising under all choices of parameters. The second tool comprises insights into the statistical behavior of optimal alignment scores. From this we gain a better understanding of the dependence of alignments on parameters in general. We propose novel criteria to detect biologically good alignments and highlight some specific features about the interaction between similarity matrices and gap penalties. To illustrate our analysis we present a detailed study of the comparison of two immunoglobulin sequences.


Assuntos
Sequência de Aminoácidos , Sequência de Bases , DNA/química , Biologia Molecular/métodos , Proteínas/química , Algoritmos , Animais , Humanos , Cadeias Pesadas de Imunoglobulinas/química , Cadeias Leves de Imunoglobulina/química , Região Variável de Imunoglobulina/química , Matemática , Dados de Sequência Molecular , Estatística como Assunto
3.
J Mol Biol ; 197(4): 723-8, 1987 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-2448477

RESUMO

The algorithm of Smith & Waterman for identification of maximally similar subsequences is extended to allow identification of all non-intersecting similar subsequences with similarity score at or above some preset level. The resulting alignments are found in order of score, with the highest scoring alignment first. In the case of single gaps or multiple gaps weighted linear with gap length, the algorithm is extremely efficient, taking very little time beyond that of the initial calculation of the matrix. The algorithm is applied to comparisons of tRNA-rRNA sequences from Escherichia coli. A statistical analysis is important for proper evaluation of the results, which differ substantially from the results of an earlier analysis of the same sequences by Bloch and colleagues.


Assuntos
Algoritmos , RNA Bacteriano , RNA Ribossômico , RNA de Transferência , Sequência de Bases , Escherichia coli/genética , Dados de Sequência Molecular , RNA Ribossômico 16S , RNA de Transferência de Alanina
4.
J Mol Biol ; 186(1): 117-28, 1985 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-3908689

RESUMO

The basic nature of the sequence features that define a promoter sequence for Escherichia coli RNA polymerase have been established by a variety of biochemical and genetic methods. We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several sequences, and have used them to examine a set of bacterial promoters. The algorithm easily discovers the "consensus" sequences for the -10 and -35 regions, which are essentially identical to the results of previous analyses, but requires no prior assumptions about the common patterns. By explicitly specifying the nature of the search for consensus sequences, we give a rigorous definition to this concept that should be widely applicable. We also have provided estimates for the statistical significance of common patterns discovered in sets of sequences. In addition to providing a rigorous basis for defining known consensus regions, we have found additional features in these promoters that may have functional significance. These added features were located on either side of the -35 region. The pattern 5', or upstream, from the -35 region was found using the standard alphabet (A, G, C and T), but the pattern between the -10 and the -35 regions was detectable only in a sub-alphabet. Recent results relating DNA sequence to helix conformation suggest that the former (upstream) pattern may have a functional significance. Possible roles in promoter function are discussed in this light, and an observation of altered promoter function involving the upstream region is reported that appears to support the suggestion of function in at least one case.


Assuntos
DNA Bacteriano/genética , Reconhecimento Automatizado de Padrão , Regiões Promotoras Genéticas , Sequência de Bases , Escherichia coli/genética , Métodos , Homologia de Sequência do Ácido Nucleico , Transcrição Gênica
5.
J Mol Biol ; 258(4): 650-60, 1996 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-8636999

RESUMO

We construct a mathematical model for in vitro molecular selection with amplification. Using DNA-protein binding as the illustrative example, we obtain an expression for the probability that a randomly selected molecule from the final in vitro selection products is a molecule with the highest binding affinity. Experiments of this type have been reported for several examples of DNA binding proteins. Our study requires a model of the DNA-protein binding constant between DNA molecules and the target protein. The relationship between binding constants and selection probabilities is presented under simplifying but reasonable assumptions. From our analysis, we find that for successful in vitro selection experiments there should be a certain relationship between the number of polymerase chain reaction cycles and the concentration of free protein. The results obtained should be widely applicable to a variety of selection-amplification procedures.


Assuntos
Proteínas de Ligação a DNA/metabolismo , DNA/metabolismo , Modelos Teóricos , Reação em Cadeia da Polimerase , Seleção Genética , Proteínas de Ligação a DNA/genética , Ligação Proteica
6.
J Comput Biol ; 1(2): 93-104, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-8790456

RESUMO

Profiles, which are summaries of multiple alignments of a sequence family, are used to find new instances of the family in databases. In this paper, we study the maximum score M obtained when the profile is aligned without indels at all possible positions of a random sequence. The main theorem gives an approximation to the distribution function of M with an explicit bound on the error. This theorem implies that M has a limiting extreme value distribution.


Assuntos
Simulação por Computador , Modelos Estatísticos , Alinhamento de Sequência/métodos , Homologia de Sequência , Bases de Dados Factuais , Matemática , Homologia de Sequência do Ácido Nucleico
7.
J Comput Biol ; 2(2): 291-306, 1995.
Artigo em Inglês | MEDLINE | ID: mdl-7497130

RESUMO

Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.


Assuntos
Algoritmos , Sequência de Bases , DNA/química , Matemática , Modelos Teóricos , Oligodesoxirribonucleotídeos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Hibridização de Ácido Nucleico , Software
8.
J Comput Biol ; 7(1-2): 1-46, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10890386

RESUMO

In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein's method, and the Chen-Stein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to self-overlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, confidence intervals for tests.


Assuntos
Biometria , Análise de Sequência de DNA/estatística & dados numéricos , Sequência de Bases , Cadeias de Markov , Modelos Estatísticos , Hibridização de Ácido Nucleico , Reconhecimento Automatizado de Padrão
9.
J Comput Biol ; 5(3): 505-15, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9773346

RESUMO

A fundamentally new molecular-biology approach in constructing restriction maps, Optical Mapping, has been developed by Schwartz et al. (1993). Using this method restriction maps are constructed by measuring the relevant fluorescence intensity and length measurements. However, it is difficult to directly estimate the restriction site locations of single DNA molecules based on these optical mapping data because of the precision of length measurements and the unknown number of true restriction sites in the data. We propose the use of a hierarchical Bayes model based on a mixture model with normals and random noise. In this model we explicitly consider the missing observation structure of the data, such as the orientations of molecules, the allocations of cutting sites to restriction sites, and the indicator variables of whether observed cut sites are true or false. Because of the complexity of the model, the large number of missing data, and the unknown number of restriction sites, we use Reversible-Jump Markov Chain Monte Carlo (MCMC) to estimate the number and the locations of the restriction sites. Since there exists a high multimodality due to unknown orientations of molecules, we also use a combination of our MCMC approach and the flipping algorithm suggested by Dancík and Waterman (1997). The study is highly computer-intensive and the development of an efficient algorithm is required.


Assuntos
Algoritmos , Mapeamento por Restrição/métodos , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo
10.
J Comput Biol ; 3(3): 425-63, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8891959

RESUMO

Sequencing by hybridization is a tool to determine a DNA sequence from the unordered list of all l-tuples contained in this sequence; typical numbers for l are l = 8, 10, 12. For theoretical purposes we assume that the multiset of all l-tuples is known. This multiset determines the DNA sequence uniquely if none of the so-called Ukkonen transformations are possible. These transformations require repeats of (l-1)-tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing self-overlap and for the process of indicators of all left-most long repeats without self-overlap. Using the Chen-Stein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of l-tuples. For all our results we give some numerical examples including error bounds.


Assuntos
Hibridização de Ácido Nucleico/métodos , Distribuição de Poisson , Análise de Sequência de DNA/métodos , Algoritmos , Projeto Genoma Humano , Humanos , Sequências Repetitivas de Ácido Nucleico
12.
J Mol Biol ; 147(1): 195-7, 1981 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-7265238
16.
Bull Math Biol ; 56(4): 743-67, 1994 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-8054893

RESUMO

Recently algorithms for parametric alignment (Waterman et al., 1992, Natl Acad. Sci. USA 89, 6090-6093; Gusfield et al., 1992, Proceedings of the Third Annual ACM-SIAM Discrete Algorithms) find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques. Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute near optimal ensemble alignments.


Assuntos
Algoritmos , Sequência de Bases , DNA/química , Homologia de Sequência do Ácido Nucleico , DNA/genética , Dados de Sequência Molecular
17.
Nucleic Acids Res ; 14(22): 9095-102, 1986 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-3786145

RESUMO

An algorithm for multiple sequence alignment is given that matches words of length and degree of mismatch chosen by the user. The alignment maximizes an alignment scoring function. The method is based on a novel extension of our consensus sequence methods. The algorithm works for both DNA and protein sequences, and from earlier work on consensus sequences, it is possible to estimate statistical significance.


Assuntos
Sequência de Bases , DNA , Proteínas , Algoritmos , Sequência de Aminoácidos
18.
Nucleic Acids Res ; 11(24): 8951-6, 1983 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-6324109

RESUMO

Restriction sites or other sequence patterns are usually assumed to occur according to a Poisson distribution with mean equal to the reciprocal of the probability of the given site or pattern. For situations where non-overlapping occurrences of patterns, such as restriction sites, are the objects of interest, this note shows that the Poisson assumption is frequently misleading. Both the case of base composition (independent bases) and of dinucleotide frequencies (Markov chains) are treated. Moreover, a new technique is presented which allows treatment of collections of patterns, where the departure from the Poisson assumption is even more striking. This later case includes double digests, and an example of a five enzyme digest is included.


Assuntos
Sequência de Bases , Enzimas de Restrição do DNA/metabolismo , DNA/genética , Matemática , Modelos Genéticos , Especificidade por Substrato
19.
J Theor Biol ; 108(3): 333-7, 1984 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-6748696

RESUMO

Sequence alignments are becoming more important with the increase of nucleic acid data. Fitch and Smith have recently given an example where multiple insertion/deletions (rather than a series of adjacent single insertion/deletions) are necessary to achieve the correct alignment. Multiple insertion/deletions are known to increase computation time from O(n2) to O(n3) although Gotoh has presented an O(n2) algorithm in the case the multiple insertion/deletion weighting function is linear. It is argued in this paper that it could be desirable to use concave weighting functions. For that case, an algorithm is derived that is conjectured to be O(n2).


Assuntos
Sequência de Bases , Matemática
20.
Proc Natl Acad Sci U S A ; 80(10): 3123-4, 1983 May.
Artigo em Inglês | MEDLINE | ID: mdl-16593315

RESUMO

When applying dynamic programming techniques to obtain optimal sequence alignments, a set of weights must be assigned to mismatches, insertion/deletions, etc. These weights are not predetermined, although efforts are being made to deduce biologically meaningful values from data. In addition, there are sometimes unknown constraints on the sequences that cause the "true" alignment to disagree with the optimum (computer) solution. To assist in overcoming these difficulties, an algorithm has been developed to produce all alignments within a specified distance of the optimum. The distance can be chosen after the optimum is computed, and the algorithm can be repeated at will. Earlier algorithms to solve this problem were very complex and not practical for any case involving sequences with significant time or storage requirements. The algorithm presented here overcomes these difficulties and has application to general, discrete dynamic programming problems.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA