Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Genetics ; 141(2): 771-83, 1995 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-8647409

RESUMO

A model is introduced describing nucleotide substitution in ribosomal RNA (rRNA) genes. In this model, substitution in the stem and loop regions of rRNA is modeled with 16- and four-state continuous time Markov chains, respectively. The mean substitution rates at nucleotide sites are assumed to follow gamma distributions that are different for the two types of regions. The simplest formulation of the model allows for explicit expressions for transition probabilities of the Markov processes to be found. These expressions were used to analyze several 16S-like rRNA genes from higher eukaryotes with the maximum likelihood method. Although the observed proportion of invariable sites was only slightly higher in the stem regions, the estimated average substitution rates in the stem regions were almost two times as high as in the loop regions. Therefore, the degree of site heterogeneity of substitution rates in the stem regions seems to be higher than in the loop regions of animal 16S-like rRNAs due to presence of a few rapidly evolving sites. The model appears to be helpful in understanding the regularities of nucleotide substitution in rRNAs and probably minimizing errors in recovering phylogeny for distantly related taxa from these genes.


Assuntos
Evolução Biológica , DNA Ribossômico/genética , Modelos Genéticos , Mutação Puntual , RNA Ribossômico 16S/genética , RNA Ribossômico/genética , Animais , Artrópodes/genética , Composição de Bases , Hominidae/genética , Humanos , Mamíferos , Cadeias de Markov , Matemática , Nematoides/genética , Conformação de Ácido Nucleico , RNA Ribossômico/química , RNA Ribossômico 16S/química , Saccharomyces cerevisiae/genética
2.
Genetics ; 159(3): 1291-8, 2001 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-11729170

RESUMO

Regulatory networks provide control over complex cell behavior in all kingdoms of life. Here we describe a statistical model, based on representing proteins as collections of domains or motifs, which predicts unknown molecular interactions within these biological networks. Using known protein-protein interactions of Saccharomyces cerevisiae as training data, we were able to predict the links within this network with only 7% false-negative and 10% false-positive error rates. We also use Markov chain Monte Carlo simulation for the prediction of networks with maximum probability under our model. This model can be applied across species, where interaction data from one (or several) species can be used to infer interactions in another. In addition, the model is extensible and can be analogously applied to other molecular data (e.g., DNA sequences).


Assuntos
Técnicas Genéticas , Modelos Genéticos , Saccharomyces cerevisiae/genética , Transdução de Sinais , Motivos de Aminoácidos , Modelos Estatísticos , Método de Monte Carlo , Plantas/genética , Ligação Proteica , Estrutura Terciária de Proteína , Análise de Sequência de DNA , Software
3.
Genetics ; 154(1): 381-95, 2000 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-10628997

RESUMO

We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.


Assuntos
Drosophila/genética , Variação Genética , Modelos Genéticos , Álcool Desidrogenase/genética , Animais , Intervalos de Confiança , Análise de Fourier , Genes de Imunoglobulinas , Vetores Genéticos , Humanos , Funções Verossimilhança , Mamíferos
4.
Genetics ; 144(4): 1975-83, 1996 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-8978080

RESUMO

We propose a simple algorithm for estimating the number of nucleotide differences between a pair of RNA or DNA sequences through comparison of their RNAse A mismatch cleavage patterns. In the RNAse A mismatch cleavage technique two or more sample sequences are hybridized to the same RNA probe, the hybrids are partially digested with RNAse A, and the digestion products are compared on an electrophoretic gel. Here we provide an algorithm for converting the numbers of unique and matching electrophoretic bands into an estimate of the number of nucleotide differences between the sequences. Computer simulation indicates that the proposed method yields a robust estimate of the genetic distance despite stochastic errors and occasional violation of certain assumptions. Our study suggests that the method performs best when the distance between the sequences is < 15 differences. When the sequences under analysis are likely to have larger distances, we advise to substitute one long riboprobe with a set of shorter nonoverlapping probes. The new algorithm is applied to infer the proximity of several strains of pseudorabies virus.


Assuntos
DNA/genética , Genes , RNA/genética , Análise de Sequência , Algoritmos , Animais , Humanos
5.
Gene ; 259(1-2): 235-44, 2000 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-11163981

RESUMO

We describe a graphical editor designed specifically to facilitate analysis and visualization of complex signal-transduction pathways. The editor provides automatic layout of complex regulatory graphs and enables users easily to maintain, edit, and exchange publication-quality images of regulatory networks.


Assuntos
Gráficos por Computador , Linguagens de Programação , Transdução de Sinais/fisiologia , Algoritmos , Apoptose/fisiologia , Ciclo Celular/fisiologia , Humanos , Processamento de Imagem Assistida por Computador , Design de Software
6.
Gene ; 259(1-2): 245-52, 2000 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-11163982

RESUMO

We describe a system which automatically identifies gene and protein names in journal articles, an important and non-trivial first step in knowledge extraction of protein and gene actions. Our system uses a database of gene and protein names and is based on BLAST [Altschul et al., Nucleic Acids Res. 25 (1997) 3389-3402], a popular tool for DNA and protein sequence comparison. We describe a method that consists of mapping sequences of text characters into sequences of nucleotides that can be processed by BLAST. We demonstrate that this approach is feasible: the system matches gene and protein names with a recall of 78.8% and a precision of 71.7%, which includes names that are not part of the system database. An analysis of the results suggests techniques that can be used to improve performance further.


Assuntos
Algoritmos , Genes , Armazenamento e Recuperação da Informação/métodos , Proteínas , Sequência de Bases , Bases de Dados como Assunto , Dados de Sequência Molecular , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Software
7.
Gene ; 208(1): 31-5, 1998 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-9479040

RESUMO

We describe two Java applets which are useful for insightful presentation of intermediate experimental data in gene discovery projects involving large scale sequencing. One of these applets provides a physical map of a genomic region and provides easy access to the second applet, which furnishes a detailed map of sequence contigs associated with clones on the physical map. In particular, the second applet displays all the known information about each contig, including the presence of exons, database homology 'hits', repetitive elements and other features; the graphics are linked to other World Wide Web pages, providing detailed information on each feature. These applets should be useful to other research groups working on large sequencing projects.


Assuntos
Mapeamento Cromossômico , Redes de Comunicação de Computadores , Bases de Dados Factuais , Genes , Análise de Sequência de DNA , Software , Cromossomos Humanos Par 13/genética , Cosmídeos , DNA Complementar , Éxons , Doenças Genéticas Inatas/genética , Humanos , Leucemia Linfocítica Crônica de Células B/genética , Linguagens de Programação , Sequências Repetitivas de Ácido Nucleico
8.
Gene ; 273(1): 89-96, 2001 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-11483364

RESUMO

Several years ago, we initiated a long-term project of cloning new human ATP-binding cassette (ABC) transporters and linking them to various disease phenotypes. As one of the results of this project, we present two new members of the human ABCC subfamily, ABCC11 and ABCC12. These two new human ABC transporters were fully characterized and mapped to the human chromosome 16q12. With the addition of these two genes, the complete human ABCC subfamily has 12 identified members (ABCC1-12), nine from the multidrug resistance-like subgroup, two from the sulfonylurea receptor subgroup, and the CFTR gene. Phylogenetic analysis determined that ABCC11 and ABCC12 are derived by duplication, and are most closely related to the ABCC5 gene. Genetic variation in some ABCC subfamily members is associated with human inherited diseases, including cystic fibrosis (CFTR/ABCC7), Dubin-Johnson syndrome (ABCC2), pseudoxanthoma elasticum (ABCC6) and familial persistent hyperinsulinemic hypoglycemia of infancy (ABCC8). Since ABCC11 and ABCC12 were mapped to a region harboring gene(s) for paroxysmal kinesigenic choreoathetosis, the two genes represent positional candidates for this disorder.


Assuntos
Transportadores de Cassetes de Ligação de ATP/genética , Cromossomos Humanos Par 16 , Sequência de Aminoácidos , Sequência de Bases , Linhagem Celular , Mapeamento Cromossômico , Clonagem Molecular , Humanos , Dados de Sequência Molecular , Proteína 2 Associada à Farmacorresistência Múltipla , Filogenia
9.
Curr Med Res Opin ; 16(2): 88-93, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10893652

RESUMO

The recent cloning of two cDNAs (Clone 1 and Clone 5) that encode novel hypothetical proteins, combining an N-terminal Ig kappa-like domain with features that occur in microfibril-associated glycoproteins (MAGPs) and fibrinogen, raises the question of whether the Ig fold may have originated in association with functions that may be more primitive than soluble immunity. Pairwise alignments were performed to compare similarities of fibrinogen-beta, Clone 1 and an Ig kappa sequence. Clone 1 had two regions in its Ig domain with > 50% similarity to fibrinogen, while Ig kappa was virtually non-homologous to fibrinogen. This result suggests that Clone 1 is closer to their common ancestor. A neighbour-joining tree was computed, and it supported this interpretation. Three-dimensional modelling of the most highly conserved sequence revealed two antiparallel beta strands connected by a helix. These observations suggest that the ancestral gene for the immunoglobulin superfamily may have originated as a primitive sandwich-like fold, possibly used in matrix/cell communications.


Assuntos
Evolução Molecular , Proteínas da Matriz Extracelular , Imunoglobulinas/genética , Pepinos-do-Mar/genética , Animais , Proteínas Contráteis/genética , Fibrinogênio/genética , Modelos Genéticos , Dobramento de Proteína , Fatores de Processamento de RNA
10.
DNA Seq ; 9(4): 189-204, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-10520750

RESUMO

Multiple neoplasias including B-cell non-Hodgkin's lymphoma, breast carcinoma, and ovarian carcinoma, have been associated with frequent deletions of the distal region on the long arm of human chromosome 6, suggesting the presence of one or more tumor suppressor gene(s) at this locus. Loss of heterozygosity analysis of breast and ovarian tumors has further restricted the minimal region of loss within 6q27. To further characterize this genomic region for gene content including putative tumor suppressor genes as well as other elements that may contribute to tumorigenesis, a 68940-bp contiguous sequence, encompassing markers D6S193 and D6S297, was generated by random shotgun sequencing of a cosmid, P1, and PAC contig. In addition, exon trapping was performed utilizing a subset of these clones. Sixteen trapped exons, ranging in size from 44 to 399 bp, span this approximately 69-kb region. Many other putative exons have been identified computationally. Further analysis has identified 13 potential promoters and 13 putative polyadenylation sites in the region. Northern analysis identified a transcript mapping within this interval that is expressed in ovarian, breast, and lymphoid-derived tumor cell lines. Consideration of these data, together with the demonstration of several regions of high CpG content, suggests the possibility of several genes at this locus.


Assuntos
Cromossomos Humanos Par 6/genética , Genes Supressores de Tumor , Elementos Alu , Sequência de Bases , Neoplasias da Mama/genética , Deleção Cromossômica , Mapeamento Cromossômico , Clonagem Molecular , DNA/genética , Éxons , Feminino , Sequência Rica em GC , Variação Genética , Genoma Humano , Humanos , Linfoma de Células B/genética , Dados de Sequência Molecular , Neoplasias Ovarianas/genética , Regiões Promotoras Genéticas , Sequências Repetitivas de Ácido Nucleico , Células Tumorais Cultivadas
12.
J Mol Evol ; 42(2): 183-93, 1996 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-8919870

RESUMO

The evolutionary relationships of four eukaryotic kingdoms--Animalia, Plantae, Fungi, and Protista--remain unclear. In particular, statistical support for the closeness of animals to fungi rather than to plants is lacking, and a preferred branching order of these and other eukaryotic lineages is still controversial even though molecular sequences from diverse eukaryotic taxa have been analyzed. We report a statistical analysis of 214 sequences of nuclear small-subunit ribosomal RNA (srRNA) gene undertaken to clarify these evolutionary relationships. We have considered the variability of substitution rates and the nonindependence of nucleotide substitution across sites in the srRNA gene in testing alternative hypotheses regarding the branching patterns of eukaryote phylogeny. We find that the rates of evolution among sites in the srRNA sequences vary substantially and are approximately gamma distributed with size and shape parameter equal to 0.76. Our results suggest that (1) the animals and true fungi are indeed closer to each other than to any other "crown" group in the eukaryote tree, (2) red algae are the closest relatives of animals, true fungi, and green plants, and (3) the heterokonts and alveolates probably evolved prior to the divergence of red algae and animal-fungus-green-plant lineages. Furthermore, our analyses indicate that the branching order of the eukaryotic lineages that diverged prior to the evolution of alveolates may be generally difficult to resolve with the srRNA sequence data.


Assuntos
Evolução Molecular , RNA Ribossômico/genética , Animais , Células Eucarióticas , Humanos , Modelos Genéticos , Filogenia
13.
J Mol Evol ; 35(4): 367-75, 1992 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-1404422

RESUMO

Statistical properties of the ordinary least-squares (OLS), generalized least-squares (GLS), and minimum-evolution (ME) methods of phylogenetic inference were studied by considering the case of four DNA sequences. Analytical study has shown that all three methods are statistically consistent in the sense that as the number of nucleotides examined (m) increases they tend to choose the true tree as long as the evolutionary distances used are unbiased. When evolutionary distances (dij's) are large and sequences under study are not very long, however, the OLS criterion is often biased and may choose an incorrect tree more often than expected under random choice. It is also shown that the variance-covariance matrix of dij's becomes singular as dij's approach zero and thus the GLS may not be applicable when dij's are small. The ME method suffers from neither of these problems, and the ME criterion is statistically unbiased. Computer simulation has shown that the ME method is more efficient in obtaining the true tree than the OLS and GLS methods and that the OLS is more efficient than the GLS when dij's are small, but otherwise the GLS is more efficient.


Assuntos
Análise dos Mínimos Quadrados , Filogenia , Evolução Biológica
14.
J Mol Evol ; 38(3): 295-9, 1994 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-8006996

RESUMO

When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models.


Assuntos
Evolução Biológica , DNA/genética , Modelos Genéticos , Sequência de Aminoácidos , Viés , Biometria
15.
Mol Biol Evol ; 13(9): 1255-65, 1996 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-8896378

RESUMO

The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree-making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.


Assuntos
Modelos Biológicos , Filogenia , Sequência de Bases , Simulação por Computador , Modelos Genéticos , Modelos Teóricos
16.
Pac Symp Biocomput ; : 203-14, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11262941

RESUMO

We suggest a method implemented in a computer program, immodestly dubbed TSUNAMI, that allows us to compare two homologous protein subfamilies with respect to the distribution of substitution rates along sequences. This study furthers our earlier work on a wavelet model of rate variation (1). The current approach allows sensitive detection of subtle discordances in the selection patterns between two protein subfamilies. In addition to performing fast computation of the maximum posterior probability estimates of the relative substitution rates, the method can select the most appropriate number of wavelet parameters for a particular dataset. TSUNAMI is based on a Markov chain Monte Carlo technique, and appears to be more applicable to larger datasets than is the full likelihood-based approach.


Assuntos
Substituição de Aminoácidos , Proteínas/genética , Simulação por Computador , Intervalos de Confiança , Interpretação Estatística de Dados , Evolução Molecular , Funções Verossimilhança , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo , Software
17.
Mol Biol Evol ; 10(5): 1073-95, 1993 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-8412650

RESUMO

The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method. We also present simple mathematical formulas for computing branch length estimates and their standard errors for any unrooted bifurcating tree, with the least-squares approach. As a numerical example, we have analyzed mtDNA sequence data obtained by Vigilant et al. and have found the ME tree for 95 human and 1 chimpanzee (outgroup) sequences. The tree was somewhat different from the neighbor-joining tree constructed by Tamura and Nei, but there was no statistically significant difference between them.


Assuntos
Algoritmos , Modelos Genéticos , Filogenia , Animais , DNA Mitocondrial/genética , Etnicidade/genética , Hominidae/genética , Humanos , Grupos Raciais/genética
18.
Comput Appl Biosci ; 10(4): 409-12, 1994 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-7804873

RESUMO

The METREE program package for estimating phylogenetic trees with the minimum evolution method is written in Turbo C 2.0 and is intended to be used on any IBM-compatible personal computers that have a mathematical coprocessor. The package is simple to use and is menu driven. A program for visualizing and printing out the final tree is also included.


Assuntos
Evolução Biológica , Técnicas Genéticas , Software , Animais , Sistemas Computacionais , Filogenia
19.
Mol Biol Evol ; 12(1): 131-51, 1995 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-7877488

RESUMO

Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.


Assuntos
Sequência de Bases , Evolução Biológica , DNA/genética , Variação Genética , Modelos Genéticos , Modelos Estatísticos , Simulação por Computador , DNA/química , Modelos Teóricos
20.
Mol Biol Evol ; 12(5): 823-33, 1995 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-7476128

RESUMO

To estimate approximate divergence times of species or species groups with molecular data, we have developed a method of constructing a linearized tree under the assumption of a molecular clock. We present two tests of the molecular clock for a given topology: two-cluster test and branch-length test. The two-cluster test examines the hypothesis of the molecular clock for the two lineages created by an interior node of the tree, whereas the branch-length test examines the deviation of the branch length between the tree root and a tip from the average length. Sequences evolving excessively fast or slow at a high significance level may be eliminated. A linearized tree will then be constructed for a given topology for the remaining sequences under the assumption of rate constancy. We have used these methods to analyze hominoid mitochondrial DNA and drosophilid Adh gene sequences.


Assuntos
Evolução Biológica , Árvores de Decisões , Matemática , Modelos Moleculares , Filogenia , Álcool Desidrogenase/genética , Animais , DNA Mitocondrial/genética , Drosophila/genética , Genes de Insetos , Gorilla gorilla/genética , Hominidae/genética , Humanos , Pan troglodytes/genética , Pongo pygmaeus/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa