Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nucleic Acids Res ; 31(13): 3507-9, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824355

RESUMO

SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Aminoácidos , Animais , Sequência de Bases , Sequência Conservada , Componentes do Gene , Humanos , Internet , Cadeias de Markov , Camundongos , Peptídeos/química , RNA Mensageiro/química , RNA não Traduzido/química , Ratos
2.
J Comput Biol ; 10(3-4): 509-20, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12935341

RESUMO

The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high-scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated nonconserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, successfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.


Assuntos
Biologia Computacional/métodos , Interpretação Estatística de Dados , Alinhamento de Sequência/métodos , Algoritmos , Animais , Antígenos CD4/genética , Humanos , Cadeias de Markov , Camundongos
3.
J Comput Biol ; 9(2): 389-99, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12015888

RESUMO

Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper, we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding and describe applications to DNA-cDNA and DNA-protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.


Assuntos
Cadeias de Markov , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Biologia Computacional , DNA/genética , Modelos Estatísticos , Proteínas/genética
4.
Genome Res ; 13(3): 496-502, 2003 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-12618381

RESUMO

Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1). generalized hidden Markov models, which have been used previously for gene finding, and (2). pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus and Plasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.


Assuntos
Genes/genética , Cadeias de Markov , Alinhamento de Sequência/estatística & dados numéricos , Software , Animais , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Sequência Conservada/genética , DNA/genética , DNA de Protozoário/genética , Genes de Protozoários/genética , Humanos , Camundongos , Plasmodium falciparum/genética , Plasmodium vivax/genética , Alinhamento de Sequência/métodos , Design de Software , Especificidade da Espécie
5.
Glycobiology ; 14(6): 521-7, 2004 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15044386

RESUMO

Mucins are large glycoproteins characterized by mucin domains that show little sequence conservation and are rich in the amino acids Ser, Thr, and Pro. To effectively predict mucins from genomic and protein sequences obtained from genome projects, we developed a strategy based on the amino acid compositional bias characteristic of the mucin domains. This strategy is combined with an analysis of other features commonly found in mucins. Our method has now been used to predict mucins in the puffer fish Fugu rubripes that were previously not identified or annotated. At least three gel-forming mucins were found with the same general domain structure as the human MUC2 mucin. In addition one transmembrane mucin was identified with SEA and EGF domains as found in the mammalian transmembrane mucins. These results suggest that the number of gel-forming mucins has been conserved during evolution of the vertebrates, whereas the family of transmembrane mucins has been markedly expanded in the higher vertebrates.


Assuntos
Biopolímeros/química , Biologia Computacional , Proteínas de Membrana/química , Mucinas/química , Animais , Takifugu
6.
Genome Res ; 14(4): 661-4, 2004 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15060007

RESUMO

We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM gene-finding program. Using this method, we found 3698 gene triples in the human, mouse, and rat genomes which are predicted with exactly the same gene structure. We show, both computationally and experimentally, that the introns of these triples are predicted accurately as compared with the introns of other ab initio gene prediction sets. Computationally, we compared the introns of these gene triples, as well as those from other ab initio gene finders, with known intron annotations. We show that a unique property of SLAM, namely that it predicts gene structures simultaneously in two organisms, is key to producing sets of predictions that are highly accurate in intron structure when combined with other programs. Experimentally, we performed reverse transcription-polymerase chain reaction (RT-PCR) in both the human and rat to test the exon pairs flanking introns from a subset of the gene triples for which the human gene had not been previously identified. By performing RT-PCR on orthologous introns in both the human and rat genomes, we additionally explore the validity of using RT-PCR as a method for confirming gene predictions.


Assuntos
Genes/genética , Animais , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Bases de Dados Genéticas , Éxons/genética , Genoma , Genoma Humano , Humanos , Íntrons/genética , Camundongos , Valor Preditivo dos Testes , Ratos , Homologia de Sequência do Ácido Nucleico , Software
7.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-12466850

RESUMO

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Assuntos
Cromossomos de Mamíferos/genética , Evolução Molecular , Genoma , Camundongos/genética , Mapeamento Físico do Cromossomo , Animais , Composição de Bases , Sequência Conservada/genética , Ilhas de CpG/genética , Regulação da Expressão Gênica , Genes/genética , Variação Genética/genética , Genoma Humano , Genômica , Humanos , Camundongos/classificação , Camundongos Knockout , Camundongos Transgênicos , Modelos Animais , Família Multigênica/genética , Mutagênese , Neoplasias/genética , Proteoma/genética , Pseudogenes/genética , Locos de Características Quantitativas/genética , RNA não Traduzido/genética , Sequências Repetitivas de Ácido Nucleico/genética , Seleção Genética , Análise de Sequência de DNA , Cromossomos Sexuais/genética , Especificidade da Espécie , Sintenia
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa