Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Genome Res ; 14(4): 661-4, 2004 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-15060007

RESUMEN

We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM gene-finding program. Using this method, we found 3698 gene triples in the human, mouse, and rat genomes which are predicted with exactly the same gene structure. We show, both computationally and experimentally, that the introns of these triples are predicted accurately as compared with the introns of other ab initio gene prediction sets. Computationally, we compared the introns of these gene triples, as well as those from other ab initio gene finders, with known intron annotations. We show that a unique property of SLAM, namely that it predicts gene structures simultaneously in two organisms, is key to producing sets of predictions that are highly accurate in intron structure when combined with other programs. Experimentally, we performed reverse transcription-polymerase chain reaction (RT-PCR) in both the human and rat to test the exon pairs flanking introns from a subset of the gene triples for which the human gene had not been previously identified. By performing RT-PCR on orthologous introns in both the human and rat genomes, we additionally explore the validity of using RT-PCR as a method for confirming gene predictions.


Asunto(s)
Genes/genética , Animales , Mapeo Cromosómico/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Exones/genética , Genoma , Genoma Humano , Humanos , Intrones/genética , Ratones , Valor Predictivo de las Pruebas , Ratas , Homología de Secuencia de Ácido Nucleico , Programas Informáticos
2.
Glycobiology ; 14(6): 521-7, 2004 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15044386

RESUMEN

Mucins are large glycoproteins characterized by mucin domains that show little sequence conservation and are rich in the amino acids Ser, Thr, and Pro. To effectively predict mucins from genomic and protein sequences obtained from genome projects, we developed a strategy based on the amino acid compositional bias characteristic of the mucin domains. This strategy is combined with an analysis of other features commonly found in mucins. Our method has now been used to predict mucins in the puffer fish Fugu rubripes that were previously not identified or annotated. At least three gel-forming mucins were found with the same general domain structure as the human MUC2 mucin. In addition one transmembrane mucin was identified with SEA and EGF domains as found in the mammalian transmembrane mucins. These results suggest that the number of gel-forming mucins has been conserved during evolution of the vertebrates, whereas the family of transmembrane mucins has been markedly expanded in the higher vertebrates.


Asunto(s)
Biopolímeros/química , Biología Computacional , Proteínas de la Membrana/química , Mucinas/química , Animales , Takifugu
3.
J Comput Biol ; 10(3-4): 509-20, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-12935341

RESUMEN

The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high-scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated nonconserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, successfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.


Asunto(s)
Biología Computacional/métodos , Interpretación Estadística de Datos , Alineación de Secuencia/métodos , Algoritmos , Animales , Antígenos CD4/genética , Humanos , Cadenas de Markov , Ratones
4.
Nucleic Acids Res ; 31(13): 3507-9, 2003 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-12824355

RESUMEN

SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.


Asunto(s)
Genómica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Secuencia Conservada , Componentes del Gen , Humanos , Internet , Cadenas de Markov , Ratones , Péptidos/química , ARN Mensajero/química , ARN no Traducido/química , Ratas
5.
Genome Res ; 13(3): 496-502, 2003 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-12618381

RESUMEN

Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1). generalized hidden Markov models, which have been used previously for gene finding, and (2). pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus and Plasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.


Asunto(s)
Genes/genética , Cadenas de Markov , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Animales , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Secuencia Conservada/genética , ADN/genética , ADN Protozoario/genética , Genes Protozoarios/genética , Humanos , Ratones , Plasmodium falciparum/genética , Plasmodium vivax/genética , Alineación de Secuencia/métodos , Diseño de Software , Especificidad de la Especie
6.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-12466850

RESUMEN

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Asunto(s)
Cromosomas de los Mamíferos/genética , Evolución Molecular , Genoma , Ratones/genética , Mapeo Físico de Cromosoma , Animales , Composición de Base , Secuencia Conservada/genética , Islas de CpG/genética , Regulación de la Expresión Génica , Genes/genética , Variación Genética/genética , Genoma Humano , Genómica , Humanos , Ratones/clasificación , Ratones Noqueados , Ratones Transgénicos , Modelos Animales , Familia de Multigenes/genética , Mutagénesis , Neoplasias/genética , Proteoma/genética , Seudogenes/genética , Sitios de Carácter Cuantitativo/genética , ARN no Traducido/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Selección Genética , Análisis de Secuencia de ADN , Cromosomas Sexuales/genética , Especificidad de la Especie , Sintenía
7.
J Comput Biol ; 9(2): 389-99, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-12015888

RESUMEN

Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper, we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding and describe applications to DNA-cDNA and DNA-protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.


Asunto(s)
Cadenas de Markov , Alineación de Secuencia/estadística & datos numéricos , Algoritmos , Biología Computacional , ADN/genética , Modelos Estadísticos , Proteínas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...