Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Genome Res ; 19(7): 1316-23, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19498102

RESUMEN

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.


Asunto(s)
Secuencia de Consenso , Genoma , Sistemas de Lectura Abierta/genética , Animales , Humanos , Ratones , Alineación de Secuencia
2.
Genome Res ; 14(5): 976-87, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123595

RESUMEN

We describe a novel algorithm for deriving the minimal set of nonredundant transcripts compatible with the splicing structure of a set of ESTs mapped on a genome. Sets of ESTs with compatible splicing are represented by a special type of graph. We describe the algorithms for building the graphs and for deriving the minimal set of transcripts from the graphs that are compatible with the evidence. These algorithms are part of the Ensembl automatic gene annotation system, and its results, using ESTs, are provided at www.ensembl.org as ESTgenes for the mosquito, Caenorhabditis briggsae, C. elegans, zebrafish, human, mouse, and rat genomes. Here we also report on the results of this method applied to the human and mouse genomes.


Asunto(s)
Empalme Alternativo/genética , Etiquetas de Secuencia Expresada , Programas Informáticos , Animales , Caenorhabditis/genética , Caenorhabditis elegans/genética , Biología Computacional , Culicidae/genética , ADN de Helmintos/genética , Genes , Genes de Helminto , Genes de Insecto , Humanos , Ratones , Valor Predictivo de las Pruebas , Ratas , Reproducibilidad de los Resultados , Transcripción Genética , Pez Cebra/genética
3.
Genome Res ; 14(5): 942-50, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123590

RESUMEN

As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C. briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.


Asunto(s)
Automatización , Biología Computacional/métodos , Genes/fisiología , Animales , Anopheles/genética , Caenorhabditis/genética , ADN/genética , ADN de Helmintos/genética , Etiquetas de Secuencia Expresada , Dosificación de Gen , Genes de Helminto/fisiología , Genes de Insecto/fisiología , Genoma , Genoma Humano , Proteínas del Helminto/genética , Humanos , Proteínas de Insectos/genética , Ratones , Valor Predictivo de las Pruebas , Proteínas/genética , Seudogenes/genética , Ratas , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Programas Informáticos , Secuencias Repetidas en Tándem/genética , Regiones no Traducidas/genética
4.
Genome Res ; 14(5): 934-41, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123589

RESUMEN

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.


Asunto(s)
Biología Computacional/métodos , Secuencia de Bases/genética , ADN/genética , Bases de Datos Genéticas/normas , Lenguajes de Programación , Proteínas/clasificación , Programas Informáticos , Diseño de Software
5.
Brain ; 125(Pt 6): 1337-47, 2002 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-12023322

RESUMEN

Linkage analysis in multiplex families has provisionally identified several genomic regions where genes influencing susceptibility to multiple sclerosis are likely to be located. It is anticipated that association mapping will provide a higher degree of resolution, but this more powerful approach is limited by the substantial genotyping effort required. Here, we describe the first use of DNA pooling to screen the whole genome for association in multiple sclerosis based on a 0.5 cM map of microsatellite markers and using four DNA pools derived from cases (n = 216), controls (n = 219) and trio families (n = 745 affected individuals and their 1490 parents). The 10 markers showing the greatest evidence for association with multiple sclerosis that emerge from this analysis include three from the HLA region on chromosome 6p (D6S1615, D6S2444 and TNFa), providing a positive control for the method, four from regions previously identified by linkage analysis in UK multiplex families (two mapping to chromosome 17q GCT6E11 and D17S1535; one to chromosome 1p GGAA30B06; and one to 19q D19S585), and three from novel sites with respect to linkage analysis (D1S1590 at 1q; D2S2739 at 2p; and D4S416 at 4q). Our results thus provide further supporting evidence for the candidature of 6p, 17q, 19q and 1p as regions encoding susceptibililty genes for multiple sclerosis. The protocol used in this UK-based study is now being extended to 18 additional sites in Europe in order to search for susceptibility genes shared between populations of common ancestry, as well as those that exert ethnically more restricted effects.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Genoma Humano , Desequilibrio de Ligamiento/genética , Esclerosis Múltiple/genética , Adulto , Distribución de Chi-Cuadrado , Femenino , Marcadores Genéticos/genética , Pruebas Genéticas/métodos , Humanos , Masculino , Repeticiones de Microsatélite/genética , Proyectos Piloto
6.
Genome Res ; 14(5): 925-8, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15078858

RESUMEN

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Asunto(s)
Biología Computacional/tendencias
7.
Science ; 298(5591): 129-49, 2002 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-12364791

RESUMEN

Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.


Asunto(s)
Anopheles/genética , Genes de Insecto , Genoma , Análisis de Secuencia de ADN , Animales , Anopheles/clasificación , Anopheles/parasitología , Anopheles/fisiología , Evolución Biológica , Sangre , Inversión Cromosómica , Cromosomas Artificiales Bacterianos , Biología Computacional , Elementos Transponibles de ADN , Digestión , Drosophila melanogaster/genética , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Etiquetas de Secuencia Expresada , Conducta Alimentaria , Regulación de la Expresión Génica , Variación Genética , Haplotipos , Humanos , Proteínas de Insectos/química , Proteínas de Insectos/genética , Proteínas de Insectos/fisiología , Insectos Vectores/genética , Insectos Vectores/parasitología , Insectos Vectores/fisiología , Malaria Falciparum/transmisión , Datos de Secuencia Molecular , Control de Mosquitos , Mapeo Físico de Cromosoma , Plasmodium falciparum/crecimiento & desarrollo , Polimorfismo de Nucleótido Simple , Proteoma , Especificidad de la Especie , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/fisiología
8.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-12466850

RESUMEN

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Asunto(s)
Cromosomas de los Mamíferos/genética , Evolución Molecular , Genoma , Ratones/genética , Mapeo Físico de Cromosoma , Animales , Composición de Base , Secuencia Conservada/genética , Islas de CpG/genética , Regulación de la Expresión Génica , Genes/genética , Variación Genética/genética , Genoma Humano , Genómica , Humanos , Ratones/clasificación , Ratones Noqueados , Ratones Transgénicos , Modelos Animales , Familia de Multigenes/genética , Mutagénesis , Neoplasias/genética , Proteoma/genética , Seudogenes/genética , Sitios de Carácter Cuantitativo/genética , ARN no Traducido/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Selección Genética , Análisis de Secuencia de ADN , Cromosomas Sexuales/genética , Especificidad de la Especie , Sintenía
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA