Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 19(7): 1316-23, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19498102

RESUMO

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.


Assuntos
Sequência Consenso , Genoma , Fases de Leitura Aberta/genética , Animais , Humanos , Camundongos , Alinhamento de Sequência
2.
Genome Res ; 14(5): 976-87, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123595

RESUMO

We describe a novel algorithm for deriving the minimal set of nonredundant transcripts compatible with the splicing structure of a set of ESTs mapped on a genome. Sets of ESTs with compatible splicing are represented by a special type of graph. We describe the algorithms for building the graphs and for deriving the minimal set of transcripts from the graphs that are compatible with the evidence. These algorithms are part of the Ensembl automatic gene annotation system, and its results, using ESTs, are provided at www.ensembl.org as ESTgenes for the mosquito, Caenorhabditis briggsae, C. elegans, zebrafish, human, mouse, and rat genomes. Here we also report on the results of this method applied to the human and mouse genomes.


Assuntos
Processamento Alternativo/genética , Etiquetas de Sequências Expressas , Software , Animais , Caenorhabditis/genética , Caenorhabditis elegans/genética , Biologia Computacional , Culicidae/genética , DNA de Helmintos/genética , Genes , Genes de Helmintos , Genes de Insetos , Humanos , Camundongos , Valor Preditivo dos Testes , Ratos , Reprodutibilidade dos Testes , Transcrição Gênica , Peixe-Zebra/genética
3.
Genome Res ; 14(5): 942-50, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123590

RESUMO

As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C. briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.


Assuntos
Automação , Biologia Computacional/métodos , Genes/fisiologia , Animais , Anopheles/genética , Caenorhabditis/genética , DNA/genética , DNA de Helmintos/genética , Etiquetas de Sequências Expressas , Dosagem de Genes , Genes de Helmintos/fisiologia , Genes de Insetos/fisiologia , Genoma , Genoma Humano , Proteínas de Helminto/genética , Humanos , Proteínas de Insetos/genética , Camundongos , Valor Preditivo dos Testes , Proteínas/genética , Pseudogenes/genética , Ratos , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Software , Sequências de Repetição em Tandem/genética , Regiões não Traduzidas/genética
4.
Genome Res ; 14(5): 934-41, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123589

RESUMO

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.


Assuntos
Biologia Computacional/métodos , Sequência de Bases/genética , DNA/genética , Bases de Dados Genéticas/normas , Linguagens de Programação , Proteínas/classificação , Software , Design de Software
5.
Brain ; 125(Pt 6): 1337-47, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12023322

RESUMO

Linkage analysis in multiplex families has provisionally identified several genomic regions where genes influencing susceptibility to multiple sclerosis are likely to be located. It is anticipated that association mapping will provide a higher degree of resolution, but this more powerful approach is limited by the substantial genotyping effort required. Here, we describe the first use of DNA pooling to screen the whole genome for association in multiple sclerosis based on a 0.5 cM map of microsatellite markers and using four DNA pools derived from cases (n = 216), controls (n = 219) and trio families (n = 745 affected individuals and their 1490 parents). The 10 markers showing the greatest evidence for association with multiple sclerosis that emerge from this analysis include three from the HLA region on chromosome 6p (D6S1615, D6S2444 and TNFa), providing a positive control for the method, four from regions previously identified by linkage analysis in UK multiplex families (two mapping to chromosome 17q GCT6E11 and D17S1535; one to chromosome 1p GGAA30B06; and one to 19q D19S585), and three from novel sites with respect to linkage analysis (D1S1590 at 1q; D2S2739 at 2p; and D4S416 at 4q). Our results thus provide further supporting evidence for the candidature of 6p, 17q, 19q and 1p as regions encoding susceptibililty genes for multiple sclerosis. The protocol used in this UK-based study is now being extended to 18 additional sites in Europe in order to search for susceptibility genes shared between populations of common ancestry, as well as those that exert ethnically more restricted effects.


Assuntos
Predisposição Genética para Doença/genética , Genoma Humano , Desequilíbrio de Ligação/genética , Esclerose Múltipla/genética , Adulto , Distribuição de Qui-Quadrado , Feminino , Marcadores Genéticos/genética , Testes Genéticos/métodos , Humanos , Masculino , Repetições de Microssatélites/genética , Projetos Piloto
6.
Genome Res ; 14(5): 925-8, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15078858

RESUMO

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Assuntos
Biologia Computacional/tendências
7.
Science ; 298(5591): 129-49, 2002 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-12364791

RESUMO

Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.


Assuntos
Anopheles/genética , Genes de Insetos , Genoma , Análise de Sequência de DNA , Animais , Anopheles/classificação , Anopheles/parasitologia , Anopheles/fisiologia , Evolução Biológica , Sangue , Inversão Cromossômica , Cromossomos Artificiais Bacterianos , Biologia Computacional , Elementos de DNA Transponíveis , Digestão , Drosophila melanogaster/genética , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Etiquetas de Sequências Expressas , Comportamento Alimentar , Regulação da Expressão Gênica , Variação Genética , Haplótipos , Humanos , Proteínas de Insetos/química , Proteínas de Insetos/genética , Proteínas de Insetos/fisiologia , Insetos Vetores/genética , Insetos Vetores/parasitologia , Insetos Vetores/fisiologia , Malária Falciparum/transmissão , Dados de Sequência Molecular , Controle de Mosquitos , Mapeamento Físico do Cromossomo , Plasmodium falciparum/crescimento & desenvolvimento , Polimorfismo de Nucleotídeo Único , Proteoma , Especificidade da Espécie , Fatores de Transcrição/química , Fatores de Transcrição/genética , Fatores de Transcrição/fisiologia
8.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-12466850

RESUMO

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Assuntos
Cromossomos de Mamíferos/genética , Evolução Molecular , Genoma , Camundongos/genética , Mapeamento Físico do Cromossomo , Animais , Composição de Bases , Sequência Conservada/genética , Ilhas de CpG/genética , Regulação da Expressão Gênica , Genes/genética , Variação Genética/genética , Genoma Humano , Genômica , Humanos , Camundongos/classificação , Camundongos Knockout , Camundongos Transgênicos , Modelos Animais , Família Multigênica/genética , Mutagênese , Neoplasias/genética , Proteoma/genética , Pseudogenes/genética , Locos de Características Quantitativas/genética , RNA não Traduzido/genética , Sequências Repetitivas de Ácido Nucleico/genética , Seleção Genética , Análise de Sequência de DNA , Cromossomos Sexuais/genética , Especificidade da Espécie , Sintenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA