Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
BMC Bioinformatics ; 11: 240, 2010 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-20459813

RESUMO

BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. RESULTS: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. CONCLUSIONS: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.


Assuntos
Genoma , Genômica/métodos , Software , Bases de Dados Genéticas
3.
Nucleic Acids Res ; 38(Database issue): D570-6, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19783817

RESUMO

The laboratory mouse is the premier animal model for studying human disease and thousands of mutants have been identified or produced, most recently through gene-specific mutagenesis approaches. High throughput strategies by the International Knockout Mouse Consortium (IKMC) are producing mutants for all protein coding genes. Generating a knock-out line involves huge monetary and time costs so capture of both the data describing each mutant alongside archiving of the line for distribution to future researchers is critical. The European Mouse Mutant Archive (EMMA) is a leading international network infrastructure for archiving and worldwide provision of mouse mutant strains. It operates in collaboration with the other members of the Federation of International Mouse Resources (FIMRe), EMMA being the European component. Additionally EMMA is one of four repositories involved in the IKMC, and therefore the current figure of 1700 archived lines will rise markedly. The EMMA database gathers and curates extensive data on each line and presents it through a user-friendly website. A BioMart interface allows advanced searching including integrated querying with other resources e.g. Ensembl. Other resources are able to display EMMA data by accessing our Distributed Annotation System server. EMMA database access is publicly available at http://www.emmanet.org.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Animais , Cromossomos , Biologia Computacional/tendências , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Internet , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Modelos Genéticos , Estrutura Terciária de Proteína , Software , Interface Usuário-Computador
4.
Genome Res ; 19(2): 327-35, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19029536

RESUMO

We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.


Assuntos
Algoritmos , Biologia Computacional/métodos , Duplicação Gênica , Filogenia , Vertebrados/classificação , Animais , Humanos , Modelos Biológicos , Família Multigênica , Homologia de Sequência , Software , Sintenia , Vertebrados/genética
5.
Nucleic Acids Res ; 36(Database issue): D735-40, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18056084

RESUMO

TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14,351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.


Assuntos
Bases de Dados Genéticas , Filogenia , Animais , Genômica , Internet , Software , Interface Usuário-Computador
6.
Genome Res ; 17(6): 760-74, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17567995

RESUMO

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.


Assuntos
Evolução Molecular , Genoma Humano , Mamíferos/genética , Fases de Leitura Aberta , Filogenia , Alinhamento de Sequência , Animais , Projeto Genoma Humano , Humanos
7.
Genome Res ; 14(5): 925-8, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15078858

RESUMO

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Assuntos
Biologia Computacional/tendências
8.
Genome Res ; 14(3): 463-71, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-14962985

RESUMO

A collection of 90,000 human cDNA clones generated to increase the fraction of "full-length" cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5' end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that approximately 380 gene models described in LocusLink could be extended at their 5' end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation.


Assuntos
Regiões 5' não Traduzidas/genética , DNA Complementar/genética , Genoma Humano , Adulto , Sequência de Aminoácidos/genética , Animais , Linhagem Celular Tumoral , DNA Complementar/classificação , DNA de Neoplasias/classificação , DNA de Neoplasias/genética , Células HeLa/química , Células HeLa/metabolismo , Humanos , Células Jurkat/química , Células Jurkat/metabolismo , Camundongos , Modelos Genéticos , Dados de Sequência Molecular , Fases de Leitura Aberta/genética , Especificidade de Órgãos/genética , Proteínas/química , Proteínas/genética , Alinhamento de Sequência/classificação , Alinhamento de Sequência/métodos , Homologia de Sequência do Ácido Nucleico , Tetraodontiformes/genética
9.
Nat Rev Genet ; 4(4): 251-62, 2003 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-12671656

RESUMO

The increasing number of complete and nearly complete metazoan genome sequences provides a significant amount of material for large-scale comparative genomic analysis. Finding new effective methods to analyse such enormous datasets has been the object of intense research. Three main areas in comparative genomics have recently shown important developments: whole-genome alignment, gene prediction and regulatory-region prediction. Each of these areas improves the methods of deciphering long genomic sequences and uncovering what lies hidden in them.


Assuntos
Genômica , Animais , Evolução Molecular , Genômica/estatística & dados numéricos , Humanos , Internet , Alinhamento de Sequência/estatística & dados numéricos , Software
10.
Nature ; 421(6923): 601-7, 2003 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-12508121

RESUMO

Chromosome 14 is one of five acrocentric chromosomes in the human genome. These chromosomes are characterized by a heterochromatic short arm that contains essentially ribosomal RNA genes, and a euchromatic long arm in which most, if not all, of the protein-coding genes are located. The finished sequence of human chromosome 14 comprises 87,410,661 base pairs, representing 100% of its euchromatic portion, in a single continuous segment covering the entire long arm with no gaps. Two loci of crucial importance for the immune system, as well as more than 60 disease genes, have been localized so far on chromosome 14. We identified 1,050 genes and gene fragments, and 393 pseudogenes. On the basis of comparisons with other vertebrate genomes, we estimate that more than 96% of the chromosome 14 genes have been annotated. From an analysis of the CpG island occurrences, we estimate that 70% of these annotated genes are complete at their 5' end.


Assuntos
Cromossomos Humanos Par 14/genética , Mapeamento Físico do Cromossomo , Análise de Sequência de DNA , Regiões 5' não Traduzidas/genética , Animais , Composição de Bases , Cromossomos Artificiais/genética , Ilhas de CpG/genética , DNA Mitocondrial/genética , DNA Ribossômico/genética , Genes/genética , Genômica , Humanos , Imunidade/genética , Camundongos , Repetições de Microssatélites/genética , Dados de Sequência Molecular , Fases de Leitura Aberta/genética , Pseudogenes/genética , Reprodutibilidade dos Testes , Sintenia/genética
11.
J Gen Virol ; 80 ( Pt 12): 3083-3088, 1999 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-10567638

RESUMO

We investigated the serological, epidemiological and molecular aspects of human T-cell lymphotropic virus type I and II (HTLV-I/II) infection in the Amerindian populations of French Guiana by testing 847 sera. No HTLV-II antibodies were detected, but five individuals (0.59%) were seropositive for HTLV-I. Analysis of the nucleotide sequences of 522 bp of the env gene and the compete LTR showed that all of the strains from French Guiana belonged to the cosmopolitan subtype A. The similarities were greater between Amerindian and Creole strains than between Amerindian and Noir-Marron strains or than between Creole and Noir-Marron strains. Phylogenetic analysis showed two clusters: one of strains from Amerindians and Creoles, which belong to the transcontinental subgroup, and the other of strains from Noirs-Marrons, belonging to the West African subgroup. Our results suggest that the Amerindian HTLV-I strains are of African origin.


Assuntos
Anticorpos Anti-HTLV-I/sangue , Infecções por HTLV-I/etnologia , Infecções por HTLV-II/etnologia , Indígenas Sul-Americanos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Feminino , Guiana Francesa/epidemiologia , Genes env , Infecções por HTLV-I/virologia , Anticorpos Anti-HTLV-II/sangue , Infecções por HTLV-II/virologia , Vírus Linfotrópico T Tipo 1 Humano/genética , Vírus Linfotrópico T Tipo 1 Humano/isolamento & purificação , Vírus Linfotrópico T Tipo 2 Humano/genética , Vírus Linfotrópico T Tipo 2 Humano/isolamento & purificação , Humanos , Masculino , Pessoa de Meia-Idade , Epidemiologia Molecular , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Estudos Soroepidemiológicos , Sequências Repetidas Terminais/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...