Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 11: 238, 2010 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-20459810

RESUMEN

BACKGROUND: Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. RESULTS: Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. CONCLUSIONS: Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.


Asunto(s)
Bases de Datos Factuales , Genómica/métodos , Genotipo , Análisis de Secuencia de ADN/métodos , Genoma , Fenotipo
2.
Proc Natl Acad Sci U S A ; 102(12): 4542-7, 2005 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-15761058

RESUMEN

Homozygous deletions of recessive cancer genes and fragile sites are known to occur in human cancers. We identified 281 homozygous deletions in 636 cancer cell lines. Of these deletions, 86 were homozygous deletions of known recessive cancer genes, 17 were of sequenced common fragile sites, and 178 were in genomic regions that do not overlap known recessive oncogenes or fragile sites ("unexplained" homozygous deletions). Some cancer cell lines have multiple homozygous deletions whereas others have none, suggesting intrinsic variation in the tendency to develop this type of genetic abnormality (P < 0.001). The 178 unexplained homozygous deletions clustered into 131 genomic regions, 27 of which exhibit homozygous deletions in more than one cancer cell line. This degree of clustering indicates that the genomic positions of the unexplained homozygous deletions are not randomly determined (P < 0.001). Many homozygous deletions, including those that are in multiple clusters, do not overlap known genes and appear to be in intergenic DNA. Therefore, to elucidate further the pathogenesis of homozygous deletions in cancer, we investigated the genome landscape within unexplained homozygous deletions. The gene count within homozygous deletions is low compared with the rest of the genome. There are also fewer short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), and low-copy-number repeats (LCRs). However, DNA within homozygous deletions has higher flexibility. These features may signal the presence of currently unrecognized zones of susceptibility to DNA rearrangement. They may also reflect a tendency to reduce the adverse effects of homozygous deletions by minimizing the number of genes removed.


Asunto(s)
Eliminación de Gen , Genes Recesivos , Genoma Humano , Homocigoto , Neoplasias/genética , Línea Celular Tumoral , Sitios Frágiles del Cromosoma/genética , ADN de Neoplasias/genética , Humanos , Elementos de Nucleótido Esparcido Largo , Secuencias Repetitivas de Ácidos Nucleicos , Elementos de Nucleótido Esparcido Corto
3.
Genome Res ; 14(5): 929-33, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123588

RESUMEN

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means for programmers to access these data. By encapsulating the underlying database structure, the libraries present end users with a simple, abstract interface to a complex data model. Programs that use the libraries rather than SQL to access the data are unaffected by most schema changes. The architecture of the core software libraries, the schema, and the factors influencing their design are described. All code and data are freely available.


Asunto(s)
Biología Computacional , Programas Informáticos , Animales , Bases de Datos Genéticas , Humanos , Diseño de Software
4.
Genome Res ; 14(5): 934-41, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123589

RESUMEN

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.


Asunto(s)
Biología Computacional/métodos , Secuencia de Bases/genética , ADN/genética , Bases de Datos Genéticas/normas , Lenguajes de Programación , Proteínas/clasificación , Programas Informáticos , Diseño de Software
5.
Genome Res ; 14(5): 925-8, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15078858

RESUMEN

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Asunto(s)
Biología Computacional/tendencias
6.
Genome Res ; 13(8): 1904-15, 2003 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-12869579

RESUMEN

We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Secuencia de Aminoácidos , Animales , Bases de Datos de Proteínas , Proteínas de Drosophila/genética , Humanos , Filogenia , Proteínas/genética , Proyectos de Investigación/tendencias , Diseño de Software , Takifugu/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...