Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Bioinform Comput Biol ; 10(2): 1241005, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22809341

RESUMO

The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único , Projetos de Pesquisa
2.
Nucleic Acids Res ; 40(Database issue): D43-7, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22080548

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA , Análise de Sequência de RNA , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Anotação de Sequência Molecular , Software , Interface Usuário-Computador
3.
Nucleic Acids Res ; 39(Database issue): D28-31, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20972220

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.


Assuntos
Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Europa (Continente) , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular
4.
Nucleic Acids Res ; 38(Database issue): D39-45, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19906712

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL-EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Acesso à Informação , Algoritmos , Animais , Biologia Computacional/tendências , DNA/genética , Europa (Continente) , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Software
5.
Nucleic Acids Res ; 37(Database issue): D19-25, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18978013

RESUMO

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência/tendências , Internet , Integração de Sistemas
6.
Nat Biotechnol ; 26(5): 541-7, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18464787

RESUMO

With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.


Assuntos
Mapeamento Cromossômico/métodos , Mapeamento Cromossômico/normas , Bases de Dados Factuais/normas , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/normas , Teoria da Informação , Internacionalidade
7.
Nucleic Acids Res ; 36(Database issue): D5-12, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18039715

RESUMO

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA , Animais , Arquivos , Genômica , Internet
8.
Nucleic Acids Res ; 35(Database issue): D16-20, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17148479

RESUMO

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk/. Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos/tendências , Internet , Interface Usuário-Computador
9.
OMICS ; 10(2): 127-37, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16901217

RESUMO

Fundamental biological processes can now be studied by applying the full range of OMICS technologies (genomics, transcriptomics, proteomics, metabolomics, and beyond) to the same biological sample. Clearly, it would be desirable if the concept of sample were shared among these technologies, especially as up until the time a biological sample is prepared for use in a specific OMICS assay, its description is inherently technology independent. Sharing a common informatic representation would encourage data sharing (rather than data replication), thereby reducing redundant data capture and the potential for error. This would result in a significant degree of harmonization across different OMICS data standardization activities, a task that is critical if we are to integrate data from these different data sources. Here, we review the current concept of sample in OMICS technologies as it is being dealt with by different OMICS standardization initiatives and discuss the special role that the newly formed Genomic Standards Consortium (GSC) might have to play in this domain.


Assuntos
Bases de Dados de Ácidos Nucleicos/normas , Genoma Humano , Genoma , Genômica/normas , Proteoma/genética , Proteômica/normas , Animais , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/normas
10.
Nucleic Acids Res ; 34(Database issue): D10-5, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381823

RESUMO

The EMBL Nucleotide Sequence Database (www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a comprehensive set of publicly available nucleotide sequence and annotation, freely accessible to all. Maintained in collaboration with partners DDBJ and GenBank, coverage includes whole genome sequencing project data, directly submitted sequence, sequence recorded in support of patent applications and much more. The database continues to offer submission tools, data retrieval facilities and user support. In 2005, the volume of data offered has continued to grow exponentially. In addition to the newly presented data, the database encompasses a range of new data types generated by novel technologies, offers enhanced presentation and searchability of the data and has greater integration with other data resources offered at the EBI and elsewhere. In stride with these developing data types, the database has continued to develop submission and retrieval tools to maximise the information content of submitted data and to offer the simplest possible submission routes for data producers. New developments, the submission process, data retrieval and access to support are presented in this paper, along with links to sources of further information.


Assuntos
Bases de Dados de Ácidos Nucleicos , Animais , Sequência de Bases , Genômica , Internet , Software , Interface Usuário-Computador
11.
Nucleic Acids Res ; 33(Database issue): D29-33, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608199

RESUMO

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos/tendências , Internet , Interface Usuário-Computador
12.
Nucleic Acids Res ; 33(Database issue): D297-302, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608201

RESUMO

Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.


Assuntos
Bases de Dados Genéticas , Genômica , Proteômica , DNA Arqueal/química , DNA Bacteriano/química , Internet , Integração de Sistemas , Interface Usuário-Computador
13.
Nucleic Acids Res ; 32(Database issue): D27-30, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681351

RESUMO

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.


Assuntos
Bases de Dados de Ácidos Nucleicos , Animais , Europa (Continente) , Genômica , Humanos , Armazenamento e Recuperação da Informação , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA