|

1.

Content discovery and retrieval services at the European Nucleotide Archive.

Silvester, Nicole; Alako, Blaise; Amid, Clara; Cerdeño-Tárraga, Ana; Cleland, Iain; Gibson, Richard; Goodgame, Neil; Ten Hoopen, Petra; Kay, Simon; Leinonen, Rasko; Li, Weizhong; Liu, Xin; Lopez, Rodrigo; Pakseresht, Nima; Pallreddy, Swapna; Plaister, Sheila; Radhakrishnan, Rajesh; Rossello, Marc; Senf, Alexander; Smirnov, Dmitriy; Toribio, Ana Luisa; Vaughan, Daniel; Zalunin, Vadim; Cochrane, Guy.

Nucleic Acids Res ; 43(Database issue): D23-9, 2015 Jan.

Article En | MEDLINE | ID: mdl-25404130

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary resource for nucleotide sequence information. With the growing volume and diversity of public sequencing data comes the need for increased sophistication in data organisation, presentation and search services so as to maximise its discoverability and usability. In response to this, ENA has been introducing and improving checklists for use during submission and expanding its search facilities to provide targeted search results. Here, we give a brief update on ENA content and some major developments undertaken in data submission services during 2014. We then describe in more detail the services we offer for data discovery and retrieval.

Databases, Nucleic Acid , Base Sequence , Genomics , Molecular Sequence Annotation , Sequence Analysis

2.

Assembly information services in the European Nucleotide Archive.

Pakseresht, Nima; Alako, Blaise; Amid, Clara; Cerdeño-Tárraga, Ana; Cleland, Iain; Gibson, Richard; Goodgame, Neil; Gur, Tamer; Jang, Mikyung; Kay, Simon; Leinonen, Rasko; Li, Weizhong; Liu, Xin; Lopez, Rodrigo; McWilliam, Hamish; Oisel, Arnaud; Pallreddy, Swapna; Plaister, Sheila; Radhakrishnan, Rajesh; Rivière, Stephane; Rossello, Marc; Senf, Alexander; Silvester, Nicole; Smirnov, Dmitriy; Squizzato, Silvano; ten Hoopen, Petra; Toribio, Ana Luisa; Vaughan, Daniel; Zalunin, Vadim; Cochrane, Guy.

Nucleic Acids Res ; 42(Database issue): D38-43, 2014 Jan.

Article En | MEDLINE | ID: mdl-24214989

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the world public domain nucleotide sequence data output. ENA content covers a spectrum of data types including raw reads, assembly data and functional annotation. ENA has faced a dramatic growth in genome assembly submission rates, data volumes and complexity of datasets. This has prompted a broad reworking of assembly submission services, for which we now reach the end of a major programme of work and many enhancements have already been made available over the year to components of the submission service. In this article, we briefly review ENA content and growth over 2013, describe our rapidly developing services for genome assembly information and outline further major developments over the last year.

Databases, Nucleic Acid , Genomics , Europe , Internet

3.

Facing growth in the European Nucleotide Archive.

Cochrane, Guy; Alako, Blaise; Amid, Clara; Bower, Lawrence; Cerdeño-Tárraga, Ana; Cleland, Iain; Gibson, Richard; Goodgame, Neil; Jang, Mikyung; Kay, Simon; Leinonen, Rasko; Lin, Xiu; Lopez, Rodrigo; McWilliam, Hamish; Oisel, Arnaud; Pakseresht, Nima; Pallreddy, Swapna; Park, Youngmi; Plaister, Sheila; Radhakrishnan, Rajesh; Rivière, Stephane; Rossello, Marc; Senf, Alexander; Silvester, Nicole; Smirnov, Dmitriy; Ten Hoopen, Petra; Toribio, Ana; Vaughan, Daniel; Zalunin, Vadim.

Nucleic Acids Res ; 41(Database issue): D30-5, 2013 Jan.

Article En | MEDLINE | ID: mdl-23203883

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) collects, maintains and presents comprehensive nucleic acid sequence and related information as part of the permanent public scientific record. Here, we provide brief updates on ENA content developments and major service enhancements in 2012 and describe in more detail two important areas of development and policy that are driven by ongoing growth in sequencing technologies. First, we describe the ENA data warehouse, a resource for which we provide a programmatic entry point to integrated content across the breadth of ENA. Second, we detail our plans for the deployment of CRAM data compression technology in ENA.

Base Sequence , Databases, Nucleic Acid , Data Compression , Genomics , High-Throughput Nucleotide Sequencing , Internet , User-Computer Interface

4.

Major submissions tool developments at the European Nucleotide Archive.

Amid, Clara; Birney, Ewan; Bower, Lawrence; Cerdeño-Tárraga, Ana; Cheng, Ying; Cleland, Iain; Faruque, Nadeem; Gibson, Richard; Goodgame, Neil; Hunter, Christopher; Jang, Mikyung; Leinonen, Rasko; Liu, Xin; Oisel, Arnaud; Pakseresht, Nima; Plaister, Sheila; Radhakrishnan, Rajesh; Reddy, Kethi; Rivière, Stephane; Rossello, Marc; Senf, Alexander; Smirnov, Dimitriy; Ten Hoopen, Petra; Vaughan, Daniel; Vaughan, Robert; Zalunin, Vadim; Cochrane, Guy.

Nucleic Acids Res ; 40(Database issue): D43-7, 2012 Jan.

Article En | MEDLINE | ID: mdl-22080548

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.

Databases, Nucleic Acid , Sequence Analysis, DNA , Sequence Analysis, RNA , Genomics , High-Throughput Nucleotide Sequencing , Internet , Molecular Sequence Annotation , Software , User-Computer Interface

5.

The European Nucleotide Archive.

Leinonen, Rasko; Akhtar, Ruth; Birney, Ewan; Bower, Lawrence; Cerdeno-Tárraga, Ana; Cheng, Ying; Cleland, Iain; Faruque, Nadeem; Goodgame, Neil; Gibson, Richard; Hoad, Gemma; Jang, Mikyung; Pakseresht, Nima; Plaister, Sheila; Radhakrishnan, Rajesh; Reddy, Kethi; Sobhany, Siamak; Ten Hoopen, Petra; Vaughan, Robert; Zalunin, Vadim; Cochrane, Guy.

Nucleic Acids Res ; 39(Database issue): D28-31, 2011 Jan.

Article En | MEDLINE | ID: mdl-20972220

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.

Base Sequence , Databases, Nucleic Acid , Europe , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation

6.

Improvements to services at the European Nucleotide Archive.

Leinonen, Rasko; Akhtar, Ruth; Birney, Ewan; Bonfield, James; Bower, Lawrence; Corbett, Matt; Cheng, Ying; Demiralp, Fehmi; Faruque, Nadeem; Goodgame, Neil; Gibson, Richard; Hoad, Gemma; Hunter, Christopher; Jang, Mikyung; Leonard, Steven; Lin, Quan; Lopez, Rodrigo; Maguire, Michael; McWilliam, Hamish; Plaister, Sheila; Radhakrishnan, Rajesh; Sobhany, Siamak; Slater, Guy; Ten Hoopen, Petra; Valentin, Franck; Vaughan, Robert; Zalunin, Vadim; Zerbino, Daniel; Cochrane, Guy.

Nucleic Acids Res ; 38(Database issue): D39-45, 2010 Jan.

Article En | MEDLINE | ID: mdl-19906712

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL-EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.

Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Access to Information , Algorithms , Animals , Computational Biology/trends , DNA/genetics , Europe , Humans , Information Storage and Retrieval/methods , Internet , Software

7.

Petabyte-scale innovations at the European Nucleotide Archive.

Cochrane, Guy; Akhtar, Ruth; Bonfield, James; Bower, Lawrence; Demiralp, Fehmi; Faruque, Nadeem; Gibson, Richard; Hoad, Gemma; Hubbard, Tim; Hunter, Christopher; Jang, Mikyung; Juhos, Szilveszter; Leinonen, Rasko; Leonard, Steven; Lin, Quan; Lopez, Rodrigo; Lorenc, Dariusz; McWilliam, Hamish; Mukherjee, Gaurab; Plaister, Sheila; Radhakrishnan, Rajesh; Robinson, Stephen; Sobhany, Siamak; Hoopen, Petra Ten; Vaughan, Robert; Zalunin, Vadim; Birney, Ewan.

Nucleic Acids Res ; 37(Database issue): D19-25, 2009 Jan.

Article En | MEDLINE | ID: mdl-18978013

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.

Databases, Nucleic Acid , Sequence Analysis/trends , Internet , Systems Integration

8.

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.

Cochrane, Guy; Akhtar, Ruth; Aldebert, Philippe; Althorpe, Nicola; Baldwin, Alastair; Bates, Kirsty; Bhattacharyya, Sumit; Bonfield, James; Bower, Lawrence; Browne, Paul; Castro, Matias; Cox, Tony; Demiralp, Fehmi; Eberhardt, Ruth; Faruque, Nadeem; Hoad, Gemma; Jang, Mikyung; Kulikova, Tamara; Labarga, Alberto; Leinonen, Rasko; Leonard, Steven; Lin, Quan; Lopez, Rodrigo; Lorenc, Dariusz; McWilliam, Hamish; Mukherjee, Gaurab; Nardone, Francesco; Plaister, Sheila; Robinson, Stephen; Sobhany, Siamak; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf; Hubbard, Tim; Birney, Ewan.

Nucleic Acids Res ; 36(Database issue): D5-12, 2008 Jan.

Article En | MEDLINE | ID: mdl-18039715

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.

Databases, Nucleic Acid , Sequence Analysis, DNA , Animals , Archives , Genomics , Internet

9.

EMBL Nucleotide Sequence Database in 2006.

Kulikova, Tamara; Akhtar, Ruth; Aldebert, Philippe; Althorpe, Nicola; Andersson, Mikael; Baldwin, Alastair; Bates, Kirsty; Bhattacharyya, Sumit; Bower, Lawrence; Browne, Paul; Castro, Matias; Cochrane, Guy; Duggan, Karyn; Eberhardt, Ruth; Faruque, Nadeem; Hoad, Gemma; Kanz, Carola; Lee, Charles; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Lorenc, Dariusz; McWilliam, Hamish; Mukherjee, Gaurab; Nardone, Francesco; Pastor, Maria Pilar Garcia; Plaister, Sheila; Sobhany, Siamak; Stoehr, Peter; Vaughan, Robert; Wu, Dan; Zhu, Weimin; Apweiler, Rolf.

Nucleic Acids Res ; 35(Database issue): D16-20, 2007 Jan.

Article En | MEDLINE | ID: mdl-17148479

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk/. Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.

Databases, Nucleic Acid , Base Sequence , Databases, Nucleic Acid/trends , Internet , User-Computer Interface