Búsqueda | Portal Regional de la BVS

Rapid automated validation, annotation and publication of SARS-CoV-2 sequences to GenBank.

Underwood, Beverly A; Yankie, Linda; Nawrocki, Eric P; Palanigobu, Vasuki; Gotvyanskyy, Sergiy; Calhoun, Vincent C; Kornbluh, Michael; Smith, Thomas G; Fleischmann, Lydia; Sinyakov, Denis; Bollin, Colleen J; Karsch-Mizrachi, Ilene.

Database (Oxford) ; 20222022 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-35230423

RESUMEN

Rapid response to the current coronavirus disease 2019 (COVID-19) pandemic requires fast dissemination of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequence data in order to align diagnostic tests and vaccines with the natural evolution of the virus as it spreads through the world. To facilitate this, the National Library of Medicine's National Center for Biotechnology Information developed an automated pipeline for the deposition and quick processing of SARS-CoV-2 genome assemblies into GenBank for the user community. The pipeline ensures the collection of contextual information about the virus source, assesses sequence quality and annotates descriptive biological features, such as protein-coding regions and mature peptides. The process promotes standardized nomenclature and creates and publishes fully processed GenBank files within minutes of deposition. The software has processed and published 982 454 annotated SARS-CoV-2 sequences, as of 21 October 2021. This development addresses the needs of the scientific community as the sequencing of SARS-CoV-2 genomes increases and will facilitate unrestricted access to and usability of SARS-CoV-2 genomic sequence data, providing important reagents for scientific and public health activities in response to the COVID-19 pandemic. Database URL https://submit.ncbi.nlm.nih.gov/sarscov2/genbank/.

Asunto(s)

COVID-19 , SARS-CoV-2 , COVID-19/epidemiología , COVID-19/genética , Bases de Datos de Ácidos Nucleicos , Genoma Viral/genética , Humanos , Pandemias , SARS-CoV-2/genética

NCBI's Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements.

Connor, Ryan; Brister, Rodney; Buchmann, Jan P; Deboutte, Ward; Edwards, Rob; Martí-Carreras, Joan; Tisza, Mike; Zalunin, Vadim; Andrade-Martínez, Juan; Cantu, Adrian; D'Amour, Michael; Efremov, Alexandre; Fleischmann, Lydia; Forero-Junco, Laura; Garmaeva, Sanzhima; Giluso, Melissa; Glickman, Cody; Henderson, Margaret; Kellman, Benjamin; Kristensen, David; Leubsdorf, Carl; Levi, Kyle; Levi, Shane; Pakala, Suman; Peddu, Vikas; Ponsero, Alise; Ribeiro, Eldred; Roy, Farrah; Rutter, Lindsay; Saha, Surya; Shakya, Migun; Shean, Ryan; Miller, Matthew; Tully, Benjamin; Turkington, Christopher; Youens-Clark, Ken; Vanmechelen, Bert; Busby, Ben.

Genes (Basel) ; 10(9)2019 09 16.

Artículo en Inglés | MEDLINE | ID: mdl-31527408

RESUMEN

A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.

Asunto(s)

Nube Computacional/normas , Genoma Viral , Metagenoma , Metagenómica/métodos , Macrodatos , Genoma Humano , Humanos , Metagenómica/normas , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA