Pesquisa | Portal Regional da BVS

1.

Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows.

Connor, Ryan; Shakya, Migun; Yarmosh, David A; Maier, Wolfgang; Martin, Ross; Bradford, Rebecca; Brister, J Rodney; Chain, Patrick S G; Copeland, Courtney A; di Iulio, Julia; Hu, Bin; Ebert, Philip; Gunti, Jonathan; Jin, Yumi; Katz, Kenneth S; Kochergin, Andrey; LaRosa, Tré; Li, Jiani; Li, Po-E; Lo, Chien-Chi; Rashid, Sujatha; Maiorova, Evguenia S; Xiao, Chunlin; Zalunin, Vadim; Purcell, Lisa; Pruitt, Kim D.

Viruses ; 16(3)2024 03 11.

Artigo em Inglês | MEDLINE | ID: mdl-38543795

RESUMO

Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Pandemias , Fluxo de Trabalho , Biologia Computacional

2.

The ATCC genome portal: 3,938 authenticated microbial reference genomes.

Nguyen, Scott V; Puthuveetil, Nikhita P; Petrone, Joseph R; Kirkland, Jade L; Gaffney, Kaitlyn; Tabron, Corina L; Wax, Noah; Duncan, James; King, Stephen; Marlow, Robert; Reese, Amy L; Yarmosh, David A; McConnell, Hannah H; Fernandes, Ana S; Bagnoli, John; Benton, Briana; Jacobs, Jonathan L.

Microbiol Resour Announc ; 13(2): e0104523, 2024 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-38289057

RESUMO

The ATCC Genome Portal (AGP, https://genomes.atcc.org/) is a database of authenticated genomes for bacteria, fungi, protists, and viruses held in ATCC's biorepository. It now includes 3,938 assemblies (253% increase) produced under ISO 9000 by ATCC. Here, we present new features and content added to the AGP for the research community.

3.

Towards increased accuracy and reproducibility in SARS-CoV-2 next generation sequence analysis for public health surveillance.

Connor, Ryan; Yarmosh, David A; Maier, Wolfgang; Shakya, Migun; Martin, Ross; Bradford, Rebecca; Brister, J Rodney; Chain, Patrick Sg; Copeland, Courtney A; di Iulio, Julia; Hu, Bin; Ebert, Philip; Gunti, Jonathan; Jin, Yumi; Katz, Kenneth S; Kochergin, Andrey; LaRosa, Tré; Li, Jiani; Li, Po-E; Lo, Chien-Chi; Rashid, Sujatha; Maiorova, Evguenia S; Xiao, Chunlin; Zalunin, Vadim; Pruitt, Kim D.

bioRxiv ; 2022 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-36380755

RESUMO

During the COVID-19 pandemic, SARS-CoV-2 surveillance efforts integrated genome sequencing of clinical samples to identify emergent viral variants and to support rapid experimental examination of genome-informed vaccine and therapeutic designs. Given the broad range of methods applied to generate new viral genomes, it is critical that consensus and variant calling tools yield consistent results across disparate pipelines. Here we examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARS-CoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative, a public-private partnership established to address the COVID-19 outbreak. Our results indicate that bioinformatic workflows can yield consensus genomes with different single nucleotide polymorphisms, insertions, and/or deletions even when using the same raw sequence input datasets. We introduce the use of a specific suite of parameters and protocols that greatly improves the agreement among pipelines developed by diverse organizations. Such consistency among bioinformatic pipelines is fundamental to SARS-CoV-2 and future pathogen surveillance efforts. The application of analysis standards is necessary to more accurately document phylogenomic trends and support data-driven public health responses.

4.

Comparative Analysis and Data Provenance for 1,113 Bacterial Genome Assemblies.

Yarmosh, David A; Lopera, Juan G; Puthuveetil, Nikhita P; Combs, Patrick Ford; Reese, Amy L; Tabron, Corina; Pierola, Amanda E; Duncan, James; Greenfield, Samuel R; Marlow, Robert; King, Stephen; Riojas, Marco A; Bagnoli, John; Benton, Briana; Jacobs, Jonathan L.

mSphere ; 7(3): e0007722, 2022 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-35491842

RESUMO

The availability of public genomics data has become essential for modern life sciences research, yet the quality, traceability, and curation of these data have significant impacts on a broad range of microbial genomics research. While microbial genome databases such as NCBI's RefSeq database leverage the scalability of crowd sourcing for growth, genomics data provenance and authenticity of the source materials used to produce data are not strict requirements. Here, we describe the de novo assembly of 1,113 bacterial genome references produced from authenticated materials sourced from the American Type Culture Collection (ATCC), each with full genomics data provenance relating to bioinformatics methods, quality control, and passage history. Comparative genomics analysis of ATCC standard reference genomes (ASRGs) revealed significant issues with regard to NCBI's RefSeq bacterial genome assemblies related to completeness, mutations, structure, strain metadata, and gaps in traceability to the original biological source materials. Nearly half of RefSeq assemblies lack details on sample source information, sequencing technology, or bioinformatics methods. Deep curation of these records is not within the scope of NCBI's core mission in supporting open science, which aims to collect sequence records that are submitted by the public. Nonetheless, we propose that gaps in metadata accuracy and data provenance represent an "elephant in the room" for microbial genomics research. Effectively addressing these issues will require raising the level of accountability for data depositors and acknowledging the need for higher expectations of quality among the researchers whose research depends on accurate and attributable reference genome data. IMPORTANCE The traceability of microbial genomics data to authenticated physical biological materials is not a requirement for depositing these data into public genome databases. This creates significant risks for the reliability and data provenance of these important genomics research resources, the impact of which is not well understood. We sought to investigate this by carrying out a comparative genomics study of 1,113 ATCC standard reference genomes (ASRGs) produced by ATCC from authenticated and traceable materials using the latest sequencing technologies. We found widespread discrepancies in genome assembly quality, genetic variability, and the quality and completeness of the associated metadata among hundreds of reference genomes for ATCC strains found in NCBI's RefSeq database. We present a comparative analysis of de novo-assembled ASRGs, their respective metadata, and variant analysis using RefSeq genomes as a reference. Although assembly quality in RefSeq has generally improved over time, we found that significant quality issues remain, especially as related to genomic data and metadata provenance. Our work highlights the importance of data authentication and provenance for the microbial genomics community, and underscores the risks of ignoring this issue in the future.

Assuntos

Bases de Dados Genéticas , Genômica , Genoma Bacteriano , Genoma Microbiano , Reprodutibilidade dos Testes

5.

The ATCC Genome Portal: Microbial Genome Reference Standards with Data Provenance.

Benton, Briana; King, Stephen; Greenfield, Samuel R; Puthuveetil, Nikhita; Reese, Amy L; Duncan, James; Marlow, Robert; Tabron, Corina; Pierola, Amanda E; Yarmosh, David A; Combs, Patrick Ford; Riojas, Marco A; Bagnoli, John; Jacobs, Jonathan L.

Microbiol Resour Announc ; 10(47): e0081821, 2021 Nov 24.

Artigo em Inglês | MEDLINE | ID: mdl-34817215

RESUMO

Lack of data provenance negatively impacts scientific reproducibility and the reliability of genomic data. The ATCC Genome Portal (https://genomes.atcc.org) addresses this by providing data provenance information for microbial whole-genome assemblies originating from authenticated biological materials. To date, we have sequenced 1,579 complete genomes, including 466 type strains and 1,156 novel genomes.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA