Pesquisa | BVS Integralidade em Saúde

GenBank 2024 Update.

Sayers, Eric W; Cavanaugh, Mark; Clark, Karen; Pruitt, Kim D; Sherry, Stephen T; Yankie, Linda; Karsch-Mizrachi, Ilene.

Nucleic Acids Res ; 52(D1): D134-D137, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37889039

RESUMO

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 25 trillion base pairs from over 3.7 billion nucleotide sequences for 557 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include policies for including spatio-temporal metadata, clarified documentation for GenBank data processing, enhanced foreign contamination screening tools, new processes in the Submission Portal, migration of Entrez Genome and Assembly displays into NCBI Datasets, and the impending retirement of tbl2asn, replaced by table2asn.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genômica , Sequência de Bases , Internet , Humanos

GenBank 2023 update.

Sayers, Eric W; Cavanaugh, Mark; Clark, Karen; Pruitt, Kim D; Sherry, Stephen T; Yankie, Linda; Karsch-Mizrachi, Ilene.

Nucleic Acids Res ; 51(D1): D141-D144, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350640

RESUMO

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 19.6 trillion base pairs from over 2.9 billion nucleotide sequences for 504 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include resources for data from the SARS-CoV-2 virus, NCBI Datasets, BLAST ClusteredNR, the Submission Portal, table2asn, a Foreign Contamination Screening tool and BioSample.

Assuntos

Bases de Dados de Ácidos Nucleicos , Humanos , COVID-19/genética , Genômica , SARS-CoV-2/genética

VADR: validation and annotation of virus sequence submissions to GenBank.

Schäffer, Alejandro A; Hatcher, Eneida L; Yankie, Linda; Shonkwiler, Lara; Brister, J Rodney; Karsch-Mizrachi, Ilene; Nawrocki, Eric P.

BMC Bioinformatics ; 21(1): 211, 2020 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-32448124

RESUMO

BACKGROUND: GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions. RESULTS: We developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of "alerts" that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank's submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention. Unlike the previous submission-checking system, VADR is freely available (https://github.com/nawrockie/vadr) for local installation and use. VADR has been used for Norovirus submissions since May 2018 and for Dengue virus submissions since January 2019. Since March 2020, VADR has also been used to check SARS-CoV-2 sequence submissions. Other viruses with high numbers of submissions will be added incrementally. CONCLUSION: VADR improves the speed with which non-flu virus submissions to GenBank can be checked and improves the content and quality of the GenBank annotations. The availability and portability of the software allow researchers to run the GenBank checks prior to submitting their viral sequences, and thereby gain confidence that their submissions will be accepted immediately without the need to correspond with GenBank staff. Reciprocally, the adoption of VADR frees GenBank staff to spend more time on services other than checking routine viral sequence submissions.

Assuntos

Betacoronavirus , Infecções por Coronavirus , Bases de Dados de Ácidos Nucleicos , Anotação de Sequência Molecular , Pandemias , Pneumonia Viral , Software , Betacoronavirus/genética , COVID-19 , Infecções por Coronavirus/genética , Vírus de DNA , Genômica , Humanos , Anotação de Sequência Molecular/normas , Pneumonia Viral/genética , SARS-CoV-2 , Vírus

Influenza sequence validation and annotation using VADR.

Calhoun, Vincent C; Hatcher, Eneida L; Yankie, Linda; Nawrocki, Eric P.

bioRxiv ; 2024 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-38712272

RESUMO

Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLAN has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions, and has been publicly available as a webserver but not as a standalone tool. VADR is a general sequence validation and annotation software package used by GenBank for Norovirus, Dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use.

The completion of the Mammalian Gene Collection (MGC).

Temple, Gary; Gerhard, Daniela S; Rasooly, Rebekah; Feingold, Elise A; Good, Peter J; Robinson, Cristen; Mandich, Allison; Derge, Jeffrey G; Lewis, Jeanne; Shoaf, Debonny; Collins, Francis S; Jang, Wonhee; Wagner, Lukas; Shenmen, Carolyn M; Misquitta, Leonie; Schaefer, Carl F; Buetow, Kenneth H; Bonner, Tom I; Yankie, Linda; Ward, Ming; Phan, Lon; Astashyn, Alex; Brown, Garth; Farrell, Catherine; Hart, Jennifer; Landrum, Melissa; Maidak, Bonnie L; Murphy, Michael; Murphy, Terence; Rajput, Bhanu; Riddick, Lillian; Webb, David; Weber, Janet; Wu, Wendy; Pruitt, Kim D; Maglott, Donna; Siepel, Adam; Brejova, Brona; Diekhans, Mark; Harte, Rachel; Baertsch, Robert; Kent, Jim; Haussler, David; Brent, Michael; Langton, Laura; Comstock, Charles L G; Stevens, Michael; Wei, Chaochun; van Baren, Marijke J; Salehi-Ashtiani, Kourosh.

Genome Res ; 19(12): 2324-33, 2009 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-19767417

RESUMO

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.

Assuntos

Clonagem Molecular/métodos , Biologia Computacional/métodos , DNA Complementar/genética , Biblioteca Gênica , Genes/genética , Mamíferos/genética , Animais , DNA/biossíntese , Humanos , Camundongos , National Institutes of Health (U.S.) , Ratos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Estados Unidos

Rapid automated validation, annotation and publication of SARS-CoV-2 sequences to GenBank.

Underwood, Beverly A; Yankie, Linda; Nawrocki, Eric P; Palanigobu, Vasuki; Gotvyanskyy, Sergiy; Calhoun, Vincent C; Kornbluh, Michael; Smith, Thomas G; Fleischmann, Lydia; Sinyakov, Denis; Bollin, Colleen J; Karsch-Mizrachi, Ilene.

Database (Oxford) ; 20222022 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-35230423

RESUMO

Rapid response to the current coronavirus disease 2019 (COVID-19) pandemic requires fast dissemination of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequence data in order to align diagnostic tests and vaccines with the natural evolution of the virus as it spreads through the world. To facilitate this, the National Library of Medicine's National Center for Biotechnology Information developed an automated pipeline for the deposition and quick processing of SARS-CoV-2 genome assemblies into GenBank for the user community. The pipeline ensures the collection of contextual information about the virus source, assesses sequence quality and annotates descriptive biological features, such as protein-coding regions and mature peptides. The process promotes standardized nomenclature and creates and publishes fully processed GenBank files within minutes of deposition. The software has processed and published 982 454 annotated SARS-CoV-2 sequences, as of 21 October 2021. This development addresses the needs of the scientific community as the sequencing of SARS-CoV-2 genomes increases and will facilitate unrestricted access to and usability of SARS-CoV-2 genomic sequence data, providing important reagents for scientific and public health activities in response to the COVID-19 pandemic. Database URL https://submit.ncbi.nlm.nih.gov/sarscov2/genbank/.

Assuntos

COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , COVID-19/genética , Bases de Dados de Ácidos Nucleicos , Genoma Viral/genética , Humanos , Pandemias , SARS-CoV-2/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa