Pesquisa | Secretaria de Estado da Saúde

CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database.

Alcock, Brian P; Huynh, William; Chalil, Romeo; Smith, Keaton W; Raphenya, Amogelang R; Wlodarski, Mateusz A; Edalatmand, Arman; Petkau, Aaron; Syed, Sohaib A; Tsang, Kara K; Baker, Sheridan J C; Dave, Mugdha; McCarthy, Madeline C; Mukiri, Karyn M; Nasir, Jalees A; Golbon, Bahar; Imtiaz, Hamna; Jiang, Xingjian; Kaur, Komal; Kwong, Megan; Liang, Zi Cheng; Niu, Keyu C; Shan, Prabakar; Yang, Jasmine Y J; Gray, Kristen L; Hoad, Gemma R; Jia, Baofeng; Bhando, Timsy; Carfrae, Lindsey A; Farha, Maya A; French, Shawn; Gordzevich, Rodion; Rachwalski, Kenneth; Tu, Megan M; Bordeleau, Emily; Dooley, Damion; Griffiths, Emma; Zubyk, Haley L; Brown, Eric D; Maguire, Finlay; Beiko, Robert G; Hsiao, William W L; Brinkman, Fiona S L; Van Domselaar, Gary; McArthur, Andrew G.

Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36263822

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.

Assuntos

Curadoria de Dados , Bases de Dados Factuais , Resistência Microbiana a Medicamentos , Aprendizado de Máquina , Antibacterianos/farmacologia , Genes Bacterianos , Funções Verossimilhança , Software , Anotação de Sequência Molecular

PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training.

Griffiths, Emma J; Mendes, Inês; Maguire, Finlay; Guthrie, Jennifer L; Wee, Bryan A; Schmedes, Sarah; Holt, Kathryn; Yadav, Chanchal; Cameron, Rhiannon; Barclay, Charlotte; Dooley, Damion; MacCannell, Duncan; Chindelevitch, Leonid; Karsch-Mizrachi, Ilene; Waheed, Zahra; Katz, Lee; Petit Iii, Robert; Dave, Mugdha; Oluniyi, Paul; Nasar, Muhammad Ibtisam; Raphenya, Amogelang; Hsiao, William W L; Timme, Ruth E.

Microb Genom ; 10(6)2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38860884

RESUMO

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA's GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.

Assuntos

Biologia Computacional , Saúde Pública , Controle de Qualidade , Humanos , Biologia Computacional/métodos , Disseminação de Informação/métodos , Reprodutibilidade dos Testes , Anotação de Sequência Molecular/métodos , Genômica/métodos , Software

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa