Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 69
Filtrar
1.
Nucleic Acids Res ; 52(D1): D762-D769, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962425

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.


Assuntos
Archaea , Bactérias , Bases de Dados de Ácidos Nucleicos , Metagenoma , Archaea/genética , Bactérias/genética , Bases de Dados de Ácidos Nucleicos/normas , Bases de Dados de Ácidos Nucleicos/tendências , Genoma Arqueal/genética , Genoma Bacteriano/genética , Internet , Anotação de Sequência Molecular , Proteínas/genética
2.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Assuntos
Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software
3.
J Bacteriol ; 206(1): e0017323, 2024 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-38084967

RESUMO

The LPXTG protein-sorting signal, found in surface proteins of various Gram-positive pathogens, was the founding member of a growing panel of prokaryotic small C-terminal sorting domains. Sortase A cleaves LPXTG, exosortases (XrtA and XrtB) cleave the PEP-CTERM sorting signal, archaeosortase A cleaves PGF-CTERM, and rhombosortase cleaves GlyGly-CTERM domains. Four sorting signal domains without previously known processing proteases are the MYXO-CTERM, JDVT-CTERM, Synerg-CTERM, and CGP-CTERM domains. These exhibit the standard tripartite architecture of a short signature motif, a hydrophobic transmembrane segment, and an Arg-rich cluster. Each has an invariant cysteine in its signature motif. Computational evidence strongly suggests that each of these four Cys-containing sorting signals is processed, at least in part, by a cognate family of glutamic-type intramembrane endopeptidases related to the eukaryotic type II CAAX-processing protease Rce1. For the MYXO-CTERM sorting signals of different lineages, their sorting enzymes, called myxosortases, include MrtX (MXAN_2755 in Myxococcus xanthus), MrtC, and MrtP, all with radically different N-terminal domains but with a conserved core. Related predicted sorting enzymes were also identified for JDVT-CTERM (MrtJ), Synerg-CTERM (MrtS), and CGP-CTERM (MrtA). This work establishes a major new family of protein-sorting housekeeping endopeptidases contributing to the surface attachment of proteins in prokaryotes. IMPORTANCE Homologs of the eukaryotic type II CAAX-box protease Rce1, a membrane-embedded endopeptidase found in yeast and human ER and involved in sorting proteins to their proper cellular locations, are abundant in prokaryotes but not well understood there. This bioinformatics paper identifies several subgroups of the family as cognate endopeptidases for four protein-sorting signals processed by previously unknown machinery. Sorting signals with newly identified processing enzymes include three novel ones, but also MYXO-CTERM, which had been the focus of previous experimental work in the model fruiting and gliding bacterium Myxococcus xanthus. The new findings will substantially improve our understanding of Cys-containing C-terminal protein-sorting signals and of protein trafficking generally in bacteria and archaea.


Assuntos
Cisteína , Peptídeo Hidrolases , Humanos , Cisteína/metabolismo , Transporte Proteico , Peptídeo Hidrolases/metabolismo , Proteínas de Membrana/metabolismo , Bactérias/metabolismo , Saccharomyces cerevisiae
4.
J Bacteriol ; 205(1): e0025922, 2023 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-36598231

RESUMO

The bioinformatics of a nine-gene locus, designated selenocysteine-assisted organometallic (SAO), was investigated after identifying six new selenoprotein families and constructing hidden Markov models (HMMs) that find and annotate members of those families. Four are selenoproteins in most SAO loci, including Clostridium difficile. They include two ABC transporter subunits, namely, permease SaoP, with selenocysteine (U) at the channel-gating position, and substrate-binding subunit SaoB. Cytosolic selenoproteins include SaoL, homologous to MerB organomercurial lyases from mercury resistance loci, and SaoT, related to thioredoxins. SaoL, SaoB, and surface protein SaoC (an occasional selenoprotein) share an unusual CU dipeptide motif, which is something rare in selenoproteins but found in selenoprotein variants of mercury resistance transporter subunit MerT. A nonselenoprotein, SaoE, shares homology with Cu/Zn efflux and arsenical efflux pumps. The organization of the SAO system suggests substrate interaction with surface-exposed selenoproteins, followed by import, metabolism that may cleave a carbon-to-heavy metal bond, and finally metal efflux. A novel type of mercury resistance is possible, but SAO instead may support fermentative metabolism, with selenocysteine-mediated formation of organometallic intermediates, followed by import, degradation, and metal efflux. Phylogenetic profiling shows SOA loci consistently co-occur with Stickland fermentation markers but even more consistently with 8Fe-9S cofactor-type double-cubane proteins. Hypothesizing that the SAO system forms organometallic intermediates, we investigated the known methylmercury formation protein families HgcA and HgcB. Both families contained overlooked selenoproteins. Most HgcAs have a CU motif N terminal to their previously accepted start sites. Seeking additional rare and overlooked selenoproteins may help reveal more cryptic aspects of microbial biochemistry. IMPORTANCE This work adds 8 novel prokaryotic selenoproteins to the 80 or so families previously known. It describes the SAO (selenocysteine-assisted organometallic) locus, with the most selenoproteins of any known system. The rare CU motif recurs throughout, suggesting the formation and degradation of organometallic compounds. That suggestion triggered a reexamination of HgcA and HcgB, which are methylmercury formation proteins that can adversely impact food safety. Both are selenoproteins, once corrected, with HgcA again showing a CU motif. The SAO system is plausibly a mercury resistance locus for selenium-dependent anaerobes. But instead, it may exploit heavy metals as cofactors in organometallic intermediate-forming pathways that circumvent high activation energies and facilitate the breakdown of otherwise poorly accessible nutrients. SAO could provide an edge that helps Clostridium difficile, an important pathogen, establish disease.


Assuntos
Clostridioides difficile , Mercúrio , Compostos de Metilmercúrio , Clostridioides difficile/genética , Clostridioides difficile/metabolismo , Selenocisteína/metabolismo , Filogenia , Selenoproteínas/genética , Selenoproteínas/metabolismo
5.
Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270901

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Proteínas/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Genômica/métodos , Internet , Proteínas/classificação , Interface Usuário-Computador
6.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência
7.
Antimicrob Agents Chemother ; 66(4): e0033322, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-35380458

RESUMO

Assigning names to ß-lactamase variants has been inconsistent and has led to confusion in the published literature. The common availability of whole genome sequencing has resulted in an exponential growth in the number of new ß-lactamase genes. In November 2021 an international group of ß-lactamase experts met virtually to develop a consensus for the way naturally-occurring ß-lactamase genes should be named. This document formalizes the process for naming novel ß-lactamases, followed by their subsequent publication.


Assuntos
Inibidores de beta-Lactamases , beta-Lactamases , Consenso , beta-Lactamases/genética
8.
Artigo em Inglês | MEDLINE | ID: mdl-31712217

RESUMO

Unlike for classes A and B, a standardized amino acid numbering scheme has not been proposed for the class C (AmpC) ß-lactamases, which complicates communication in the field. Here, we propose a scheme developed through a collaborative approach that considers both sequence and structure, preserves traditional numbering of catalytically important residues (Ser64, Lys67, Tyr150, and Lys315), is adaptable to new variants or enzymes yet to be discovered and includes a variation for genetic and epidemiological applications.


Assuntos
Proteínas de Bactérias/classificação , Bactérias Gram-Negativas/genética , Bactérias Gram-Positivas/genética , Mutação , Terminologia como Assunto , Resistência beta-Lactâmica/genética , beta-Lactamases/classificação , Sequência de Aminoácidos , Antibacterianos/química , Antibacterianos/farmacologia , Proteínas de Bactérias/antagonistas & inibidores , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Expressão Gênica , Bactérias Gram-Negativas/efeitos dos fármacos , Bactérias Gram-Negativas/enzimologia , Bactérias Gram-Positivas/efeitos dos fármacos , Bactérias Gram-Positivas/enzimologia , Cooperação Internacional , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Inibidores de beta-Lactamases/química , Inibidores de beta-Lactamases/farmacologia , beta-Lactamases/genética , beta-Lactamases/metabolismo , beta-Lactamas/química , beta-Lactamas/farmacologia
9.
Nucleic Acids Res ; 46(D1): D851-D860, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29112715

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Assuntos
Curadoria de Dados , Bases de Dados de Ácidos Nucleicos , Genoma , Anotação de Sequência Molecular , Células Procarióticas , Archaea/genética , Bactérias/genética , Bases de Dados de Proteínas , Eucariotos/genética , Previsões , Humanos , Homologia de Sequência , Software , Vírus/genética
10.
Artigo em Inglês | MEDLINE | ID: mdl-31427293

RESUMO

Antimicrobial resistance (AMR) is a major public health problem that requires publicly available tools for rapid analysis. To identify AMR genes in whole-genome sequences, the National Center for Biotechnology Information (NCBI) has produced AMRFinder, a tool that identifies AMR genes using a high-quality curated AMR gene reference database. The Bacterial Antimicrobial Resistance Reference Gene Database consists of up-to-date gene nomenclature, a set of hidden Markov models (HMMs), and a curated protein family hierarchy. Currently, it contains 4,579 antimicrobial resistance proteins and more than 560 HMMs. Here, we describe AMRFinder and its associated database. To assess the predictive ability of AMRFinder, we measured the consistency between predicted AMR genotypes from AMRFinder and resistance phenotypes of 6,242 isolates from the National Antimicrobial Resistance Monitoring System (NARMS). This included 5,425 Salmonella enterica, 770 Campylobacter spp., and 47 Escherichia coli isolates phenotypically tested against various antimicrobial agents. Of 87,679 susceptibility tests performed, 98.4% were consistent with predictions. To assess the accuracy of AMRFinder, we compared its gene symbol output with that of a 2017 version of ResFinder, another publicly available resistance gene detection system. Most gene calls were identical, but there were 1,229 gene symbol differences (8.8%) between them, with differences due to both algorithmic differences and database composition. AMRFinder missed 16 loci that ResFinder found, while ResFinder missed 216 loci that AMRFinder identified. Based on these results, AMRFinder appears to be a highly accurate AMR gene detection system.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa