Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Bacteriol ; 206(1): e0017323, 2024 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-38084967

RESUMO

The LPXTG protein-sorting signal, found in surface proteins of various Gram-positive pathogens, was the founding member of a growing panel of prokaryotic small C-terminal sorting domains. Sortase A cleaves LPXTG, exosortases (XrtA and XrtB) cleave the PEP-CTERM sorting signal, archaeosortase A cleaves PGF-CTERM, and rhombosortase cleaves GlyGly-CTERM domains. Four sorting signal domains without previously known processing proteases are the MYXO-CTERM, JDVT-CTERM, Synerg-CTERM, and CGP-CTERM domains. These exhibit the standard tripartite architecture of a short signature motif, a hydrophobic transmembrane segment, and an Arg-rich cluster. Each has an invariant cysteine in its signature motif. Computational evidence strongly suggests that each of these four Cys-containing sorting signals is processed, at least in part, by a cognate family of glutamic-type intramembrane endopeptidases related to the eukaryotic type II CAAX-processing protease Rce1. For the MYXO-CTERM sorting signals of different lineages, their sorting enzymes, called myxosortases, include MrtX (MXAN_2755 in Myxococcus xanthus), MrtC, and MrtP, all with radically different N-terminal domains but with a conserved core. Related predicted sorting enzymes were also identified for JDVT-CTERM (MrtJ), Synerg-CTERM (MrtS), and CGP-CTERM (MrtA). This work establishes a major new family of protein-sorting housekeeping endopeptidases contributing to the surface attachment of proteins in prokaryotes. IMPORTANCE Homologs of the eukaryotic type II CAAX-box protease Rce1, a membrane-embedded endopeptidase found in yeast and human ER and involved in sorting proteins to their proper cellular locations, are abundant in prokaryotes but not well understood there. This bioinformatics paper identifies several subgroups of the family as cognate endopeptidases for four protein-sorting signals processed by previously unknown machinery. Sorting signals with newly identified processing enzymes include three novel ones, but also MYXO-CTERM, which had been the focus of previous experimental work in the model fruiting and gliding bacterium Myxococcus xanthus. The new findings will substantially improve our understanding of Cys-containing C-terminal protein-sorting signals and of protein trafficking generally in bacteria and archaea.


Assuntos
Cisteína , Peptídeo Hidrolases , Humanos , Cisteína/metabolismo , Transporte Proteico , Peptídeo Hidrolases/metabolismo , Proteínas de Membrana/metabolismo , Bactérias/metabolismo , Saccharomyces cerevisiae
2.
Nucleic Acids Res ; 52(D1): D762-D769, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962425

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.


Assuntos
Archaea , Bactérias , Bases de Dados de Ácidos Nucleicos , Metagenoma , Archaea/genética , Bactérias/genética , Bases de Dados de Ácidos Nucleicos/normas , Bases de Dados de Ácidos Nucleicos/tendências , Genoma Arqueal/genética , Genoma Bacteriano/genética , Internet , Anotação de Sequência Molecular , Proteínas/genética
3.
J Bacteriol ; 205(1): e0025922, 2023 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-36598231

RESUMO

The bioinformatics of a nine-gene locus, designated selenocysteine-assisted organometallic (SAO), was investigated after identifying six new selenoprotein families and constructing hidden Markov models (HMMs) that find and annotate members of those families. Four are selenoproteins in most SAO loci, including Clostridium difficile. They include two ABC transporter subunits, namely, permease SaoP, with selenocysteine (U) at the channel-gating position, and substrate-binding subunit SaoB. Cytosolic selenoproteins include SaoL, homologous to MerB organomercurial lyases from mercury resistance loci, and SaoT, related to thioredoxins. SaoL, SaoB, and surface protein SaoC (an occasional selenoprotein) share an unusual CU dipeptide motif, which is something rare in selenoproteins but found in selenoprotein variants of mercury resistance transporter subunit MerT. A nonselenoprotein, SaoE, shares homology with Cu/Zn efflux and arsenical efflux pumps. The organization of the SAO system suggests substrate interaction with surface-exposed selenoproteins, followed by import, metabolism that may cleave a carbon-to-heavy metal bond, and finally metal efflux. A novel type of mercury resistance is possible, but SAO instead may support fermentative metabolism, with selenocysteine-mediated formation of organometallic intermediates, followed by import, degradation, and metal efflux. Phylogenetic profiling shows SOA loci consistently co-occur with Stickland fermentation markers but even more consistently with 8Fe-9S cofactor-type double-cubane proteins. Hypothesizing that the SAO system forms organometallic intermediates, we investigated the known methylmercury formation protein families HgcA and HgcB. Both families contained overlooked selenoproteins. Most HgcAs have a CU motif N terminal to their previously accepted start sites. Seeking additional rare and overlooked selenoproteins may help reveal more cryptic aspects of microbial biochemistry. IMPORTANCE This work adds 8 novel prokaryotic selenoproteins to the 80 or so families previously known. It describes the SAO (selenocysteine-assisted organometallic) locus, with the most selenoproteins of any known system. The rare CU motif recurs throughout, suggesting the formation and degradation of organometallic compounds. That suggestion triggered a reexamination of HgcA and HcgB, which are methylmercury formation proteins that can adversely impact food safety. Both are selenoproteins, once corrected, with HgcA again showing a CU motif. The SAO system is plausibly a mercury resistance locus for selenium-dependent anaerobes. But instead, it may exploit heavy metals as cofactors in organometallic intermediate-forming pathways that circumvent high activation energies and facilitate the breakdown of otherwise poorly accessible nutrients. SAO could provide an edge that helps Clostridium difficile, an important pathogen, establish disease.


Assuntos
Clostridioides difficile , Mercúrio , Compostos de Metilmercúrio , Clostridioides difficile/genética , Clostridioides difficile/metabolismo , Selenocisteína/metabolismo , Filogenia , Selenoproteínas/genética , Selenoproteínas/metabolismo
4.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Assuntos
Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software
5.
Microb Genom ; 8(6)2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35675101

RESUMO

Antimicrobial resistance (AMR) is a significant public health threat. Low-cost whole-genome sequencing, which is often used in surveillance programmes, provides an opportunity to assess AMR gene content in these genomes using in silico approaches. A variety of bioinformatic tools have been developed to identify these genomic elements. Most of those tools rely on reference databases of nucleotide or protein sequences and collections of models and rules for analysis. While the tools are critical for the identification of AMR genes, the databases themselves also provide significant utility for researchers, for applications ranging from sequence analysis to information about AMR phenotypes. Additionally, these databases can be evaluated by domain experts and others to ensure their accuracy. Here we describe how we curate the genes, point mutations and blast rules, and hidden Markov models used in NCBI's AMRFinderPlus, along with the quality-control steps we take to ensure database quality. We also describe the web interfaces that display the full structure of the database and their newly developed cross-browser relationships. Then, using the Reference Gene Catalog as an example, we detail how the databases, rules and models are made publicly available, as well as how to access the software. In addition, as part of the Pathogen Detection system, we have analysed over 1 million publicly available genomes using AMRFinderPlus and its databases. We discuss how the computed analyses generated by those tools can be accessed through a web interface. Finally, we conclude with NCBI's plans to make these databases accessible over the long-term.


Assuntos
Biologia Computacional , Software , Sequência de Aminoácidos , Sequenciamento Completo do Genoma
6.
Antimicrob Agents Chemother ; 66(4): e0033322, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-35380458

RESUMO

Assigning names to ß-lactamase variants has been inconsistent and has led to confusion in the published literature. The common availability of whole genome sequencing has resulted in an exponential growth in the number of new ß-lactamase genes. In November 2021 an international group of ß-lactamase experts met virtually to develop a consensus for the way naturally-occurring ß-lactamase genes should be named. This document formalizes the process for naming novel ß-lactamases, followed by their subsequent publication.


Assuntos
Inibidores de beta-Lactamases , beta-Lactamases , Consenso , beta-Lactamases/genética
7.
Bioinform Adv ; 2(1): vbab043, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36699409

RESUMO

Motivation: The release of AlphaFold 2.0 has revolutionized our ability to determine protein structures from sequences. This tool also inadvertently opens up many unanticipated opportunities. In this article, we investigate the AntiFam resource, which contains 250 protein sequence families that we believe to be spurious protein translations. We would not expect proteins belonging to these families to fold into well-ordered globular structures. To test this hypothesis, we have attempted to computationally determine the structure of a representative sequence from all AntiFam 6.0 families. Results: Although the large majority of families showed no evidence of globular structure, we have identified one example for which a globular structure is predicted. Proteins in this AntiFam entry indeed seem likely to be bona fide proteins, based on additional considerations, and thus AlphaFold provides a useful quality control for the AntiFam database. Conversely, known spurious proteins offer useful set of quality controls for AlphaFold. We have identified a trend that the mean structure prediction confidence score pLDDT is higher for shorter sequences. Of the 131 AntiFam representative sequences <100 amino acids in length, AlphaFold predicts a mean pLDDT of 80 or greater for six of them. Thus, particular care should be taken when applying AlphaFold to short protein sequences. Availability and implementation: The AlphaFold predictions for representative sequences can be found at the following URL: https://drive.google.com/drive/folders/1u9OocRIAabGQn56GljoG1JTDAxjkY1ro. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

8.
Sci Rep ; 11(1): 12728, 2021 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-34135355

RESUMO

Antimicrobial resistance (AMR) is a significant public health threat. With the rise of affordable whole genome sequencing, in silico approaches to assessing AMR gene content can be used to detect known resistance mechanisms and potentially identify novel mechanisms. To enable accurate assessment of AMR gene content, as part of a multi-agency collaboration, NCBI developed a comprehensive AMR gene database, the Bacterial Antimicrobial Resistance Reference Gene Database and the AMR gene detection tool AMRFinder. Here, we describe the expansion of the Reference Gene Database, now called the Reference Gene Catalog, to include putative acid, biocide, metal, stress resistance genes, in addition to virulence genes and species-specific point mutations. Genes and point mutations are classified by broad functions, as well as more detailed functions. As we have expanded both the functional repertoire of identified genes and functionality, NCBI released a new version of AMRFinder, known as AMRFinderPlus. This new tool allows users the option to utilize only the core set of AMR elements, or include stress response and virulence genes, too. AMRFinderPlus can detect acquired genes and point mutations in both protein and nucleotide sequence. In addition, the evidence used to identify the gene has been expanded to include whether nucleotide or protein sequence was used, its location in the contig, and presence of an internal stop codon. These database improvements and functional expansions will enable increased precision in identifying AMR genes, linking AMR genotypes and phenotypes, and determining possible relationships between AMR, virulence, and stress response.


Assuntos
Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bases de Dados Genéticas , Farmacorresistência Bacteriana/genética , Genes Bacterianos , Bactérias/genética , Bactérias/patogenicidade , Farmacorresistência Bacteriana Múltipla/genética , Genoma Bacteriano , Mercúrio/farmacologia , Plasmídeos , Salmonella/efeitos dos fármacos , Salmonella/genética , Virulência/genética
9.
Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270901

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Proteínas/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Genômica/métodos , Internet , Proteínas/classificação , Interface Usuário-Computador
10.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência
12.
Nat Rev Microbiol ; 18(2): 67-83, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31857715

RESUMO

The number and diversity of known CRISPR-Cas systems have substantially increased in recent years. Here, we provide an updated evolutionary classification of CRISPR-Cas systems and cas genes, with an emphasis on the major developments that have occurred since the publication of the latest classification, in 2015. The new classification includes 2 classes, 6 types and 33 subtypes, compared with 5 types and 16 subtypes in 2015. A key development is the ongoing discovery of multiple, novel class 2 CRISPR-Cas systems, which now include 3 types and 17 subtypes. A second major novelty is the discovery of numerous derived CRISPR-Cas variants, often associated with mobile genetic elements that lack the nucleases required for interference. Some of these variants are involved in RNA-guided transposition, whereas others are predicted to perform functions distinct from adaptive immunity that remain to be characterized experimentally. The third highlight is the discovery of numerous families of ancillary CRISPR-linked genes, often implicated in signal transduction. Together, these findings substantially clarify the functional diversity and evolutionary history of CRISPR-Cas.


Assuntos
Archaea/genética , Bactérias/genética , Sistemas CRISPR-Cas/genética , Evolução Molecular , Regulação da Expressão Gênica em Archaea/fisiologia , Regulação Bacteriana da Expressão Gênica/fisiologia , Sistemas CRISPR-Cas/fisiologia
13.
Artigo em Inglês | MEDLINE | ID: mdl-31712217

RESUMO

Unlike for classes A and B, a standardized amino acid numbering scheme has not been proposed for the class C (AmpC) ß-lactamases, which complicates communication in the field. Here, we propose a scheme developed through a collaborative approach that considers both sequence and structure, preserves traditional numbering of catalytically important residues (Ser64, Lys67, Tyr150, and Lys315), is adaptable to new variants or enzymes yet to be discovered and includes a variation for genetic and epidemiological applications.


Assuntos
Proteínas de Bactérias/classificação , Bactérias Gram-Negativas/genética , Bactérias Gram-Positivas/genética , Mutação , Terminologia como Assunto , Resistência beta-Lactâmica/genética , beta-Lactamases/classificação , Sequência de Aminoácidos , Antibacterianos/química , Antibacterianos/farmacologia , Proteínas de Bactérias/antagonistas & inibidores , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Expressão Gênica , Bactérias Gram-Negativas/efeitos dos fármacos , Bactérias Gram-Negativas/enzimologia , Bactérias Gram-Positivas/efeitos dos fármacos , Bactérias Gram-Positivas/enzimologia , Cooperação Internacional , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Inibidores de beta-Lactamases/química , Inibidores de beta-Lactamases/farmacologia , beta-Lactamases/genética , beta-Lactamases/metabolismo , beta-Lactamas/química , beta-Lactamas/farmacologia
14.
Artigo em Inglês | MEDLINE | ID: mdl-31427293

RESUMO

Antimicrobial resistance (AMR) is a major public health problem that requires publicly available tools for rapid analysis. To identify AMR genes in whole-genome sequences, the National Center for Biotechnology Information (NCBI) has produced AMRFinder, a tool that identifies AMR genes using a high-quality curated AMR gene reference database. The Bacterial Antimicrobial Resistance Reference Gene Database consists of up-to-date gene nomenclature, a set of hidden Markov models (HMMs), and a curated protein family hierarchy. Currently, it contains 4,579 antimicrobial resistance proteins and more than 560 HMMs. Here, we describe AMRFinder and its associated database. To assess the predictive ability of AMRFinder, we measured the consistency between predicted AMR genotypes from AMRFinder and resistance phenotypes of 6,242 isolates from the National Antimicrobial Resistance Monitoring System (NARMS). This included 5,425 Salmonella enterica, 770 Campylobacter spp., and 47 Escherichia coli isolates phenotypically tested against various antimicrobial agents. Of 87,679 susceptibility tests performed, 98.4% were consistent with predictions. To assess the accuracy of AMRFinder, we compared its gene symbol output with that of a 2017 version of ResFinder, another publicly available resistance gene detection system. Most gene calls were identical, but there were 1,229 gene symbol differences (8.8%) between them, with differences due to both algorithmic differences and database composition. AMRFinder missed 16 loci that ResFinder found, while ResFinder missed 216 loci that AMRFinder identified. Based on these results, AMRFinder appears to be a highly accurate AMR gene detection system.

15.
J Antimicrob Chemother ; 73(10): 2625-2630, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30053115

RESUMO

The initial report of the mcr-1 (mobile colistin resistance) gene has led to many reports of mcr-1 variants and other mcr genes from different bacterial species originating from human, animal and environmental samples in different geographical locations. Resistance gene nomenclature is complex and unfortunately problems such as different names being used for the same gene/protein or the same name being used for different genes/proteins are not uncommon. Registries exist for some families, such as bla (ß-lactamase) genes, but there is as yet no agreed nomenclature scheme for mcr genes. The National Center for Biotechnology Information (NCBI) recently took over assigning bla allele numbers from the longstanding Lahey ß-lactamase website and has agreed to do the same for mcr genes. Here, we propose a nomenclature scheme that we hope will be acceptable to researchers in this area and that will reduce future confusion.


Assuntos
Alelos , Antibacterianos/farmacologia , Bactérias/genética , Colistina/farmacologia , Farmacorresistência Bacteriana/genética , Genes MDR , Bactérias/efeitos dos fármacos , Escherichia coli/efeitos dos fármacos , Proteínas de Escherichia coli/genética , Testes de Sensibilidade Microbiana , Terminologia como Assunto , Sequenciamento Completo do Genoma , beta-Lactamases/genética
16.
Environ Microbiol ; 20(5): 1677-1692, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29473278

RESUMO

Bacterial floc formation plays a central role in the activated sludge (AS) process, which has been widely utilized for sewage and wastewater treatment. The formation of AS flocs has long been known to require exopolysaccharide biosynthesis. This study demonstrates an additional requirement for a PEP-CTERM protein in Zoogloea resiniphila, a dominant AS bacterium harboring a large exopolysaccharide biosynthesis gene cluster. Two members of a wide-spread family of high copy number-per-genome PEP-CTERM genes, transcriptionally regulated by the RpoN sigma factor and accessory PrsK-PrsR two-component system and at least one of these, pepA, must be expressed for Zoogloea to build the floc structures that allow gravitational sludge settling and recycling. Without PrsK or PrsR, Zoogloea cells were planktonic rather than flocculated and secreted exopolysaccharides were released into the growth broth in soluble form. Overexpression of PepA could circumvent the requirement of rpoN, prsK and prsR for the floc-forming phenotype by fixing the exopolysaccharides to bacterial cells. However, overexpression of PepA, which underwent post-translational modifications, could not rescue the long-rod morphology of the rpoN mutant. Consistently, PEP-CTERM genes and exopolysaccharide biosynthesis gene cluster are present in the genome of the floc-forming Nitrospira comammox and Mitsuaria strain as well as many other AS bacteria.


Assuntos
Esgotos/microbiologia , Águas Residuárias/microbiologia , Zoogloea/fisiologia , Proteínas de Bactérias/metabolismo , Floculação , Fator sigma/metabolismo , Eliminação de Resíduos Líquidos , Águas Residuárias/química
17.
Nucleic Acids Res ; 46(D1): D851-D860, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29112715

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Assuntos
Curadoria de Dados , Bases de Dados de Ácidos Nucleicos , Genoma , Anotação de Sequência Molecular , Células Procarióticas , Archaea/genética , Bactérias/genética , Bases de Dados de Proteínas , Eucariotos/genética , Previsões , Humanos , Homologia de Sequência , Software , Vírus/genética
18.
PLoS One ; 12(2): e0171758, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28182651

RESUMO

In functionally diverse protein families, conservation in short signature regions may outperform full-length sequence comparisons for identifying proteins that belong to a subgroup within which one specific aspect of their function is conserved. The SIMBAL workflow (Sites Inferred by Metabolic Background Assertion Labeling) is a data-mining procedure for finding such signature regions. It begins by using clues from genomic context, such as co-occurrence or conserved gene neighborhoods, to build a useful training set from a large number of uncharacterized but mutually homologous proteins. When training set construction is successful, the YES partition is enriched in proteins that share function with the user's query sequence, while the NO partition is depleted. A selected query sequence is then mined for short signature regions whose closest matches overwhelmingly favor proteins from the YES partition. High-scoring signature regions typically contain key residues critical to functional specificity, so proteins with the highest sequence similarity across these regions tend to share the same function. The SIMBAL algorithm was described previously, but significant manual effort, expertise, and a supporting software infrastructure were required to prepare the requisite training sets. Here, we describe a new, distributable software suite that speeds up and simplifies the process for using SIMBAL, most notably by providing tools that automate training set construction. These tools have broad utility for comparative genomics, allowing for flexible collection of proteins or protein domains based on genomic context as well as homology, a capability that can greatly assist in protein family construction. Armed with this new software suite, SIMBAL can serve as a fast and powerful in silico alternative to direct experimentation for characterizing proteins and their functional interactions.


Assuntos
Bases de Dados de Proteínas , Família Multigênica , Proteínas/química , Proteínas/fisiologia , Software , Algoritmos , Sequência de Aminoácidos , Animais , Ontologia Genética , Humanos , Domínios Proteicos , Mapas de Interação de Proteínas , Proteínas/genética , Proteínas/metabolismo , Homologia de Sequência de Aminoácidos
19.
Sci Rep ; 7: 41074, 2017 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-28120876

RESUMO

During human infection, Mycobacterium tuberculosis (Mtb) survives the normally bacteriocidal phagosome of macrophages. Mtb and related species may be able to combat this harsh acidic environment which contains reactive oxygen species due to the mycobacterial genomes encoding a large number of dehydrogenases. Typically, dehydrogenase cofactor binding sites are open to solvent, which allows NAD/NADH exchange to support multiple turnover. Interestingly, mycobacterial short chain dehydrogenases/reductases (SDRs) within family TIGR03971 contain an insertion at the NAD binding site. Here we present crystal structures of 9 mycobacterial SDRs in which the insertion buries the NAD cofactor except for a small portion of the nicotinamide ring. Line broadening and STD-NMR experiments did not show NAD or NADH exchange on the NMR timescale. STD-NMR demonstrated binding of the potential substrate carveol, the potential product carvone, the inhibitor tricyclazol, and an external redox partner 2,6-dichloroindophenol (DCIP). Therefore, these SDRs appear to contain a non-exchangeable NAD cofactor and may rely on an external redox partner, rather than cofactor exchange, for multiple turnover. Incidentally, these genes always appear in conjunction with the mftA gene, which encodes the short peptide MftA, and with other genes proposed to convert MftA into the external redox partner mycofactocin.


Assuntos
Coenzimas/química , Coenzimas/metabolismo , Mycobacterium tuberculosis/enzimologia , NAD/química , NAD/metabolismo , Oxirredutases/química , Oxirredutases/metabolismo , 2,6-Dicloroindofenol/metabolismo , Sítios de Ligação , Cristalografia por Raios X , Monoterpenos Cicloexânicos , Humanos , Espectroscopia de Ressonância Magnética , Modelos Moleculares , Monoterpenos/metabolismo , Mutagênese Insercional , Oxirredutases/genética , Ligação Proteica , Conformação Proteica , Tiazóis/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...