Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 50(D1): D553-D559, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850923

RESUMO

The Structural Classification of Proteins-extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/classificação , Algoritmos , Bases de Dados de Compostos Químicos , Regulação da Expressão Gênica/genética , Aprendizado de Máquina , Proteínas/genética
2.
Environ Microbiol ; 24(11): 5546-5560, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36053980

RESUMO

Bacillus cereus strain CPT56D-587-MTF (CPTF) was isolated from the highly contaminated Oak Ridge Reservation (ORR) subsurface. This site is contaminated with high levels of nitric acid and multiple heavy metals. Amplicon sequencing of the 16S rRNA genes (V4 region) in sediment from this area revealed an amplicon sequence variant (ASV) with 100% identity to the CPTF 16S rRNA sequence. Notably, this CPTF-matching ASV had the highest relative abundance in this community survey, with a median relative abundance of 3.77% and comprised 20%-40% of reads in some samples. Pangenomic analysis revealed that strain CPTF has expanded genomic content compared to other B. cereus species-largely due to plasmid acquisition and expansion of transposable elements. This suggests that these features are important for rapid adaptation to native environmental stressors. We connected genotype to phenotype in the context of the unique geochemistry of the site. These analyses revealed that certain genes (e.g. nitrate reductase, heavy metal efflux pumps) that allow this strain to successfully occupy the geochemically heterogenous microniches of its native site are characteristic of the B. cereus species while others such as acid tolerance are mobile genetic element associated and are generally unique to strain CPTF.


Assuntos
Bacillus cereus , Metais Pesados , RNA Ribossômico 16S/genética , Bacillus cereus/genética , Genômica , Filogenia
3.
Nucleic Acids Res ; 47(D1): D475-D481, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30500919

RESUMO

The SCOPe (Structural Classification of Proteins-extended, https://scop.berkeley.edu) database hierarchically classifies domains from the majority of proteins of known structure according to their structural and evolutionary relationships. SCOPe also incorporates and updates the ASTRAL compendium, which provides multiple databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. Protein structures are classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.07, we have focused our manual curation efforts on larger protein structures, including the spliceosome, proteasome and RNA polymerase I, as well as many other Pfam families that had not previously been classified. Domains from these large protein complexes are distinctive in several ways: novel non-globular folds are more common, and domains from previously observed protein families often have N- or C-terminal extensions that were disordered or not present in previous structures. The current monthly release update, SCOPe 2.07-2018-10-18, classifies 90 992 PDB entries (about two thirds of PDB entries).


Assuntos
Bases de Dados de Proteínas , Domínios Proteicos , Complexos Multiproteicos/química , Complexo de Endopeptidases do Proteassoma/química , Spliceossomos/química
4.
Mol Cell Proteomics ; 15(6): 2186-202, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27099342

RESUMO

Identifying protein-protein interactions (PPIs) at an acceptable false discovery rate (FDR) is challenging. Previously we identified several hundred PPIs from affinity purification - mass spectrometry (AP-MS) data for the bacteria Escherichia coli and Desulfovibrio vulgaris These two interactomes have lower FDRs than any of the nine interactomes proposed previously for bacteria and are more enriched in PPIs validated by other data than the nine earlier interactomes. To more thoroughly determine the accuracy of ours or other interactomes and to discover further PPIs de novo, here we present a quantitative tagless method that employs iTRAQ MS to measure the copurification of endogenous proteins through orthogonal chromatography steps. 5273 fractions from a four-step fractionation of a D. vulgaris protein extract were assayed, resulting in the detection of 1242 proteins. Protein partners from our D. vulgaris and E. coli AP-MS interactomes copurify as frequently as pairs belonging to three benchmark data sets of well-characterized PPIs. In contrast, the protein pairs from the nine other bacterial interactomes copurify two- to 20-fold less often. We also identify 200 high confidence D. vulgaris PPIs based on tagless copurification and colocalization in the genome. These PPIs are as strongly validated by other data as our AP-MS interactomes and overlap with our AP-MS interactome for D.vulgaris within 3% of expectation, once FDRs and false negative rates are taken into account. Finally, we reanalyzed data from two quantitative tagless screens of human cell extracts. We estimate that the novel PPIs reported in these studies have an FDR of at least 85% and find that less than 7% of the novel PPIs identified in each screen overlap. Our results establish that a quantitative tagless method can be used to validate and identify PPIs, but that such data must be analyzed carefully to minimize the FDR.


Assuntos
Proteínas de Bactérias/metabolismo , Desulfovibrio vulgaris/metabolismo , Escherichia coli/metabolismo , Proteômica/métodos , Cromatografia de Afinidade/métodos , Espectrometria de Massas/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas
5.
Mol Cell Proteomics ; 15(5): 1539-55, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-26873250

RESUMO

Numerous affinity purification-mass spectrometry (AP-MS) and yeast two-hybrid screens have each defined thousands of pairwise protein-protein interactions (PPIs), most of which are between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here, we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial yeast two-hybrid and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli Compared with the nine published interactomes, our two networks are smaller, are much less highly connected, and have significantly lower false discovery rates. In addition, our interactomes are much more enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays than the pairs reported in prior studies. Our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested.


Assuntos
Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Desulfovibrio vulgaris/metabolismo , Escherichia coli/metabolismo , Cromatografia de Afinidade , Bases de Dados de Proteínas , Espectrometria de Massas , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteômica/métodos , Técnicas do Sistema de Duplo-Híbrido
6.
Hum Mutat ; 38(9): 1155-1168, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28397312

RESUMO

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Testes Genéticos , Humanos , Fenótipo
7.
Environ Sci Technol ; 51(5): 2879-2889, 2017 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-28112946

RESUMO

Temporal variability complicates testing the influences of environmental variability on microbial community structure and thus function. An in-field bioreactor system was developed to assess oxic versus anoxic manipulations on in situ groundwater communities. Each sample was sequenced (16S SSU rRNA genes, average 10,000 reads), and biogeochemical parameters are monitored by quantifying 53 metals, 12 organic acids, 14 anions, and 3 sugars. Changes in dissolved oxygen (DO), pH, and other variables were similar across bioreactors. Sequencing revealed a complex community that fluctuated in-step with the groundwater community and responded to DO. This also directly influenced the pH, and so the biotic impacts of DO and pH shifts are correlated. A null model demonstrated that bioreactor communities were driven in part not only by experimental conditions but also by stochastic variability and did not accurately capture alterations in diversity during perturbations. We identified two groups of abundant OTUs important to this system; one was abundant in high DO and pH and contained heterotrophs and oxidizers of iron, nitrite, and ammonium, whereas the other was abundant in low DO with the capability to reduce nitrate. In-field bioreactors are a powerful tool for capturing natural microbial community responses to alterations in geochemical factors beyond the bulk phase.


Assuntos
Bactérias/genética , Reatores Biológicos , Água Subterrânea/química , Nitritos , RNA Ribossômico 16S/genética
8.
Nucleic Acids Res ; 42(Database issue): D304-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24304899

RESUMO

Structural Classification of Proteins-extended (SCOPe, http://scop.berkeley.edu) is a database of protein structural relationships that extends the SCOP database. SCOP is a manually curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. Development of the SCOP 1.x series concluded with SCOP 1.75. The ASTRAL compendium provides several databases and tools to aid in the analysis of the protein structures classified in SCOP, particularly through the use of their sequences. SCOPe extends version 1.75 of the SCOP database, using automated curation methods to classify many structures released since SCOP 1.75. We have rigorously benchmarked our automated methods to ensure that they are as accurate as manual curation, though there are many proteins to which our methods cannot be applied. SCOPe is also partially manually curated to correct some errors in SCOP. SCOPe aims to be backward compatible with SCOP, providing the same parseable files and a history of changes between all stable SCOP and SCOPe releases. SCOPe also incorporates and updates the ASTRAL database. The latest release of SCOPe, 2.03, contains 59 514 Protein Data Bank (PDB) entries, increasing the number of structures classified in SCOP by 55% and including more than 65% of the protein structures in the PDB.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Internet , Proteínas/classificação , Integração de Sistemas
9.
Proteins ; 83(11): 2025-38, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26313554

RESUMO

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.


Assuntos
Proteínas/química , Proteínas/classificação , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Conformação Proteica
10.
Environ Microbiome ; 19(1): 26, 2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38671539

RESUMO

Castellaniella species have been isolated from a variety of mixed-waste environments including the nitrate and multiple metal-contaminated subsurface at the Oak Ridge Reservation (ORR). Previous studies examining microbial community composition and nitrate removal at ORR during biostimulation efforts reported increased abundances of members of the Castellaniella genus concurrent with increased denitrification rates. Thus, we asked how genomic and abiotic factors control the Castellaniella biogeography at the site to understand how these factors may influence nitrate transformation in an anthropogenically impacted setting. We report the isolation and characterization of several Castellaniella strains from the ORR subsurface. Five of these isolates match at 100% identity (at the 16S rRNA gene V4 region) to two Castellaniella amplicon sequence variants (ASVs), ASV1 and ASV2, that have persisted in the ORR subsurface for at least 2 decades. However, ASV2 has consistently higher relative abundance in samples taken from the site and was also the dominant blooming denitrifier population during a prior biostimulation effort. We found that the ASV2 representative strain has greater resistance to mixed metal stress than the ASV1 representative strains. We attribute this resistance, in part, to the large number of unique heavy metal resistance genes identified on a genomic island in the ASV2 representative genome. Additionally, we suggest that the relatively lower fitness of ASV1 may be connected to the loss of the nitrous oxide reductase (nos) operon (and associated nitrous oxide reductase activity) due to the insertion at this genomic locus of a mobile genetic element carrying copper resistance genes. This study demonstrates the value of integrating genomic, environmental, and phenotypic data to characterize the biogeography of key microorganisms in contaminated sites.

11.
Front Microbiol ; 14: 1095191, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37065130

RESUMO

Sulfate-reducing bacteria (SRB) are obligate anaerobes that can couple their growth to the reduction of sulfate. Despite the importance of SRB to global nutrient cycles and their damage to the petroleum industry, our molecular understanding of their physiology remains limited. To systematically provide new insights into SRB biology, we generated a randomly barcoded transposon mutant library in the model SRB Desulfovibrio vulgaris Hildenborough (DvH) and used this genome-wide resource to assay the importance of its genes under a range of metabolic and stress conditions. In addition to defining the essential gene set of DvH, we identified a conditional phenotype for 1,137 non-essential genes. Through examination of these conditional phenotypes, we were able to make a number of novel insights into our molecular understanding of DvH, including how this bacterium synthesizes vitamins. For example, we identified DVU0867 as an atypical L-aspartate decarboxylase required for the synthesis of pantothenic acid, provided the first experimental evidence that biotin synthesis in DvH occurs via a specialized acyl carrier protein and without methyl esters, and demonstrated that the uncharacterized dehydrogenase DVU0826:DVU0827 is necessary for the synthesis of pyridoxal phosphate. In addition, we used the mutant fitness data to identify genes involved in the assimilation of diverse nitrogen sources and gained insights into the mechanism of inhibition of chlorate and molybdate. Our large-scale fitness dataset and RB-TnSeq mutant library are community-wide resources that can be used to generate further testable hypotheses into the gene functions of this environmentally and industrially important group of bacteria.

12.
J Proteome Res ; 11(12): 5720-35, 2012 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-23098413

RESUMO

Cell membranes represent the "front line" of cellular defense and the interface between a cell and its environment. To determine the range of proteins and protein complexes that are present in the cell membranes of a target organism, we have utilized a "tagless" process for the system-wide isolation and identification of native membrane protein complexes. As an initial subject for study, we have chosen the Gram-negative sulfate-reducing bacterium Desulfovibrio vulgaris. With this tagless methodology, we have identified about two-thirds of the outer membrane- associated proteins anticipated. Approximately three-fourths of these appear to form homomeric complexes. Statistical and machine-learning methods used to analyze data compiled over multiple experiments revealed networks of additional protein-protein interactions providing insight into heteromeric contacts made between proteins across this region of the cell. Taken together, these results establish a D. vulgaris outer membrane protein data set that will be essential for the detection and characterization of environment-driven changes in the outer membrane proteome and in the modeling of stress response pathways. The workflow utilized here should be effective for the global characterization of membrane protein complexes in a wide range of organisms.


Assuntos
Proteínas da Membrana Bacteriana Externa/isolamento & purificação , Desulfovibrio vulgaris/química , Ensaios de Triagem em Larga Escala/métodos , Proteínas de Membrana/isolamento & purificação , Complexos Multiproteicos/isolamento & purificação , Proteínas da Membrana Bacteriana Externa/química , Membrana Celular/química , Cromatografia por Troca Iônica , Desulfovibrio vulgaris/enzimologia , Detergentes/química , Eletroforese em Gel de Poliacrilamida , Escherichia coli/química , Espectrometria de Massas , Proteínas de Membrana/química , Peso Molecular , Complexos Multiproteicos/química , Periplasma/química , Periplasma/enzimologia , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteoma/química , Proteômica/métodos , Homologia de Sequência de Aminoácidos , Solubilidade
13.
Bioinformatics ; 27(3): 437-8, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-21258060

RESUMO

UNLABELLED: Workflow Information Storage Toolkit (WIST) is a set of application programming interfaces and web applications that allow for the rapid development of customized laboratory information management systems (LIMS). WIST provides common LIMS input components, and allows them to be arranged and configured using a flexible language that specifies each component's visual and semantic characteristics. WIST includes a complete set of web applications for adding, editing and viewing data, as well as a powerful setup tool that can build new LIMS modules by analyzing existing database schema. AVAILABILITY AND IMPLEMENTATION: WIST is implemented in Perl and may be obtained from http://vimss.sf.net under the BSD license.


Assuntos
Sistemas de Informação em Laboratório Clínico , Biologia Computacional/métodos , Software , Bases de Dados Factuais , Internet , Interface Usuário-Computador
14.
Proc Natl Acad Sci U S A ; 106(39): 16580-5, 2009 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-19805340

RESUMO

An unbiased survey has been made of the stable, most abundant multi-protein complexes in Desulfovibrio vulgaris Hildenborough (DvH) that are larger than Mr approximately 400 k. The quaternary structures for 8 of the 16 complexes purified during this work were determined by single-particle reconstruction of negatively stained specimens, a success rate approximately 10 times greater than that of previous "proteomic" screens. In addition, the subunit compositions and stoichiometries of the remaining complexes were determined by biochemical methods. Our data show that the structures of only two of these large complexes, out of the 13 in this set that have recognizable functions, can be modeled with confidence based on the structures of known homologs. These results indicate that there is significantly greater variability in the way that homologous prokaryotic macromolecular complexes are assembled than has generally been appreciated. As a consequence, we suggest that relying solely on previously determined quaternary structures for homologous proteins may not be sufficient to properly understand their role in another cell of interest.


Assuntos
Proteínas de Bactérias/química , Desulfovibrio vulgaris/metabolismo , Proteínas de Bactérias/isolamento & purificação , Cristalografia por Raios X , Bases de Dados de Proteínas , Desulfovibrio vulgaris/química , Modelos Moleculares , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Conformação Proteica
15.
Gigascience ; 112022 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-36251274

RESUMO

BACKGROUND: Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science and has more recently been formalized in the FAIR principles: that all data objects be Findable, Accessible, Interoperable, and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring the efficiency of processes, and for facilitating reuse of data-analytical frameworks. FINDINGS: We present the Contextual Ontology-based Repository Analysis Library (CORAL), a platform that greatly facilitates adherence to all 4 of the FAIR principles, including the especially difficult challenge of making heterogeneous datasets Interoperable and Reusable across all parts of a large, long-lasting organization. To achieve this, CORAL's data model requires that data generators extensively document the context for all data, and our tools maintain that context throughout the entire analysis pipeline. CORAL also features a web interface for data generators to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API. CONCLUSIONS: CORAL enables organizations to build FAIR data types on the fly as they are needed, avoiding the expense of bespoke data modeling. CORAL provides a uniquely powerful platform to enable integrative cross-dataset analyses, generating deeper insights than are possible using traditional analysis tools.


Assuntos
Antozoários , Análise de Dados , Animais
16.
Microbiol Resour Announc ; 11(5): e0014522, 2022 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-35475637

RESUMO

Bacillus cereus strain CPT56D-587-MTF was isolated from nitrate- and toxic metal-contaminated subsurface sediment at the Oak Ridge Reservation (ORR) (Oak Ridge, TN, USA). Here, we report the complete genome sequence of this strain to provide genomic insight into its strategies for survival at this mixed-waste site.

17.
mSystems ; : e0053721, 2021 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-34184913

RESUMO

Viruses are ubiquitous microbiome components, shaping ecosystems via strain-specific predation, horizontal gene transfer and redistribution of nutrients through host lysis. Viral impacts are important in groundwater ecosystems, where microbes drive many nutrient fluxes and metabolic processes; however, little is known about the diversity of viruses in these environments. We analyzed four groundwater plasmidomes (the entire plasmid content of an environment) and identified 200 viral sequences, which clustered into 41 genus-level viral clusters (approximately equivalent to viral genera) including 9 known and 32 putative new genera. We used publicly available bacterial whole-genome sequences (WGS) and WGS from 261 bacterial isolates from this groundwater environment to identify potential viral hosts. We linked 76 of the 200 viral sequences to a range of bacterial phyla, the majority associated with Proteobacteria, followed by Firmicutes, Bacteroidetes, and Actinobacteria. The publicly available WGS enabled mapping bacterial hosts to several viral sequences. The WGS of groundwater isolates increased the depth of host prediction by allowing host identification at the strain level. The latter included 4 viruses that were almost entirely (>99% query coverage, >99% identity) identified as integrated in the genomes of Pseudomonas, Acidovorax, and Castellaniella strains, resulting in high-confidence host assignments. Lastly, 21 of these viruses carried putative auxiliary metabolite genes for metal and antibiotic resistance, which might drive their infection cycles and/or provide selective advantage to infected hosts. Exploring the groundwater virome provides a necessary foundation for integration of viruses into ecosystem models where they are key players in microbial adaption to environmental stress. IMPORTANCE To our knowledge, this is the first study to identify the bacteriophage distribution in a groundwater ecosystem shedding light on their prevalence and distribution across metal-contaminated and background sites. Our study is uniquely based on selective sequencing of solely the extrachromosomal elements of a microbiome followed by analysis for viral signatures, thus establishing a more focused approach for phage identifications. Using this method, we detected several novel phage genera along with those previously established. Our approach of using the whole-genome sequences of hundreds of bacterial isolates from the same site enabled us to make host assignments with high confidence, several at strain levels. Certain phage genes suggest that they provide an environment-specific selective advantage to their bacterial hosts. Our study lays the foundation for future research on directed phage isolations using specific bacterial host strains to further characterize groundwater phages, their life cycles, and their effects on groundwater microbiome and biogeochemistry.

18.
mSystems ; 6(1)2021 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-33622857

RESUMO

Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community. To understand how these standards are being adopted, or the barriers to adoption, across research domains, institutions, and funding agencies, the National Microbiome Data Collaborative (NMDC) hosted a workshop in October 2019. This report provides a summary of discussions that took place throughout the workshop, as well as outcomes of the working groups initiated at the workshop.

19.
PLoS Biol ; 5(3): e16, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17355171

RESUMO

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.


Assuntos
Proteínas/química , Etiquetas de Sequências Expressas , Oceanos e Mares , Proteínas/genética , Microbiologia da Água
20.
Nucleic Acids Res ; 36(Database issue): D419-25, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18000004

RESUMO

The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/classificação , Bases de Dados de Proteínas/tendências , Evolução Molecular , Genômica , Internet , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa