Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
1.
Bioinformatics ; 36(3): 713-720, 2020 02 01.
Article in English | MEDLINE | ID: mdl-31424527

ABSTRACT

MOTIVATION: The vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for the co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false-positive and false-negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developed neoepiscope chiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels). RESULTS: Herein, we illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment. neoepiscope is performant, flexible and supports several major histocompatibility complex binding affinity prediction tools. AVAILABILITY AND IMPLEMENTATION: neoepiscope is available on GitHub at https://github.com/pdxgx/neoepiscope under the MIT license. Scripts for reproducing results described in the text are available at https://github.com/pdxgx/neoepiscope-paper under the MIT license. Additional data from this study, including summaries of variant phasing incidence and benchmarking wallclock times, are available in Supplementary Files 1, 2 and 3. Supplementary File 1 contains Supplementary Table 1, Supplementary Figures 1 and 2, and descriptions of Supplementary Tables 2-8. Supplementary File 2 contains Supplementary Tables 2-6 and 8. Supplementary File 3 contains Supplementary Table 7. Raw sequencing data used for the analyses in this manuscript are available from the Sequence Read Archive under accessions PRJNA278450, PRJNA312948, PRJNA307199, PRJNA343789, PRJNA357321, PRJNA293912, PRJNA369259, PRJNA305077, PRJNA306070, PRJNA82745 and PRJNA324705; from the European Genome-phenome Archive under accessions EGAD00001004352 and EGAD00001002731; and by direct request to the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , Genome , Humans , INDEL Mutation , Sequence Analysis, DNA
2.
BMC Bioinformatics ; 19(1): 339, 2018 Sep 25.
Article in English | MEDLINE | ID: mdl-30253747

ABSTRACT

BACKGROUND: Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. RESULTS: To determine how to create subsets of predictions for validation that maximize accuracy of global error profile inference, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. We evaluated these selection strategies on one simulated and two experimental datasets. CONCLUSIONS: Valection is implemented in multiple programming languages, available at: http://labs.oicr.on.ca/boutros-lab/software/valection.


Subject(s)
Sequence Analysis, DNA/methods , Software Validation
3.
BMC Bioinformatics ; 19(1): 28, 2018 01 31.
Article in English | MEDLINE | ID: mdl-29385983

ABSTRACT

BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS: The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS: The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.


Subject(s)
Genome, Human , Germ Cells/metabolism , Polymorphism, Single Nucleotide , Algorithms , Humans , Internet , Neoplasms/genetics , Neoplasms/pathology , User-Computer Interface , Whole Genome Sequencing
4.
Nat Methods ; 12(7): 623-30, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25984700

ABSTRACT

The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.


Subject(s)
Benchmarking , Crowdsourcing , Genome , Neoplasms/genetics , Polymorphism, Single Nucleotide , Algorithms , Humans
5.
BMC Cancer ; 18(1): 414, 2018 04 13.
Article in English | MEDLINE | ID: mdl-29653567

ABSTRACT

BACKGROUND: Tumor neoantigens are drivers of cancer immunotherapy response; however, current prediction tools produce many candidates requiring further prioritization. Additional filtration criteria and population-level understanding may assist with prioritization. Herein, we show neoepitope immunogenicity is related to measures of peptide novelty and report population-level behavior of these and other metrics. METHODS: We propose four peptide novelty metrics to refine predicted neoantigenicity: tumor vs. paired normal peptide binding affinity difference, tumor vs. paired normal peptide sequence similarity, tumor vs. closest human peptide sequence similarity, and tumor vs. closest microbial peptide sequence similarity. We apply these metrics to neoepitopes predicted from somatic missense mutations in The Cancer Genome Atlas (TCGA) and a cohort of melanoma patients, and to a group of peptides with neoepitope-specific immune response data using an extension of pVAC-Seq (Hundal et al., pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med 8:11, 2016). RESULTS: We show neoepitope burden varies across TCGA diseases and HLA alleles, with surprisingly low repetition of neoepitope sequences across patients or neoepitope preferences among sets of HLA alleles. Only 20.3% of predicted neoepitopes across TCGA patients displayed novel binding change based on our binding affinity difference criteria. Similarity of amino acid sequence was typically high between paired tumor-normal epitopes, but in 24.6% of cases, neoepitopes were more similar to other human peptides, or bacterial (56.8% of cases) or viral peptides (15.5% of cases), than their paired normal counterparts. Applied to peptides with neoepitope-specific immune response, a linear model incorporating neoepitope binding affinity, protein sequence similarity between neoepitopes and their closest viral peptides, and paired binding affinity difference was able to predict immunogenicity (AUROC = 0.66). CONCLUSIONS: Our proposed prioritization criteria emphasize neoepitope novelty and refine patient neoepitope predictions for focus on biologically meaningful candidate neoantigens. We have demonstrated that neoepitopes should be considered not only with respect to their paired normal epitope, but to the entire human proteome, and bacterial and viral peptides, with potential implications for neoepitope immunogenicity and personalized vaccines for cancer treatment. We conclude that putative neoantigens are highly variable across individuals as a function of cancer genetics and personalized HLA repertoire, while the overall behavior of filtration criteria reflects predictable patterns.


Subject(s)
Antigens, Neoplasm/immunology , Epitopes/immunology , Neoplasms/immunology , Alleles , Amino Acid Sequence , Antigens, Neoplasm/genetics , Epitope Mapping , Epitopes/chemistry , Epitopes/genetics , Genomics/methods , Humans , Immunotherapy , Neoplasms/genetics , Neoplasms/therapy , Peptides/chemistry , Peptides/genetics , Peptides/immunology , ROC Curve
6.
Nucleic Acids Res ; 41(Database issue): D949-54, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23109555

ABSTRACT

The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a set of web-based tools to display, investigate and analyse cancer genomics data and its associated clinical information. The browser provides whole-genome to base-pair level views of several different types of genomics data, including some next-generation sequencing platforms. The ability to view multiple datasets together allows users to make comparisons across different data and cancer types. Biological pathways, collections of genes, genomic or clinical information can be used to sort, aggregate and zoom into a group of samples. We currently display an expanding set of data from various sources, including 201 datasets from 22 TCGA (The Cancer Genome Atlas) cancers as well as data from Cancer Cell Line Encyclopedia and Stand Up To Cancer. New features include a completely redesigned user interface with an interactive tutorial and updated documentation. We have also added data downloads, additional clinical heatmap features, and an updated Tumor Image Browser based on Google Maps. New security features allow authenticated users access to private datasets hosted by several different consortia through the public website.


Subject(s)
Databases, Genetic , Genomics , Neoplasms/genetics , Cell Line, Tumor , Humans , Internet
7.
Nat Biotechnol ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38862616

ABSTRACT

Subclonal reconstruction algorithms use bulk DNA sequencing data to quantify parameters of tumor evolution, allowing an assessment of how cancers initiate, progress and respond to selective pressures. We launched the ICGC-TCGA (International Cancer Genome Consortium-The Cancer Genome Atlas) DREAM Somatic Mutation Calling Tumor Heterogeneity and Evolution Challenge to benchmark existing subclonal reconstruction algorithms. This 7-year community effort used cloud computing to benchmark 31 subclonal reconstruction algorithms on 51 simulated tumors. Algorithms were scored on seven independent tasks, leading to 12,061 total runs. Algorithm choice influenced performance substantially more than tumor features but purity-adjusted read depth, copy-number state and read mappability were associated with the performance of most algorithms on most tasks. No single algorithm was a top performer for all seven tasks and existing ensemble strategies were unable to outperform the best individual methods, highlighting a key research need. All containerized methods, evaluation code and datasets are available to support further assessment of the determinants of subclonal reconstruction accuracy and development of improved methods to understand tumor evolution.

8.
Nucleic Acids Res ; 39(Database issue): D494-6, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20961957

ABSTRACT

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.


Subject(s)
Databases, Protein , Protein Conformation , Genomics , Proteins/chemistry , Proteins/genetics , User-Computer Interface
9.
Bioinform Adv ; 3(1): vbad020, 2023.
Article in English | MEDLINE | ID: mdl-36874953

ABSTRACT

Summary: Thousands of DNA methylation (DNAm) array samples from human blood are publicly available on the Gene Expression Omnibus (GEO), but they remain underutilized for experiment planning, replication and cross-study and cross-platform analyses. To facilitate these tasks, we augmented our recountmethylation R/Bioconductor package with 12 537 uniformly processed EPIC and HM450K blood samples on GEO as well as several new features. We subsequently used our updated package in several illustrative analyses, finding (i) study ID bias adjustment increased variation explained by biological and demographic variables, (ii) most variation in autosomal DNAm was explained by genetic ancestry and CD4+ T-cell fractions and (iii) the dependence of power to detect differential methylation on sample size was similar for each of peripheral blood mononuclear cells (PBMC), whole blood and umbilical cord blood. Finally, we used PBMC and whole blood to perform independent validations, and we recovered 38-46% of differentially methylated probes between sexes from two previously published epigenome-wide association studies. Availability and implementation: Source code to reproduce the main results are available on GitHub (repo: recountmethylation_flexible-blood-analysis_manuscript; url: https://github.com/metamaden/recountmethylation_flexible-blood-analysis_manuscript). All data was publicly available and downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). Compilations of the analyzed public data can be accessed from the website recount.bio/data (preprocessed HM450K array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/; preprocessed EPIC array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

10.
Cell Genom ; 2(1)2022 Jan 12.
Article in English | MEDLINE | ID: mdl-35199087

ABSTRACT

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.

11.
PLoS Comput Biol ; 6(6): e1000798, 2010 Jun 03.
Article in English | MEDLINE | ID: mdl-20532204

ABSTRACT

The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being.


Subject(s)
Bacterial Proteins/genetics , Computational Biology/methods , Gastrointestinal Tract/microbiology , Genome, Bacterial , Metagenome , Cluster Analysis , Databases, Protein , Humans
12.
Cell Syst ; 12(8): 827-838.e5, 2021 08 18.
Article in English | MEDLINE | ID: mdl-34146471

ABSTRACT

The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.


Subject(s)
Neoplasms , Humans , Neoplasms/genetics , Protein Isoforms/genetics , RNA/genetics , RNA-Seq , Sequence Analysis, RNA
13.
Cell Genom ; 1(2)2021 Nov 10.
Article in English | MEDLINE | ID: mdl-35072136

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.

14.
J Biol Chem ; 284(37): 25268-79, 2009 Sep 11.
Article in English | MEDLINE | ID: mdl-19567872

ABSTRACT

SsgA-like proteins (SALPs) are a family of homologous cell division-related proteins that occur exclusively in morphologically complex actinomycetes. We show that SsgB, a subfamily of SALPs, is the archetypal SALP that is functionally conserved in all sporulating actinomycetes. Sporulation-specific cell division of Streptomyces coelicolor ssgB mutants is restored by introduction of distant ssgB orthologues from other actinomycetes. Interestingly, the number of septa (and spores) of the complemented null mutants is dictated by the specific ssgB orthologue that is expressed. The crystal structure of the SsgB from Thermobifida fusca was determined at 2.6 A resolution and represents the first structure for this family. The structure revealed similarities to a class of eukaryotic "whirly" single-stranded DNA/RNA-binding proteins. However, the electro-negative surface of the SALPs suggests that neither SsgB nor any of the other SALPs are likely to interact with nucleotide substrates. Instead, we show that a conserved hydrophobic surface is likely to be important for SALP function and suggest that proteins are the likely binding partners.


Subject(s)
Actinobacteria/metabolism , Bacterial Proteins/chemistry , Bacterial Proteins/physiology , Amino Acid Sequence , Binding Sites , Cell Division , Cryoelectron Microscopy , Crystallography, X-Ray/methods , Escherichia coli/metabolism , Genetic Complementation Test , Microscopy, Fluorescence/methods , Microscopy, Phase-Contrast/methods , Molecular Sequence Data , Mutation , Sequence Homology, Amino Acid , Spores, Bacterial
15.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1174-81, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944208

ABSTRACT

Proteins with the DUF2063 domain constitute a new Pfam family, PF09836. The crystal structure of a member of this family, NGO1945 from Neisseria gonorrhoeae, has been determined and reveals that the N-terminal DUF2063 domain is likely to be a DNA-binding domain. In conjunction with the rest of the protein, NGO1945 is likely to be involved in transcriptional regulation, which is consistent with genomic neighborhood analysis. Of the 216 currently known proteins that contain a DUF2063 domain, the most significant sequence homologs of NGO1945 (∼40-99% sequence identity) are from various Neisseria and Haemophilus species. As these are important human pathogens, NGO1945 represents an interesting candidate for further exploration via biochemical studies and possible therapeutic intervention.


Subject(s)
Bacterial Proteins/chemistry , Gene Expression Regulation , Neisseria gonorrhoeae/chemistry , Transcription, Genetic , Amino Acid Sequence , Bacterial Proteins/genetics , Crystallography, X-Ray , Genome, Bacterial , Models, Molecular , Molecular Sequence Data , Neisseria gonorrhoeae/genetics , Protein Structure, Quaternary , Protein Structure, Tertiary , Structural Homology, Protein
16.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1182-9, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944209

ABSTRACT

The crystal structures of BB2672 and SPO0826 were determined to resolutions of 1.7 and 2.1 Šby single-wavelength anomalous dispersion and multiple-wavelength anomalous dispersion, respectively, using the semi-automated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG) as part of the NIGMS Protein Structure Initiative (PSI). These proteins are the first structural representatives of the PF06684 (DUF1185) Pfam family. Structural analysis revealed that both structures adopt a variant of the Bacillus chorismate mutase fold (BCM). The biological unit of both proteins is a hexamer and analysis of homologs indicates that the oligomer interface residues are highly conserved. The conformation of the critical regions for oligomerization appears to be dependent on pH or salt concentration, suggesting that this protein might be subject to environmental regulation. Structural similarities to BCM and genome-context analysis suggest a function in amino-acid synthesis.


Subject(s)
Amino Acids/metabolism , Bordetella bronchiseptica/enzymology , Chorismate Mutase/chemistry , Protein Folding , Rhodobacteraceae/enzymology , Amino Acid Sequence , Bacillus/enzymology , Chorismate Mutase/metabolism , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Protein Structure, Quaternary , Protein Structure, Tertiary , Structural Homology, Protein
17.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1254-60, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944219

ABSTRACT

KPN03535 (gi|152972051) is a putative lipoprotein of unknown function that is secreted by Klebsiella pneumoniae MGH 78578. The crystal structure reveals that despite a lack of any detectable sequence similarity to known structures, it is a novel variant of the OB-fold and structurally similar to the bacterial Cpx-pathway protein NlpE, single-stranded DNA-binding (SSB) proteins and toxins. K. pneumoniae MGH 78578 forms part of the normal human skin, mouth and gut flora and is an opportunistic pathogen that is linked to about 8% of all hospital-acquired infections in the USA. This structure provides the foundation for further investigations into this divergent member of the OB-fold family.


Subject(s)
Bacterial Proteins/chemistry , Klebsiella pneumoniae/chemistry , Lipoproteins/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Protein Folding , Protein Structure, Tertiary
18.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1265-73, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944221

ABSTRACT

Proteins that contain the DUF2874 domain constitute a new Pfam family PF11396. Members of this family have predominantly been identified in microbes found in the human gut and oral cavity. The crystal structure of one member of this family, BVU2987 from Bacteroides vulgatus, has been determined, revealing a ß-lactamase inhibitor protein-like structure with a tandem repeat of domains. Sequence analysis and structural comparisons reveal that BVU2987 and other DUF2874 proteins are related to ß-lactamase inhibitor protein, PepSY and SmpA_OmlA proteins and hence are likely to function as inhibitory proteins.


Subject(s)
Bacteroides/chemistry , Periplasmic Proteins/chemistry , Amino Acid Sequence , Bacteroides/metabolism , Conserved Sequence , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Periplasmic Proteins/metabolism , Protein Binding , Protein Structure, Tertiary , Sequence Alignment , Structural Homology, Protein
19.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1274-80, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944222

ABSTRACT

The crystal structure of the Bacteroides thetaiotaomicron protein BT_3984 was determined to a resolution of 1.7 Šand was the first structure to be determined from the extensive SusD family of polysaccharide-binding proteins. SusD is an essential component of the sus operon that defines the paradigm for glycan utilization in dominant members of the human gut microbiota. Structural analysis of BT_3984 revealed an N-terminal region containing several tetratricopeptide repeats (TPRs), while the signature C-terminal region is less structured and contains extensive loop regions. Sequence and structure analysis of BT_3984 suggests the presence of binding interfaces for other proteins from the polysaccharide-utilization complex.


Subject(s)
Bacterial Proteins/chemistry , Bacteroides/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Protein Structure, Tertiary , Structural Homology, Protein
20.
Acta Crystallogr Sect F Struct Biol Cryst Commun ; 66(Pt 10): 1281-6, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20944223

ABSTRACT

BT1062 from Bacteroides thetaiotaomicron is a homolog of Mfa2 (PGN0288 or PG0179), which is a component of the minor fimbriae in Porphyromonas gingivalis. The crystal structure of BT1062 revealed a conserved fold that is widely adopted by fimbrial components.


Subject(s)
Bacteroides/chemistry , Fimbriae Proteins/chemistry , Fimbriae, Bacterial/chemistry , Protein Folding , Amino Acid Sequence , Bacteroides/genetics , Crystallography, X-Ray , Fimbriae Proteins/genetics , Models, Molecular , Molecular Sequence Data , Protein Structure, Tertiary , Sequence Alignment , Structural Homology, Protein
SELECTION OF CITATIONS
SEARCH DETAIL