Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 172
Filter
1.
Cell ; 165(3): 690-703, 2016 Apr 21.
Article in English | MEDLINE | ID: mdl-27062925

ABSTRACT

Pili are proteinaceous polymers of linked pilins that protrude from the cell surface of many bacteria and often mediate adherence and virulence. We investigated a set of 20 Bacteroidia pilins from the human microbiome whose structures and mechanism of assembly were unknown. Crystal structures and biochemical data revealed a diverse protein superfamily with a common Greek-key ß sandwich fold with two transthyretin-like repeats that polymerize into a pilus through a strand-exchange mechanism. The assembly mechanism of the central, structural pilins involves proteinase-assisted removal of their N-terminal ß strand, creating an extended hydrophobic groove that binds the C-terminal donor strands of the incoming pilin. Accessory pilins at the tip and base have unique structural features specific to their location, allowing initiation or termination of the assembly. The Bacteroidia pilus, therefore, has a biogenesis mechanism that is distinct from other known pili and likely represents a different type of bacterial pilus.


Subject(s)
Fimbriae Proteins/chemistry , Fimbriae, Bacterial , Gastrointestinal Microbiome , Amino Acid Sequence , Crystallography, X-Ray , Fimbriae Proteins/genetics , Fimbriae Proteins/metabolism , Humans , Lipoproteins/chemistry , Lipoproteins/metabolism , Models, Molecular , Molecular Sequence Data , Sequence Alignment
2.
Nature ; 605(7911): 640-652, 2022 05.
Article in English | MEDLINE | ID: mdl-35361968

ABSTRACT

The global emergence of many severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants jeopardizes the protective antiviral immunity induced after infection or vaccination. To address the public health threat caused by the increasing SARS-CoV-2 genomic diversity, the National Institute of Allergy and Infectious Diseases within the National Institutes of Health established the SARS-CoV-2 Assessment of Viral Evolution (SAVE) programme. This effort was designed to provide a real-time risk assessment of SARS-CoV-2 variants that could potentially affect the transmission, virulence, and resistance to infection- and vaccine-induced immunity. The SAVE programme is a critical data-generating component of the US Government SARS-CoV-2 Interagency Group to assess implications of SARS-CoV-2 variants on diagnostics, vaccines and therapeutics, and for communicating public health risk. Here we describe the coordinated approach used to identify and curate data about emerging variants, their impact on immunity and effects on vaccine protection using animal models. We report the development of reagents, methodologies, models and notable findings facilitated by this collaborative approach and identify future challenges. This programme is a template for the response to rapidly evolving pathogens with pandemic potential by monitoring viral evolution in the human population to identify variants that could reduce the effectiveness of countermeasures.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Biological Evolution , COVID-19 Vaccines , Humans , National Institute of Allergy and Infectious Diseases (U.S.) , Pandemics/prevention & control , Pharmacogenomic Variants , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , United States/epidemiology , Virulence
3.
Int J Mol Sci ; 25(11)2024 May 29.
Article in English | MEDLINE | ID: mdl-38892129

ABSTRACT

This study focuses on understanding the transcriptional heterogeneity of activated platelets and its impact on diseases such as sepsis, COVID-19, and systemic lupus erythematosus (SLE). Recognizing the limited knowledge in this area, our research aims to dissect the complex transcriptional profiles of activated platelets to aid in developing targeted therapies for abnormal and pathogenic platelet subtypes. We analyzed single-cell transcriptional profiles from 47,977 platelets derived from 413 samples of patients with these diseases, utilizing Deep Neural Network (DNN) and eXtreme Gradient Boosting (XGB) to distinguish transcriptomic signatures predictive of fatal or survival outcomes. Our approach included source data annotations and platelet markers, along with SingleR and Seurat for comprehensive profiling. Additionally, we employed Uniform Manifold Approximation and Projection (UMAP) for effective dimensionality reduction and visualization, aiding in the identification of various platelet subtypes and their relation to disease severity and patient outcomes. Our results highlighted distinct platelet subpopulations that correlate with disease severity, revealing that changes in platelet transcription patterns can intensify endotheliopathy, increasing the risk of coagulation in fatal cases. Moreover, these changes may impact lymphocyte function, indicating a more extensive role for platelets in inflammatory and immune responses. This study identifies crucial biomarkers of platelet heterogeneity in serious health conditions, paving the way for innovative therapeutic approaches targeting platelet activation, which could improve patient outcomes in diseases characterized by altered platelet function.


Subject(s)
Blood Platelets , COVID-19 , Lupus Erythematosus, Systemic , Machine Learning , SARS-CoV-2 , Sepsis , Single-Cell Analysis , Transcriptome , Humans , COVID-19/blood , COVID-19/genetics , COVID-19/virology , Lupus Erythematosus, Systemic/genetics , Lupus Erythematosus, Systemic/blood , Blood Platelets/metabolism , Single-Cell Analysis/methods , Sepsis/genetics , Sepsis/blood , Gene Expression Profiling/methods , Platelet Activation/genetics
4.
J Struct Biol ; 215(3): 108011, 2023 09.
Article in English | MEDLINE | ID: mdl-37562586

ABSTRACT

Leucine Rich Repeat (LRR) domains, are present in hundreds of thousands of proteins across all kingdoms of life and are typically involved in protein-protein interactions and ligand recognition. LRR domains are classified into eight classes and when examined in three dimensions seven, of them form curved solenoid-like super-helices, also described as toruses, with a beta sheet on the concave (inside) and stacked alpha-helices on the convex (outside) of the torus. Here we present an overview of the least characterized 8th class of LRR proteins, the TpLRR-like LRRs, named after the Treponema pallidum protein Tp0225. Proteins from the TpLRR class differ from the proteins in all other known LRR classes by having a flipped curvature, with the beta sheet on the convex side of the torus and irregular secondary structure instead of helices on the opposite, now concave site. TpLRR proteins also present highly divergent sequence pattern of individual repeats and can associate with specific types of additional domains. Several of the characterized proteins from this class, specifically the BspA-like proteins, were found in human bacterial and protozoan pathogens, playing an important role in the interactions between the pathogens and the host immune system. In this paper we surveyed all existing experimental structures and selected AlphaFold models of the best-known proteins containing this class of LRR repeats, analyzing the relation between the pattern of conserved residues, specific structural features and functions of these proteins.


Subject(s)
Leucine-Rich Repeat Proteins , Proteins , Humans , Proteins/chemistry , Protein Domains , Protein Structure, Secondary , Bacteria/chemistry
5.
Emerg Infect Dis ; 29(5)2023 05.
Article in English | MEDLINE | ID: mdl-37054986

ABSTRACT

Since late 2020, SARS-CoV-2 variants have regularly emerged with competitive and phenotypic differences from previously circulating strains, sometimes with the potential to escape from immunity produced by prior exposure and infection. The Early Detection group is one of the constituent groups of the US National Institutes of Health National Institute of Allergy and Infectious Diseases SARS-CoV-2 Assessment of Viral Evolution program. The group uses bioinformatic methods to monitor the emergence, spread, and potential phenotypic properties of emerging and circulating strains to identify the most relevant variants for experimental groups within the program to phenotypically characterize. Since April 2021, the group has prioritized variants monthly. Prioritization successes include rapidly identifying most major variants of SARS-CoV-2 and providing experimental groups within the National Institutes of Health program easy access to regularly updated information on the recent evolution and epidemiology of SARS-CoV-2 that can be used to guide phenotypic investigations.


Subject(s)
COVID-19 , SARS-CoV-2 , United States/epidemiology , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , National Institutes of Health (U.S.)
6.
Arch Biochem Biophys ; 739: 109579, 2023 05 01.
Article in English | MEDLINE | ID: mdl-36933758

ABSTRACT

Both gender and smoking are correlated with prevalence and outcomes in many types of cancers. Tobacco smoke is a known carcinogen through its genotoxicity but can also affect cancer progression through its effect on the immune system. In this study, we aim to evaluate the hypothesis that the effects of smoking on the tumor immune microenvironment will be influenced differently by gender using large-scale analysis of publicly available cancer datasets. We used The Cancer Genomic Atlas (TCGA) datasets (n = 2724) to analyze effects of smoking on different cancer immune subtypes and the relative abundance of immune cell types between male and female cancer patients. We further validated our results by analyzing additional datasets, including Expression Project for Oncology (expO) bulk RNA-seq dataset (n = 1118) and single-cell RNA-seq dataset (n = 14). Results of our study indicate that in female patients, two immune subtypes, C1 and C2, are respectively over and under abundant in smokers vs. never smokers. In males, the only significant difference is underabundance of the C6 subtype in smokers. We identified gender-specific differences in the population of immune cell types between smokers and never smokers in all TCGA and expO cancer types. Increased plasma cell population was identified as the most consistent feature distinguishing smokers and never smokers, especially in current female smokers based on both TCGA and expO data. Our analysis of existing single-cell RNA-seq data further revealed that smoking differentially affects the gene expression profile of cancer patients based on the immune cell type and gender. In our analysis, female and male smokers show different smoking-induced patterns of immune cells in tumor microenvironment. Besides, our results suggest cancer tissues directly exposed to tobacco smoke undergo the most significant changes, but all other tissue types are affected as well. Findings of current study also indicate that changes in the populations of plasma cells and their correlations to survival outcomes are stronger in female current smokers, with implications for cancer immunotherapy of women smokers. In conclusion, results of this study can be used to develop personalized treatment plans for cancer patients who smoke, particularly women smokers, taking into account the unique immune cell profile of their tumors.


Subject(s)
Lung Neoplasms , Tobacco Smoke Pollution , Humans , Male , Female , Tumor Microenvironment , Sex Factors , Smoking/adverse effects , Lung Neoplasms/pathology
7.
J Bacteriol ; 204(5): e0055521, 2022 05 17.
Article in English | MEDLINE | ID: mdl-35435721

ABSTRACT

Alpha-pore-forming toxins (α-PFTs) are secreted by many species of bacteria, including Escherichia coli, Aeromonas hydrophila, and Bacillus thuringiensis, as part of their arsenal of virulence factors, and are often cytotoxic. In particular, for α-PFTs, the membrane-spanning channel they form is composed of hydrophobic α-helices. These toxins oligomerize at the surface of target cells and transition from a soluble to a protomer state in which they expose their hydrophobic regions and insert into the membrane to form a pore. The pores may be composed of homooligomers of one component or heterooligomers with two or three components, resulting in bi- or tripartite toxins. The multicomponent α-PFTs are often expressed from a single operon. Recently, motility-associated killing factor A (MakA), an α-PFT, was discovered in Vibrio cholerae. We report that makA is found on the V. cholerae GI-10 genomic island within an operon containing genes for two other potential α-PFTs, MakB and MakE. We determined the X-ray crystal structures for MakA, MakB, and MakE and demonstrated that all three are structurally related to the α-PFT family in the soluble state, and we modeled their protomer state based on the α-PFT AhlB from A. hydrophila. We found that MakA alone is cytotoxic at micromolar concentrations. However, combining MakA with MakB and MakE is cytotoxic at nanomolar concentrations, with specificity for J774 macrophage cells. Our data suggest that MakA, -B, and -E are α-PFTs that potentially act as a tripartite pore-forming toxin with specificity for phagocytic cells. IMPORTANCE The bacterium Vibrio cholerae causes gastrointestinal, wound, and skin infections. The motility-associated killing factor A (MakA) was recently shown to be cytotoxic against colon, prostate, and other cancer cells. However, at the outset of this study, the capacity of MakA to damage cells in combination with other Mak proteins encoded in the same operon had not been elucidated. We determined the structures of three Mak proteins and established that they are structurally related to the α-PFTs. Compared to MakA alone, the combination of all three toxins was more potent specifically in mouse macrophages. This study highlights the idea that the Mak toxins are selectively cytotoxic and thus may function as a tripartite toxin with cell type specificity.


Subject(s)
Vibrio cholerae , Animals , Cytotoxins/genetics , Cytotoxins/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Genomic Islands , Mice , Pore Forming Cytotoxic Proteins , Protein Subunits/metabolism , Vibrio cholerae/metabolism , Virulence Factors/metabolism
8.
Proteins ; 90(2): 504-511, 2022 02.
Article in English | MEDLINE | ID: mdl-34553433

ABSTRACT

Several plastic degrading enzymes have been described in the literature, most notably PETases that are capable of hydrolyzing polyethylene terephthalate (PET) plastic. One of them, the PETase from Ideonella sakaiensis, a bacterium isolated from environmental samples within a PET bottle recycling site, was a subject of extensive studies. To test how widespread PETase functionality is in other bacterial communities, we used a cascade of BLAST searches in the JGI metagenomic datasets and showed that close homologs of I. sakaiensis PETase can also be found in other metagenomic environmental samples from both human-affected and relatively pristine sites. To confirm their classification as putative PETases, we verified that the newly identified proteins have the PETase sequence signatures common to known PETases and that phylogenetic analyses group them with the experimentally characterized PETases. Additionally, docking analysis was performed in order to further confirm the functional assignment of the putative environmental PETases.


Subject(s)
Biodegradation, Environmental , Burkholderiales/enzymology , Plastics/metabolism , Polyethylene Terephthalates/metabolism , Bacterial Proteins/metabolism
9.
PLoS Comput Biol ; 17(7): e1009147, 2021 07.
Article in English | MEDLINE | ID: mdl-34237054

ABSTRACT

The unprecedented pace of the sequencing of the SARS-CoV-2 virus genomes provides us with unique information about the genetic changes in a single pathogen during ongoing pandemic. By the analysis of close to 200,000 genomes we show that the patterns of the SARS-CoV-2 virus mutations along its genome are closely correlated with the structural and functional features of the encoded proteins. Requirements of foldability of proteins' 3D structures and the conservation of their key functional regions, such as protein-protein interaction interfaces, are the dominant factors driving evolutionary selection in protein-coding genes. At the same time, avoidance of the host immunity leads to the abundance of mutations in other regions, resulting in high variability of the missense mutation rate along the genome. "Unexplained" peaks and valleys in the mutation rate provide hints on function for yet uncharacterized genomic regions and specific protein structural and functional features they code for. Some of these observations have immediate practical implications for the selection of target regions for PCR-based COVID-19 tests and for evaluating the risk of mutations in epitopes targeted by specific antibodies and vaccine design strategies.


Subject(s)
Biological Evolution , SARS-CoV-2/physiology , Genes, Viral , Mutation , SARS-CoV-2/genetics , Viral Proteins/physiology
10.
Nucleic Acids Res ; 48(W1): W60-W64, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32469061

ABSTRACT

FATCAT 2.0 server (http://fatcat.godziklab.org/), provides access to a flexible protein structure alignment algorithm developed in our group. In such an alignment, rotations and translations between elements in the structure are allowed to minimize the overall root mean square deviation (RMSD) between the compared structures. This allows to effectively compare protein structures even if they underwent structural rearrangements in different functional forms, different crystallization conditions or as a result of mutations. The major update for the server introduces a new graphical interface, much faster database searches and several new options for visualization of the structural differences between proteins.


Subject(s)
Software , Structural Homology, Protein , Algorithms , Databases, Protein , Models, Molecular , Proteins/chemistry
11.
Bioinformatics ; 36(15): 4360-4362, 2020 08 01.
Article in English | MEDLINE | ID: mdl-32470119

ABSTRACT

MOTIVATION: As the COVID-19 pandemic is spreading around the world, the SARS-CoV-2 virus is evolving with mutations that potentially change and fine-tune functions of the proteins coded in its genome. RESULTS: Coronavirus3D website integrates data on the SARS-CoV-2 virus mutations with information about 3D structures of its proteins, allowing users to visually analyze the mutations in their 3D context. AVAILABILITY AND IMPLEMENTATION: Coronavirus3D server is freely available at https://coronavirus3d.org.


Subject(s)
Coronavirus Infections , Genome, Viral , Pandemics , Pneumonia, Viral , Betacoronavirus , COVID-19 , Genomics , Humans , SARS-CoV-2
12.
Nucleic Acids Res ; 47(D1): D895-D899, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30407596

ABSTRACT

Our knowledge of cancer genomics exploded in last several years, providing us with detailed knowledge of genetic alterations in almost all cancer types. Analysis of this data gave us new insights into molecular aspects of cancer, most important being the amazing diversity of molecular abnormalities in individual cancers. The most important question in cancer research today is how to classify this diversity to identify subtypes that are most relevant for treatment and outcome prediction for individual patients. The Cancer3D database at http://www.cancer3d.org gives an open and user-friendly way to analyze cancer missense mutations in the context of structures of proteins they are found in and in relation to patients' clinical data. This approach allows users to find novel candidate driver regions for specific subgroups, that often cannot be found when similar analyses are done on the whole gene level and for large, diverse cohorts. Interactive interface allows user to visualize the distribution of mutations in subgroups defined by cancer type and stage, gender and age brackets, patient's ethnicity or vice versa find dominant cancer type, gender or age groups for specific three-dimensional mutation patterns.


Subject(s)
Databases, Protein , Mutation, Missense , Neoplasms/genetics , Protein Conformation , Proteins/genetics , Humans , Protein Domains
13.
Proc Natl Acad Sci U S A ; 113(32): E4639-47, 2016 08 09.
Article in English | MEDLINE | ID: mdl-27385826

ABSTRACT

The "canonical" proteasomal degradation signal is a substrate-anchored polyubiquitin chain. However, a handful of proteins were shown to be targeted following monoubiquitination. In this study, we established-in both human and yeast cells-a systematic approach for the identification of monoubiquitination-dependent proteasomal substrates. The cellular wild-type polymerizable ubiquitin was replaced with ubiquitin that cannot form chains. Using proteomic analysis, we screened for substrates that are nevertheless degraded under these conditions compared with those that are stabilized, and therefore require polyubiquitination for their degradation. For randomly sampled representative substrates, we confirmed that their cellular stability is in agreement with our screening prediction. Importantly, the two groups display unique features: monoubiquitinated substrates are smaller than the polyubiquitinated ones, are enriched in specific pathways, and, in humans, are structurally less disordered. We suggest that monoubiquitination-dependent degradation is more widespread than assumed previously, and plays key roles in various cellular processes.


Subject(s)
Proteasome Endopeptidase Complex/physiology , Proteins/metabolism , Ubiquitination , Humans , MCF-7 Cells , Proteasome Endopeptidase Complex/chemistry , Proteomics
14.
Nucleic Acids Res ; 44(D1): D423-8, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26615193

ABSTRACT

The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software.


Subject(s)
Databases, Protein , Protein Conformation , Ligands , Models, Molecular
15.
Bioinformatics ; 32(4): 602-4, 2016 Feb 15.
Article in English | MEDLINE | ID: mdl-26515826

ABSTRACT

UNLABELLED: Protael is a JavaScript library for creating interactive visualizations of biological sequences and various associated data. It allows users to generate high-quality vector graphics (SVG) and integrate it into web pages. AVAILABILITY AND IMPLEMENTATION: Protael distribution, documentation and examples are freely available at http://protael.org; source code is hosted at https://github.com/sanshu/protaeljs.


Subject(s)
Computer Graphics , Internet , Proteins/chemistry , Software , Humans , Programming Languages
16.
Bioinformatics ; 32(18): 2776-82, 2016 09 15.
Article in English | MEDLINE | ID: mdl-27334472

ABSTRACT

MOTIVATION: Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those. RESULTS: We developed FAIT, a sequence-based algorithm for the precise assignment of individual repeats in repeat proteins and introduced a framework to classify and compare aperiodicity patterns for large protein families. FAIT extracts repeat positions by post-processing FFAS alignment matrices with image processing methods. On examples of proteins with Leucine Rich Repeat (LRR) domains and other solenoids like proteins, we show that the automated analysis with FAIT correctly identifies exact lengths of individual repeats based entirely on sequence information. AVAILABILITY AND IMPLEMENTATION: https://github.com/GodzikLab/FAIT CONTACT: adam@godziklab.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Proteins , Amino Acid Motifs , Repetitive Sequences, Amino Acid , Sequence Analysis, Protein
17.
Bioinformatics ; 31(13): 2098-105, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25701568

ABSTRACT

MOTIVATION: Most proteins consist of multiple domains, independent structural and evolutionary units that are often reshuffled in genomic rearrangements to form new protein architectures. Template-based modeling methods can often detect homologous templates for individual domains, but templates that could be used to model the entire query protein are often not available. RESULTS: We have developed a fast docking algorithm ab initio domain assembly (AIDA) for assembling multi-domain protein structures, guided by the ab initio folding potential. This approach can be extended to discontinuous domains (i.e. domains with 'inserted' domains). When tested on experimentally solved structures of multi-domain proteins, the relative domain positions were accurately found among top 5000 models in 86% of cases. AIDA server can use domain assignments provided by the user or predict them from the provided sequence. The latter approach is particularly useful for automated protein structure prediction servers. The blind test consisting of 95 CASP10 targets shows that domain boundaries could be successfully determined for 97% of targets. AVAILABILITY AND IMPLEMENTATION: The AIDA package as well as the benchmark sets used here are available for download at http://ffas.burnham.org/AIDA/. CONTACT: adam@sanfordburnham.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Models, Theoretical , Protein Conformation , Proteins/chemistry , Proteins/metabolism , Software , Algorithms , Humans , Internet , Protein Interaction Domains and Motifs , Protein Structure, Tertiary , Sequence Analysis, Protein
18.
Nucleic Acids Res ; 42(Web Server issue): W430-5, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24957597

ABSTRACT

PubServer, available at http://pubserver.burnham.org/, is a tool to automatically collect, filter and analyze publications associated with groups of homologous proteins. Protein entries in databases such as Entrez Protein database at NCBI contain information about publications associated with a given protein. The scope of these publications varies a lot: they include studies focused on biochemical functions of individual proteins, but also reports from genome sequencing projects that introduce tens of thousands of proteins. Collecting and analyzing publications related to sets of homologous proteins help in functional annotation of novel protein families and in improving annotations of well-studied protein families or individual genes. However, performing such collection and analysis manually is a tedious and time-consuming process. PubServer automatically collects identifiers of homologous proteins using PSI-Blast, retrieves literature references from corresponding database entries and filters out publications unlikely to contain useful information about individual proteins. It also prepares simple vocabulary statistics from titles, abstracts and MeSH terms to identify the most frequently occurring keywords, which may help to quickly identify common themes in these publications. The filtering criteria applied to collected publications are user-adjustable. The results of the server are presented as an interactive page that allows re-filtering and different presentations of the output.


Subject(s)
Data Mining/methods , Sequence Homology, Amino Acid , Software , Internet , Molecular Sequence Annotation , Protein Structure, Tertiary , Proteins/classification , Proteins/genetics , PubMed , Sequence Analysis, Protein
19.
Nucleic Acids Res ; 42(Web Server issue): W308-13, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24831546

ABSTRACT

AIDA: ab initio domain assembly server, available at http://ffas.burnham.org/AIDA/ is a tool that can identify domains in multi-domain proteins and then predict their 3D structures and relative spatial arrangements. The server is free and open to all users, and there is an option for a user to provide an e-mail to get the link to result page. Domains are evolutionary conserved and often functionally independent units in proteins. Most proteins, especially eukaryotic ones, consist of multiple domains while at the same time, most experimentally determined protein structures contain only one or two domains. As a result, often structures of individual domains in multi-domain proteins can be accurately predicted, but the mutual arrangement of different domains remains unknown. To address this issue we have developed AIDA program, which combines steps of identifying individual domains, predicting (separately) their structures and assembling them into multiple domain complexes using an ab initio folding potential to describe domain-domain interactions. AIDA server not only supports the assembly of a large number of continuous domains, but also allows the assembly of domains inserted into other domains. Users can also provide distance restraints to guide the AIDA energy minimization.


Subject(s)
Protein Structure, Tertiary , Software , Internet , Sequence Analysis, Protein
20.
Bioinformatics ; 30(5): 660-7, 2014 Mar 01.
Article in English | MEDLINE | ID: mdl-24130308

ABSTRACT

MOTIVATION: Homology detection enables grouping proteins into families and prediction of their structure and function. The range of application of homology-based predictions can be significantly extended by using sequence profiles and incorporation of local structural features. However, incorporation of the latter terms varies a lot between existing methods, and together with many examples of distant relations not recognized even by the best methods, suggests that further improvements are still possible. RESULTS: Here we describe recent improvements to the fold and function assignment system (FFAS) method, including adding optimized structural features (experimental or predicted), 'symmetrical' Z-score calculation and re-ranking the templates with a neural network. The alignment accuracy in the new FFAS-3D is now 11% higher than the original and comparable with the most accurate template-based structure prediction algorithms. At the same time, FFAS-3D has high success rate at the Structural Classification of Proteins (SCOP) family, superfamily and fold levels. Importantly, FFAS-3D results are not highly correlated with other programs suggesting that it may significantly improve meta-predictions. FFAS-3D does not require 3D structures of the templates, as using predicted features instead of structure-derived does not lead to the decrease of accuracy. Because of that, FFAS-3D can be used for databases other than Protein Data Bank (PDB) such as Protein families database or Clusters of orthologous groups thus extending its applications to functional annotations of genomes and protein families. AVAILABILITY AND IMPLEMENTATION: FFAS-3D is available at http://ffas.godziklab.org.


Subject(s)
Protein Conformation , Sequence Homology, Amino Acid , Algorithms , Databases, Protein , Neural Networks, Computer , Protein Folding , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL