Search | VHL Search Portal

1.

ColocQuiaL: a QTL-GWAS colocalization pipeline.

Chen, Brian Y; Bone, William P; Lorenz, Kim; Levin, Michael; Ritchie, Marylyn D; Voight, Benjamin F.

Bioinformatics ; 38(18): 4409-4411, 2022 09 15.

Article in English | MEDLINE | ID: mdl-35894642

ABSTRACT

SUMMARY: Identifying genomic features responsible for genome-wide association study (GWAS) signals has proven to be a difficult challenge; many researchers have turned to colocalization analysis of GWAS signals with expression quantitative trait loci (eQTL) and splicing quantitative trait loci (sQTL) to connect GWAS signals to candidate causal genes. The ColocQuiaL pipeline provides a framework to perform these colocalization analyses at scale across the genome and returns summary files and locus visualization plots to allow for detailed review of the results. As an example, we used ColocQuiaL to perform colocalization between a recent type 2 diabetes GWAS and Genotype-Tissue Expression (GTEx) v8 single-tissue eQTL and sQTL data. AVAILABILITY AND IMPLEMENTATION: ColocQuiaL is primarily written in R and is freely available on GitHub: https://github.com/bvoightlab/ColocQuiaL.

Subject(s)

Diabetes Mellitus, Type 2 , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Quantitative Trait Loci , Diabetes Mellitus, Type 2/genetics , Genomics , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease

2.

Analysis of Protein-Protein Interactions for Intermolecular Bond Prediction.

Tam, Justin Z; Palumbo, Talulla; Miwa, Julie M; Chen, Brian Y.

Molecules ; 27(19)2022 Sep 21.

Article in English | MEDLINE | ID: mdl-36234723

ABSTRACT

Protein-protein interactions often involve a complex system of intermolecular interactions between residues and atoms at the binding site. A comprehensive exploration of these interactions can help reveal key residues involved in protein-protein recognition that are not obvious using other protein analysis techniques. This paper presents and extends DiffBond, a novel method for identifying and classifying intermolecular bonds while applying standard definitions of bonds in chemical literature to explain protein interactions. DiffBond predicted intermolecular bonds from four protein complexes: Barnase-Barstar, Rap1a-raf, SMAD2-SMAD4, and a subset of complexes formed from three-finger toxins and nAChRs. Based on validation through manual literature search and through comparison of two protein complexes from the SKEMPI dataset, DiffBond was able to identify intermolecular ionic bonds and hydrogen bonds with high precision and recall, and identify salt bridges with high precision. DiffBond predictions on bond existence were also strongly correlated with observations of Gibbs free energy change and electrostatic complementarity in mutational experiments. DiffBond can be a powerful tool for predicting and characterizing influential residues in protein-protein interactions, and its predictions can support research in mutational experiments and drug design.

Subject(s)

Hydrogen Bonding , Binding Sites , Biophysical Phenomena , Static Electricity

3.

Exploring Protein Cavities through Rigidity Analysis.

Mason, Stephanie; Chen, Brian Y; Jagodzinski, Filip.

Molecules ; 23(2)2018 Feb 07.

Article in English | MEDLINE | ID: mdl-29414909

ABSTRACT

The geometry of cavities in the surfaces of proteins facilitates a variety of biochemical functions. To better understand the biochemical nature of protein cavities, the shape, size, chemical properties, and evolutionary nature of functional and nonfunctional surface cavities have been exhaustively surveyed in protein structures. The rigidity of surface cavities, however, is not immediately available as a characteristic of structure data, and is thus more difficult to examine. Using rigidity analysis for assessing and analyzing molecular rigidity, this paper performs the first survey of the relationships between cavity properties, such as size and residue content, and how they correspond to cavity rigidity. Our survey measured a variety of rigidity metrics on 120,323 cavities from 12,785 sequentially non-redundant protein chains. We used VASP-E, a volume-based algorithm for analyzing cavity geometry. Our results suggest that rigidity properties of protein cavities are dependent on cavity surface area.

Subject(s)

Models, Theoretical , Proteins/chemistry , Algorithms

4.

VASP-E: specificity annotation with a volumetric analysis of electrostatic isopotentials.

Chen, Brian Y.

PLoS Comput Biol ; 10(8): e1003792, 2014 Aug.

Article in English | MEDLINE | ID: mdl-25166865

ABSTRACT

Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins.

Subject(s)

Computational Biology/methods , Molecular Sequence Annotation/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Algorithms , Amino Acids/chemistry , Animals , Cattle , Cluster Analysis , Humans , Protein Binding , Software , Static Electricity , Surface Properties

5.

Multi-Modal Diagnosis of Alzheimer's Disease using Interpretable Graph Convolutional Networks.

Zhou, Houliang; He, Lifang; Chen, Brian Y; Shen, Li; Zhang, Yu.

IEEE Trans Med Imaging ; PP2024 Jul 23.

Article in English | MEDLINE | ID: mdl-39042528

ABSTRACT

The interconnection between brain regions in neurological disease encodes vital information for the advancement of biomarkers and diagnostics. Although graph convolutional networks are widely applied for discovering brain connection patterns that point to disease conditions, the potential of connection patterns that arise from multiple imaging modalities has yet to be fully realized. In this paper, we propose a multi-modal sparse interpretable GCN framework (SGCN) for the detection of Alzheimer's disease (AD) and its prodromal stage, known as mild cognitive impairment (MCI). In our experimentation, SGCN learned the sparse regional importance probability to find signature regions of interest (ROIs), and the connective importance probability to reveal disease-specific brain network connections. We evaluated SGCN on the Alzheimer's Disease Neuroimaging Initiative database with multi-modal brain images and demonstrated that the ROI features learned by SGCN were effective for enhancing AD status identification. The identified abnormalities were significantly correlated with AD-related clinical symptoms. We further interpreted the identified brain dysfunctions at the level of large-scale neural systems and sex-related connectivity abnormalities in AD/MCI. The salient ROIs and the prominent brain connectivity abnormalities interpreted by SGCN are considerably important for developing novel biomarkers. These findings contribute to a better understanding of the network-based disorder via multi-modal diagnosis and offer the potential for precision diagnostics. The source code is available at https://github.com/Houliang-Zhou/SGCN.

6.

An aggregate analysis of many predicted structures to reduce errors in protein structure comparison caused by conformational flexibility.

Godshall, Brian G; Tang, Yisheng; Yang, Wenjie; Chen, Brian Y.

BMC Struct Biol ; 13 Suppl 1: S10, 2013.

Article in English | MEDLINE | ID: mdl-24564934

ABSTRACT

BACKGROUND: Conformational flexibility creates errors in the comparison of protein structures. Even small changes in backbone or sidechain conformation can radically alter the shape of ligand binding cavities. These changes can cause structure comparison programs to overlook functionally related proteins with remote evolutionary similarities, and cause others to incorrectly conclude that closely related proteins have different binding preferences, when their specificities are actually similar. Towards the latter effort, this paper applies protein structure prediction algorithms to enhance the classification of homologous proteins according to their binding preferences, despite radical conformational differences. METHODS: Specifically, structure prediction algorithms can be used to "remodel" existing structures against the same template. This process can return proteins in very different conformations to similar, objectively comparable states. Operating on close homologs exploits the accuracy of structure predictions on closely related proteins, but structure prediction is often a nondeterministic process. Identical inputs can generate subtly different models with very different binding cavities that make structure comparison difficult. We present a first method to mitigate such errors, called "medial remodeling", that examines a large number of predicted structures to eliminate extreme models of the same binding cavity. RESULTS: Our results, on the enolase and tyrosine kinase superfamilies, demonstrate that remodeling can enable proteins in very different conformations to be returned to states that can be objectively compared. Structures that would have been erroneously classified as having different binding preferences were often correctly classified after remodeling, while structures that would have been correctly classified as having different binding preferences almost always remained distinct. The enolase superfamily, which exhibited less sequential diversity than the tyrosine kinase superfamily, was classified more accurately after remodeling than the tyrosine kinases. Medial remodeling reduced errors from models with unusual perturbations that distort the shape of the binding site, enhancing classification accuracy. CONCLUSIONS: This paper demonstrates that protein structure prediction can compensate for conformational variety in the comparison of protein-ligand binding sites. While protein structure prediction introduces new uncertainties into the structure comparison problem, our results indicate that unusual models can be ignored through an analysis of many models, using techniques like medial remodeling. These results point to applications of protein structure comparison that extend beyond existing crystal structures.

Subject(s)

Phosphopyruvate Hydratase/chemistry , Protein-Tyrosine Kinases/chemistry , Algorithms , Binding Sites , Models, Molecular , Phosphopyruvate Hydratase/metabolism , Protein Binding , Protein Conformation , Protein-Tyrosine Kinases/metabolism , Structural Homology, Protein

7.

MarkUs: a server to navigate sequence-structure-function space.

Fischer, Markus; Zhang, Qiangfeng Cliff; Dey, Fabian; Chen, Brian Y; Honig, Barry; Petrey, Donald.

Nucleic Acids Res ; 39(Web Server issue): W357-61, 2011 Jul.

Article in English | MEDLINE | ID: mdl-21672961

ABSTRACT

We describe MarkUs, a web server for analysis and comparison of the structural and functional properties of proteins. In contrast to a 'structure in/function out' approach to protein function annotation, the server is designed to be highly interactive and to allow flexibility in the examination of possible functions, suggested either automatically by various similarity measures or specified by a user directly. This is combined with tools that allow a user to assess independently whether or not a suggested function is consistent with the bioinformatic and biophysical properties of a given query structure, further allowing the user to generate testable hypotheses. The server is available at http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:Mark-Us.

Subject(s)

Proteins/chemistry , Software , Bacterial Proteins/chemistry , Internet , Protein Conformation , Proteins/physiology , Structure-Activity Relationship

8.

Modeling regionalized volumetric differences in protein-ligand binding cavities.

Chen, Brian Y; Bandyopadhyay, Soutir.

Proteome Sci ; 10 Suppl 1: S6, 2012 Jun 21.

Article in English | MEDLINE | ID: mdl-22759583

ABSTRACT

Identifying elements of protein structures that create differences in protein-ligand binding specificity is an essential method for explaining the molecular mechanisms underlying preferential binding. In some cases, influential mechanisms can be visually identified by experts in structural biology, but subtler mechanisms, whose significance may only be apparent from the analysis of many structures, are harder to find. To assist this process, we present a geometric algorithm and two statistical models for identifying significant structural differences in protein-ligand binding cavities. We demonstrate these methods in an analysis of sequentially nonredundant structural representatives of the canonical serine proteases and the enolase superfamily. Here, we observed that statistically significant structural variations identified experimentally established determinants of specificity. We also observed that an analysis of individual regions inside cavities can reveal areas where small differences in shape can correspond to differences in specificity.

9.

Sparse Interpretation of Graph Convolutional Networks for Multi-Modal Diagnosis of Alzheimer's Disease.

Zhou, Houliang; Zhang, Yu; Chen, Brian Y; Shen, Li; He, Lifang.

Med Image Comput Comput Assist Interv ; 13438: 469-478, 2022 Sep.

Article in English | MEDLINE | ID: mdl-36827208

ABSTRACT

The interconnected quality of brain regions in neurological disease has immense importance for the development of biomarkers and diagnostics. While Graph Convolutional Network (GCN) methods are fundamentally compatible with discovering the connected role of brain regions in disease, current methods apply limited consideration for node features and their connectivity in brain network analysis. In this paper, we propose a sparse interpretable GCN framework (SGCN) for the identification and classification of Alzheimer's disease (AD) using brain imaging data with multiple modalities. SGCN applies an attention mechanism with sparsity to identify the most discriminative subgraph structure and important node features for the detection of AD. The model learns the sparse importance probabilities for each node feature and edge with entropy, â 1, and mutual information regularization. We then utilized this information to find signature regions of interest (ROIs), and emphasize the disease-specific brain network connections by detecting the significant difference of connectives between regions in healthy control (HC), and AD groups. We evaluated SGCN on the ADNI database with imaging data from three modalities, including VBM-MRI, FDG-PET, and AV45-PET, and observed that the important probabilities it learned are effective for disease status identification and the sparse interpretability of disease-specific ROI features and connections. The salient ROIs detected and the most discriminative network connections interpreted by our method show a high correspondence with previous neuroimaging evidence associated with AD.

10.

DeepVASP-E: A Flexible Analysis of Electrostatic Isopotentials for Finding and Explaining Mechanisms that Control Binding Specificity.

Quintana, Felix M; Kong, Zhaoming; He, Lifang; Chen, Brian Y.

Pac Symp Biocomput ; 27: 56-67, 2022.

Article in English | MEDLINE | ID: mdl-34890136

ABSTRACT

Amino acids that play a role in binding specificity can be identified with many methods, but few techniques identify the biochemical mechanisms by which they act. To address a part of this problem, we present DeepVASP-E, an algorithm that can suggest electrostatic mechanisms that influence specificity. DeepVASP-E uses convolutional neural networks to classify an electrostatic representation of ligand binding sites into specificity categories. It also uses class activation mapping to identify regions of electrostatic potential that are salient for classification. We hypothesize that electrostatic regions that are salient for classification are also likely to play a biochemical role in achieving specificity. Our findings, on two families of proteins with electrostatic influences on specificity, suggest that large salient regions can identify amino acids that have an electrostatic role in binding, and that DeepVASP-E is an effective classifier of ligand binding sites.

Subject(s)

Computational Biology , Proteins , Binding Sites , Humans , Neural Networks, Computer , Protein Binding , Static Electricity

11.

HBcompare: Classifying Ligand Binding Preferences with Hydrogen Bond Topology.

Tam, Justin Z; Kong, Zhaoming; Ahmed, Omar; He, Lifang; Chen, Brian Y.

Biomolecules ; 12(11)2022 10 28.

Article in English | MEDLINE | ID: mdl-36358939

ABSTRACT

This paper presents HBcompare, a method that classifies protein structures according to ligand binding preference categories by analyzing hydrogen bond topology. HBcompare excludes other characteristics of protein structure so that, in the event of accurate classification, it can implicate the involvement of hydrogen bonds in selective binding. This approach contrasts from methods that represent many aspects of protein structure because holistic representations cannot associate classification with just one characteristic. To our knowledge, HBcompare is the first technique with this capability. On five datasets of proteins that catalyze similar reactions with different preferred ligands, HBcompare correctly categorized proteins with similar ligand binding preferences 89.5% of the time. Using only hydrogen bond topology, classification accuracy with HBcompare surpassed standard structure-based comparison algorithms that use atomic coordinates. As a tool for implicating the role of hydrogen bonds in protein function categories, HBcompare represents a first step towards the automatic explanation of biochemical mechanisms.

Subject(s)

Algorithms , Proteins , Hydrogen Bonding , Ligands , Models, Molecular , Proteins/chemistry , Protein Binding , Binding Sites

12.

VASP: a volumetric analysis of surface properties yields insights into protein-ligand binding specificity.

Chen, Brian Y; Honig, Barry.

PLoS Comput Biol ; 6(8)2010 Aug 12.

Article in English | MEDLINE | ID: mdl-20814581

ABSTRACT

Many algorithms that compare protein structures can reveal similarities that suggest related biological functions, even at great evolutionary distances. Proteins with related function often exhibit differences in binding specificity, but few algorithms identify structural variations that effect specificity. To address this problem, we describe the Volumetric Analysis of Surface Properties (VASP), a novel volumetric analysis tool for the comparison of binding sites in aligned protein structures. VASP uses solid volumes to represent protein shape and the shape of surface cavities, clefts and tunnels that are defined with other methods. Our approach, inspired by techniques from constructive solid geometry, enables the isolation of volumetrically conserved and variable regions within three dimensionally superposed volumes. We applied VASP to compute a comparative volumetric analysis of the ligand binding sites formed by members of the steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains and the serine proteases. Within both families, VASP isolated individual amino acids that create structural differences between ligand binding cavities that are known to influence differences in binding specificity. Also, VASP isolated cavity subregions that differ between ligand binding cavities which are essential for differences in binding specificity. As such, VASP should prove a valuable tool in the study of protein-ligand binding specificity.

Subject(s)

Algorithms , Phosphoproteins/chemistry , Protein Interaction Domains and Motifs , Serine Proteases/chemistry , Amino Acid Motifs , Amino Acids/chemistry , Binding Sites , Humans , Ligands , Models, Molecular , Protein Binding , Substrate Specificity , Surface Properties

13.

DiffBond: A Method for Predicting Intermolecular Bond Formation.

Tam, Justin; Palumbo, Talulla; Miwa, Julie M; Chen, Brian Y.

Proceedings (IEEE Int Conf Bioinformatics Biomed) ; 2021: 2574-2586, 2021 Dec.

Article in English | MEDLINE | ID: mdl-35378834

ABSTRACT

Many tools that explore models of protein complexes are also able to analyze interactions between specific residues and atoms. A comprehensive exploration of these interactions can often uncover aspects of protein-protein recognition that are not obvious using other protein analysis techniques. This paper describes DiffBond, a novel method for searching for intermolecular interactions between protein complexes while differentiating between three different types of interaction: hydrogen bonds, ionic bonds, and salt bridges. DiffBond incorporates textbook definitions of these three interactions while contending with uncertainties that are inherent in computational models of interacting proteins. We used it to examine the barnase-barstar, Rap1a-raf, and Smad2-Smad4 complexes, as well as a subset of protein complexes formed between three-finger toxins and nAChRs. Based on electrostatic interactions established by previous experimental studies, DiffBond was able to identify ionic and hydrogen bonds with high precision and recall, and identify salt bridges with high precision. In combination with other electrostatic analysis methods, DiffBond can be a useful tool in helping predict influential amino acids in protein-protein interactions and characterizing the type of interaction.

14.

Multi-Trait Genome-Wide Association Study of Atherosclerosis Detects Novel Pleiotropic Loci.

Bellomo, Tiffany R; Bone, William P; Chen, Brian Y; Gawronski, Katerina A B; Zhang, David; Park, Joseph; Levin, Michael; Tsao, Noah; Klarin, Derek; Lynch, Julie; Assimes, Themistocles L; Gaziano, J Michael; Wilson, Peter W; Cho, Kelly; Vujkovic, Marijana; O'Donnell, Christopher J; Chang, Kyong-Mi; Tsao, Philip S; Rader, Daniel J; Ritchie, Marylyn D; Damrauer, Scott M; Voight, Benjamin F.

Front Genet ; 12: 787545, 2021.

Article in English | MEDLINE | ID: mdl-35186008

ABSTRACT

Although affecting different arterial territories, the related atherosclerotic vascular diseases coronary artery disease (CAD) and peripheral artery disease (PAD) share similar risk factors and have shared pathobiology. To identify novel pleiotropic loci associated with atherosclerosis, we performed a joint analysis of their shared genetic architecture, along with that of common risk factors. Using summary statistics from genome-wide association studies of nine known atherosclerotic (CAD, PAD) and atherosclerosis risk factors (body mass index, smoking initiation, type 2 diabetes, low density lipoprotein, high density lipoprotein, total cholesterol, and triglycerides), we perform 15 separate multi-trait genetic association scans which resulted in 25 novel pleiotropic loci not yet reported as genome-wide significant for their respective traits. Colocalization with single-tissue eQTLs identified candidate causal genes at 14 of the detected signals. Notably, the signal between PAD and LDL-C at the PCSK6 locus affects PCSK6 splicing in human liver tissue and induced pluripotent derived hepatocyte-like cells. These results show that joint analysis of related atherosclerotic disease traits and their risk factors allowed identification of unified biology that may offer the opportunity for therapeutic manipulation. The signal at PCSK6 represent possible shared causal biology where existing inhibitors may be able to be leveraged for novel therapies.

15.

Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction.

Bryant, Drew H; Moll, Mark; Chen, Brian Y; Fofanov, Viacheslav Y; Kavraki, Lydia E.

BMC Bioinformatics ; 11: 242, 2010 May 11.

Article in English | MEDLINE | ID: mdl-20459833

ABSTRACT

BACKGROUND: Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. RESULTS: This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine Substructural Clusters (SCs). SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. CONCLUSIONS: FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated, statistically rigorous procedure for incorporating structural variation data into protein function prediction pipelines. Our work provides an unbiased, automated assessment of the structural variability of identified binding site substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. As available proteomic data continues to expand, the techniques proposed will be indispensable for the large-scale analysis and interpretation of structural data.

Subject(s)

Enzymes/chemistry , Proteins/chemistry , Proteomics/methods , Amino Acid Motifs , Binding Sites , Databases, Protein , Enzymes/metabolism , Models, Molecular , Protein Conformation , Protein Folding , Proteins/metabolism , Sequence Analysis, Protein

16.

Mapping of ligand-binding cavities in proteins.

Andersson, C David; Chen, Brian Y; Linusson, Anna.

Proteins ; 78(6): 1408-22, 2010 May 01.

Article in English | MEDLINE | ID: mdl-20034113

ABSTRACT

The complex interactions between proteins and small organic molecules (ligands) are intensively studied because they play key roles in biological processes and drug activities. Here, we present a novel approach to characterize and map the ligand-binding cavities of proteins without direct geometric comparison of structures, based on Principal Component Analysis of cavity properties (related mainly to size, polarity, and charge). This approach can provide valuable information on the similarities and dissimilarities, of binding cavities due to mutations, between-species differences and flexibility upon ligand-binding. The presented results show that information on ligand-binding cavity variations can complement information on protein similarity obtained from sequence comparisons. The predictive aspect of the method is exemplified by successful predictions of serine proteases that were not included in the model construction. The presented strategy to compare ligand-binding cavities of related and unrelated proteins has many potential applications within protein and medicinal chemistry, for example in the characterization and mapping of "orphan structures", selection of protein structures for docking studies in structure-based design, and identification of proteins for selectivity screens in drug design programs.

Subject(s)

Computational Biology/methods , Proteins/chemistry , Proteins/metabolism , Binding Sites , Cluster Analysis , Databases, Protein , Humans , Ligands , Peptide Hydrolases/chemistry , Principal Component Analysis , Protein Structure, Secondary , Sequence Homology, Amino Acid

17.

Precise parallel volumetric comparison of molecular surfaces and electrostatic isopotentials.

Georgiev, Georgi D; Dodd, Kevin F; Chen, Brian Y.

Algorithms Mol Biol ; 15: 11, 2020.

Article in English | MEDLINE | ID: mdl-32489400

ABSTRACT

Geometric comparisons of binding sites and their electrostatic properties can identify subtle variations that select different binding partners and subtle similarities that accommodate similar partners. Because subtle features are central for explaining how proteins achieve specificity, algorithmic efficiency and geometric precision are central to algorithmic design. To address these concerns, this paper presents pClay, the first algorithm to perform parallel and arbitrarily precise comparisons of molecular surfaces and electrostatic isopotentials as geometric solids. pClay was presented at the 2019 Workshop on Algorithms in Bioinformatics (WABI 2019) and is described in expanded detail here, especially with regard to the comparison of electrostatic isopotentials. Earlier methods have generally used parallelism to enhance computational throughput, pClay is the first algorithm to use parallelism to make arbitrarily high precision comparisons practical. It is also the first method to demonstrate that high precision comparisons of geometric solids can yield more precise structural inferences than algorithms that use existing standards of precision. One advantage of added precision is that statistical models can be trained with more accurate data. Using structural data from an existing method, a model of steric variations between binding cavities can overlook 53% of authentic steric influences on specificity, whereas a model trained with data from pClay overlooks none. Our results also demonstrate the parallel performance of pClay on both workstation CPUs and a 61-core Xeon Phi. While slower on one core, additional processor cores rapidly outpaced single core performance and existing methods. Based on these results, it is clear that pClay has applications in the automatic explanation of binding mechanisms and in the rational design of protein binding preferences.

18.

Prediction of enzyme function based on 3D templates of evolutionarily important amino acids.

Kristensen, David M; Ward, R Matthew; Lisewski, Andreas Martin; Erdin, Serkan; Chen, Brian Y; Fofanov, Viacheslav Y; Kimmel, Marek; Kavraki, Lydia E; Lichtarge, Olivier.

BMC Bioinformatics ; 9: 17, 2008 Jan 11.

Article in English | MEDLINE | ID: mdl-18190718

ABSTRACT

BACKGROUND: Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. RESULTS: Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. CONCLUSION: These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.

Subject(s)

Amino Acid Motifs/genetics , Enzymes , Evolution, Molecular , Artificial Intelligence , Databases, Protein , Enzymes/chemistry , Enzymes/genetics , Enzymes/metabolism , Likelihood Functions , Models, Biological , Pattern Recognition, Automated , Protein Conformation , Proteome , Sequence Alignment , Sequence Homology, Amino Acid , Structural Homology, Protein , Structure-Activity Relationship

19.

The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs.

Chen, Brian Y; Fofanov, Viacheslav Y; Bryant, Drew H; Dodson, Bradley D; Kristensen, David M; Lisewski, Andreas M; Kimmel, Marek; Lichtarge, Olivier; Kavraki, Lydia E.

J Comput Biol ; 14(6): 791-816, 2007.

Article in English | MEDLINE | ID: mdl-17691895

ABSTRACT

The development of new and effective drugs is strongly affected by the need to identify drug targets and to reduce side effects. Resolving these issues depends partially on a thorough understanding of the biological function of proteins. Unfortunately, the experimental determination of protein function is expensive and time consuming. To support and accelerate the determination of protein functions, algorithms for function prediction are designed to gather evidence indicating functional similarity with well studied proteins. One such approach is the MASH pipeline, described in the first half of this paper. MASH identifies matches of geometric and chemical similarity between motifs, representing known functional sites, and substructures of functionally uncharacterized proteins (targets). Observations from several research groups concur that statistically significant matches can indicate functionally related active sites. One major subproblem is the design of effective motifs, which have many matches to functionally related targets (sensitive motifs), and few matches to functionally unrelated targets (specific motifs). Current techniques select and combine structural, physical, and evolutionary properties to generate motifs that mirror functional characteristics in active sites. This approach ignores incidental similarities that may occur with functionally unrelated proteins. To address this problem, we have developed Geometric Sieving (GS), a parallel distributed algorithm that efficiently refines motifs, designed by existing methods, into optimized motifs with maximal geometric and chemical dissimilarity from all known protein structures. In exhaustive comparison of all possible motifs based on the active sites of 10 well-studied proteins, we observed that optimized motifs were among the most sensitive and specific.

Subject(s)

Algorithms , Computational Biology , Proteins/chemistry , Proteins/metabolism , Amino Acid Motifs , Binding Sites , Databases, Protein , Models, Molecular , Protein Structure, Tertiary , Software

20.

Cavity scaling: automated refinement of cavity-aware motifs in protein function prediction.

Chen, Brian Y; Bryant, Drew H; Fofanov, Viacheslav Y; Kristensen, David M; Cruess, Amanda E; Kimmel, Marek; Lichtarge, Olivier; Kavraki, Lydia E.

J Bioinform Comput Biol ; 5(2a): 353-82, 2007 Apr.

Article in English | MEDLINE | ID: mdl-17589966

ABSTRACT

Algorithms for geometric and chemical comparison of protein substructure can be useful for many applications in protein function prediction. These motif matching algorithms identify matches of geometric and chemical similarity between well-studied functional sites, motifs, and substructures of functionally uncharacterized proteins, targets. For the purpose of function prediction, the accuracy of motif matching algorithms can be evaluated with the number of statistically significant matches to functionally related proteins, true positives (TPs), and the number of statistically insignificant matches to functionally unrelated proteins, false positives (FPs). Our earlier work developed cavity-aware motifs which use motif points to represent functionally significant atoms and C-spheres to represent functionally significant volumes. We observed that cavity-aware motifs match significantly fewer FPs than matches containing only motif points. We also observed that high-impact C-spheres, which significantly contribute to the reduction of FPs, can be isolated automatically with a technique we call Cavity Scaling. This paper extends our earlier work by demonstrating that C-spheres can be used to accelerate point-based geometric and chemical comparison algorithms, maintaining accuracy while reducing runtime. We also demonstrate that the placement of C-spheres can significantly affect the number of TPs and FPs identified by a cavity-aware motif. While the optimal placement of C-spheres remains a difficult open problem, we compared two logical placement strategies to better understand C-sphere placement.

Subject(s)

Algorithms , Amino Acid Motifs , Models, Chemical , Pattern Recognition, Automated/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Molecular Sequence Data , Structure-Activity Relationship

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL