Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
Add more filters











Publication year range
2.
Bioinformatics ; 39(39 Suppl 1): i357-i367, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387189

ABSTRACT

The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (〈ϕ,ψ,χ1,χ2,…〉) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (〈χ1,χ2,…〉) as a function of backbone 〈ϕ,ψ〉 conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (ϕψχal). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.


Subject(s)
Data Compression , Libraries , Amino Acids , Gene Library , Information Theory
3.
Curr Opin Struct Biol ; 79: 102539, 2023 04.
Article in English | MEDLINE | ID: mdl-36753924

ABSTRACT

Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.


Subject(s)
Algorithms , Proteins , Proteins/chemistry , Amino Acid Sequence , Sequence Alignment
4.
Proteins ; 90(12): 2144-2147, 2022 12.
Article in English | MEDLINE | ID: mdl-35754316

ABSTRACT

The basic operation in analysis of protein evolution is alignment: the specification of residue-residue correspondences. A structural alignment is a specification of residue-residue correspondences based on the atomic positions in the structures of two or more proteins. It is well-known that structural alignments are more accurate, over a much wider range of divergence, than pairwise alignments based solely on sequences-for instance computed with the Needleman-Wunsch algorithm with affine gap penalties. Given the amino-acid sequences of two proteins, alignments based solely on the sequences fall into "daylight", "twilight", and "midnight" zones, in which the fidelity of the correspondences diminishes in accuracy, and in strength of ability to distinguish true homology from noise. The success of AlphaFold2 in template-free modeling of three-dimensional structures from one-dimensional amino-acid sequence information implies that: given the amino-acid sequences of two or more proteins, in the absence of experimentally determined structures, reliable alignments-even for very highly diverged proteins-could in many cases be achieved by applying AlphaFold2 to the sequences, and performing structural alignments of the models.


Subject(s)
Algorithms , Proteins , Sequence Alignment , Amino Acid Sequence , Proteins/chemistry
5.
Bioinformatics ; 38(Suppl 1): i255-i263, 2022 06 24.
Article in English | MEDLINE | ID: mdl-35758808

ABSTRACT

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Amino Acids , Proteins , Algorithms , Amino Acid Sequence , Proteins/chemistry , Reproducibility of Results , Sequence Alignment , Sequence Homology, Amino Acid
6.
Methods Mol Biol ; 2449: 43-91, 2022.
Article in English | MEDLINE | ID: mdl-35507259

ABSTRACT

Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.


Subject(s)
Proteins , Software , Computational Biology , Databases, Protein , Protein Conformation , Proteins/chemistry
7.
J Mol Biol ; 433(21): 167181, 2021 10 15.
Article in English | MEDLINE | ID: mdl-34339724

ABSTRACT

We analyse paths through the regulatory networks that control gene-expression patterns in Yeast, in five different physiological states: cell cycle, DNA damage, stress response, diauxic shift, and sporulation. The network in each state is specified as a directed graph, containing different sets of edges connecting pairs selected from a combined set of 1475 nodes. Each network contains some nodes that have no parents, and others that have no children. We call these, respectively, 'source' and 'sink' nodes. For each network we enumerate paths between source and sink nodes. In a previous paper (Lesk and Konagurthu, 2020), we defined, extracted and compared the neighbourhoods of each transcription factor in different physiological states, and how the system reconfigures itself. Here we compare the usage of nodes and edges by different networks, and how they are assembled into paths. The picture that emerges is that the networks are not disjoint but show substantial sharing of nodes and edges; however, they assemble these materials into different sets of paths. Four of the networks, other than the cell-cycle network, contain paths between only a small fraction (<13%) of possible source-sink pairs. Although the cell-cycle network is not an outlier in terms of total number of nodes and edges, and number of sink nodes, it is very much an outlier in having a greater proportion of source-to-sink paths than the other networks.


Subject(s)
Cell Cycle/genetics , Gene Regulatory Networks , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Stress, Physiological/genetics , Transcription Factors/genetics , Computational Biology/methods , DNA Damage , Gene Expression Profiling , Gene Expression Regulation, Fungal , Gene Ontology , Molecular Sequence Annotation , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/classification , Saccharomyces cerevisiae Proteins/metabolism , Signal Transduction , Spores, Fungal/genetics , Spores, Fungal/growth & development , Spores, Fungal/metabolism , Transcription Factors/classification , Transcription Factors/metabolism
8.
J Biol Chem ; 296: 100421, 2021.
Article in English | MEDLINE | ID: mdl-33609524

ABSTRACT

Intracellular organelles do not, as thought for a long time, act in isolation but are dynamically tethered together by entire machines responsible for interorganelle trafficking and positioning. Among the proteins responsible for tethering is the family of VAMP-associated proteins (VAPs) that appear in all eukaryotes and are localized primarily in the endoplasmic reticulum. The major functional role of VAPs is to tether the endoplasmic reticulum with different organelles and regulate lipid metabolism and transport. VAPs have gained increasing attention because of their role in human pathology where they contribute to infections by viruses and bacteria and participate in neurodegeneration. In this review, we discuss the structure, evolution, and functions of VAPs, focusing more specifically on VAP-B for its relationship with amyotrophic lateral sclerosis and other neurodegenerative diseases.


Subject(s)
Communicable Diseases/metabolism , Neurodegenerative Diseases/metabolism , Vesicular Transport Proteins/metabolism , Animals , Humans , Lipid Metabolism , Mutation , Vesicular Transport Proteins/genetics
9.
Bioinformatics ; 37(4): 551-558, 2021 05 01.
Article in English | MEDLINE | ID: mdl-32976569

ABSTRACT

MOTIVATION: The gene expression regulatory network in yeast controls the selective implementation of the information contained in the genome sequence. We seek to understand how, in different physiological states, the network reconfigures itself to produce a different proteome. RESULTS: This article analyses this reconfiguration, focussing on changes in the local structure of the network. In particular, we define, extract and compare the 1-neighbourhoods of each transcription factor, where a 1-neighbourhood of a node in a network is the minimal subgraph of the network containing all nodes connected to the central node by an edge. We report the similarities and differences in the topologies and connectivities of these neighbourhoods in five physiological states for which data are available: cell cycle, DNA damage, stress response, diauxic shift and sporulation. Based on our analysis, it seems apt to regard the components of the regulatory network as 'software', and the responses to changes in state, 'reprogramming'.


Subject(s)
Gene Regulatory Networks , Saccharomyces cerevisiae , Cell Cycle , Saccharomyces cerevisiae/genetics , Software , Transcription Factors/genetics
10.
Proteins ; 88(12): 1557-1558, 2020 12.
Article in English | MEDLINE | ID: mdl-32662915

ABSTRACT

We have modeled modifications of a known ligand to the SARS-CoV-2 (COVID-19) protease, that can form a covalent adduct, plus additional ligand-protein hydrogen bonds.


Subject(s)
Antiviral Agents , Aphids , Coronavirus Infections , Insecticides , Pandemics , Pneumonia, Viral , Acetylcholinesterase , Animals , Betacoronavirus , COVID-19 , Cysteine Endopeptidases , Humans , Molecular Docking Simulation , Protease Inhibitors , SARS-CoV-2 , Viral Nonstructural Proteins
11.
Front Mol Biosci ; 7: 65, 2020.
Article in English | MEDLINE | ID: mdl-32373628
12.
Front Mol Biosci ; 7: 612920, 2020.
Article in English | MEDLINE | ID: mdl-33996891

ABSTRACT

What is the architectural "basis set" of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures-called concepts-typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence-structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.

13.
Methods Mol Biol ; 1958: 123-131, 2019.
Article in English | MEDLINE | ID: mdl-30945216

ABSTRACT

We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .


Subject(s)
Amino Acid Motifs , Computational Biology/methods , Proteins/chemistry , Algorithms , Bayes Theorem , Data Compression , Humans , Models, Molecular , Protein Folding
14.
Nat Struct Mol Biol ; 25(6): 538-545, 2018 06.
Article in English | MEDLINE | ID: mdl-29872229

ABSTRACT

Arrestins regulate the signaling of ligand-activated, phosphorylated G-protein-coupled receptors (GPCRs). Different patterns of receptor phosphorylation (phosphorylation barcode) can modulate arrestin conformations, resulting in distinct functional outcomes (for example, desensitization, internalization, and downstream signaling). However, the mechanism of arrestin activation and how distinct receptor phosphorylation patterns could induce different conformational changes on arrestin are not fully understood. We analyzed how each arrestin amino acid contributes to its different conformational states. We identified a conserved structural motif that restricts the mobility of the arrestin finger loop in the inactive state and appears to be regulated by receptor phosphorylation. Distal and proximal receptor phosphorylation sites appear to selectively engage with distinct arrestin structural motifs (that is, micro-locks) to induce different arrestin conformations. These observations suggest a model in which different phosphorylation patterns of the GPCR C terminus can combinatorially modulate the conformation of the finger loop and other phosphorylation-sensitive structural elements to drive distinct arrestin conformation and functional outcomes.


Subject(s)
Arrestin/chemistry , Arrestin/metabolism , Receptors, G-Protein-Coupled/metabolism , Humans , Phosphorylation , Protein Conformation , Signal Transduction
15.
Bioinformatics ; 33(7): 1005-1013, 2017 04 01.
Article in English | MEDLINE | ID: mdl-28065899

ABSTRACT

Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . Contact: arun.konagurthu@monash.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Data Compression , Models, Statistical , Proteins/chemistry , Sequence Alignment , Algorithms , Bayes Theorem , Reproducibility of Results , Software
16.
J Comput Biol ; 22(6): 487-97, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25695500

ABSTRACT

The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.


Subject(s)
Proteins/chemistry , Algorithms , Least-Squares Analysis , Models, Molecular , Protein Conformation , Sequence Alignment/methods
17.
Bioinformatics ; 30(17): i512-8, 2014 Sep 01.
Article in English | MEDLINE | ID: mdl-25161241

ABSTRACT

MOTIVATION: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. RESULTS: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. AVAILABILITY: http://lcb.infotech.monash.edu.au/I-value. SUPPLEMENTARY INFORMATION: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html.


Subject(s)
Structural Homology, Protein , Algorithms , Data Compression , Data Interpretation, Statistical
18.
Acta Crystallogr D Biol Crystallogr ; 70(Pt 3): 904-6, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24598758

ABSTRACT

Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each C(α) coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1 Å.


Subject(s)
Databases, Protein/standards , User-Computer Interface , Crystallography, X-Ray/standards , Dictionaries, Chemical as Topic , Magnetic Resonance Spectroscopy/standards , Microscopy, Electron/standards , Predictive Value of Tests , Random Allocation
19.
Proteins ; 82(3): 349-53, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24105818

ABSTRACT

Eph receptors comprise the largest known family of receptor tyrosine kinases in mammals. They bind members of a second family, the ephrins. As both Eph receptors and ephrins are membrane bound, interactions permit unusual bidirectional cell-cell signaling. Eph receptors and ephrins each form two classes, A and B, based on sequences, structures, and patterns of affinity: Class A Eph receptors bind class A ephrins, and class B Eph receptors bind class B ephrins. The only known exceptions are the receptor EphA4, which can bind ephrinB2 and ephrinB3 in addition to the ephrin-As (Bowden et al., Structure 2009;17:1386-1397); and EphB2, which can bind ephrin-A5 in addition to the ephrin-Bs (Himanen et al., Nat Neurosci 2004;7:501-509). A crystal structure is available of the interacting domains of the EphA4-ephrin B2 complex (wwPDB entry 2WO2) (Bowden et al., Structure 2009;17:1386-1397). In this complex, the ligand-binding domain of EphA4 adopts an EphB-like conformation. To understand why other cross-class EphA receptor-ephrinB complexes do not form, we modeled hypothetical complexes between (1) EphA4-ephrinB1, (2) EphA4-ephrinB3, and (3) EphA2-ephrinB2. We identify particular residues in the interface region, the size variations of which cause steric clashes that prevent formation of the unobserved complexes. The sizes of the sidechains of residues at these positions correlate with the pattern of binding affinity.


Subject(s)
Ephrins/chemistry , Ephrins/metabolism , Receptor, EphA4/chemistry , Receptor, EphA4/metabolism , Amino Acid Sequence , Humans , Models, Molecular , Molecular Sequence Data , Sequence Alignment , Surface Properties
20.
Methods Mol Biol ; 932: 51-9, 2013.
Article in English | MEDLINE | ID: mdl-22987346

ABSTRACT

We have developed a concise tableau representation of protein folding patterns, based on the order and contact patterns of elements of secondary structure: helices and strands of sheet. The tableaux provide a database, derived from the protein data bank, minable for studies on the general principles of protein architecture, including investigation of the relationship between local supersecondary structure of proteins and the complete folding topology. This chapter outlines the tableaux representation of protein folding patterns and methods to use them to identify structural and substructural similarities.


Subject(s)
Models, Molecular , Protein Folding , Proteins/chemistry , Software , Algorithms , Protein Structure, Secondary
SELECTION OF CITATIONS
SEARCH DETAIL