Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Bioinformatics ; 39(39 Suppl 1): i357-i367, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387189

RESUMEN

The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (〈ϕ,ψ,χ1,χ2,…〉) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (〈χ1,χ2,…〉) as a function of backbone 〈ϕ,ψ〉 conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (ϕψχal). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.


Asunto(s)
Compresión de Datos , Bibliotecas , Aminoácidos , Biblioteca de Genes , Teoría de la Información
3.
Curr Opin Struct Biol ; 79: 102539, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36753924

RESUMEN

Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.


Asunto(s)
Algoritmos , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Alineación de Secuencia
4.
Bioinformatics ; 38(Suppl 1): i255-i263, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758808

RESUMEN

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aminoácidos , Proteínas , Algoritmos , Secuencia de Aminoácidos , Proteínas/química , Reproducibilidad de los Resultados , Alineación de Secuencia , Homología de Secuencia de Aminoácido
5.
Proteins ; 90(12): 2144-2147, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35754316

RESUMEN

The basic operation in analysis of protein evolution is alignment: the specification of residue-residue correspondences. A structural alignment is a specification of residue-residue correspondences based on the atomic positions in the structures of two or more proteins. It is well-known that structural alignments are more accurate, over a much wider range of divergence, than pairwise alignments based solely on sequences-for instance computed with the Needleman-Wunsch algorithm with affine gap penalties. Given the amino-acid sequences of two proteins, alignments based solely on the sequences fall into "daylight", "twilight", and "midnight" zones, in which the fidelity of the correspondences diminishes in accuracy, and in strength of ability to distinguish true homology from noise. The success of AlphaFold2 in template-free modeling of three-dimensional structures from one-dimensional amino-acid sequence information implies that: given the amino-acid sequences of two or more proteins, in the absence of experimentally determined structures, reliable alignments-even for very highly diverged proteins-could in many cases be achieved by applying AlphaFold2 to the sequences, and performing structural alignments of the models.


Asunto(s)
Algoritmos , Proteínas , Alineación de Secuencia , Secuencia de Aminoácidos , Proteínas/química
6.
Methods Mol Biol ; 2449: 43-91, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35507259

RESUMEN

Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.


Asunto(s)
Proteínas , Programas Informáticos , Biología Computacional , Bases de Datos de Proteínas , Conformación Proteica , Proteínas/química
7.
J Mol Biol ; 433(21): 167181, 2021 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-34339724

RESUMEN

We analyse paths through the regulatory networks that control gene-expression patterns in Yeast, in five different physiological states: cell cycle, DNA damage, stress response, diauxic shift, and sporulation. The network in each state is specified as a directed graph, containing different sets of edges connecting pairs selected from a combined set of 1475 nodes. Each network contains some nodes that have no parents, and others that have no children. We call these, respectively, 'source' and 'sink' nodes. For each network we enumerate paths between source and sink nodes. In a previous paper (Lesk and Konagurthu, 2020), we defined, extracted and compared the neighbourhoods of each transcription factor in different physiological states, and how the system reconfigures itself. Here we compare the usage of nodes and edges by different networks, and how they are assembled into paths. The picture that emerges is that the networks are not disjoint but show substantial sharing of nodes and edges; however, they assemble these materials into different sets of paths. Four of the networks, other than the cell-cycle network, contain paths between only a small fraction (<13%) of possible source-sink pairs. Although the cell-cycle network is not an outlier in terms of total number of nodes and edges, and number of sink nodes, it is very much an outlier in having a greater proportion of source-to-sink paths than the other networks.


Asunto(s)
Ciclo Celular/genética , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Estrés Fisiológico/genética , Factores de Transcripción/genética , Biología Computacional/métodos , Daño del ADN , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Ontología de Genes , Anotación de Secuencia Molecular , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/clasificación , Proteínas de Saccharomyces cerevisiae/metabolismo , Transducción de Señal , Esporas Fúngicas/genética , Esporas Fúngicas/crecimiento & desarrollo , Esporas Fúngicas/metabolismo , Factores de Transcripción/clasificación , Factores de Transcripción/metabolismo
8.
J Biol Chem ; 296: 100421, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33609524

RESUMEN

Intracellular organelles do not, as thought for a long time, act in isolation but are dynamically tethered together by entire machines responsible for interorganelle trafficking and positioning. Among the proteins responsible for tethering is the family of VAMP-associated proteins (VAPs) that appear in all eukaryotes and are localized primarily in the endoplasmic reticulum. The major functional role of VAPs is to tether the endoplasmic reticulum with different organelles and regulate lipid metabolism and transport. VAPs have gained increasing attention because of their role in human pathology where they contribute to infections by viruses and bacteria and participate in neurodegeneration. In this review, we discuss the structure, evolution, and functions of VAPs, focusing more specifically on VAP-B for its relationship with amyotrophic lateral sclerosis and other neurodegenerative diseases.


Asunto(s)
Enfermedades Transmisibles/metabolismo , Enfermedades Neurodegenerativas/metabolismo , Proteínas de Transporte Vesicular/metabolismo , Animales , Humanos , Metabolismo de los Lípidos , Mutación , Proteínas de Transporte Vesicular/genética
9.
Bioinformatics ; 37(4): 551-558, 2021 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32976569

RESUMEN

MOTIVATION: The gene expression regulatory network in yeast controls the selective implementation of the information contained in the genome sequence. We seek to understand how, in different physiological states, the network reconfigures itself to produce a different proteome. RESULTS: This article analyses this reconfiguration, focussing on changes in the local structure of the network. In particular, we define, extract and compare the 1-neighbourhoods of each transcription factor, where a 1-neighbourhood of a node in a network is the minimal subgraph of the network containing all nodes connected to the central node by an edge. We report the similarities and differences in the topologies and connectivities of these neighbourhoods in five physiological states for which data are available: cell cycle, DNA damage, stress response, diauxic shift and sporulation. Based on our analysis, it seems apt to regard the components of the regulatory network as 'software', and the responses to changes in state, 'reprogramming'.


Asunto(s)
Redes Reguladoras de Genes , Saccharomyces cerevisiae , Ciclo Celular , Saccharomyces cerevisiae/genética , Programas Informáticos , Factores de Transcripción/genética
10.
Proteins ; 88(12): 1557-1558, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32662915

RESUMEN

We have modeled modifications of a known ligand to the SARS-CoV-2 (COVID-19) protease, that can form a covalent adduct, plus additional ligand-protein hydrogen bonds.


Asunto(s)
Antivirales , Áfidos , Infecciones por Coronavirus , Insecticidas , Pandemias , Neumonía Viral , Acetilcolinesterasa , Animales , Betacoronavirus , COVID-19 , Cisteína Endopeptidasas , Humanos , Simulación del Acoplamiento Molecular , Inhibidores de Proteasas , SARS-CoV-2 , Proteínas no Estructurales Virales
11.
Front Mol Biosci ; 7: 65, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32373628
12.
Front Mol Biosci ; 7: 612920, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33996891

RESUMEN

What is the architectural "basis set" of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures-called concepts-typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence-structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.

13.
Methods Mol Biol ; 1958: 123-131, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30945216

RESUMEN

We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .


Asunto(s)
Secuencias de Aminoácidos , Biología Computacional/métodos , Proteínas/química , Algoritmos , Teorema de Bayes , Compresión de Datos , Humanos , Modelos Moleculares , Pliegue de Proteína
14.
Nat Struct Mol Biol ; 25(6): 538-545, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29872229

RESUMEN

Arrestins regulate the signaling of ligand-activated, phosphorylated G-protein-coupled receptors (GPCRs). Different patterns of receptor phosphorylation (phosphorylation barcode) can modulate arrestin conformations, resulting in distinct functional outcomes (for example, desensitization, internalization, and downstream signaling). However, the mechanism of arrestin activation and how distinct receptor phosphorylation patterns could induce different conformational changes on arrestin are not fully understood. We analyzed how each arrestin amino acid contributes to its different conformational states. We identified a conserved structural motif that restricts the mobility of the arrestin finger loop in the inactive state and appears to be regulated by receptor phosphorylation. Distal and proximal receptor phosphorylation sites appear to selectively engage with distinct arrestin structural motifs (that is, micro-locks) to induce different arrestin conformations. These observations suggest a model in which different phosphorylation patterns of the GPCR C terminus can combinatorially modulate the conformation of the finger loop and other phosphorylation-sensitive structural elements to drive distinct arrestin conformation and functional outcomes.


Asunto(s)
Arrestina/química , Arrestina/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Humanos , Fosforilación , Conformación Proteica , Transducción de Señal
15.
Bioinformatics ; 33(7): 1005-1013, 2017 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-28065899

RESUMEN

Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . Contact: arun.konagurthu@monash.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Compresión de Datos , Modelos Estadísticos , Proteínas/química , Alineación de Secuencia , Algoritmos , Teorema de Bayes , Reproducibilidad de los Resultados , Programas Informáticos
16.
J Comput Biol ; 22(6): 487-97, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25695500

RESUMEN

The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.


Asunto(s)
Proteínas/química , Algoritmos , Análisis de los Mínimos Cuadrados , Modelos Moleculares , Conformación Proteica , Alineación de Secuencia/métodos
17.
Bioinformatics ; 30(17): i512-8, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161241

RESUMEN

MOTIVATION: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. RESULTS: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. AVAILABILITY: http://lcb.infotech.monash.edu.au/I-value. SUPPLEMENTARY INFORMATION: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html.


Asunto(s)
Homología Estructural de Proteína , Algoritmos , Compresión de Datos , Interpretación Estadística de Datos
18.
Acta Crystallogr D Biol Crystallogr ; 70(Pt 3): 904-6, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24598758

RESUMEN

Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each C(α) coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1 Å.


Asunto(s)
Bases de Datos de Proteínas/normas , Interfaz Usuario-Computador , Cristalografía por Rayos X/normas , Diccionarios Químicos como Asunto , Espectroscopía de Resonancia Magnética/normas , Microscopía Electrónica/normas , Valor Predictivo de las Pruebas , Distribución Aleatoria
19.
Proteins ; 82(3): 349-53, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24105818

RESUMEN

Eph receptors comprise the largest known family of receptor tyrosine kinases in mammals. They bind members of a second family, the ephrins. As both Eph receptors and ephrins are membrane bound, interactions permit unusual bidirectional cell-cell signaling. Eph receptors and ephrins each form two classes, A and B, based on sequences, structures, and patterns of affinity: Class A Eph receptors bind class A ephrins, and class B Eph receptors bind class B ephrins. The only known exceptions are the receptor EphA4, which can bind ephrinB2 and ephrinB3 in addition to the ephrin-As (Bowden et al., Structure 2009;17:1386-1397); and EphB2, which can bind ephrin-A5 in addition to the ephrin-Bs (Himanen et al., Nat Neurosci 2004;7:501-509). A crystal structure is available of the interacting domains of the EphA4-ephrin B2 complex (wwPDB entry 2WO2) (Bowden et al., Structure 2009;17:1386-1397). In this complex, the ligand-binding domain of EphA4 adopts an EphB-like conformation. To understand why other cross-class EphA receptor-ephrinB complexes do not form, we modeled hypothetical complexes between (1) EphA4-ephrinB1, (2) EphA4-ephrinB3, and (3) EphA2-ephrinB2. We identify particular residues in the interface region, the size variations of which cause steric clashes that prevent formation of the unobserved complexes. The sizes of the sidechains of residues at these positions correlate with the pattern of binding affinity.


Asunto(s)
Efrinas/química , Efrinas/metabolismo , Receptor EphA4/química , Receptor EphA4/metabolismo , Secuencia de Aminoácidos , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Alineación de Secuencia , Propiedades de Superficie
20.
Methods Mol Biol ; 932: 51-9, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-22987346

RESUMEN

We have developed a concise tableau representation of protein folding patterns, based on the order and contact patterns of elements of secondary structure: helices and strands of sheet. The tableaux provide a database, derived from the protein data bank, minable for studies on the general principles of protein architecture, including investigation of the relationship between local supersecondary structure of proteins and the complete folding topology. This chapter outlines the tableaux representation of protein folding patterns and methods to use them to identify structural and substructural similarities.


Asunto(s)
Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Programas Informáticos , Algoritmos , Estructura Secundaria de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA