Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Bioinformatics ; 39(39 Suppl 1): i357-i367, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387189

RESUMO

The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (〈ϕ,ψ,χ1,χ2,…〉) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (〈χ1,χ2,…〉) as a function of backbone 〈ϕ,ψ〉 conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (ϕψχal). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.


Assuntos
Compressão de Dados , Bibliotecas , Aminoácidos , Biblioteca Gênica , Teoria da Informação
3.
Curr Opin Struct Biol ; 79: 102539, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36753924

RESUMO

Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.


Assuntos
Algoritmos , Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência
4.
Proteins ; 90(12): 2144-2147, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35754316

RESUMO

The basic operation in analysis of protein evolution is alignment: the specification of residue-residue correspondences. A structural alignment is a specification of residue-residue correspondences based on the atomic positions in the structures of two or more proteins. It is well-known that structural alignments are more accurate, over a much wider range of divergence, than pairwise alignments based solely on sequences-for instance computed with the Needleman-Wunsch algorithm with affine gap penalties. Given the amino-acid sequences of two proteins, alignments based solely on the sequences fall into "daylight", "twilight", and "midnight" zones, in which the fidelity of the correspondences diminishes in accuracy, and in strength of ability to distinguish true homology from noise. The success of AlphaFold2 in template-free modeling of three-dimensional structures from one-dimensional amino-acid sequence information implies that: given the amino-acid sequences of two or more proteins, in the absence of experimentally determined structures, reliable alignments-even for very highly diverged proteins-could in many cases be achieved by applying AlphaFold2 to the sequences, and performing structural alignments of the models.


Assuntos
Algoritmos , Proteínas , Alinhamento de Sequência , Sequência de Aminoácidos , Proteínas/química
5.
Bioinformatics ; 38(Suppl 1): i255-i263, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758808

RESUMO

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aminoácidos , Proteínas , Algoritmos , Sequência de Aminoácidos , Proteínas/química , Reprodutibilidade dos Testes , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
6.
Methods Mol Biol ; 2449: 43-91, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35507259

RESUMO

Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.


Assuntos
Proteínas , Software , Biologia Computacional , Bases de Dados de Proteínas , Conformação Proteica , Proteínas/química
7.
J Mol Biol ; 433(21): 167181, 2021 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-34339724

RESUMO

We analyse paths through the regulatory networks that control gene-expression patterns in Yeast, in five different physiological states: cell cycle, DNA damage, stress response, diauxic shift, and sporulation. The network in each state is specified as a directed graph, containing different sets of edges connecting pairs selected from a combined set of 1475 nodes. Each network contains some nodes that have no parents, and others that have no children. We call these, respectively, 'source' and 'sink' nodes. For each network we enumerate paths between source and sink nodes. In a previous paper (Lesk and Konagurthu, 2020), we defined, extracted and compared the neighbourhoods of each transcription factor in different physiological states, and how the system reconfigures itself. Here we compare the usage of nodes and edges by different networks, and how they are assembled into paths. The picture that emerges is that the networks are not disjoint but show substantial sharing of nodes and edges; however, they assemble these materials into different sets of paths. Four of the networks, other than the cell-cycle network, contain paths between only a small fraction (<13%) of possible source-sink pairs. Although the cell-cycle network is not an outlier in terms of total number of nodes and edges, and number of sink nodes, it is very much an outlier in having a greater proportion of source-to-sink paths than the other networks.


Assuntos
Ciclo Celular/genética , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Estresse Fisiológico/genética , Fatores de Transcrição/genética , Biologia Computacional/métodos , Dano ao DNA , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Ontologia Genética , Anotação de Sequência Molecular , Saccharomyces cerevisiae/crescimento & desenvolvimento , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/classificação , Proteínas de Saccharomyces cerevisiae/metabolismo , Transdução de Sinais , Esporos Fúngicos/genética , Esporos Fúngicos/crescimento & desenvolvimento , Esporos Fúngicos/metabolismo , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo
8.
J Biol Chem ; 296: 100421, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33609524

RESUMO

Intracellular organelles do not, as thought for a long time, act in isolation but are dynamically tethered together by entire machines responsible for interorganelle trafficking and positioning. Among the proteins responsible for tethering is the family of VAMP-associated proteins (VAPs) that appear in all eukaryotes and are localized primarily in the endoplasmic reticulum. The major functional role of VAPs is to tether the endoplasmic reticulum with different organelles and regulate lipid metabolism and transport. VAPs have gained increasing attention because of their role in human pathology where they contribute to infections by viruses and bacteria and participate in neurodegeneration. In this review, we discuss the structure, evolution, and functions of VAPs, focusing more specifically on VAP-B for its relationship with amyotrophic lateral sclerosis and other neurodegenerative diseases.


Assuntos
Doenças Transmissíveis/metabolismo , Doenças Neurodegenerativas/metabolismo , Proteínas de Transporte Vesicular/metabolismo , Animais , Humanos , Metabolismo dos Lipídeos , Mutação , Proteínas de Transporte Vesicular/genética
9.
Bioinformatics ; 37(4): 551-558, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32976569

RESUMO

MOTIVATION: The gene expression regulatory network in yeast controls the selective implementation of the information contained in the genome sequence. We seek to understand how, in different physiological states, the network reconfigures itself to produce a different proteome. RESULTS: This article analyses this reconfiguration, focussing on changes in the local structure of the network. In particular, we define, extract and compare the 1-neighbourhoods of each transcription factor, where a 1-neighbourhood of a node in a network is the minimal subgraph of the network containing all nodes connected to the central node by an edge. We report the similarities and differences in the topologies and connectivities of these neighbourhoods in five physiological states for which data are available: cell cycle, DNA damage, stress response, diauxic shift and sporulation. Based on our analysis, it seems apt to regard the components of the regulatory network as 'software', and the responses to changes in state, 'reprogramming'.


Assuntos
Redes Reguladoras de Genes , Saccharomyces cerevisiae , Ciclo Celular , Saccharomyces cerevisiae/genética , Software , Fatores de Transcrição/genética
10.
Proteins ; 88(12): 1557-1558, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32662915

RESUMO

We have modeled modifications of a known ligand to the SARS-CoV-2 (COVID-19) protease, that can form a covalent adduct, plus additional ligand-protein hydrogen bonds.


Assuntos
Antivirais , Afídeos , Infecções por Coronavirus , Inseticidas , Pandemias , Pneumonia Viral , Acetilcolinesterase , Animais , Betacoronavirus , COVID-19 , Cisteína Endopeptidases , Humanos , Simulação de Acoplamento Molecular , Inibidores de Proteases , SARS-CoV-2 , Proteínas não Estruturais Virais
11.
Front Mol Biosci ; 7: 65, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32373628
12.
Front Mol Biosci ; 7: 612920, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33996891

RESUMO

What is the architectural "basis set" of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures-called concepts-typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence-structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.

13.
Methods Mol Biol ; 1958: 123-131, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30945216

RESUMO

We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .


Assuntos
Motivos de Aminoácidos , Biologia Computacional/métodos , Proteínas/química , Algoritmos , Teorema de Bayes , Compressão de Dados , Humanos , Modelos Moleculares , Dobramento de Proteína
14.
Nat Struct Mol Biol ; 25(6): 538-545, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29872229

RESUMO

Arrestins regulate the signaling of ligand-activated, phosphorylated G-protein-coupled receptors (GPCRs). Different patterns of receptor phosphorylation (phosphorylation barcode) can modulate arrestin conformations, resulting in distinct functional outcomes (for example, desensitization, internalization, and downstream signaling). However, the mechanism of arrestin activation and how distinct receptor phosphorylation patterns could induce different conformational changes on arrestin are not fully understood. We analyzed how each arrestin amino acid contributes to its different conformational states. We identified a conserved structural motif that restricts the mobility of the arrestin finger loop in the inactive state and appears to be regulated by receptor phosphorylation. Distal and proximal receptor phosphorylation sites appear to selectively engage with distinct arrestin structural motifs (that is, micro-locks) to induce different arrestin conformations. These observations suggest a model in which different phosphorylation patterns of the GPCR C terminus can combinatorially modulate the conformation of the finger loop and other phosphorylation-sensitive structural elements to drive distinct arrestin conformation and functional outcomes.


Assuntos
Arrestina/química , Arrestina/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Humanos , Fosforilação , Conformação Proteica , Transdução de Sinais
15.
Bioinformatics ; 33(7): 1005-1013, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28065899

RESUMO

Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . Contact: arun.konagurthu@monash.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Compressão de Dados , Modelos Estatísticos , Proteínas/química , Alinhamento de Sequência , Algoritmos , Teorema de Bayes , Reprodutibilidade dos Testes , Software
16.
J Comput Biol ; 22(6): 487-97, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25695500

RESUMO

The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.


Assuntos
Proteínas/química , Algoritmos , Análise dos Mínimos Quadrados , Modelos Moleculares , Conformação Proteica , Alinhamento de Sequência/métodos
17.
Bioinformatics ; 30(17): i512-8, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25161241

RESUMO

MOTIVATION: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. RESULTS: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. AVAILABILITY: http://lcb.infotech.monash.edu.au/I-value. SUPPLEMENTARY INFORMATION: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html.


Assuntos
Homologia Estrutural de Proteína , Algoritmos , Compressão de Dados , Interpretação Estatística de Dados
18.
Acta Crystallogr D Biol Crystallogr ; 70(Pt 3): 904-6, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24598758

RESUMO

Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each C(α) coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1 Å.


Assuntos
Bases de Dados de Proteínas/normas , Interface Usuário-Computador , Cristalografia por Raios X/normas , Dicionários Químicos como Assunto , Espectroscopia de Ressonância Magnética/normas , Microscopia Eletrônica/normas , Valor Preditivo dos Testes , Distribuição Aleatória
19.
Proteins ; 82(3): 349-53, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24105818

RESUMO

Eph receptors comprise the largest known family of receptor tyrosine kinases in mammals. They bind members of a second family, the ephrins. As both Eph receptors and ephrins are membrane bound, interactions permit unusual bidirectional cell-cell signaling. Eph receptors and ephrins each form two classes, A and B, based on sequences, structures, and patterns of affinity: Class A Eph receptors bind class A ephrins, and class B Eph receptors bind class B ephrins. The only known exceptions are the receptor EphA4, which can bind ephrinB2 and ephrinB3 in addition to the ephrin-As (Bowden et al., Structure 2009;17:1386-1397); and EphB2, which can bind ephrin-A5 in addition to the ephrin-Bs (Himanen et al., Nat Neurosci 2004;7:501-509). A crystal structure is available of the interacting domains of the EphA4-ephrin B2 complex (wwPDB entry 2WO2) (Bowden et al., Structure 2009;17:1386-1397). In this complex, the ligand-binding domain of EphA4 adopts an EphB-like conformation. To understand why other cross-class EphA receptor-ephrinB complexes do not form, we modeled hypothetical complexes between (1) EphA4-ephrinB1, (2) EphA4-ephrinB3, and (3) EphA2-ephrinB2. We identify particular residues in the interface region, the size variations of which cause steric clashes that prevent formation of the unobserved complexes. The sizes of the sidechains of residues at these positions correlate with the pattern of binding affinity.


Assuntos
Efrinas/química , Efrinas/metabolismo , Receptor EphA4/química , Receptor EphA4/metabolismo , Sequência de Aminoácidos , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Alinhamento de Sequência , Propriedades de Superfície
20.
Methods Mol Biol ; 932: 51-9, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-22987346

RESUMO

We have developed a concise tableau representation of protein folding patterns, based on the order and contact patterns of elements of secondary structure: helices and strands of sheet. The tableaux provide a database, derived from the protein data bank, minable for studies on the general principles of protein architecture, including investigation of the relationship between local supersecondary structure of proteins and the complete folding topology. This chapter outlines the tableaux representation of protein folding patterns and methods to use them to identify structural and substructural similarities.


Assuntos
Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Software , Algoritmos , Estrutura Secundária de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...