Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Bioinformatics ; 38(13): 3395-3406, 2022 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-35575379

RESUMEN

MOTIVATION: Protein function prediction, based on the patterns of connection in a protein-protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein-protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein-protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. RESULTS: GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein-protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein-protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein-protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson's Disease GWAS genes, rediscover many genes which have known involvement in Parkinson's disease pathways, plus suggest some new genes to study. AVAILABILITY AND IMPLEMENTATION: All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Enfermedad de Parkinson , Humanos , Biología Computacional/métodos , Algoritmos , Proteínas/metabolismo
2.
Annu Rev Biomed Data Sci ; 5: 205-231, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35537462

RESUMEN

Coral reefs are home to over two million species and provide habitat for roughly 25% of all marine animals, but they are being severely threatened by pollution and climate change. A large amount of genomic, transcriptomic, and other omics data is becoming increasingly available from different species of reef-building corals, the unicellular dinoflagellates, and the coral microbiome (bacteria, archaea, viruses, fungi, etc.). Such new data present an opportunity for bioinformatics researchers and computational biologists to contribute to a timely, compelling, and urgent investigation of critical factors that influence reef health and resilience.


Asunto(s)
Antozoos , Microbiota , Animales , Antozoos/genética , Biología Computacional , Arrecifes de Coral , Microbiota/genética , Simbiosis/genética
3.
Cell Mol Life Sci ; 79(2): 78, 2022 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-35044538

RESUMEN

Three-dimensional (3D) in vitro culture systems using human induced pluripotent stem cells (hiPSCs) are useful tools to model neurodegenerative disease biology in physiologically relevant microenvironments. Though many successful biomaterials-based 3D model systems have been established for other neurogenerative diseases, such as Alzheimer's disease, relatively few exist for Parkinson's disease (PD) research. We employed tissue engineering approaches to construct a 3D silk scaffold-based platform for the culture of hiPSC-dopaminergic (DA) neurons derived from healthy individuals and PD patients harboring LRRK2 G2019S or GBA N370S mutations. We then compared results from protein, gene expression, and metabolic analyses obtained from two-dimensional (2D) and 3D culture systems. The 3D platform enabled the formation of dense dopamine neuronal network architectures and developed biological profiles both similar and distinct from 2D culture systems in healthy and PD disease lines. PD cultures developed in 3D platforms showed elevated levels of α-synuclein and alterations in purine metabolite profiles. Furthermore, computational network analysis of transcriptomic networks nominated several novel molecular interactions occurring in neurons from patients with mutations in LRRK2 and GBA. We conclude that the brain-like 3D system presented here is a realistic platform to interrogate molecular mechanisms underlying PD biology.


Asunto(s)
Neuronas Dopaminérgicas/patología , Enfermedad de Parkinson/patología , Bioingeniería , Técnicas de Cultivo Tridimensional de Células , Células Cultivadas , Neuronas Dopaminérgicas/citología , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/patología , Neurogénesis , Seda/química , Andamios del Tejido/química
4.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 1933-1945, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33591921

RESUMEN

A method to improve protein function prediction for sparsely annotated PPI networks is introduced. The method extends the DSD majority vote algorithm introduced by Cao et al. to give confidence scores on predicted labels and to use predictions of high confidence to predict the labels of other nodes in subsequent rounds. We call this a majority vote cascade. Several cascade variants are tested in a stringent cross-validation experiment on PPI networks from S. cerevisiae and D. melanogaster, and we show that for many different settings with several alternative confidence functions, cascading improves the accuracy of the predictions. A list of the most confident new label predictions in the two networks is also reported. Code and networks for the cross-validation experiments appear at http://bcb.cs.tufts.edu/cascade.


Asunto(s)
Drosophila melanogaster , Saccharomyces cerevisiae , Algoritmos , Animales , Proteínas/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
5.
Bioinform Adv ; 2(1): vbab025, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36699351

RESUMEN

Motivation: Leveraging cross-species information in protein function prediction can add significant power to network-based protein function prediction methods, because so much functional information is conserved across at least close scales of evolution. We introduce MUNDO, a new cross-species co-embedding method that combines a single-network embedding method with a co-embedding method to predict functional annotations in a target species, leveraging also functional annotations in a model species network. Results: Across a wide range of parameter choices, MUNDO performs best at predicting annotations in the mouse network, when trained on mouse and human protein-protein interaction (PPI) networks, in the human network, when trained on human and mouse PPIs, and in Baker's yeast, when trained on Fission and Baker's yeast, as compared to competitor methods. MUNDO also outperforms all the cross-species methods when predicting in Fission yeast when trained on Fission and Baker's yeast; however, in this single case, discarding the information from the other species and using annotations from the Fission yeast network alone usually performs best. Availability and implementation: All code is available and can be accessed here: github.com/v0rtex20k/MUNDO. Supplementary information: Supplementary data are available at Bioinformatics Advances online. Additional experimental results are on our github site.

6.
Pac Symp Biocomput ; 26: 336-340, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33691030

RESUMEN

Coral reefs are home to over 2 million species and provide habitat for roughly 25% of all marine animals, but they are being severely threatened by pollution and climate change. A large amount of genomic, transcriptomic and other -omics data from different species of reef building corals, the uni-cellular dinoagellates, plus the coral microbiome (where corals have possibly the most complex microbiome yet discovered, consisting of over 20,000 different species), is becoming increasingly available for corals. This new data present an opportunity for bioinformatics researchers and computational biologists to contribute to a timely, compelling, and urgent investigation of critical factors that influence reef health and resilience. This paper summarizes the content of the Bioinformatics of Corals workshop, that is being held as part of PSB 2021. It is particularly relevant for this workshop to occur at PSB, given the abundance of and reliance on coral reefs in Hawaii and the conference's traditional association with the region.


Asunto(s)
Antozoos , Microbiota , Animales , Antozoos/genética , Biología Computacional , Arrecifes de Coral
7.
Bioinformatics ; 36(Suppl_1): i464-i473, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657369

RESUMEN

MOTIVATION: One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein-protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. RESULTS: We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE's global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn's disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. AVAILABILITY AND IMPLEMENTATION: GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Saccharomyces cerevisiae , Difusión , Humanos , Mapeo de Interacción de Proteínas
8.
Nat Methods ; 16(9): 843-852, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31471613

RESUMEN

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.


Asunto(s)
Biología Computacional/métodos , Enfermedad/genética , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Modelos Biológicos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Algoritmos , Perfilación de la Expresión Génica , Humanos , Fenotipo , Mapas de Interacción de Proteínas
9.
BMC Syst Biol ; 12(1): 113, 2018 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-30453938

RESUMEN

The authors have retracted this article [1]. After publication they discovered a technical error in the Louvain algorithm with bounded cluster sizes. Correction of this error substantially changed the results for this algorithm and the conclusions drawn in the article were found to be incorrect. The authors will submit a new manuscript for peer review.

10.
BMC Syst Biol ; 12(Suppl 3): 24, 2018 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-29589565

RESUMEN

BACKGROUND: Decomposing a protein-protein interaction network (PPI network) into non-overlapping clusters or communities, sometimes called "network modules," is an important way to explore functional roles of sets of genes. When the method to accomplish this decomposition is solely based on purely graph-theoretic measures of the interconnection structure of the network, this is often called unsupervised clustering or community detection. In this study, we compare unsupervised computational methods for decomposing a PPI network into non-overlapping modules. A method is preferred if it results in a large proportion of nodes being assigned to functionally meaningful modules, as measured by functional enrichment over terms from the Gene Ontology (GO). RESULTS: We compare the performance of three popular community detection algorithms with the same algorithms run after the network is pre-processed by removing and reweighting based on the diffusion state distance (DSD) between pairs of nodes in the network. We call this "detangling" the network. In almost all cases, we find that detangling the network based on the DSD distance reweighting provides more meaningful clusters. CONCLUSIONS: Re-embedding using the DSD distance metric, before applying standard community detection algorithms, can assist in uncovering GO functionally enriched clusters in the yeast PPI network.

11.
Pac Symp Biocomput ; 22: 15-26, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27896958

RESUMEN

Current automated computational methods to assign functional labels to unstudied genes often involve transferring annotation from orthologous or paralogous genes, however such genes can evolve divergent functions, making such transfer inappropriate. We consider the problem of determining when it is correct to make such an assignment between paralogs. We construct a benchmark dataset of two types of similar paralogous pairs of genes in the well-studied model organism S. cerevisiae: one set of pairs where single deletion mutants have very similar phenotypes (implying similar functions), and another set of pairs where single deletion mutants have very divergent phenotypes (implying different functions). State of the art methods for this problem will determine the evolutionary history of the paralogs with references to multiple related species. Here, we ask a first and simpler question: we explore to what extent any computational method with access only to data from a single species can solve this problem.We consider divergence data (at both the amino acid and nucleotide levels), and network data (based on the yeast protein-protein interaction network, as captured in BioGRID), and ask if we can extract features from these data that can distinguish between these sets of paralogous gene pairs. We find that the best features come from measures of sequence divergence, however, simple network measures based on degree or centrality or shortest path or diffusion state distance (DSD), or shared neighborhood in the yeast protein-protein interaction (PPI) network also contain some signal. One should, in general, not transfer function if sequence divergence is too high. Further improvements in classification will need to come from more computationally expensive but much more powerful evolutionary methods that incorporate ancestral states and measure evolutionary divergence over multiple species based on evolutionary trees.


Asunto(s)
Anotación de Secuencia Molecular , Algoritmos , Biología Computacional , Duplicación de Gen , Genes Fúngicos , Mapas de Interacción de Proteínas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Homología de Secuencia de Aminoácido , Aprendizaje Automático Supervisado
12.
Artículo en Inglés | MEDLINE | ID: mdl-26357074

RESUMEN

We introduce MRFy, a tool for protein remote homology detection that captures beta-strand dependencies in the Markov random field. Over a set of 11 SCOP beta-structural superfamilies, MRFy shows a 14 percent improvement in mean Area Under the Curve for the motif recognition problem as compared to HMMER, 25 percent improvement as compared to RAPTOR, 14 percent improvement as compared to HHPred, and a 18 percent improvement as compared to CNFPred and RaptorX. MRFy was implemented in the Haskell functional programming language, and parallelizes well on multi-core systems. MRFy is available, as source code as well as an executable, from http://mrfy.cs.tufts.edu/.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Algoritmos , Secuencias de Aminoácidos , Cadenas de Markov , Modelos Estadísticos , Procesos Estocásticos
13.
Bioinformatics ; 30(12): i219-27, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24931987

RESUMEN

MOTIVATION: It has long been hypothesized that incorporating models of network noise as well as edge directions and known pathway information into the representation of protein-protein interaction (PPI) networks might improve their utility for functional inference. However, a simple way to do this has not been obvious. We find that diffusion state distance (DSD), our recent diffusion-based metric for measuring dissimilarity in PPI networks, has natural extensions that incorporate confidence, directions and can even express coherent pathways by calculating DSD on an augmented graph. RESULTS: We define three incremental versions of DSD which we term cDSD, caDSD and capDSD, where the capDSD matrix incorporates confidence, known directed edges, and pathways into the measure of how similar each pair of nodes is according to the structure of the PPI network. We test four popular function prediction methods (majority vote, weighted majority vote, multi-way cut and functional flow) using these different matrices on the Baker's yeast PPI network in cross-validation. The best performing method is weighted majority vote using capDSD. We then test the performance of our augmented DSD methods on an integrated heterogeneous set of protein association edges from the STRING database. The superior performance of capDSD in this context confirms that treating the pathways as probabilistic units is more powerful than simply incorporating pathway edges independently into the network. AVAILABILITY: All source code for calculating the confidences, for extracting pathway information from KEGG XML files, and for calculating the cDSD, caDSD and capDSD matrices are available from http://dsd.cs.tufts.edu/capdsd


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Algoritmos , Proteínas de Saccharomyces cerevisiae/metabolismo
14.
PLoS One ; 8(10): e76339, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24194834

RESUMEN

In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.


Asunto(s)
Algoritmos , Modelos Genéticos , Mapas de Interacción de Proteínas/genética , Proteínas/metabolismo
15.
Bioinformatics ; 29(13): i283-90, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23812995

RESUMEN

MOTIVATION: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. RESULTS: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP's runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. AVAILABILITY: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ CONTACT: bab@mit.edu.


Asunto(s)
Algoritmos , Compresión de Datos/métodos , Bases de Datos de Proteínas , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Genómica/métodos
16.
BMC Bioinformatics ; 14: 23, 2013 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-23331614

RESUMEN

BACKGROUND: New technology has resulted in high-throughput screens for pairwise genetic interactions in yeast and other model organisms. For each pair in a collection of non-essential genes, an epistasis score is obtained, representing how much sicker (or healthier) the double-knockout organism will be compared to what would be expected from the sickness of the component single knockouts. Recent algorithmic work has identified graph-theoretic patterns in this data that can indicate functional modules, and even sets of genes that may occur in compensatory pathways, such as a BPM-type schema first introduced by Kelley and Ideker. However, to date, any algorithms for finding such patterns in the data were implemented internally, with no software being made publically available. RESULTS: Genecentric is a new package that implements a parallelized version of the Leiserson et al. algorithm (J Comput Biol 18:1399-1409, 2011) for generating generalized BPMs from high-throughput genetic interaction data. Given a matrix of weighted epistasis values for a set of double knock-outs, Genecentric returns a list of generalized BPMs that may represent compensatory pathways. Genecentric also has an extension, GenecentricGO, to query FuncAssociate (Bioinformatics 25:3043-3044, 2009) to retrieve GO enrichment statistics on generated BPMs. Python is the only dependency, and our web site provides working examples and documentation. CONCLUSION: We find that Genecentric can be used to find coherent functional and perhaps compensatory gene sets from high throughput genetic interaction data. Genecentric is made freely available for download under the GPLv2 from http://bcb.cs.tufts.edu/genecentric.


Asunto(s)
Epistasis Genética , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Genes Fúngicos , Modelos Genéticos , Saccharomyces cerevisiae/genética
17.
BMC Bioinformatics ; 13: 259, 2012 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-23039758

RESUMEN

BACKGROUND: The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. RESULTS: We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. CONCLUSIONS: Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Proteínas/química
18.
Bioinformatics ; 28(9): 1216-22, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-22408192

RESUMEN

MOTIVATION: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif. RESULTS: We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile-profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions. Availability and implementaion: A webserver that runs SMURFLite is available at: http://smurf.cs.tufts.edu/smurflite/


Asunto(s)
Cadenas de Markov , Estructura Secundaria de Proteína , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Genoma Bacteriano , Humanos , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/genética , Thermotoga maritima/genética
19.
Proteins ; 80(2): 410-20, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22095906

RESUMEN

The supersecondary structure of amyloids and prions, proteins of intense clinical and biological interest, are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Previous work has demonstrated that probability-based prediction of discrete ß-strand pairs can offer insight into these structures. Here, we devise a system of energetic rules that can be used to dynamically assemble these discrete ß-strand pairs into complete amyloid ß-structures. The STITCHER algorithm progressively 'stitches' strand-pairs into full ß-sheets based on a novel free-energy model, incorporating experimentally observed amino-acid side-chain stacking contributions, entropic estimates, and steric restrictions for amyloidal parallel ß-sheet construction. A dynamic program computes the top 50 structures and returns both the highest scoring structure and a consensus structure taken by polling this list for common discrete elements. Putative structural heterogeneity can be inferred from sequence regions that compose poorly. Predictions show agreement with experimental models of Alzheimer's amyloid beta peptide and the Podospora anserina Het-s prion. Predictions of the HET-s homolog HET-S also reflect experimental observations of poor amyloid formation. We put forward predicted structures for the yeast prion Sup35, suggesting N-terminal structural stability enabled by tyrosine ladders, and C-terminal heterogeneity. Predictions for the Rnq1 prion and alpha-synuclein are also given, identifying a similar mix of homogenous and heterogeneous secondary structure elements. STITCHER provides novel insight into the energetic basis of amyloid structure, provides accurate structure predictions, and can help guide future experimental studies.


Asunto(s)
Algoritmos , Péptidos beta-Amiloides/química , Priones/química , Pliegue de Proteína , Amiloide/química , Entropía , Proteínas Fúngicas/química , Proteínas de Filamentos Intermediarios/química , Factores de Terminación de Péptidos/química , Estructura Secundaria de Proteína , Proteínas de Saccharomyces cerevisiae/química
20.
Artículo en Inglés | MEDLINE | ID: mdl-21464511

RESUMEN

Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.


Asunto(s)
Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Análisis por Conglomerados , Biología Computacional , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...