Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(24): e2220778120, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37289807

RESUMO

Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.


Assuntos
Descoberta de Drogas , Proteínas , Humanos , Proteínas/química , Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos , Idioma
2.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37897686

RESUMO

MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS: We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION: TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.


Assuntos
Proteínas , Software , Sequência de Aminoácidos , Proteínas/química
3.
Bioinformatics ; 38(Suppl 1): i264-i272, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758793

RESUMO

SUMMARY: Computational methods to predict protein-protein interaction (PPI) typically segregate into sequence-based 'bottom-up' methods that infer properties from the characteristics of the individual protein sequences, or global 'top-down' methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. AVAILABILITY AND IMPLEMENTATION: https://topsyturvy.csail.mit.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mapeamento de Interação de Proteínas , Proteínas , Sequência de Aminoácidos , Mapeamento de Interação de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo
4.
Bioinformatics ; 38(13): 3395-3406, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35575379

RESUMO

MOTIVATION: Protein function prediction, based on the patterns of connection in a protein-protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein-protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein-protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. RESULTS: GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein-protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein-protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein-protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson's Disease GWAS genes, rediscover many genes which have known involvement in Parkinson's disease pathways, plus suggest some new genes to study. AVAILABILITY AND IMPLEMENTATION: All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Doença de Parkinson , Humanos , Biologia Computacional/métodos , Algoritmos , Proteínas/metabolismo
5.
Nat Rev Genet ; 18(9): 551-562, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28607512

RESUMO

Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery - network propagation - suggesting that it is a powerful data transformation method of broad utility in genetic research.


Assuntos
Biologia Computacional , Doença/genética , Redes Reguladoras de Genes , Estudos de Associação Genética , Software , Algoritmos , Humanos , Mapas de Interação de Proteínas , Proteínas/metabolismo
6.
Cell Mol Life Sci ; 79(2): 78, 2022 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-35044538

RESUMO

Three-dimensional (3D) in vitro culture systems using human induced pluripotent stem cells (hiPSCs) are useful tools to model neurodegenerative disease biology in physiologically relevant microenvironments. Though many successful biomaterials-based 3D model systems have been established for other neurogenerative diseases, such as Alzheimer's disease, relatively few exist for Parkinson's disease (PD) research. We employed tissue engineering approaches to construct a 3D silk scaffold-based platform for the culture of hiPSC-dopaminergic (DA) neurons derived from healthy individuals and PD patients harboring LRRK2 G2019S or GBA N370S mutations. We then compared results from protein, gene expression, and metabolic analyses obtained from two-dimensional (2D) and 3D culture systems. The 3D platform enabled the formation of dense dopamine neuronal network architectures and developed biological profiles both similar and distinct from 2D culture systems in healthy and PD disease lines. PD cultures developed in 3D platforms showed elevated levels of α-synuclein and alterations in purine metabolite profiles. Furthermore, computational network analysis of transcriptomic networks nominated several novel molecular interactions occurring in neurons from patients with mutations in LRRK2 and GBA. We conclude that the brain-like 3D system presented here is a realistic platform to interrogate molecular mechanisms underlying PD biology.


Assuntos
Neurônios Dopaminérgicos/patologia , Doença de Parkinson/patologia , Bioengenharia , Técnicas de Cultura de Células em Três Dimensões , Células Cultivadas , Neurônios Dopaminérgicos/citologia , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/patologia , Neurogênese , Seda/química , Alicerces Teciduais/química
7.
Nat Methods ; 16(9): 843-852, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31471613

RESUMO

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.


Assuntos
Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Modelos Biológicos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Algoritmos , Perfilação da Expressão Gênica , Humanos , Fenótipo , Mapas de Interação de Proteínas
8.
Bioinformatics ; 36(Suppl_1): i464-i473, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657369

RESUMO

MOTIVATION: One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein-protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. RESULTS: We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE's global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn's disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. AVAILABILITY AND IMPLEMENTATION: GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Saccharomyces cerevisiae , Difusão , Humanos , Mapeamento de Interação de Proteínas
9.
Bioinformatics ; 30(12): i219-27, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931987

RESUMO

MOTIVATION: It has long been hypothesized that incorporating models of network noise as well as edge directions and known pathway information into the representation of protein-protein interaction (PPI) networks might improve their utility for functional inference. However, a simple way to do this has not been obvious. We find that diffusion state distance (DSD), our recent diffusion-based metric for measuring dissimilarity in PPI networks, has natural extensions that incorporate confidence, directions and can even express coherent pathways by calculating DSD on an augmented graph. RESULTS: We define three incremental versions of DSD which we term cDSD, caDSD and capDSD, where the capDSD matrix incorporates confidence, known directed edges, and pathways into the measure of how similar each pair of nodes is according to the structure of the PPI network. We test four popular function prediction methods (majority vote, weighted majority vote, multi-way cut and functional flow) using these different matrices on the Baker's yeast PPI network in cross-validation. The best performing method is weighted majority vote using capDSD. We then test the performance of our augmented DSD methods on an integrated heterogeneous set of protein association edges from the STRING database. The superior performance of capDSD in this context confirms that treating the pathways as probabilistic units is more powerful than simply incorporating pathway edges independently into the network. AVAILABILITY: All source code for calculating the confidences, for extracting pathway information from KEGG XML files, and for calculating the cDSD, caDSD and capDSD matrices are available from http://dsd.cs.tufts.edu/capdsd


Assuntos
Mapeamento de Interação de Proteínas/métodos , Algoritmos , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
Bioinformatics ; 29(13): i283-90, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812995

RESUMO

MOTIVATION: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. RESULTS: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP's runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. AVAILABILITY: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ CONTACT: bab@mit.edu.


Assuntos
Algoritmos , Compressão de Dados/métodos , Bases de Dados de Proteínas , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Genômica/métodos
11.
PeerJ ; 12: e16804, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38313028

RESUMO

Once thought to be a unique capability of the Langerhans islets in the pancreas of mammals, insulin (INS) signaling is now recognized as an evolutionarily ancient function going back to prokaryotes. INS is ubiquitously present not only in humans but also in unicellular eukaryotes, fungi, worms, and Drosophila. Remote homologue identification also supports the presence of INS and INS receptor in corals where the availability of glucose is largely dependent on the photosynthetic activity of the symbiotic algae. The cnidarian animal host of corals operates together with a 20,000-sized microbiome, in direct analogy to the human gut microbiome. In humans, aberrant INS signaling is the hallmark of metabolic disease, and is thought to play a major role in aging, and age-related diseases, such as Alzheimer's disease. We here would like to argue that a broader view of INS beyond its human homeostasis function may help us understand other organisms, and in turn, studying those non-model organisms may enable a novel view of the human INS signaling system. To this end, we here review INS signaling from a new angle, by drawing analogies between humans and corals at the molecular level.


Assuntos
Antozoários , Ilhotas Pancreáticas , Animais , Humanos , Antozoários/metabolismo , Insulina/metabolismo , Ilhotas Pancreáticas/metabolismo , Pâncreas/metabolismo , Transdução de Sinais
12.
PeerJ ; 12: e16654, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38313033

RESUMO

Anthropogenic activities increase sediment suspended in the water column and deposition on reefs can be largely dependent on colony morphology. Massive and plating corals have a high capacity to trap sediments, and active removal mechanisms can be energetically costly. Branching corals trap less sediment but are more susceptible to light limitation caused by suspended sediment. Despite deleterious effects of sediments on corals, few studies have examined the molecular response of corals with different morphological characteristics to sediment stress. To address this knowledge gap, this study assessed the transcriptomic responses of branching and massive corals in Florida and Hawai'i to varying levels of sediment exposure. Gene expression analysis revealed a molecular responsiveness to sediments across species and sites. Differential Gene Expression followed by Gene Ontology (GO) enrichment analysis identified that branching corals had the largest transcriptomic response to sediments, in developmental processes and metabolism, while significantly enriched GO terms were highly variable between massive corals, despite similar morphologies. Comparison of DEGs within orthogroups revealed that while all corals had DEGs in response to sediment, there was not a concerted gene set response by morphology or location. These findings illuminate the species specificity and genetic basis underlying coral susceptibility to sediments.


Assuntos
Antozoários , Animais , Antozoários/genética , Recifes de Corais , Perfilação da Expressão Gênica , Transcriptoma/genética , Água
13.
BMC Bioinformatics ; 14: 23, 2013 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-23331614

RESUMO

BACKGROUND: New technology has resulted in high-throughput screens for pairwise genetic interactions in yeast and other model organisms. For each pair in a collection of non-essential genes, an epistasis score is obtained, representing how much sicker (or healthier) the double-knockout organism will be compared to what would be expected from the sickness of the component single knockouts. Recent algorithmic work has identified graph-theoretic patterns in this data that can indicate functional modules, and even sets of genes that may occur in compensatory pathways, such as a BPM-type schema first introduced by Kelley and Ideker. However, to date, any algorithms for finding such patterns in the data were implemented internally, with no software being made publically available. RESULTS: Genecentric is a new package that implements a parallelized version of the Leiserson et al. algorithm (J Comput Biol 18:1399-1409, 2011) for generating generalized BPMs from high-throughput genetic interaction data. Given a matrix of weighted epistasis values for a set of double knock-outs, Genecentric returns a list of generalized BPMs that may represent compensatory pathways. Genecentric also has an extension, GenecentricGO, to query FuncAssociate (Bioinformatics 25:3043-3044, 2009) to retrieve GO enrichment statistics on generated BPMs. Python is the only dependency, and our web site provides working examples and documentation. CONCLUSION: We find that Genecentric can be used to find coherent functional and perhaps compensatory gene sets from high throughput genetic interaction data. Genecentric is made freely available for download under the GPLv2 from http://bcb.cs.tufts.edu/genecentric.


Assuntos
Epistasia Genética , Software , Algoritmos , Biologia Computacional/métodos , Genes Fúngicos , Modelos Genéticos , Saccharomyces cerevisiae/genética
14.
Bioinformatics ; 28(9): 1216-22, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22408192

RESUMO

MOTIVATION: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif. RESULTS: We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile-profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions. Availability and implementaion: A webserver that runs SMURFLite is available at: http://smurf.cs.tufts.edu/smurflite/


Assuntos
Cadeias de Markov , Estrutura Secundária de Proteína , Proteínas/química , Software , Sequência de Aminoácidos , Genoma Bacteriano , Humanos , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/genética , Thermotoga maritima/genética
15.
Proc Natl Acad Sci U S A ; 107(9): 4069-74, 2010 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-20147619

RESUMO

The recent explosion in newly sequenced bacterial genomes is outpacing the capacity of researchers to try to assign functional annotation to all the new proteins. Hence, computational methods that can help predict structural motifs provide increasingly important clues in helping to determine how these proteins might function. We introduce a Markov Random Field approach tailored for recognizing proteins that fold into mainly beta-structural motifs, and apply it to build recognizers for the beta-propeller shapes. As an application, we identify a potential class of hybrid two-component sensor proteins, that we predict contain a double-propeller domain.


Assuntos
Proteínas de Bactérias/química , Histidina Quinase , Cadeias de Markov , Conformação Proteica , Proteínas Quinases/química
16.
PLoS One ; 18(2): e0270965, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36735673

RESUMO

With the ease of gene sequencing and the technology available to study and manipulate non-model organisms, the extension of the methodological toolbox required to translate our understanding of model organisms to non-model organisms has become an urgent problem. For example, mining of large coral and their symbiont sequence data is a challenge, but also provides an opportunity for understanding functionality and evolution of these and other non-model organisms. Much more information than for any other eukaryotic species is available for humans, especially related to signal transduction and diseases. However, the coral cnidarian host and human have diverged over 700 million years ago and homologies between proteins in the two species are therefore often in the gray zone, or at least often undetectable with traditional BLAST searches. We introduce a two-stage approach to identifying putative coral homologues of human proteins. First, through remote homology detection using Hidden Markov Models, we identify candidate human homologues in the cnidarian genome. However, for many proteins, the human genome alone contains multiple family members with similar or even more divergence in sequence. In the second stage, therefore, we filter the remote homology results based on the functional and structural plausibility of each coral candidate, shortlisting the coral proteins likely to have conserved some of the functions of the human proteins. We demonstrate our approach with a pipeline for mapping membrane receptors in humans to membrane receptors in corals, with specific focus on the stony coral, P. damicornis. More than 1000 human membrane receptors mapped to 335 coral receptors, including 151 G protein coupled receptors (GPCRs). To validate specific sub-families, we chose opsin proteins, representative GPCRs that confer light sensitivity, and Toll-like receptors, representative non-GPCRs, which function in the immune response, and their ability to communicate with microorganisms. Through detailed structure-function analysis of their ligand-binding pockets and downstream signaling cascades, we selected those candidate remote homologues likely to carry out related functions in the corals. This pipeline may prove generally useful for other non-model organisms, such as to support the growing field of synthetic biology.


Assuntos
Antozoários , Receptores Acoplados a Proteínas G , Transdução de Sinais , Animais , Humanos , Antozoários/genética , Antozoários/fisiologia , Genoma , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Modelos Animais
17.
BMC Bioinformatics ; 13: 259, 2012 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-23039758

RESUMO

BACKGROUND: The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. RESULTS: We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. CONCLUSIONS: Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Proteínas/química
18.
Proteins ; 80(2): 410-20, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22095906

RESUMO

The supersecondary structure of amyloids and prions, proteins of intense clinical and biological interest, are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Previous work has demonstrated that probability-based prediction of discrete ß-strand pairs can offer insight into these structures. Here, we devise a system of energetic rules that can be used to dynamically assemble these discrete ß-strand pairs into complete amyloid ß-structures. The STITCHER algorithm progressively 'stitches' strand-pairs into full ß-sheets based on a novel free-energy model, incorporating experimentally observed amino-acid side-chain stacking contributions, entropic estimates, and steric restrictions for amyloidal parallel ß-sheet construction. A dynamic program computes the top 50 structures and returns both the highest scoring structure and a consensus structure taken by polling this list for common discrete elements. Putative structural heterogeneity can be inferred from sequence regions that compose poorly. Predictions show agreement with experimental models of Alzheimer's amyloid beta peptide and the Podospora anserina Het-s prion. Predictions of the HET-s homolog HET-S also reflect experimental observations of poor amyloid formation. We put forward predicted structures for the yeast prion Sup35, suggesting N-terminal structural stability enabled by tyrosine ladders, and C-terminal heterogeneity. Predictions for the Rnq1 prion and alpha-synuclein are also given, identifying a similar mix of homogenous and heterogeneous secondary structure elements. STITCHER provides novel insight into the energetic basis of amyloid structure, provides accurate structure predictions, and can help guide future experimental studies.


Assuntos
Algoritmos , Peptídeos beta-Amiloides/química , Príons/química , Dobramento de Proteína , Amiloide/química , Entropia , Proteínas Fúngicas/química , Proteínas de Filamentos Intermediários/química , Fatores de Terminação de Peptídeos/química , Estrutura Secundária de Proteína , Proteínas de Saccharomyces cerevisiae/química
19.
Annu Rev Biomed Data Sci ; 5: 205-231, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35537462

RESUMO

Coral reefs are home to over two million species and provide habitat for roughly 25% of all marine animals, but they are being severely threatened by pollution and climate change. A large amount of genomic, transcriptomic, and other omics data is becoming increasingly available from different species of reef-building corals, the unicellular dinoflagellates, and the coral microbiome (bacteria, archaea, viruses, fungi, etc.). Such new data present an opportunity for bioinformatics researchers and computational biologists to contribute to a timely, compelling, and urgent investigation of critical factors that influence reef health and resilience.


Assuntos
Antozoários , Microbiota , Animais , Antozoários/genética , Biologia Computacional , Recifes de Corais , Microbiota/genética , Simbiose/genética
20.
PLoS One ; 17(1): e0261811, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34995299

RESUMO

Understanding the spread of false or dangerous beliefs-often called misinformation or disinformation-through a population has never seemed so urgent. Network science researchers have often taken a page from epidemiologists, and modeled the spread of false beliefs as similar to how a disease spreads through a social network. However, absent from those disease-inspired models is an internal model of an individual's set of current beliefs, where cognitive science has increasingly documented how the interaction between mental models and incoming messages seems to be crucially important for their adoption or rejection. Some computational social science modelers analyze agent-based models where individuals do have simulated cognition, but they often lack the strengths of network science, namely in empirically-driven network structures. We introduce a cognitive cascade model that combines a network science belief cascade approach with an internal cognitive model of the individual agents as in opinion diffusion models as a public opinion diffusion (POD) model, adding media institutions as agents which begin opinion cascades. We show that the model, even with a very simplistic belief function to capture cognitive effects cited in disinformation study (dissonance and exposure), adds expressive power over existing cascade models. We conduct an analysis of the cognitive cascade model with our simple cognitive function across various graph topologies and institutional messaging patterns. We argue from our results that population-level aggregate outcomes of the model qualitatively match what has been reported in COVID-related public opinion polls, and that the model dynamics lend insights as to how to address the spread of problematic beliefs. The overall model sets up a framework with which social science misinformation researchers and computational opinion diffusion modelers can join forces to understand, and hopefully learn how to best counter, the spread of disinformation and "alternative facts."


Assuntos
COVID-19 , Desinformação , Modelos Teóricos , Opinião Pública , SARS-CoV-2 , Mídias Sociais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA