Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 120(24): e2220778120, 2023 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-37289807

RESUMEN

Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.


Asunto(s)
Descubrimiento de Drogas , Proteínas , Humanos , Proteínas/química , Descubrimiento de Drogas/métodos , Evaluación Preclínica de Medicamentos , Lenguaje
2.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37897686

RESUMEN

MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS: We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION: TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.


Asunto(s)
Proteínas , Programas Informáticos , Secuencia de Aminoácidos , Proteínas/química
3.
Bioinformatics ; 38(Suppl 1): i264-i272, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758793

RESUMEN

SUMMARY: Computational methods to predict protein-protein interaction (PPI) typically segregate into sequence-based 'bottom-up' methods that infer properties from the characteristics of the individual protein sequences, or global 'top-down' methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. AVAILABILITY AND IMPLEMENTATION: https://topsyturvy.csail.mit.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas , Secuencia de Aminoácidos , Mapeo de Interacción de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo
4.
Bioinformatics ; 38(13): 3395-3406, 2022 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-35575379

RESUMEN

MOTIVATION: Protein function prediction, based on the patterns of connection in a protein-protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein-protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein-protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. RESULTS: GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein-protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein-protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein-protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson's Disease GWAS genes, rediscover many genes which have known involvement in Parkinson's disease pathways, plus suggest some new genes to study. AVAILABILITY AND IMPLEMENTATION: All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Enfermedad de Parkinson , Humanos , Biología Computacional/métodos , Algoritmos , Proteínas/metabolismo
5.
Nat Rev Genet ; 18(9): 551-562, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28607512

RESUMEN

Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery - network propagation - suggesting that it is a powerful data transformation method of broad utility in genetic research.


Asunto(s)
Biología Computacional , Enfermedad/genética , Redes Reguladoras de Genes , Estudios de Asociación Genética , Programas Informáticos , Algoritmos , Humanos , Mapas de Interacción de Proteínas , Proteínas/metabolismo
6.
Cell Mol Life Sci ; 79(2): 78, 2022 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-35044538

RESUMEN

Three-dimensional (3D) in vitro culture systems using human induced pluripotent stem cells (hiPSCs) are useful tools to model neurodegenerative disease biology in physiologically relevant microenvironments. Though many successful biomaterials-based 3D model systems have been established for other neurogenerative diseases, such as Alzheimer's disease, relatively few exist for Parkinson's disease (PD) research. We employed tissue engineering approaches to construct a 3D silk scaffold-based platform for the culture of hiPSC-dopaminergic (DA) neurons derived from healthy individuals and PD patients harboring LRRK2 G2019S or GBA N370S mutations. We then compared results from protein, gene expression, and metabolic analyses obtained from two-dimensional (2D) and 3D culture systems. The 3D platform enabled the formation of dense dopamine neuronal network architectures and developed biological profiles both similar and distinct from 2D culture systems in healthy and PD disease lines. PD cultures developed in 3D platforms showed elevated levels of α-synuclein and alterations in purine metabolite profiles. Furthermore, computational network analysis of transcriptomic networks nominated several novel molecular interactions occurring in neurons from patients with mutations in LRRK2 and GBA. We conclude that the brain-like 3D system presented here is a realistic platform to interrogate molecular mechanisms underlying PD biology.


Asunto(s)
Neuronas Dopaminérgicas/patología , Enfermedad de Parkinson/patología , Bioingeniería , Técnicas de Cultivo Tridimensional de Células , Células Cultivadas , Neuronas Dopaminérgicas/citología , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/patología , Neurogénesis , Seda/química , Andamios del Tejido/química
7.
Nat Methods ; 16(9): 843-852, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31471613

RESUMEN

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.


Asunto(s)
Biología Computacional/métodos , Enfermedad/genética , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Modelos Biológicos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Algoritmos , Perfilación de la Expresión Génica , Humanos , Fenotipo , Mapas de Interacción de Proteínas
8.
Bioinformatics ; 36(Suppl_1): i464-i473, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657369

RESUMEN

MOTIVATION: One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein-protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. RESULTS: We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE's global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn's disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. AVAILABILITY AND IMPLEMENTATION: GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Saccharomyces cerevisiae , Difusión , Humanos , Mapeo de Interacción de Proteínas
9.
Bioinformatics ; 30(12): i219-27, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24931987

RESUMEN

MOTIVATION: It has long been hypothesized that incorporating models of network noise as well as edge directions and known pathway information into the representation of protein-protein interaction (PPI) networks might improve their utility for functional inference. However, a simple way to do this has not been obvious. We find that diffusion state distance (DSD), our recent diffusion-based metric for measuring dissimilarity in PPI networks, has natural extensions that incorporate confidence, directions and can even express coherent pathways by calculating DSD on an augmented graph. RESULTS: We define three incremental versions of DSD which we term cDSD, caDSD and capDSD, where the capDSD matrix incorporates confidence, known directed edges, and pathways into the measure of how similar each pair of nodes is according to the structure of the PPI network. We test four popular function prediction methods (majority vote, weighted majority vote, multi-way cut and functional flow) using these different matrices on the Baker's yeast PPI network in cross-validation. The best performing method is weighted majority vote using capDSD. We then test the performance of our augmented DSD methods on an integrated heterogeneous set of protein association edges from the STRING database. The superior performance of capDSD in this context confirms that treating the pathways as probabilistic units is more powerful than simply incorporating pathway edges independently into the network. AVAILABILITY: All source code for calculating the confidences, for extracting pathway information from KEGG XML files, and for calculating the cDSD, caDSD and capDSD matrices are available from http://dsd.cs.tufts.edu/capdsd


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Algoritmos , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
Bioinformatics ; 29(13): i283-90, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23812995

RESUMEN

MOTIVATION: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. RESULTS: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP's runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. AVAILABILITY: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ CONTACT: bab@mit.edu.


Asunto(s)
Algoritmos , Compresión de Datos/métodos , Bases de Datos de Proteínas , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Genómica/métodos
11.
PeerJ ; 12: e16804, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38313028

RESUMEN

Once thought to be a unique capability of the Langerhans islets in the pancreas of mammals, insulin (INS) signaling is now recognized as an evolutionarily ancient function going back to prokaryotes. INS is ubiquitously present not only in humans but also in unicellular eukaryotes, fungi, worms, and Drosophila. Remote homologue identification also supports the presence of INS and INS receptor in corals where the availability of glucose is largely dependent on the photosynthetic activity of the symbiotic algae. The cnidarian animal host of corals operates together with a 20,000-sized microbiome, in direct analogy to the human gut microbiome. In humans, aberrant INS signaling is the hallmark of metabolic disease, and is thought to play a major role in aging, and age-related diseases, such as Alzheimer's disease. We here would like to argue that a broader view of INS beyond its human homeostasis function may help us understand other organisms, and in turn, studying those non-model organisms may enable a novel view of the human INS signaling system. To this end, we here review INS signaling from a new angle, by drawing analogies between humans and corals at the molecular level.


Asunto(s)
Antozoos , Islotes Pancreáticos , Animales , Humanos , Antozoos/metabolismo , Insulina/metabolismo , Islotes Pancreáticos/metabolismo , Páncreas/metabolismo , Transducción de Señal
12.
PeerJ ; 12: e16654, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38313033

RESUMEN

Anthropogenic activities increase sediment suspended in the water column and deposition on reefs can be largely dependent on colony morphology. Massive and plating corals have a high capacity to trap sediments, and active removal mechanisms can be energetically costly. Branching corals trap less sediment but are more susceptible to light limitation caused by suspended sediment. Despite deleterious effects of sediments on corals, few studies have examined the molecular response of corals with different morphological characteristics to sediment stress. To address this knowledge gap, this study assessed the transcriptomic responses of branching and massive corals in Florida and Hawai'i to varying levels of sediment exposure. Gene expression analysis revealed a molecular responsiveness to sediments across species and sites. Differential Gene Expression followed by Gene Ontology (GO) enrichment analysis identified that branching corals had the largest transcriptomic response to sediments, in developmental processes and metabolism, while significantly enriched GO terms were highly variable between massive corals, despite similar morphologies. Comparison of DEGs within orthogroups revealed that while all corals had DEGs in response to sediment, there was not a concerted gene set response by morphology or location. These findings illuminate the species specificity and genetic basis underlying coral susceptibility to sediments.


Asunto(s)
Antozoos , Animales , Antozoos/genética , Arrecifes de Coral , Perfilación de la Expresión Génica , Transcriptoma/genética , Agua
13.
Bioinform Adv ; 4(1): vbae099, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39143982

RESUMEN

Summary: Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation: Not applicable.

14.
BMC Bioinformatics ; 14: 23, 2013 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-23331614

RESUMEN

BACKGROUND: New technology has resulted in high-throughput screens for pairwise genetic interactions in yeast and other model organisms. For each pair in a collection of non-essential genes, an epistasis score is obtained, representing how much sicker (or healthier) the double-knockout organism will be compared to what would be expected from the sickness of the component single knockouts. Recent algorithmic work has identified graph-theoretic patterns in this data that can indicate functional modules, and even sets of genes that may occur in compensatory pathways, such as a BPM-type schema first introduced by Kelley and Ideker. However, to date, any algorithms for finding such patterns in the data were implemented internally, with no software being made publically available. RESULTS: Genecentric is a new package that implements a parallelized version of the Leiserson et al. algorithm (J Comput Biol 18:1399-1409, 2011) for generating generalized BPMs from high-throughput genetic interaction data. Given a matrix of weighted epistasis values for a set of double knock-outs, Genecentric returns a list of generalized BPMs that may represent compensatory pathways. Genecentric also has an extension, GenecentricGO, to query FuncAssociate (Bioinformatics 25:3043-3044, 2009) to retrieve GO enrichment statistics on generated BPMs. Python is the only dependency, and our web site provides working examples and documentation. CONCLUSION: We find that Genecentric can be used to find coherent functional and perhaps compensatory gene sets from high throughput genetic interaction data. Genecentric is made freely available for download under the GPLv2 from http://bcb.cs.tufts.edu/genecentric.


Asunto(s)
Epistasis Genética , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Genes Fúngicos , Modelos Genéticos , Saccharomyces cerevisiae/genética
15.
Bioinformatics ; 28(9): 1216-22, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-22408192

RESUMEN

MOTIVATION: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif. RESULTS: We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile-profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions. Availability and implementaion: A webserver that runs SMURFLite is available at: http://smurf.cs.tufts.edu/smurflite/


Asunto(s)
Cadenas de Markov , Estructura Secundaria de Proteína , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Genoma Bacteriano , Humanos , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/genética , Thermotoga maritima/genética
16.
Proc Natl Acad Sci U S A ; 107(9): 4069-74, 2010 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-20147619

RESUMEN

The recent explosion in newly sequenced bacterial genomes is outpacing the capacity of researchers to try to assign functional annotation to all the new proteins. Hence, computational methods that can help predict structural motifs provide increasingly important clues in helping to determine how these proteins might function. We introduce a Markov Random Field approach tailored for recognizing proteins that fold into mainly beta-structural motifs, and apply it to build recognizers for the beta-propeller shapes. As an application, we identify a potential class of hybrid two-component sensor proteins, that we predict contain a double-propeller domain.


Asunto(s)
Proteínas Bacterianas/química , Histidina Quinasa , Cadenas de Markov , Conformación Proteica , Proteínas Quinasas/química
17.
PLoS One ; 18(2): e0270965, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36735673

RESUMEN

With the ease of gene sequencing and the technology available to study and manipulate non-model organisms, the extension of the methodological toolbox required to translate our understanding of model organisms to non-model organisms has become an urgent problem. For example, mining of large coral and their symbiont sequence data is a challenge, but also provides an opportunity for understanding functionality and evolution of these and other non-model organisms. Much more information than for any other eukaryotic species is available for humans, especially related to signal transduction and diseases. However, the coral cnidarian host and human have diverged over 700 million years ago and homologies between proteins in the two species are therefore often in the gray zone, or at least often undetectable with traditional BLAST searches. We introduce a two-stage approach to identifying putative coral homologues of human proteins. First, through remote homology detection using Hidden Markov Models, we identify candidate human homologues in the cnidarian genome. However, for many proteins, the human genome alone contains multiple family members with similar or even more divergence in sequence. In the second stage, therefore, we filter the remote homology results based on the functional and structural plausibility of each coral candidate, shortlisting the coral proteins likely to have conserved some of the functions of the human proteins. We demonstrate our approach with a pipeline for mapping membrane receptors in humans to membrane receptors in corals, with specific focus on the stony coral, P. damicornis. More than 1000 human membrane receptors mapped to 335 coral receptors, including 151 G protein coupled receptors (GPCRs). To validate specific sub-families, we chose opsin proteins, representative GPCRs that confer light sensitivity, and Toll-like receptors, representative non-GPCRs, which function in the immune response, and their ability to communicate with microorganisms. Through detailed structure-function analysis of their ligand-binding pockets and downstream signaling cascades, we selected those candidate remote homologues likely to carry out related functions in the corals. This pipeline may prove generally useful for other non-model organisms, such as to support the growing field of synthetic biology.


Asunto(s)
Antozoos , Receptores Acoplados a Proteínas G , Transducción de Señal , Animales , Humanos , Antozoos/genética , Antozoos/fisiología , Genoma , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Modelos Animales
18.
BMC Bioinformatics ; 13: 259, 2012 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-23039758

RESUMEN

BACKGROUND: The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. RESULTS: We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. CONCLUSIONS: Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Proteínas/química
19.
Proteins ; 80(2): 410-20, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22095906

RESUMEN

The supersecondary structure of amyloids and prions, proteins of intense clinical and biological interest, are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Previous work has demonstrated that probability-based prediction of discrete ß-strand pairs can offer insight into these structures. Here, we devise a system of energetic rules that can be used to dynamically assemble these discrete ß-strand pairs into complete amyloid ß-structures. The STITCHER algorithm progressively 'stitches' strand-pairs into full ß-sheets based on a novel free-energy model, incorporating experimentally observed amino-acid side-chain stacking contributions, entropic estimates, and steric restrictions for amyloidal parallel ß-sheet construction. A dynamic program computes the top 50 structures and returns both the highest scoring structure and a consensus structure taken by polling this list for common discrete elements. Putative structural heterogeneity can be inferred from sequence regions that compose poorly. Predictions show agreement with experimental models of Alzheimer's amyloid beta peptide and the Podospora anserina Het-s prion. Predictions of the HET-s homolog HET-S also reflect experimental observations of poor amyloid formation. We put forward predicted structures for the yeast prion Sup35, suggesting N-terminal structural stability enabled by tyrosine ladders, and C-terminal heterogeneity. Predictions for the Rnq1 prion and alpha-synuclein are also given, identifying a similar mix of homogenous and heterogeneous secondary structure elements. STITCHER provides novel insight into the energetic basis of amyloid structure, provides accurate structure predictions, and can help guide future experimental studies.


Asunto(s)
Algoritmos , Péptidos beta-Amiloides/química , Priones/química , Pliegue de Proteína , Amiloide/química , Entropía , Proteínas Fúngicas/química , Proteínas de Filamentos Intermediarios/química , Factores de Terminación de Péptidos/química , Estructura Secundaria de Proteína , Proteínas de Saccharomyces cerevisiae/química
20.
Annu Rev Biomed Data Sci ; 5: 205-231, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35537462

RESUMEN

Coral reefs are home to over two million species and provide habitat for roughly 25% of all marine animals, but they are being severely threatened by pollution and climate change. A large amount of genomic, transcriptomic, and other omics data is becoming increasingly available from different species of reef-building corals, the unicellular dinoflagellates, and the coral microbiome (bacteria, archaea, viruses, fungi, etc.). Such new data present an opportunity for bioinformatics researchers and computational biologists to contribute to a timely, compelling, and urgent investigation of critical factors that influence reef health and resilience.


Asunto(s)
Antozoos , Microbiota , Animales , Antozoos/genética , Biología Computacional , Arrecifes de Coral , Microbiota/genética , Simbiosis/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA