Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 120(24): e2220778120, 2023 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-37289807

RESUMEN

Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.


Asunto(s)
Descubrimiento de Drogas , Proteínas , Humanos , Proteínas/química , Descubrimiento de Drogas/métodos , Evaluación Preclínica de Medicamentos , Lenguaje
2.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37897686

RESUMEN

MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS: We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION: TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.


Asunto(s)
Proteínas , Programas Informáticos , Secuencia de Aminoácidos , Proteínas/química
3.
Bioinformatics ; 38(Suppl 1): i264-i272, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758793

RESUMEN

SUMMARY: Computational methods to predict protein-protein interaction (PPI) typically segregate into sequence-based 'bottom-up' methods that infer properties from the characteristics of the individual protein sequences, or global 'top-down' methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. AVAILABILITY AND IMPLEMENTATION: https://topsyturvy.csail.mit.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas , Secuencia de Aminoácidos , Mapeo de Interacción de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo
4.
Bioinformatics ; 38(13): 3395-3406, 2022 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-35575379

RESUMEN

MOTIVATION: Protein function prediction, based on the patterns of connection in a protein-protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein-protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein-protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. RESULTS: GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein-protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein-protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein-protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson's Disease GWAS genes, rediscover many genes which have known involvement in Parkinson's disease pathways, plus suggest some new genes to study. AVAILABILITY AND IMPLEMENTATION: All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Enfermedad de Parkinson , Humanos , Biología Computacional/métodos , Algoritmos , Proteínas/metabolismo
5.
Nat Rev Genet ; 18(9): 551-562, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28607512

RESUMEN

Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery - network propagation - suggesting that it is a powerful data transformation method of broad utility in genetic research.


Asunto(s)
Biología Computacional , Enfermedad/genética , Redes Reguladoras de Genes , Estudios de Asociación Genética , Programas Informáticos , Algoritmos , Humanos , Mapas de Interacción de Proteínas , Proteínas/metabolismo
6.
Cell Mol Life Sci ; 79(2): 78, 2022 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-35044538

RESUMEN

Three-dimensional (3D) in vitro culture systems using human induced pluripotent stem cells (hiPSCs) are useful tools to model neurodegenerative disease biology in physiologically relevant microenvironments. Though many successful biomaterials-based 3D model systems have been established for other neurogenerative diseases, such as Alzheimer's disease, relatively few exist for Parkinson's disease (PD) research. We employed tissue engineering approaches to construct a 3D silk scaffold-based platform for the culture of hiPSC-dopaminergic (DA) neurons derived from healthy individuals and PD patients harboring LRRK2 G2019S or GBA N370S mutations. We then compared results from protein, gene expression, and metabolic analyses obtained from two-dimensional (2D) and 3D culture systems. The 3D platform enabled the formation of dense dopamine neuronal network architectures and developed biological profiles both similar and distinct from 2D culture systems in healthy and PD disease lines. PD cultures developed in 3D platforms showed elevated levels of α-synuclein and alterations in purine metabolite profiles. Furthermore, computational network analysis of transcriptomic networks nominated several novel molecular interactions occurring in neurons from patients with mutations in LRRK2 and GBA. We conclude that the brain-like 3D system presented here is a realistic platform to interrogate molecular mechanisms underlying PD biology.


Asunto(s)
Neuronas Dopaminérgicas/patología , Enfermedad de Parkinson/patología , Bioingeniería , Técnicas de Cultivo Tridimensional de Células , Células Cultivadas , Neuronas Dopaminérgicas/citología , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/patología , Neurogénesis , Seda/química , Andamios del Tejido/química
7.
Nat Methods ; 16(9): 843-852, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31471613

RESUMEN

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.


Asunto(s)
Biología Computacional/métodos , Enfermedad/genética , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Modelos Biológicos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Algoritmos , Perfilación de la Expresión Génica , Humanos , Fenotipo , Mapas de Interacción de Proteínas
8.
Bioinformatics ; 36(Suppl_1): i464-i473, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657369

RESUMEN

MOTIVATION: One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein-protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. RESULTS: We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE's global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn's disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. AVAILABILITY AND IMPLEMENTATION: GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Saccharomyces cerevisiae , Difusión , Humanos , Mapeo de Interacción de Proteínas
9.
Bioinformatics ; 30(12): i219-27, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24931987

RESUMEN

MOTIVATION: It has long been hypothesized that incorporating models of network noise as well as edge directions and known pathway information into the representation of protein-protein interaction (PPI) networks might improve their utility for functional inference. However, a simple way to do this has not been obvious. We find that diffusion state distance (DSD), our recent diffusion-based metric for measuring dissimilarity in PPI networks, has natural extensions that incorporate confidence, directions and can even express coherent pathways by calculating DSD on an augmented graph. RESULTS: We define three incremental versions of DSD which we term cDSD, caDSD and capDSD, where the capDSD matrix incorporates confidence, known directed edges, and pathways into the measure of how similar each pair of nodes is according to the structure of the PPI network. We test four popular function prediction methods (majority vote, weighted majority vote, multi-way cut and functional flow) using these different matrices on the Baker's yeast PPI network in cross-validation. The best performing method is weighted majority vote using capDSD. We then test the performance of our augmented DSD methods on an integrated heterogeneous set of protein association edges from the STRING database. The superior performance of capDSD in this context confirms that treating the pathways as probabilistic units is more powerful than simply incorporating pathway edges independently into the network. AVAILABILITY: All source code for calculating the confidences, for extracting pathway information from KEGG XML files, and for calculating the cDSD, caDSD and capDSD matrices are available from http://dsd.cs.tufts.edu/capdsd


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Algoritmos , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
Bioinformatics ; 29(13): i283-90, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23812995

RESUMEN

MOTIVATION: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. RESULTS: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP's runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. AVAILABILITY: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ CONTACT: bab@mit.edu.


Asunto(s)
Algoritmos , Compresión de Datos/métodos , Bases de Datos de Proteínas , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Genómica/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA