Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 34(13): i295-i303, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949957

RESUMEN

Motivation: The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, Basic local alignment tool and ProtVec, and two compound fingerprint-based protein representation methods are compared. Results: We showed that ligand-based protein representation, which uses only SMILES strings of the ligands that proteins bind to, performs as well as protein sequence-based representation methods in protein clustering. The results suggest that ligand-based protein description can be an alternative to the traditional sequence or structure-based representation of proteins and this novel approach can be applied to different bioinformatics problems such as prediction of new protein-ligand interactions and protein function annotation. Availability and implementation: https://github.com/hkmztrk/SMILESVecProteinRepresentation. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Ligandos , Proteínas , Secuencia de Aminoácidos , Análisis por Conglomerados , Biología Computacional/métodos , Modelos Moleculares , Unión Proteica , Proteínas/química , Análisis de Secuencia de Proteína
2.
Bioinformatics ; 34(17): i821-i829, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423097

RESUMEN

Motivation: The identification of novel drug-target (DT) interactions is a substantial part of the drug discovery process. Most of the computational methods that have been proposed to predict DT interactions have focused on binary classification, where the goal is to determine whether a DT pair interacts or not. However, protein-ligand interactions assume a continuum of binding strength values, also called binding affinity and predicting this value still remains a challenge. The increase in the affinity data available in DT knowledge-bases allows the use of advanced learning techniques such as deep learning architectures in the prediction of binding affinities. In this study, we propose a deep-learning based model that uses only sequence information of both targets and drugs to predict DT interaction binding affinities. The few studies that focus on DT binding affinity prediction use either 3D structures of protein-ligand complexes or 2D features of compounds. One novel approach used in this work is the modeling of protein sequences and compound 1D representations with convolutional neural networks (CNNs). Results: The results show that the proposed deep learning based model that uses the 1D representations of targets and drugs is an effective approach for drug target binding affinity prediction. The model in which high-level representations of a drug and a target are constructed via CNNs achieved the best Concordance Index (CI) performance in one of our larger benchmark datasets, outperforming the KronRLS algorithm and SimBoost, a state-of-the-art method for DT binding affinity prediction. Availability and implementation: https://github.com/hkmztrk/DeepDTA. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Ligandos , Aprendizaje Automático , Redes Neurales de la Computación , Programas Informáticos
3.
Bioinformatics ; 33(14): i49-i58, 2017 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-28881973

RESUMEN

MOTIVATION: The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. METHODS: We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. RESULTS: The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. AVAILABILITY AND IMPLEMENTATION: A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . CONTACT: gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr.


Asunto(s)
Procesamiento de Lenguaje Natural , Aprendizaje Automático Supervisado , Ontologías Biológicas , Humanos , Semántica
4.
BMC Bioinformatics ; 17: 128, 2016 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-26987649

RESUMEN

BACKGROUND: Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. RESULTS: In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. CONCLUSION: The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.


Asunto(s)
Enzimas/metabolismo , Canales Iónicos/metabolismo , Modelos Moleculares , Preparaciones Farmacéuticas/metabolismo , Receptores Citoplasmáticos y Nucleares/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Ligandos , Estructura Molecular , Unión Proteica
5.
Mol Inform ; 40(5): e2000212, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33225594

RESUMEN

Identification of high affinity drug-target interactions is a major research question in drug discovery. Proteins are generally represented by their structures or sequences. However, structures are available only for a small subset of biomolecules and sequence similarity is not always correlated with functional similarity. We propose ChemBoost, a chemical language based approach for affinity prediction using SMILES syntax. We hypothesize that SMILES is a codified language and ligands are documents composed of chemical words. These documents can be used to learn chemical word vectors that represent words in similar contexts with similar vectors. In ChemBoost, the ligands are represented via chemical word embeddings, while the proteins are represented through sequence-based features and/or chemical words of their ligands. Our aim is to process the patterns in SMILES as a language to predict protein-ligand affinity, even when we cannot infer the function from the sequence. We used eXtreme Gradient Boosting to predict protein-ligand affinities in KIBA and BindingDB data sets. ChemBoost was able to predict drug-target binding affinity as well as or better than state-of-the-art machine learning systems. When powered with ligand-centric representations, ChemBoost was more robust to the changes in protein sequence similarity and successfully captured the interactions between a protein and a ligand, even if the protein has low sequence similarity to the known targets of the ligand.


Asunto(s)
Descubrimiento de Drogas/métodos , Aprendizaje Automático , Unión Proteica , Biología Computacional/métodos , Química Computacional/métodos , Ligandos
6.
Drug Discov Today ; 25(4): 689-705, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32027969

RESUMEN

Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge. Advances in natural language processing (NLP) methodologies in the processing of spoken languages accelerated the application of NLP to elucidate hidden knowledge in textual representations of these biochemical entities and then use it to construct models to predict molecular properties or to design novel molecules. This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.


Asunto(s)
Diseño de Fármacos , Descubrimiento de Drogas/métodos , Procesamiento de Lenguaje Natural , Química Farmacéutica/métodos , Simulación por Computador , Humanos
7.
PLoS One ; 12(10): e0185558, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28981542

RESUMEN

Trastuzumab is a monoclonal antibody frequently used to prevent the progression of HER2+ breast cancers, which constitute approximately 20% of invasive breast cancers. microRNAs (miRNAs) are small, non-coding RNA molecules that are known to be involved in gene regulation. With their emerging roles in cancer, they are recently promoted as potential candidates to mediate therapeutic actions by targeting genes associated with drug response. In this study we explored miRNA-mediated regulation of trastuzumab mechanisms by identifying the important miRNAs responsible for the drug response via homogenous network analysis. Our network model enabled us to simplify the complexity of miRNA interactions by connecting them through their common pathways. We outlined the functionally relevant miRNAs by constructing pathway-based miRNA-miRNA networks in SKBR3 and BT474 cells, respectively. Identification of the most targeted genes revealed that trastuzumab responsive miRNAs favourably regulate the repression of targets with longer 3'UTR than average considered to be key elements, while the miRNA-miRNA networks highlighted central miRNAs such as hsa-miR-3976 and hsa-miR-3671 that showed strong interactions with the remaining members of the network. Furthermore, the clusters of the miRNA-miRNA networks showed that trastuzumab response was mostly established through cancer related and metabolic pathways. hsa-miR-216b was found to be the part of the most powerful interactions of metabolic pathways, which was defined in the largest clusters in both cell lines. The network based representation of miRNA-miRNA interactions through their shared pathways provided a better understanding of miRNA-mediated drug response and could be suggested for further characterization of miRNA functions.


Asunto(s)
Antineoplásicos/uso terapéutico , Neoplasias de la Mama/genética , Redes Reguladoras de Genes , MicroARNs/genética , Trastuzumab/uso terapéutico , Neoplasias de la Mama/tratamiento farmacológico , Línea Celular Tumoral , Femenino , Humanos
8.
PLoS One ; 10(2): e0117874, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25689853

RESUMEN

ß-lactamase mediated antibiotic resistance is an important health issue and the discovery of new ß-lactam type antibiotics or ß-lactamase inhibitors is an area of intense research. Today, there are about a thousand ß-lactamases due to the evolutionary pressure exerted by these ligands. While ß-lactamases hydrolyse the ß-lactam ring of antibiotics, rendering them ineffective, Penicillin-Binding Proteins (PBPs), which share high structural similarity with ß-lactamases, also confer antibiotic resistance to their host organism by acquiring mutations that allow them to continue their participation in cell wall biosynthesis. In this paper, we propose a novel approach to include ligand sharing information for classifying and clustering ß-lactamases and PBPs in an effort to elucidate the ligand induced evolution of these ß-lactam binding proteins. We first present a detailed summary of the ß-lactamase and PBP families in the Protein Data Bank, as well as the compounds they bind to. Then, we build two different types of networks in which the proteins are represented as nodes, and two proteins are connected by an edge with a weight that depends on the number of shared identical or similar ligands. These models are analyzed under three different edge weight settings, namely unweighted, weighted, and normalized weighted. A detailed comparison of these six networks showed that the use of ligand sharing information to cluster proteins resulted in modules comprising proteins with not only sequence similarity but also functional similarity. Consideration of ligand similarity highlighted some interactions that were not detected in the identical ligand network. Analysing the ß-lactamases and PBPs using ligand-centric network models enabled the identification of novel relationships, suggesting that these models can be used to examine other protein families to obtain information on their ligand induced evolutionary paths.


Asunto(s)
Biología Computacional/métodos , Proteínas de Unión a las Penicilinas/metabolismo , beta-Lactamasas/metabolismo , Análisis por Conglomerados , Bases de Datos de Proteínas , Ligandos , Modelos Estadísticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA