Búsqueda | Portal de Búsqueda de la BVS Ecuador

1.

Transfer learning for drug-target interaction prediction.

Dalkiran, Alperen; Atakan, Ahmet; Rifaioglu, Ahmet S; Martin, Maria J; Atalay, Rengül Çetin; Acar, Aybar C; Dogan, Tunca; Atalay, Volkan.

Bioinformatics ; 39(39 Suppl 1): i103-i110, 2023 06 30.

Artículo en Inglés | MEDLINE | ID: mdl-37387156

RESUMEN

MOTIVATION: Utilizing AI-driven approaches for drug-target interaction (DTI) prediction require large volumes of training data which are not available for the majority of target proteins. In this study, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. To explore this idea, we selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. In two independent experiments, the protein families of transporters and nuclear receptors were individually set as the target datasets, while the remaining five families were used as the source datasets. Several size-based target family training datasets were formed in a controlled manner to assess the benefit provided by the transfer learning approach. RESULTS: Here, we present a systematic evaluation of our approach by pre-training a feed-forward neural network with source training datasets and applying different modes of transfer learning from the pre-trained source network to a target dataset. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets. AVAILABILITY AND IMPLEMENTATION: The source code and datasets are available at https://github.com/cansyl/TransferLearning4DTI. Our web-based service containing the ready-to-use pre-trained models is accessible at https://tl4dti.kansil.org.

Asunto(s)

Redes Neurales de la Computación , Péptido Hidrolasas , Programas Informáticos , Aprendizaje Automático

2.

SLPred: a multi-view subcellular localization prediction tool for multi-location human proteins.

Özsari, Gökhan; Rifaioglu, Ahmet Sureyya; Atakan, Ahmet; Dogan, Tunca; Martin, Maria Jesus; Çetin Atalay, Rengül; Atalay, Volkan.

Bioinformatics ; 38(17): 4226-4229, 2022 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-35801913

RESUMEN

SUMMARY: Accurate prediction of the subcellular locations (SLs) of proteins is a critical topic in protein science. In this study, we present SLPred, an ensemble-based multi-view and multi-label protein subcellular localization prediction tool. For a query protein sequence, SLPred provides predictions for nine main SLs using independent machine-learning models trained for each location. We used UniProtKB/Swiss-Prot human protein entries and their curated SL annotations as our source data. We connected all disjoint terms in the UniProt SL hierarchy based on the corresponding term relationships in the cellular component category of Gene Ontology and constructed a training dataset that is both reliable and large scale using the re-organized hierarchy. We tested SLPred on multiple benchmarking datasets including our-in house sets and compared its performance against six state-of-the-art methods. Results indicated that SLPred outperforms other tools in the majority of cases. AVAILABILITY AND IMPLEMENTATION: SLPred is available both as an open-access and user-friendly web-server (https://slpred.kansil.org) and a stand-alone tool (https://github.com/kansil/SLPred). All datasets used in this study are also available at https://slpred.kansil.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional , Proteínas , Humanos , Bases de Datos de Proteínas , Ontología de Genes , Proteínas/genética , Secuencia de Aminoácidos , Transporte de Proteínas , Biología Computacional/métodos

3.

CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations.

Dogan, Tunca; Atas, Heval; Joshi, Vishal; Atakan, Ahmet; Rifaioglu, Ahmet Sureyya; Nalbat, Esra; Nightingale, Andrew; Saidi, Rabie; Volynkin, Vladimir; Zellner, Hermann; Cetin-Atalay, Rengul; Martin, Maria; Atalay, Volkan.

Nucleic Acids Res ; 49(16): e96, 2021 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-34181736

RESUMEN

Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.

Asunto(s)

Biología Computacional/métodos , Programas Informáticos , Bases de Datos de Compuestos Químicos , Bases de Datos Genéticas , Aprendizaje Profundo , Humanos

4.

Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.

Rifaioglu, Ahmet Sureyya; Atas, Heval; Martin, Maria Jesus; Cetin-Atalay, Rengul; Atalay, Volkan; Dogan, Tunca.

Brief Bioinform ; 20(5): 1878-1912, 2019 09 27.

Artículo en Inglés | MEDLINE | ID: mdl-30084866

RESUMEN

The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.

Asunto(s)

Sistemas de Administración de Bases de Datos , Aprendizaje Profundo , Descubrimiento de Drogas , Simulación por Computador

5.

iBioProVis: interactive visualization and analysis of compound bioactivity space.

Donmez, Ataberk; Rifaioglu, Ahmet Sureyya; Acar, Aybar; Dogan, Tunca; Cetin-Atalay, Rengul; Atalay, Volkan.

Bioinformatics ; 36(14): 4227-4230, 2020 08 15.

Artículo en Inglés | MEDLINE | ID: mdl-32407491

RESUMEN

SUMMARY: iBioProVis is an interactive tool for visual analysis of the compound bioactivity space in the context of target proteins, drugs and drug candidate compounds. iBioProVis tool takes target protein identifiers and, optionally, compound SMILES as input, and uses the state-of-the-art non-linear dimensionality reduction method t-Distributed Stochastic Neighbor Embedding (t-SNE) to plot the distribution of compounds embedded in a 2D map, based on the similarity of structural properties of compounds and in the context of compounds' cognate targets. Similar compounds, which are embedded to proximate points on the 2D map, may bind the same or similar target proteins. Thus, iBioProVis can be used to easily observe the structural distribution of one or two target proteins' known ligands on the 2D compound space, and to infer new binders to the same protein, or to infer new potential target(s) for a compound of interest, based on this distribution. Principal component analysis (PCA) projection of the input compounds is also provided, Hence the user can interactively observe the same compound or a group of selected compounds which is projected by both PCA and embedded by t-SNE. iBioProVis also provides detailed information about drugs and drug candidate compounds through cross-references to widely used and well-known databases, in the form of linked table views. Two use-case studies were demonstrated, one being on angiotensin-converting enzyme 2 (ACE2) protein which is Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Spike protein receptor. ACE2 binding compounds and seven antiviral drugs were closely embedded in which two of them have been under clinical trial for Coronavirus disease 19 (COVID-19). AVAILABILITY AND IMPLEMENTATION: iBioProVis and its carefully filtered dataset are available at https://ibpv.kansil.org/ for public use. CONTACT: vatalay@metu.edu.tr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Modelos Moleculares , Peptidil-Dipeptidasa A/química , Programas Informáticos , Glicoproteína de la Espiga del Coronavirus/química , Enzima Convertidora de Angiotensina 2 , Inhibidores de la Enzima Convertidora de Angiotensina/química , Antivirales/química , Betacoronavirus , COVID-19 , Infecciones por Coronavirus , Humanos , Internet , Pandemias , Neumonía Viral , Análisis de Componente Principal , Receptores Adrenérgicos beta 2/química , Receptores Adrenérgicos beta 3/química , SARS-CoV-2 , Interfaz Usuario-Computador

6.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.

Dalkiran, Alperen; Rifaioglu, Ahmet Sureyya; Martin, Maria Jesus; Cetin-Atalay, Rengul; Atalay, Volkan; Dogan, Tunca.

BMC Bioinformatics ; 19(1): 334, 2018 Sep 21.

Artículo en Inglés | MEDLINE | ID: mdl-30241466

RESUMEN

BACKGROUND: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. RESULTS: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study. CONCLUSIONS: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred . ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html .

Asunto(s)

Biología Computacional/métodos , Enzimas/clasificación , Enzimas/metabolismo , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Terminología como Asunto , Algoritmos , Humanos

7.

iBioProVis: interactive visualization and analysis of compound bioactivity space.

Donmez, Ataberk; Rifaioglu, Ahmet Sureyya; Acar, Aybar; Dogan, Tunca; Cetin-Atalay, Rengul; Atalay, Volkan.

Bioinformatics ; 36(17): 4674, 2020 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-33094316

8.

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations.

Rifaioglu, Ahmet Sureyya; Nalbat, Esra; Atalay, Volkan; Martin, Maria Jesus; Cetin-Atalay, Rengul; Dogan, Tunca.

Chem Sci ; 11(9): 2531-2557, 2020 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-33209251

RESUMEN

The identification of physical interactions between drug candidate compounds and target biomolecules is an important process in drug discovery. Since conventional screening procedures are expensive and time consuming, computational approaches are employed to provide aid by automatically predicting novel drug-target interactions (DTIs). In this study, we propose a large-scale DTI prediction system, DEEPScreen, for early stage drug discovery, using deep convolutional neural networks. One of the main advantages of DEEPScreen is employing readily available 2-D structural representations of compounds at the input level instead of conventional descriptors that display limited performance. DEEPScreen learns complex features inherently from the 2-D representations, thus producing highly accurate predictions. The DEEPScreen system was trained for 704 target proteins (using curated bioactivity data) and finalized with rigorous hyper-parameter optimization tests. We compared the performance of DEEPScreen against the state-of-the-art on multiple benchmark datasets to indicate the effectiveness of the proposed approach and verified selected novel predictions through molecular docking analysis and literature-based validation. Finally, JAK proteins that were predicted by DEEPScreen as new targets of a well-known drug cladribine were experimentally demonstrated in vitro on cancer cells through STAT3 phosphorylation, which is the downstream effector protein. The DEEPScreen system can be exploited in the fields of drug discovery and repurposing for in silico screening of the chemogenomic space, to provide novel DTIs which can be experimentally pursued. The source code, trained "ready-to-use" prediction models, all datasets and the results of this study are available at ; https://github.com/cansyl/DEEPscreen.

9.

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks.

Sureyya Rifaioglu, Ahmet; Dogan, Tunca; Jesus Martin, Maria; Cetin-Atalay, Rengul; Atalay, Volkan.

Sci Rep ; 9(1): 7344, 2019 05 14.

Artículo en Inglés | MEDLINE | ID: mdl-31089211

RESUMEN

Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the 'biofilm formation process' in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred .

Asunto(s)

Redes Neurales de la Computación , Proteínas/metabolismo , Proteínas Bacterianas/metabolismo , Biopelículas/crecimiento & desarrollo , Minería de Datos , Aprendizaje Profundo , Ontología de Genes , Humanos , Modelos Biológicos , Infecciones por Pseudomonas/microbiología , Pseudomonas aeruginosa/fisiología , Programas Informáticos

10.

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Zhou, Naihui; Jiang, Yuxiang; Bergquist, Timothy R; Lee, Alexandra J; Kacsoh, Balint Z; Crocker, Alex W; Lewis, Kimberley A; Georghiou, George; Nguyen, Huy N; Hamid, Md Nafiz; Davis, Larry; Dogan, Tunca; Atalay, Volkan; Rifaioglu, Ahmet S; Dalkiran, Alperen; Cetin Atalay, Rengul; Zhang, Chengxin; Hurto, Rebecca L; Freddolino, Peter L; Zhang, Yang; Bhat, Prajwal; Supek, Fran; Fernández, José M; Gemovic, Branislava; Perovic, Vladimir R; Davidovic, Radoslav S; Sumonja, Neven; Veljkovic, Nevena; Asgari, Ehsaneddin; Mofrad, Mohammad R K; Profiti, Giuseppe; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita; Boecker, Florian; Schoof, Heiko; Kahanda, Indika; Thurlby, Natalie; McHardy, Alice C; Renaux, Alexandre; Saidi, Rabie; Gough, Julian; Freitas, Alex A; Antczak, Magdalena; Fabris, Fabio; Wass, Mark N; Hou, Jie; Cheng, Jianlin; Wang, Zheng; Romero, Alfonso E.

Genome Biol ; 20(1): 244, 2019 11 19.

Artículo en Inglés | MEDLINE | ID: mdl-31744546

RESUMEN

BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

Asunto(s)

Anotación de Secuencia Molecular/tendencias , Animales , Biopelículas , Candida albicans/genética , Drosophila melanogaster/genética , Genoma Bacteriano , Genoma Fúngico , Humanos , Locomoción , Memoria a Largo Plazo , Anotación de Secuencia Molecular/métodos , Pseudomonas aeruginosa/genética

11.

Subsequence-based feature map for protein function classification.

Sarac, Omer Sinan; Gürsoy-Yüzügüllü, Ozge; Cetin-Atalay, Rengul; Atalay, Volkan.

Comput Biol Chem ; 32(2): 122-30, 2008 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-18243801

RESUMEN

Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification is decomposed into fixed-length subsequences and they are clustered to obtain a representative feature space mapping. Mapping is defined as the distribution of the subsequences of a protein sequence over these clusters. The resulting feature space representation is used to train discriminative classifiers for functional families. The aim of this approach is to incorporate information coming from important subregions that are conserved over a family of proteins while avoiding the difficult task of explicit motif identification. The performance of the method was assessed through tests on various protein classification tasks. Our results showed that SPMap is capable of high accuracy classification in most of these tasks. Furthermore SPMap is fast and scalable enough to handle large datasets.

Asunto(s)

Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/clasificación , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Enzimas/química , Enzimas/clasificación , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/clasificación , Sensibilidad y Especificidad

12.

EClerize: A customized force-directed graph drawing algorithm for biological graphs with EC attributes.

Danaci, Hasan Fehmi; Cetin-Atalay, Rengul; Atalay, Volkan.

J Bioinform Comput Biol ; 16(4): 1850007, 2018 08.

Artículo en Inglés | MEDLINE | ID: mdl-29783871

RESUMEN

Visualizing large-scale data produced by the high throughput experiments as a biological graph leads to better understanding and analysis. This study describes a customized force-directed layout algorithm, EClerize, for biological graphs that represent pathways in which the nodes are associated with Enzyme Commission (EC) attributes. The nodes with the same EC class numbers are treated as members of the same cluster. Positions of nodes are then determined based on both the biological similarity and the connection structure. EClerize minimizes the intra-cluster distance, that is the distance between the nodes of the same EC cluster and maximizes the inter-cluster distance, that is the distance between two distinct EC clusters. EClerize is tested on a number of biological pathways and the improvement brought in is presented with respect to the original algorithm. EClerize is available as a plug-in to Cytoscape ( http://apps.cytoscape.org/apps/eclerize ).

Asunto(s)

Algoritmos , Gráficos por Computador , Enzimas , Presentación de Datos , Bases de Datos de Proteínas , Enzimas/clasificación , Enzimas/metabolismo , Transducción de Señal , Programas Informáticos

13.

Identification of novel reference genes based on MeSH categories.

Ersahin, Tulin; Carkacioglu, Levent; Can, Tolga; Konu, Ozlen; Atalay, Volkan; Cetin-Atalay, Rengul.

PLoS One ; 9(3): e93341, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24682035

RESUMEN

Transcriptome experiments are performed to assess protein abundance through mRNA expression analysis. Expression levels of genes vary depending on the experimental conditions and the cell response. Transcriptome data must be diverse and yet comparable in reference to stably expressed genes, even if they are generated from different experiments on the same biological context from various laboratories. In this study, expression patterns of 9090 microarray samples grouped into 381 NCBI-GEO datasets were investigated to identify novel candidate reference genes using randomizations and Receiver Operating Characteristic (ROC) curves. The analysis demonstrated that cell type specific reference gene sets display less variability than a united set for all tissues. Therefore, constitutively and stably expressed, origin specific novel reference gene sets were identified based on their coefficient of variation and percentage of occurrence in all GEO datasets, which were classified using Medical Subject Headings (MeSH). A large number of MeSH grouped reference gene lists are presented as novel tissue specific reference gene lists. The most commonly observed 17 genes in these sets were compared for their expression in 8 hepatocellular, 5 breast and 3 colon carcinoma cells by RT-qPCR to verify tissue specificity. Indeed, commonly used housekeeping genes GAPDH, Actin and EEF2 had tissue specific variations, whereas several ribosomal genes were among the most stably expressed genes in vitro. Our results confirm that two or more reference genes should be used in combination for differential expression analysis of large-scale data obtained from microarray or next generation sequencing studies. Therefore context dependent reference gene sets, as presented in this study, are required for normalization of expression data from diverse technological backgrounds.

Asunto(s)

Expresión Génica/genética , Línea Celular Tumoral , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Medical Subject Headings , Reacción en Cadena en Tiempo Real de la Polimerasa , Estándares de Referencia

14.

A signal transduction score flow algorithm for cyclic cellular pathway analysis, which combines transcriptome and ChIP-seq data.

Isik, Zerrin; Ersahin, Tulin; Atalay, Volkan; Aykanat, Cevdet; Cetin-Atalay, Rengul.

Mol Biosyst ; 8(12): 3224-31, 2012 Oct 30.

Artículo en Inglés | MEDLINE | ID: mdl-23042589

RESUMEN

Determination of cell signalling behaviour is crucial for understanding the physiological response to a specific stimulus or drug treatment. Current approaches for large-scale data analysis do not effectively incorporate critical topological information provided by the signalling network. We herein describe a novel model- and data-driven hybrid approach, or signal transduction score flow algorithm, which allows quantitative visualization of cyclic cell signalling pathways that lead to ultimate cell responses such as survival, migration or death. This score flow algorithm translates signalling pathways as a directed graph and maps experimental data, including negative and positive feedbacks, onto gene nodes as scores, which then computationally traverse the signalling pathway until a pre-defined biological target response is attained. Initially, experimental data-driven enrichment scores of the genes were computed in a pathway, then a heuristic approach was applied using the gene score partition as a solution for protein node stoichiometry during dynamic scoring of the pathway of interest. Incorporation of a score partition during the signal flow and cyclic feedback loops in the signalling pathway significantly improves the usefulness of this model, as compared to other approaches. Evaluation of the score flow algorithm using both transcriptome and ChIP-seq data-generated signalling pathways showed good correlation with expected cellular behaviour on both KEGG and manually generated pathways. Implementation of the algorithm as a Cytoscape plug-in allows interactive visualization and analysis of KEGG pathways as well as user-generated and curated Cytoscape pathways. Moreover, the algorithm accurately predicts gene-level and global impacts of single or multiple in silico gene knockouts.

Asunto(s)

Algoritmos , Biología Computacional , Perfilación de la Expresión Génica , Análisis por Matrices de Proteínas , Transducción de Señal , Modelos Biológicos , Transcriptoma

15.

GOPred: GO molecular function prediction by combined classifiers.

Saraç, Omer Sinan; Atalay, Volkan; Cetin-Atalay, Rengul.

PLoS One ; 5(8): e12382, 2010 Aug 31.

Artículo en Inglés | MEDLINE | ID: mdl-20824206

RESUMEN

Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred).

Asunto(s)

Biología Computacional/métodos , Proteínas/clasificación , Proteínas/metabolismo , Humanos , Internet

16.

Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering.

Carkacioglu, Levent; Atalay, Rengül Cetin; Konu, Ozlen; Atalay, Volkan; Can, Tolga.

Int J Data Min Bioinform ; 4(6): 701-21, 2010.

Artículo en Inglés | MEDLINE | ID: mdl-21355502

RESUMEN

Due to the increase in gene expression data sets in recent years, various data mining techniques have been proposed for mining gene expression profiles. However, most of these methods target single gene expression data sets and cannot handle all the available gene expression data in public databases in reasonable amount of time and space. In this paper, we propose a novel framework, bi-k-bi clustering, for finding association rules of gene pairs that can easily operate on large scale and multiple heterogeneous data sets. We applied our proposed framework on the available NCBI GEO Homo sapiens data sets. Our results show consistency and relatedness with the available literature and also provides novel associations.

Asunto(s)

Minería de Datos/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Expresión Génica , Análisis por Conglomerados , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos

17.

Implicit motif distribution based hybrid computational kernel for sequence classification.

Atalay, Volkan; Cetin-Atalay, Rengul.

Bioinformatics ; 21(8): 1429-36, 2005 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-15598837

RESUMEN

MOTIVATION: We designed a general computational kernel for classification problems that require specific motif extraction and search from sequences. Instead of searching for explicit motifs, our approach finds the distribution of implicit motifs and uses as a feature for classification. Implicit motif distribution approach may be used as modus operandi for bioinformatics problems that require specific motif extraction and search, which is otherwise computationally prohibitive. RESULTS: A system named P2SL that infer protein subcellular targeting was developed through this computational kernel. Targeting-signal was modeled by the distribution of subsequence occurrences (implicit motifs) using self-organizing maps. The boundaries among the classes were then determined with a set of support vector machines. P2SL hybrid computational system achieved approximately 81% of prediction accuracy rate over ER targeted, cytosolic, mitochondrial and nuclear protein localization classes. P2SL additionally offers the distribution potential of proteins among localization classes, which is particularly important for proteins, shuttle between nucleus and cytosol. AVAILABILITY: http://staff.vbi.vt.edu/volkan/p2sl and http://www.i-cancer.fen.bilkent.edu.tr/p2sl CONTACT: rengul@bilkent.edu.tr.

Asunto(s)

Algoritmos , Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Proteínas/metabolismo , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Fracciones Subcelulares/metabolismo , Secuencias de Aminoácidos , Programas Informáticos , Relación Estructura-Actividad

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA