Pesquisa | BVS IEC

Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants.

Rifaioglu, Ahmet Sureyya; Dogan, Tunca; Saraç, Ömer Sinan; Ersahin, Tulin; Saidi, Rabie; Atalay, Mehmet Volkan; Martin, Maria Jesus; Cetin-Atalay, Rengul.

Proteins ; 86(2): 135-151, 2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29098713

RESUMO

Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at http://cansyl.metu.edu.tr/UniGOPred.html.

Assuntos

Aprendizado de Máquina , PTEN Fosfo-Hidrolase/metabolismo , Proteômica/métodos , Animais , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Modelos Biológicos , PTEN Fosfo-Hidrolase/química , PTEN Fosfo-Hidrolase/genética , Análise de Sequência de Proteína , Transcriptoma

Proteomic profiling of HBV infected liver biopsies with different fibrotic stages.

Katrinli, Seyma; Ozdil, Kamil; Sahin, Abdurrahman; Ozturk, Oguzhan; Kir, Gozde; Baykal, Ahmet Tarik; Akgun, Emel; Sarac, Omer Sinan; Sokmen, Mehmet; Doganay, H Levent; Dinler Doganay, Gizem.

Proteome Sci ; 15: 7, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-28439208

RESUMO

BACKGROUND: Hepatitis B virus (HBV) is a global health problem, and infected patients if left untreated may develop cirrhosis and eventually hepatocellular carcinoma. This study aims to enlighten pathways associated with HBV related liver fibrosis for delineation of potential new therapeutic targets and biomarkers. METHODS: Tissue samples from 47 HBV infected patients with different fibrotic stages (F1 to F6) were enrolled for 2D-DIGE proteomic screening. Differentially expressed proteins were identified by mass spectrometry and verified by western blotting. Functional proteomic associations were analyzed by EnrichNet application. RESULTS: Fibrotic stage variations were observed for apolipoprotein A1 (APOA1), pyruvate kinase PKM (KPYM), glyceraldehyde 3-phospahate dehydrogenase (GAPDH), glutamate dehydrogenase (DHE3), aldehyde dehydrogenase (ALDH2), alcohol dehydrogenase (ALDH1A1), transferrin (TRFE), peroxiredoxin 3 (PRDX3), phenazine biosynthesis-like domain-containing protein (PBLD), immuglobulin kappa chain C region (IGKC), annexin A4 (ANXA4), keratin 5 (KRT5). Enrichment analysis with Reactome and Kegg databases highlighted the possible involvement of platelet release, glycolysis and HDL mediated lipid transport pathways. Moreover, string analysis revealed that HIF-1α (Hypoxia-inducible factor 1-alpha), one of the interacting partners of HBx (Hepatitis B X protein), may play a role in the altered glycolytic response and oxidative stress observed in liver fibrosis. CONCLUSIONS: To our knowledge, this is the first protomic research that studies HBV infected fibrotic human liver tissues to investigate alterations in protein levels and affected pathways among different fibrotic stages. Observed changes in the glycolytic pathway caused by HBx presence and therefore its interactions with HIF-1α can be a target pathway for novel therapeutic purposes.

Extensive mass spectrometry-based analysis of the fission yeast proteome: the Schizosaccharomyces pombe PeptideAtlas.

Gunaratne, Jayantha; Schmidt, Alexander; Quandt, Andreas; Neo, Suat Peng; Saraç, Omer Sinan; Gracia, Tannia; Loguercio, Salvatore; Ahrné, Erik; Xia, Rachel Li Hai; Tan, Keng Hwa; Lössner, Christopher; Bähler, Jürg; Beyer, Andreas; Blackstock, Walter; Aebersold, Ruedi.

Mol Cell Proteomics ; 12(6): 1741-51, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23462206

RESUMO

We report a high quality and system-wide proteome catalogue covering 71% (3,542 proteins) of the predicted genes of fission yeast, Schizosaccharomyces pombe, presenting the largest protein dataset to date for this important model organism. We obtained this high proteome and peptide (11.4 peptides/protein) coverage by a combination of extensive sample fractionation, high resolution Orbitrap mass spectrometry, and combined database searching using the iProphet software as part of the Trans-Proteomics Pipeline. All raw and processed data are made accessible in the S. pombe PeptideAtlas. The identified proteins showed no biases in functional properties and allowed global estimation of protein abundances. The high coverage of the PeptideAtlas allowed correlation with transcriptomic data in a system-wide manner indicating that post-transcriptional processes control the levels of at least half of all identified proteins. Interestingly, the correlation was not equally tight for all functional categories ranging from r(s) >0.80 for proteins involved in translation to r(s) <0.45 for signal transduction proteins. Moreover, many proteins involved in DNA damage repair could not be detected in the PeptideAtlas despite their high mRNA levels, strengthening the translation-on-demand hypothesis for members of this protein class. In summary, the extensive and publicly available S. pombe PeptideAtlas together with the generated proteotypic peptide spectral library will be a useful resource for future targeted, in-depth, and quantitative proteomic studies on this microorganism.

Assuntos

Regulação Fúngica da Expressão Gênica , Peptídeos/isolamento & purificação , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , RNA Mensageiro/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/metabolismo , Bases de Dados de Proteínas , Espectrometria de Massas , Família Multigênica , Mapeamento de Peptídeos , Proteoma/química , Proteoma/genética , RNA Mensageiro/genética , Schizosaccharomyces/química , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/química , Proteínas de Schizosaccharomyces pombe/genética , Transdução de Sinais

Topology of functional networks predicts physical binding of proteins.

Saraç, Omer Sinan; Pancaldi, Vera; Bähler, Jürg; Beyer, Andreas.

Bioinformatics ; 28(16): 2137-45, 2012 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-22718785

RESUMO

MOTIVATION: It has been recognized that the topology of molecular networks provides information about the certainty and nature of individual interactions. Thus, network motifs have been used for predicting missing links in biological networks and for removing false positives. However, various different measures can be inferred from the structure of a given network and their predictive power varies depending on the task at hand. RESULTS: Herein, we present a systematic assessment of seven different network features extracted from the topology of functional genetic networks and we quantify their ability to classify interactions into different types of physical protein associations. Using machine learning, we combine features based on network topology with non-network features and compare their importance of the classification of interactions. We demonstrate the utility of network features based on human and budding yeast networks; we show that network features can distinguish different sub-types of physical protein associations and we apply the framework to fission yeast, which has a much sparser known physical interactome than the other two species. Our analysis shows that network features are at least as predictive for the tasks we tested as non-network features. However, feature importance varies between species owing to different topological characteristics of the networks. The application to fission yeast shows that small maps of physical interactomes can be extended based on functional networks, which are often more readily available. AVAILABILITY AND IMPLEMENTATION: The R-code for computing the network features is available from www.cellularnetworks.org

Assuntos

Inteligência Artificial , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Área Sob a Curva , Humanos , Ligação Proteica , Curva ROC , Saccharomyces cerevisiae , Schizosaccharomyces , Software

Large-scale de novo prediction of physical protein-protein association.

Elefsinioti, Antigoni; Saraç, Ömer Sinan; Hegele, Anna; Plake, Conrad; Hubner, Nina C; Poser, Ina; Sarov, Mihail; Hyman, Anthony; Mann, Matthias; Schroeder, Michael; Stelzl, Ulrich; Beyer, Andreas.

Mol Cell Proteomics ; 10(11): M111.010629, 2011 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-21836163

RESUMO

Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org.

Assuntos

Simulação por Computador , Modelos Moleculares , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Animais , Teorema de Bayes , Células HeLa , Humanos , Camundongos , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/metabolismo , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Proteoma/genética , Proteoma/metabolismo , Curva ROC , Proteínas Recombinantes/metabolismo , Estatísticas não Paramétricas

Subsequence-based feature map for protein function classification.

Sarac, Omer Sinan; Gürsoy-Yüzügüllü, Ozge; Cetin-Atalay, Rengul; Atalay, Volkan.

Comput Biol Chem ; 32(2): 122-30, 2008 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-18243801

RESUMO

Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification is decomposed into fixed-length subsequences and they are clustered to obtain a representative feature space mapping. Mapping is defined as the distribution of the subsequences of a protein sequence over these clusters. The resulting feature space representation is used to train discriminative classifiers for functional families. The aim of this approach is to incorporate information coming from important subregions that are conserved over a family of proteins while avoiding the difficult task of explicit motif identification. The performance of the method was assessed through tests on various protein classification tasks. Our results showed that SPMap is capable of high accuracy classification in most of these tasks. Furthermore SPMap is fast and scalable enough to handle large datasets.

Assuntos

Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/classificação , Algoritmos , Análise por Conglomerados , Simulação por Computador , Enzimas/química , Enzimas/classificação , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/classificação , Sensibilidade e Especificidade

Species translatable blood gene signature as a marker of exposure to smoking: computational approaches of the top ranked teams in the sbv IMPROVER Systems Toxicology challenge.

Saraç, Ömer Sinan; Kumar, Rahul; Dhanda, Sandeep Kumar; Balci, Ali Tugrul; Bilgen, Ismail; Romero, Roberto; Tarca, Adi L.

Comput Toxicol ; 5: 25-30, 2018 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-29556587

RESUMO

Crowdsourcing has been used to address computational challenges in systems biology and assess translation of findings across species. Sub-challenge 2 of the sbv IMPROVER Systems Toxicology Challenge was designed to determine whether a common set of genes can be used to identify exposure to cigarette smoke in both human and mouse. Participating teams used a training set of human and mouse blood gene expression data to derive parsimonious models (up to 40 genes) that classify subjects into exposure groups: smokers, former smokers, and never-smokers. Teams were ranked based on two classification performance metrics evaluated on a blinded test dataset. Prediction of current exposure to cigarette smoke in human and mouse by a common prediction model was achieved by the top ranked team (Team 219) with 89% balanced accuracy (BAC), while past exposure was predicted with only 57% BAC. The prediction model of the top ranked team was a random forest classifier trained on sets of genes that appeared best for each species separately with no overlap between species. By contrast, Team 264, ranked second (tied with Team 250), selected genes that were simultaneously predictive in both species and achieved 80% and 59% BAC when predicting current and past exposure, respectively. These performance values were lower than the 96.5% and 61% BAC estimates for current and past exposure, respectively, obtained by Team 264 (top ranked in sub-challenge 1) when using only human data. Unlike past exposure, current exposure to cigarette smoke can be accurately assessed in both human and mouse with a common prediction model based on blood mRNAs. However, requiring a common gene signature to be predictive in both species resulted in a substantial decrease in balanced accuracy for prediction of current exposure to cigarette smoke (from 96.5% to 80%), suggesting species-specific responses exist.

The sbv IMPROVER Systems Toxicology Computational Challenge: Identification of Human and Species-Independent Blood Response Markers as Predictors of Smoking Exposure and Cessation Status.

Belcastro, Vincenzo; Poussin, Carine; Xiang, Yang; Giordano, Maurizio; Tripathi, Kumar Parijat; Boda, Akash; Boué, Stéphanie; Guarracino, Mario; Martin, Florian; Peitsch, Manuel C; Hoeng, Julia; Romero, Roberto; Tarca, Adi L; Duan, Zhongqu; Yang, Hao; Gong, Xiaofeng; Wang, Peixuan; Zhang, Chenfang; Yang, Wenxin; Sarac, Omer Sinan; Bilgen, Ismail; Balci, Ali Tugrul; Kumar, Rahul; Dhanda, Sandeep Kumar.

Comput Toxicol ; 5: 38-51, 2018 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-30221212

RESUMO

Cigarette smoking entails chronic exposure to a mixture of harmful chemicals that trigger molecular changes over time, and is known to increase the risk of developing diseases. Risk assessment in the context of 21st century toxicology relies on the elucidation of mechanisms of toxicity and the identification of exposure response markers, usually from high-throughput data, using advanced computational methodologies. The sbv IMPROVER Systems Toxicology computational challenge (Fall 2015-Spring 2016) aimed to evaluate whether robust and sparse (≤40 genes) human (sub-challenge 1, SC1) and species-independent (sub-challenge 2, SC2) exposure response markers (so called gene signatures) could be extracted from human and mouse blood transcriptomics data of current (S), former (FS) and never (NS) smoke-exposed subjects as predictors of smoking and cessation status. Best-performing computational methods were identified by scoring anonymized participants' predictions. Worldwide participation resulted in 12 (SC1) and six (SC2) final submissions qualified for scoring. The results showed that blood gene expression data were informative to predict smoking exposure (i.e. discriminating smoker versus never or former smokers) status in human and across species with a high level of accuracy. By contrast, the prediction of cessation status (i.e. distinguishing FS from NS) remained challenging, as reflected by lower classification performances. Participants successfully developed inductive predictive models and extracted human and species-independent gene signatures, including genes with high consensus across teams. Post-challenge analyses highlighted "feature selection" as a key step in the process of building a classifier and confirmed the importance of testing a gene signature in independent cohorts to ensure the generalized applicability of a predictive model at a population-based level. In conclusion, the Systems Toxicology challenge demonstrated the feasibility of extracting a consistent blood-based smoke exposure response gene signature and further stressed the importance of independent and unbiased data and method evaluations to provide confidence in systems toxicology-based scientific conclusions.

GOPred: GO molecular function prediction by combined classifiers.

Saraç, Omer Sinan; Atalay, Volkan; Cetin-Atalay, Rengul.

PLoS One ; 5(8): e12382, 2010 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-20824206

RESUMO

Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred).

Assuntos

Biologia Computacional/métodos , Proteínas/classificação , Proteínas/metabolismo , Humanos , Internet

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA