Search | Nursing VHL Search Portal

1.

APPRIS: selecting functionally important isoforms.

Rodriguez, Jose Manuel; Pozo, Fernando; Cerdán-Vélez, Daniel; Di Domenico, Tomás; Vázquez, Jesús; Tress, Michael L.

Nucleic Acids Res ; 50(D1): D54-D59, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34755885

ABSTRACT

APPRIS (https://appris.bioinfo.cnio.es) is a well-established database housing annotations for protein isoforms for a range of species. APPRIS selects principal isoforms based on protein structure and function features and on cross-species conservation. Most coding genes produce a single main protein isoform and the principal isoforms chosen by the APPRIS database best represent this main cellular isoform. Human genetic data, experimental protein evidence and the distribution of clinical variants all support the relevance of APPRIS principal isoforms. APPRIS annotations and principal isoforms have now been expanded to 10 model organisms. In this paper we highlight the most recent updates to the database. APPRIS annotations have been generated for two new species, cow and chicken, the protein structural information has been augmented with reliable models from the EMBL-EBI AlphaFold database, and we have substantially expanded the confirmatory proteomics evidence available for the human genome. The most significant change in APPRIS has been the implementation of TRIFID functional isoform scores. TRIFID functional scores are assigned to all splice isoforms, and APPRIS uses the TRIFID functional scores and proteomics evidence to determine principal isoforms when core methods cannot.

Subject(s)

Databases, Protein , Protein Isoforms/genetics , Proteins/genetics , Proteomics , Animals , Cattle , Chickens/genetics , Humans , Protein Conformation , Protein Isoforms/classification , Proteins/chemistry , Proteins/classification

2.

METTL1 promotes tumorigenesis through tRNA-derived fragment biogenesis in prostate cancer.

García-Vílchez, Raquel; Añazco-Guenkova, Ana M; Dietmann, Sabine; López, Judith; Morón-Calvente, Virginia; D'Ambrosi, Silvia; Nombela, Paz; Zamacola, Kepa; Mendizabal, Isabel; García-Longarte, Saioa; Zabala-Letona, Amaia; Astobiza, Ianire; Fernández, Sonia; Paniagua, Alejandro; Miguel-López, Borja; Marchand, Virginie; Alonso-López, Diego; Merkel, Angelika; García-Tuñón, Ignacio; Ugalde-Olano, Aitziber; Loizaga-Iriarte, Ana; Lacasa-Viscasillas, Isabel; Unda, Miguel; Azkargorta, Mikel; Elortza, Félix; Bárcena, Laura; Gonzalez-Lopez, Monika; Aransay, Ana M; Di Domenico, Tomás; Sánchez-Martín, Manuel A; De Las Rivas, Javier; Guil, Sònia; Motorin, Yuri; Helm, Mark; Pandolfi, Pier Paolo; Carracedo, Arkaitz; Blanco, Sandra.

Mol Cancer ; 22(1): 119, 2023 07 29.

Article in English | MEDLINE | ID: mdl-37516825

ABSTRACT

Newly growing evidence highlights the essential role that epitranscriptomic marks play in the development of many cancers; however, little is known about the role and implications of altered epitranscriptome deposition in prostate cancer. Here, we show that the transfer RNA N7-methylguanosine (m7G) transferase METTL1 is highly expressed in primary and advanced prostate tumours. Mechanistically, we find that METTL1 depletion causes the loss of m7G tRNA methylation and promotes the biogenesis of a novel class of small non-coding RNAs derived from 5'tRNA fragments. 5'tRNA-derived small RNAs steer translation control to favour the synthesis of key regulators of tumour growth suppression, interferon pathway, and immune effectors. Knockdown of Mettl1 in prostate cancer preclinical models increases intratumoural infiltration of pro-inflammatory immune cells and enhances responses to immunotherapy. Collectively, our findings reveal a therapeutically actionable role of METTL1-directed m7G tRNA methylation in cancer cell translation control and tumour biology.

Subject(s)

Carcinogenesis , Prostatic Neoplasms , Male , Humans , Carcinogenesis/genetics , Cell Transformation, Neoplastic , Prostatic Neoplasms/genetics , Transcription, Genetic , RNA Processing, Post-Transcriptional , Methyltransferases/genetics

3.

bollito: a flexible pipeline for comprehensive single-cell RNA-seq analyses.

García-Jimeno, Luis; Fustero-Torre, Coral; Jiménez-Santos, María José; Gómez-López, Gonzalo; Di Domenico, Tomás; Al-Shahrour, Fátima.

Bioinformatics ; 38(4): 1155-1156, 2022 01 27.

Article in English | MEDLINE | ID: mdl-34788788

ABSTRACT

SUMMARY: bollito is an automated, flexible and parallelizable computational pipeline for the comprehensive analysis of single-cell RNA-seq data. Starting from FASTQ files or preprocessed expression matrices, bollito performs both basic and advanced tasks in single-cell analysis integrating >30 state-of-the-art tools. This includes quality control, read alignment, dimensionality reduction, clustering, cell-marker detection, differential expression, functional analysis, trajectory inference and RNA velocity. bollito is built using the Snakemake workflow management system, which easily connects each execution step and facilitates the reproducibility of results. bollito's modular design makes it easy to incorporate other packages into the pipeline enabling its expansion with new functionalities. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://gitlab.com/bu_cnio/bollito under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Single-Cell Gene Expression Analysis , Software , Reproducibility of Results , RNA , Workflow

4.

GENCODE 2021.

Frankish, Adam; Diekhans, Mark; Jungreis, Irwin; Lagarde, Julien; Loveland, Jane E; Mudge, Jonathan M; Sisu, Cristina; Wright, James C; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Boix, Carles; Carbonell Sala, Silvia; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Howe, Kevin L; Hunt, Toby; Izuogu, Osagie G; Johnson, Rory; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Riera, Ferriol Calvet; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Wolf, Maxim Y; Xu, Jinuri; Yang, Yucheng T; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Choudhary, Jyoti S; Gerstein, Mark.

Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33270111

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Subject(s)

COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics

5.

An analysis of tissue-specific alternative splicing at the protein level.

Rodriguez, Jose Manuel; Pozo, Fernando; di Domenico, Tomas; Vazquez, Jesus; Tress, Michael L.

PLoS Comput Biol ; 16(10): e1008287, 2020 10.

Article in English | MEDLINE | ID: mdl-33017396

ABSTRACT

The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart.

Subject(s)

Alternative Splicing/genetics , Organ Specificity/genetics , Protein Isoforms , Animals , Computational Biology , Genome/genetics , Humans , Protein Isoforms/chemistry , Protein Isoforms/classification , Protein Isoforms/genetics , Proteomics

6.

GENCODE reference annotation for the human and mouse genomes.

Frankish, Adam; Diekhans, Mark; Ferreira, Anne-Maud; Johnson, Rory; Jungreis, Irwin; Loveland, Jane; Mudge, Jonathan M; Sisu, Cristina; Wright, James; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Carbonell Sala, Silvia; Chrast, Jacqueline; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Hunt, Toby; Izuogu, Osagie G; Lagarde, Julien; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Xu, Jinuri; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Aken, Bronwen; Choudhary, Jyoti S; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J P.

Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30357393

ABSTRACT

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

Subject(s)

Databases, Genetic , Genome, Human/genetics , Genomics , Pseudogenes/genetics , Animals , Computational Biology , Humans , Internet , Mice , Molecular Sequence Annotation , Software

7.

vulcanSpot: a tool to prioritize therapeutic vulnerabilities in cancer.

Perales-Patón, Javier; Di Domenico, Tomás; Fustero-Torre, Coral; Piñeiro-Yáñez, Elena; Carretero-Puche, Carlos; Tejero, Héctor; Valencia, Alfonso; Gómez-López, Gonzalo; Al-Shahrour, Fátima.

Bioinformatics ; 35(22): 4846-4848, 2019 11 01.

Article in English | MEDLINE | ID: mdl-31173067

ABSTRACT

MOTIVATION: Genetic alterations lead to tumor progression and cell survival but also uncover cancer-specific vulnerabilities on gene dependencies that can be therapeutically exploited. RESULTS: vulcanSpot is a novel computational approach implemented to expand the therapeutic options in cancer beyond known-driver genes unlocking alternative ways to target undruggable genes. The method integrates genome-wide information provided by massive screening experiments to detect genetic vulnerabilities associated to tumors. Then, vulcanSpot prioritizes drugs to target cancer-specific gene dependencies using a weighted scoring system based on well known drug-gene relationships and drug repositioning strategies. AVAILABILITY AND IMPLEMENTATION: http://www.vulcanspot.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Neoplasms , Computational Biology , Drug Repositioning , Humans , Mutation , Software

8.

APPRIS 2017: principal isoforms for multiple gene sets.

Rodriguez, Jose Manuel; Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso; Tress, Michael L.

Nucleic Acids Res ; 46(D1): D213-D217, 2018 01 04.

Article in English | MEDLINE | ID: mdl-29069475

ABSTRACT

The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the 'principal' isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants.

Subject(s)

Databases, Genetic , Protein Isoforms/genetics , Alternative Splicing , Amino Acid Sequence , Animals , Humans , Models, Molecular , Molecular Sequence Annotation , Protein Conformation , Protein Isoforms/chemistry , Proteome/genetics , Reproducibility of Results , Sequence Alignment

9.

The Profile and Dynamics of RNA Modifications in Animals.

van Delft, Pieter; Akay, Alper; Huber, Sabrina M; Bueschl, Christoph; Rudolph, Konrad L M; Di Domenico, Tomás; Schuhmacher, Rainer; Miska, Eric A; Balasubramanian, Shankar.

Chembiochem ; 18(11): 979-984, 2017 06 01.

Article in English | MEDLINE | ID: mdl-28449301

ABSTRACT

More than a hundred distinct modified nucleosides have been identified in RNA, but little is known about their distribution across different organisms, their dynamic nature and their response to cellular and environmental stress. Mass-spectrometry-based methods have been at the forefront of identifying and quantifying modified nucleosides. However, they often require synthetic reference standards, which do not exist in the case of many modified nucleosides, and this therefore impedes their analysis. Here we use a metabolic labelling approach to achieve rapid generation of bio-isotopologues of the complete Caenorhabditis elegans transcriptome and its modifications and use them as reference standards to characterise the RNA modification profile in this multicellular organism through an untargeted liquid-chromatography tandem high-resolution mass spectrometry (LC-HRMS) approach. We furthermore show that several of these RNA modifications have a dynamic response to environmental stress and that, in particular, changes in the tRNA wobble base modification 5-methoxycarbonylmethyl-2-thiouridine (mcm5 s2 U) lead to codon-biased gene-expression changes in starved animals.

Subject(s)

RNA Processing, Post-Transcriptional , Stress, Physiological/genetics , Transcriptome , Animals , Caenorhabditis elegans , Chromatography, Liquid , Isotope Labeling , Tandem Mass Spectrometry , Thiouridine/analogs & derivatives , Thiouridine/metabolism

10.

MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins.

Potenza, Emilio; Di Domenico, Tomás; Walsh, Ian; Tosatto, Silvio C E.

Nucleic Acids Res ; 43(Database issue): D315-20, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25361972

ABSTRACT

MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein-protein interactions from STRING are also classified for disorder content.

Subject(s)

Databases, Protein , Intrinsically Disordered Proteins/chemistry , Data Curation , Protein Conformation

11.

Comprehensive large-scale assessment of intrinsic protein disorder.

Walsh, Ian; Giollo, Manuel; Di Domenico, Tomás; Ferrari, Carlo; Zimmermann, Olav; Tosatto, Silvio C E.

Bioinformatics ; 31(2): 201-8, 2015 Jan 15.

Article in English | MEDLINE | ID: mdl-25246432

ABSTRACT

MOTIVATION: Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. RESULTS: MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. AVAILABILITY: The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Proteins/chemistry , Sequence Analysis, Protein/methods , Tumor Suppressor Protein p53/chemistry , Crystallography, X-Ray , Databases, Protein , Humans , Molecular Sequence Annotation , Protein Structure, Tertiary

12.

RepeatsDB: a database of tandem repeat protein structures.

Di Domenico, Tomás; Potenza, Emilio; Walsh, Ian; Parra, R Gonzalo; Giollo, Manuel; Minervini, Giovanni; Piovesan, Damiano; Ihsan, Awais; Ferrari, Carlo; Kajava, Andrey V; Tosatto, Silvio C E.

Nucleic Acids Res ; 42(Database issue): D352-7, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24311564

ABSTRACT

RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10,745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.

Subject(s)

Databases, Protein , Repetitive Sequences, Amino Acid , Internet , Molecular Sequence Annotation , Protein Conformation

13.

Highlights from the tenth ISCB Student Council Symposium 2014.

Rahman, Farzana; Wilkins, Katie; Jacobsen, Annika; Junge, Alexander; Vicedo, Esmeralda; DeBlasio, Dan; Jigisha, Anupama; Di Domenico, Tomás.

BMC Bioinformatics ; 16 Suppl 2: A1-10, 2015.

Article in English | MEDLINE | ID: mdl-25708534

ABSTRACT

This report summarizes the scientific content and activities of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) conference in Boston, USA, on July 11th, 2014.

Subject(s)

Computational Biology , Drug Resistance, Multiple , High-Throughput Nucleotide Sequencing , Microsatellite Repeats/genetics , Peer Review, Research , Publishing , RNA, Messenger/metabolism , Sequence Analysis, DNA

14.

RUBI: rapid proteomic-scale prediction of lysine ubiquitination and factors influencing predictor performance.

Walsh, Ian; Di Domenico, Tomás; Tosatto, Silvio C E.

Amino Acids ; 46(4): 853-62, 2014 Apr.

Article in English | MEDLINE | ID: mdl-24363213

ABSTRACT

Post-translational modification of protein lysines was recently shown to be a common feature of eukaryotic organisms. The ubiquitin modification is regarded as a versatile regulatory mechanism with many important cellular roles. Large-scale datasets are becoming available for H. sapiens ubiquitination. However, using current experimental techniques the vast majority of their sites remain unidentified and in silico tools may offer an alternative. Here, we introduce Rapid UBIquitination (RUBI) a sequence-based ubiquitination predictor designed for rapid application on a genome scale. RUBI was constructed using an iterative approach. At each iteration, important factors which influenced performance and its usability were investigated. The final RUBI model has an AUC of 0.868 on a large cross-validation set and is shown to outperform other available methods on independent sets. Predicted intrinsic disorder is shown to be weakly anti-correlated to ubiquitination for the H. sapiens dataset and improves performance slightly. RUBI predicts the number of ubiquitination sites correctly within three sites for ca. 80% of the tested proteins. The average potentially ubiquitinated proteome fraction is predicted to be at least 25% across a variety of model organisms, including several thousand possible H. sapiens proteins awaiting experimental characterization. RUBI can accurately predict ubiquitination on unseen examples and has a signal across different eukaryotic organisms. The factors which influenced the construction of RUBI could also be tested in other post-translational modification predictors. One of the more interesting factors is the influence of intrinsic protein disorder on ubiquitinated lysines where residues with low disorder probability are preferred.

Subject(s)

Computational Biology/methods , Lysine/metabolism , Proteins/chemistry , Proteins/metabolism , Proteomics/methods , Animals , Artificial Intelligence , Computational Biology/instrumentation , Humans , Internet , Proteomics/instrumentation , Software , Ubiquitination

15.

Analysis and consensus of currently available intrinsic protein disorder annotation sources in the MobiDB database.

Di Domenico, Tomás; Walsh, Ian; Tosatto, Silvio C E.

BMC Bioinformatics ; 14 Suppl 7: S3, 2013.

Article in English | MEDLINE | ID: mdl-23815411

ABSTRACT

BACKGROUND: Intrinsic protein disorder is becoming an increasingly important topic in protein science. During the last few years, intrinsically disordered proteins (IDPs) have been shown to play a role in many important biological processes, e.g. protein signalling and regulation. This has sparked a need to better understand and characterize different types of IDPs, their functions and roles. Our recently published database, MobiDB, provides a centralized resource for accessing and analysing intrinsic protein disorder annotations. RESULTS: Here, we present a thorough description and analysis of the data made available by MobiDB, providing descriptive statistics on the various available annotation sources. Version 1.2.1 of the database contains annotations for ca. 4,500,000 UniProt sequences, covering all eukaryotic proteomes. In addition, we describe a novel consensus annotation calculation and its related weighting scheme. The comparison between disorder information sources highlights how the MobiDB consensus captures the main features of intrinsic disorder and correlates well with manually curated datasets. Finally, we demonstrate the annotation of 13 eukaryotic model organisms through MobiDB's datasets, and of an example protein through the interactive user interface. CONCLUSIONS: MobiDB is a central resource for intrinsic disorder research, containing both experimental data and predictions. In the future it will be expanded to include additional information for all known proteins.

Subject(s)

Databases, Protein , Eukaryota/chemistry , Proteins/chemistry , Proteome , Animals , Humans

16.

ESpritz: accurate and fast prediction of protein disorder.

Walsh, Ian; Martin, Alberto J M; Di Domenico, Tomàs; Tosatto, Silvio C E.

Bioinformatics ; 28(4): 503-9, 2012 Feb 15.

Article in English | MEDLINE | ID: mdl-22190692

ABSTRACT

MOTIVATION: Intrinsically disordered regions are key for the function of numerous proteins, and the scant available experimental annotations suggest the existence of different disorder flavors. While efficient predictions are required to annotate entire genomes, most existing methods require sequence profiles for disorder prediction, making them cumbersome for high-throughput applications. RESULTS: In this work, we present an ensemble of protein disorder predictors called ESpritz. These are based on bidirectional recursive neural networks and trained on three different flavors of disorder, including a novel NMR flexibility predictor. ESpritz can produce fast and accurate sequence-only predictions, annotating entire genomes in the order of hours on a single processor core. Alternatively, a slower but slightly more accurate ESpritz variant using sequence profiles can be used for applications requiring maximum performance. Two levels of prediction confidence allow either to maximize reasonable disorder detection or to limit expected false positives to 5%. ESpritz performs consistently well on the recent CASP9 data, reaching a S(w) measure of 54.82 and area under the receiver operator curve of 0.856. The fast predictor is four orders of magnitude faster and remains better than most publicly available CASP9 methods, making it ideal for genomic scale predictions. CONCLUSIONS: ESpritz predicts three flavors of disorder at two distinct false positive rates, either with a fast or slower and slightly more accurate approach. Given its state-of-the-art performance, it can be especially useful for high-throughput applications. AVAILABILITY: Both a web server for high-throughput analysis and a Linux executable version of ESpritz are available from: http://protein.bio.unipd.it/espritz/.

Subject(s)

Neural Networks, Computer , Proteins/chemistry , Proteins/metabolism , Algorithms , Animals , Humans , Protein Conformation , Protein Folding

17.

MobiDB: a comprehensive database of intrinsic protein disorder annotations.

Di Domenico, Tomás; Walsh, Ian; Martin, Alberto J M; Tosatto, Silvio C E.

Bioinformatics ; 28(15): 2080-1, 2012 Aug 01.

Article in English | MEDLINE | ID: mdl-22661649

ABSTRACT

MOTIVATION: Disordered protein regions are key to the function of numerous processes within an organism and to the determination of a protein's biological role. The most common source for protein disorder annotations, DisProt, covers only a fraction of the available sequences. Alternatively, the Protein Data Bank (PDB) has been mined for missing residues in X-ray crystallographic structures. Herein, we provide a centralized source for data on different flavours of disorder in protein structures, MobiDB, building on and expanding the content provided by already existing sources. In addition to the DisProt and PDB X-ray structures, we have added experimental information from NMR structures and five different flavours of two disorder predictors (ESpritz and IUpred). These are combined into a weighted consensus disorder used to classify disordered regions into flexible and constrained disorder. Users are encouraged to submit manual annotations through a submission form. MobiDB features experimental annotations for 17 285 proteins, covering the entire PDB and predictions for the SwissProt database, with 565 200 annotated sequences. Depending on the disorder flavour, 6-20% of the residues are predicted as disordered. AVAILABILITY: The database is freely available at http://mobidb.bio.unipd.it/. CONTACT: silvio.tosatto@unipd.it.

Subject(s)

Databases, Protein , Molecular Sequence Annotation , Proteins/chemistry , Crystallography, X-Ray , Magnetic Resonance Spectroscopy , Protein Structure, Tertiary , Sequence Analysis, Protein/methods , User-Computer Interface

18.

RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures.

Walsh, Ian; Sirocco, Francesco G; Minervini, Giovanni; Di Domenico, Tomás; Ferrari, Carlo; Tosatto, Silvio C E.

Bioinformatics ; 28(24): 3257-64, 2012 Dec 15.

Article in English | MEDLINE | ID: mdl-22962341

ABSTRACT

MOTIVATION: Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. RESULTS: Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.

Subject(s)

Algorithms , Protein Structure, Tertiary , Repetitive Sequences, Amino Acid , Proteins/chemistry , Sequence Analysis, Protein

19.

CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs.

Walsh, Ian; Martin, Alberto J M; Di Domenico, Tomàs; Vullo, Alessandro; Pollastri, Gianluca; Tosatto, Silvio C E.

Nucleic Acids Res ; 39(Web Server issue): W190-6, 2011 Jul.

Article in English | MEDLINE | ID: mdl-21646342

ABSTRACT

CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.

Subject(s)

Protein Conformation , Software , Amino Acid Motifs , Internet , Molecular Sequence Annotation , Protein Structure, Secondary , Structural Homology, Protein

20.

RING: networking interacting residues, evolutionary information and energetics in protein structures.

Martin, Alberto J M; Vidotto, Michele; Boscariol, Filippo; Di Domenico, Tomàs; Walsh, Ian; Tosatto, Silvio C E.

Bioinformatics ; 27(14): 2003-5, 2011 Jul 15.

Article in English | MEDLINE | ID: mdl-21493660

ABSTRACT

MOTIVATION: Residue interaction networks (RINs) have been used in the literature to describe the protein 3D structure as a graph where nodes represent residues and edges physico-chemical interactions, e.g. hydrogen bonds or van-der-Waals contacts. Topological network parameters can be calculated over RINs and have been correlated with various aspects of protein structure and function. Here we present a novel web server, RING, to construct physico-chemically valid RINs interactively from PDB files for subsequent visualization in the Cytoscape platform. The additional structure-based parameters secondary structure, solvent accessibility and experimental uncertainty can be combined with information regarding residue conservation, mutual information and residue-based energy scoring functions. Different visualization styles are provided to facilitate visualization and standard plugins can be used to calculate topological parameters in Cytoscape. A sample use case analyzing the active site of glutathione peroxidase is presented. AVAILABILITY: The RING server, supplementary methods, examples and tutorials are available for non-commercial use at URL: http://protein.bio.unipd.it/ring/.

Subject(s)

Databases, Protein , Internet , Proteins/chemistry , Software , Glutathione Peroxidase/chemistry , Humans , Imaging, Three-Dimensional , Protein Conformation , Protein Interaction Mapping/methods , Proteins/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL