RESUMEN
The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.
Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Ontologías Biológicas , Curaduría de Datos , Anotación de Secuencia MolecularRESUMEN
MOTIVATION: Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. RESULTS: Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. AVAILABILITY AND IMPLEMENTATION: Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Secuencias Repetidas en Tándem , Secuencia de Aminoácidos , Aprendizaje Automático , Proteínas/genética , Programas InformáticosRESUMEN
Semi-denaturing detergent agarose gel electrophoresis (SDD-AGE) was proposed by Vitaly V. Kushnirov in the Michael D. Ter-Avanesyan's laboratory as a method to compare sizes of amyloid aggregates. Currently, this method is widely used for amyloid investigation, but mostly as a qualitative approach. In this work, we assessed the possibilities and limitations of the quantitative analysis of amyloid aggregate size distribution using SDD-AGE results. For this purpose, we used aggregates of two well-characterized yeast amyloid-forming proteins, Sup35 and Rnq1, and developed a protocol to standardize image analysis and process the result. A detailed investigation of factors that may affect the results of SDD-AGE revealed that both the cell lysis method and electrophoresis conditions can substantially affect the estimation of aggregate size. Despite this, quantitative analysis of SDD-AGE results is possible when one needs to estimate and compare the size of aggregates on the same gel, or even in different experiments, if the experimental conditions are tightly controlled and additional standards are used.