RESUMEN
Amino acid scales are crucial for protein prediction tasks, many of them being curated in the AAindex database. Despite various clustering attempts to organize them and to better understand their relationships, these approaches lack the fine-grained classification necessary for satisfactory interpretability in many protein prediction problems. To address this issue, we developed AAontology-a two-level classification for 586 amino acid scales (mainly from AAindex) together with an in-depth analysis of their relations-using bag-of-word-based classification, clustering, and manual refinement over multiple iterations. AAontology organizes physicochemical scales into 8 categories and 67 subcategories, enhancing the interpretability of scale-based machine learning methods in protein bioinformatics. Thereby it enables researchers to gain a deeper biological insight. We anticipate that AAontology will be a building block to link amino acid properties with protein function and dysfunctions as well as aid informed decision-making in mutation analysis or protein drug design.
RESUMEN
Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.
Asunto(s)
Productos Agrícolas , Genoma de Planta , Anotación de Secuencia Molecular , Proteómica , Productos Agrícolas/genética , Proteómica/métodos , Genómica/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismoRESUMEN
The intramembrane protease γ-secretase has broad physiological functions, but also contributes to Notch-dependent tumors and Alzheimer's disease. While γ-secretase cleaves numerous membrane proteins, only few nonsubstrates are known. Thus, a fundamental open question is how γ-secretase distinguishes substrates from nonsubstrates and whether sequence-based features or post-translational modifications of membrane proteins contribute to substrate recognition. Using mass spectrometry-based proteomics, we identified several type I membrane proteins with short ectodomains that were inefficiently or not cleaved by γ-secretase, including 'pituitary tumor-transforming gene 1-interacting protein' (PTTG1IP). To analyze the mechanism preventing cleavage of these putative nonsubstrates, we used the validated substrate FN14 as a backbone and replaced its transmembrane domain (TMD), where γ-cleavage occurs, with the one of nonsubstrates. Surprisingly, some nonsubstrate TMDs were efficiently cleaved in the FN14 backbone, demonstrating that a cleavable TMD is necessary, but not sufficient for cleavage by γ-secretase. Cleavage efficiencies varied by up to 200-fold. Other TMDs, including that of PTTG1IP, were still barely cleaved within the FN14 backbone. Pharmacological and mutational experiments revealed that the PTTG1IP TMD is palmitoylated, which prevented cleavage by γ-secretase. We conclude that the TMD sequence of a membrane protein and its palmitoylation can be key factors determining substrate recognition and cleavage efficiency by γ-secretase.