RESUMO
In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.
Assuntos
Ácidos Nucleicos , RNA , Aminoácidos , Espectrometria de Massas , PeptídeosRESUMO
With the rapidly growing importance of biological research, non-coding RNAs (ncRNA) attract more attention in biology and bioinformatics. They play vital roles in biological processes such as transcription and translation. Classification of ncRNAs is essential to our understanding of disease mechanisms and treatment design. Many approaches to ncRNA classification have been developed, several of which use machine learning and deep learning. In this paper, we construct a novel deep learning-based architecture, ncRDense, to effectively classify and distinguish ncRNA families. In a comparative study, our model produces comparable results with existing state-of-the-art methods. Finally, we built a freely accessible web server for the ncRDense tool, which is available at http://nsclbio.jbnu.ac.kr/tools/ncRDense/.
Assuntos
Aprendizado Profundo , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , RNA não Traduzido/genéticaRESUMO
UV (ultra-violet) crosslinking with mass spectrometry (XL-MS) has been established for identifying RNA-and DNA-binding proteins along with their domains and amino acids involved. Here, we explore chemical XL-MS for RNA-protein, DNA-protein, and nucleotide-protein complexes in vitro and in vivo . We introduce a specialized nucleotide-protein-crosslink search engine, NuXL, for robust and fast identification of such crosslinks at amino acid resolution. Chemical XL-MS complements UV XL-MS by generating different crosslink species, increasing crosslinked protein yields in vivo almost four-fold and thus it expands the structural information accessible via XL-MS. Our workflow facilitates integrative structural modelling of nucleic acid-protein complexes and adds spatial information to the described RNA-binding properties of enzymes, for which crosslinking sites are often observed close to their cofactor-binding domains. In vivo UV and chemical XL-MS data from E. coli cells analysed by NuXL establish a comprehensive nucleic acid-protein crosslink inventory with crosslink sites at amino acid level for more than 1500 proteins. Our new workflow combined with the dedicated NuXL search engine identified RNA crosslinks that cover most RNA-binding proteins, with DNA and RNA crosslinks detected in transcriptional repressors and activators.
RESUMO
Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.