ABSTRACT
Motivation: Automated selection of signals in protein NMR spectra, known as peak picking, has been studied for over 20 years, nevertheless existing peak picking methods are still largely deficient. Accurate and precise automated peak picking would accelerate the structure calculation, and analysis of dynamics and interactions of macromolecules. Recent advancement in handling big data, together with an outburst of machine learning techniques, offer an opportunity to tackle the peak picking problem substantially faster than manual picking and on par with human accuracy. In particular, deep learning has proven to systematically achieve human-level performance in various recognition tasks, and thus emerges as an ideal tool to address automated identification of NMR signals. Results: We have applied a convolutional neural network for visual analysis of multidimensional NMR spectra. A comprehensive test on 31 manually annotated spectra has demonstrated top-tier average precision (AP) of 0.9596, 0.9058 and 0.8271 for backbone, side-chain and NOESY spectra, respectively. Furthermore, a combination of extracted peak lists with automated assignment routine, FLYA, outperformed other methods, including the manual one, and led to correct resonance assignment at the levels of 90.40%, 89.90% and 90.20% for three benchmark proteins. Availability and implementation: The proposed model is a part of a Dumpling software (platform for protein NMR data analysis), and is available at https://dumpling.bio/. Supplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Deep Learning , Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Software , Macromolecular Substances/chemistryABSTRACT
Analysis of structure, function and interactions of proteins by NMR spectroscopy usually requires the assignment of resonances to the corresponding nuclei in protein. This task, although automated by methods such as FLYA or PINE, is still frequently performed manually. To facilitate the manual sequence-specific chemical shift assignment of complex proteins, we propose a method based on Dirichlet process mixture model (DPMM) that performs automated matching of groups of signals observed in NMR spectra to corresponding nuclei in protein sequence. The model has been extensively tested on 80 proteins retrieved from the BMRB database and has shown superior performance to the reference method.