RESUMO
In shotgun proteomics, the proteome search engine analyzes mass spectra obtained by experiments, and then a peptide-spectra match (PSM) is reported for each spectrum. However, most of the PSMs identified are incorrect, and therefore various postprocessing software have been developed for reranking the peptide identifications. Yet these methods suffer from issues such as dependency on distribution, reliance on shallow models, and limited effectiveness. In this work, we propose AttnPep, a deep learning model for rescoring PSM scores that utilizes the Self-Attention module. This module helps the neural network focus on features relevant to the classification of PSMs and ignore irrelevant features. This allows AttnPep to analyze the output of different search engines and improve PSM discrimination accuracy. We considered a PSM to be correct if it achieves a q-value <0.01 and compared AttnPep with existing mainstream software PeptideProphet, Percolator, and proteoTorch. The results indicated that AttnPep found an average increase in correct PSMs of 9.29% relative to the other methods. Additionally, AttnPep was able to better distinguish between correct and incorrect PSMs and found more synthetic peptides in the complex SWATH data set.
Assuntos
Algoritmos , Aprendizado Profundo , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Peptídeos , Software , Bases de Dados de ProteínasRESUMO
In the realm of quantitative proteomics, data-independent acquisition (DIA) has emerged as a promising approach, offering enhanced reproducibility and quantitative accuracy compared to traditional data-dependent acquisition (DDA) methods. However, the analysis of DIA data is currently hindered by its reliance on project-specific spectral libraries derived from DDA analyses, which not only limits proteome coverage but also proves to be a time-intensive process. To overcome these challenges, we propose ProPept-MT, a novel deep learning-based multi-task prediction model designed to accurately forecast key features such as retention time (RT), ion intensity, and ion mobility (IM). Leveraging advanced techniques such as multi-head attention and BiLSTM for feature extraction, coupled with Nash-MTL for gradient coordination, ProPept-MT demonstrates superior prediction performance. Integrating ion mobility alongside RT, mass-to-charge ratio (m/z), and ion intensity forms 4D proteomics. Then, we outline a comprehensive workflow tailored for 4D DIA proteomics research, integrating the use of 4D in silico libraries predicted by ProPept-MT. Evaluation on a benchmark dataset showcases ProPept-MT's exceptional predictive capabilities, with impressive results including a 99.9% Pearson correlation coefficient (PCC) for RT prediction, a median dot product (DP) of 96.0% for fragment ion intensity prediction, and a 99.3% PCC for IM prediction on the test set. Notably, ProPept-MT manifests efficacy in predicting both unmodified and phosphorylated peptides, underscoring its potential as a valuable tool for constructing high-quality 4D DIA in silico libraries.
Assuntos
Peptídeos , Proteômica , Proteômica/métodos , Peptídeos/química , Aprendizado Profundo , Humanos , Proteoma , Reprodutibilidade dos TestesRESUMO
TLR4 complexes are essential for the initiation of the LPS-induced innate immune response. The Myddosome, which mainly contains TLR4, TIRAP, MyD88, IRAK1/4 and TRAF6 proteins, is regarded as a major complex of TLR4. Although the Myddosome has been well studied, a quantitative description of the Myddosome assembly dynamics is still lacking. Furthermore, whether some unknown TLR4 complexes exist remains unclear. In this study, we constructed a SWATH-MS data-based mathematical model that describes the component assembly dynamics of TLR4 complexes. In addition to Myddosome, we suggest that a TIRAP-independent MyD88 activation complex is formed upon LPS stimulation, in which TRAF6 is not included. Furthermore, quantitative analysis reveals that the distribution of components in TIRAP-dependent and -independent MyD88 activation complexes are LPS stimulation-dependent. The two complexes compete for recruiting IRAK1/4 proteins. MyD88 forms higher-order assembly in the Myddosome and we show that the strategy to form higher-order assembly is also LPS stimulation-dependent. MyD88 forms a long chain upon weak stimulation, but forms a short chain upon strong stimulation. Higher-order assembly of MyD88 is directly determined by the level of TIRAP in the Myddosome, providing a formation mechanism for efficient signaling transduction. Taken together, our study provides an enhanced understanding of component assembly dynamics and strategies in TLR4 complexes.
Assuntos
Lipopolissacarídeos/farmacologia , Glicoproteínas de Membrana/metabolismo , Fator 88 de Diferenciação Mieloide/metabolismo , Receptores de Interleucina-1/metabolismo , Receptor 4 Toll-Like/metabolismo , Algoritmos , Animais , Quinases Associadas a Receptores de Interleucina-1/metabolismo , Camundongos , Modelos Teóricos , Complexos Multiproteicos/metabolismo , Células RAW 264.7 , Transdução de Sinais/efeitos dos fármacosRESUMO
Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.
RESUMO
Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.
Assuntos
Algoritmos , Peptídeos , Proteômica , Proteômica/métodos , Peptídeos/química , Redes Neurais de Computação , Software , Espectrometria de Massas , Aprendizado ProfundoRESUMO
Data-independent acquisition (DIA) technology for protein identification from mass spectrometry and related algorithms is developing rapidly. The spectrum-centric analysis of DIA data without the use of spectra library from data-dependent acquisition data represents a promising direction. In this paper, we proposed an untargeted analysis method, Dear-DIAXMBD, for direct analysis of DIA data. Dear-DIAXMBD first integrates the deep variational autoencoder and triplet loss to learn the representations of the extracted fragment ion chromatograms, then uses the k-means clustering algorithm to aggregate fragments with similar representations into the same classes, and finally establishes the inverted index tables to determine the precursors of fragment clusters between precursors and peptides and between fragments and peptides. We show that Dear-DIAXMBD performs superiorly with the highly complicated DIA data of different species obtained by different instrument platforms. Dear-DIAXMBD is publicly available at https://github.com/jianweishuai/Dear-DIA-XMBD.
RESUMO
OpenSWATH is an analysis toolkit commonly used for data independent acquisition (DIA). Although the output of OpenSWATH is controlled at 1% false discovery rate (FDR), the output report still contains many peptide precursors with low similarity fragments. At the last step of OpenSWATH for peptide quantification, researchers usually need to manually check the similarity of the extracted ion chromatograms (XICs) of fragments to distinguish the high confidence and the low confidence peptide precursors. Here we developed an algorithm with a Graphic User Interface named MSSort-DIAXMBD, which combines the deep convolutional neural network (CNN) and the double-threshold segmentation process, to automatically recognize the high confidence precursors and low confidence precursors. To train the model of MSSort-DIAXMBD, we built a database contained about 50,000 manually classified peptide precursors acquired from different instrument platforms and different species. With the double-threshold segmentation strategy, MSSort-DIAXMBD can reduce the number of the low confidence peptides required for manual inspections to less than 10% and be used as the last step of OpenSWATH to visualize and classify the MS/MS data of peptide precursors. SIGNIFICANCE: Although the output of OpenSWATH is controlled at 1% false discovery rate (FDR), the output report still contains many peptide precursors with low similarity fragments. At the last step of OpenSWATH for peptide quantification, researchers usually need to manually check the similarity of fragment XICs to distinguish the high confidence and the low confidence peptide precursors. However, manual inspection is inefficient. For instance, it takes about 50 h to sort even a small dataset of 1000 MS/MS spectra manually. In this paper we developed a software named MSSort-DIAXMBD to automatically recognize the high confidence precursors. We manually classify 50,000 peptide precursors as training set to train a convolutional neural network. After training finished, MSSort-DIAXMBD takes only a few minutes to automatically classify 20,000 peptide precursors, leaving a small portion of fuzzy ones for manual inspection. On the benchmarked dataset, MSSort-DIAXMBD can significantly improve the efficiency and accuracy of recognition of high confidence peptide precursors.
Assuntos
Aprendizado Profundo , Proteômica , Peptídeos/análise , Software , Espectrometria de Massas em TandemRESUMO
Background: Radiation proctitis is a common complication after radiotherapy for cervical cancer. Unlike simple radiation damage to other organs, radiation proctitis is a complex disease closely related to the microbiota. However, analysis of the gut microbiota is time-consuming and expensive. This study aims to mine rectal information using radiomics and incorporate it into a nomogram model for cheap and fast prediction of severe radiation proctitis prediction in postoperative cervical cancer patients. Methods: The severity of the patient's radiation proctitis was graded according to the RTOG/EORTC criteria. The toxicity grade of radiation proctitis over or equal to grade 2 was set as the model's target. A total of 178 patients with cervical cancer were divided into a training set (n = 124) and a validation set (n = 54). Multivariate logistic regression was used to build the radiomic and non-raidomic models. Results: The radiomics model [AUC=0.6855(0.5174-0.8535)] showed better performance and more net benefit in the validation set than the non-radiomic model [AUC=0.6641(0.4904-0.8378)]. In particular, we applied SHapley Additive exPlanation (SHAP) method for the first time to a radiomics-based logistic regression model to further interpret the radiomic features from case-based and feature-based perspectives. The integrated radiomic model enables the first accurate quantitative assessment of the probability of radiation proctitis in postoperative cervical cancer patients, addressing the limitations of the current qualitative assessment of the plan through dose-volume parameters only. Conclusion: We successfully developed and validated an integrated radiomic model containing rectal information. SHAP analysis of the model suggests that radiomic features have a supporting role in the quantitative assessment of the probability of radiation proctitis in postoperative cervical cancer patients.
RESUMO
The diminishing of the polarization effect is important in the applications of dielectric multilayer reflectors in many optical systems, such as low-loss broadband waveguides, optical fibers, and LEDs. Low-polarizing broadband reflections were identified from birefringent-guanine-crystal-based multilayer reflectors in the skins of some fish. Previous models for these intriguing natural optical phenomena suggested the combined action of two populations of guanine crystals with an orthogonal low-refractive-index optic axis. Here we report a novel realization of polarization-insensitive broadband reflectivity in a spider, Phoroncidia rubroargentea, based solely on the type of guanine crystals with the low-refractive-index optic axis normal to the crystal plates. We examined the three-dimensional structure of the guanine assembly in the spider and performed finite-difference time-domain (FDTD) optical modeling of the guanine-based multilayer reflector. Comparative modeling studies reveal that the biological selection of the guanine crystal type and specific spatial arrangement work synergistically to optimize the polarization-insensitive broadband reflection. This study demonstrates the importance of both crystallographic characteristics and 3D arrangement of guanine crystals in understanding relevant natural optical effects and also provides new insights into similar broadband, low-polarizing reflections in biological optical systems. Learning from relevant biofunctional assembly of guanine crystals could promote the bioinspired design of nonpolarizing dielectric multilayer reflectors.
RESUMO
We developed DreamDIAXMBD (denoted as DreamDIA), a software suite based on a deep representation model for data-independent acquisition (DIA) data analysis. DreamDIA adopts a data-driven strategy to capture comprehensive information from elution patterns of peptides in DIA data and achieves considerable improvements on both identification and quantification performance compared with other state-of-the-art methods such as OpenSWATH, Skyline and DIA-NN. Specifically, in contrast to existing methods which use only 6 to 10 selected fragment ions from spectral libraries, DreamDIA extracts additional features from hundreds of theoretical elution profiles originated from different ions of each precursor using a deep representation network. To achieve higher coverage of target peptides without sacrificing specificity, the extracted features are further processed by nonlinear discriminative models under the framework of positive-unlabeled learning with decoy peptides as affirmative negative controls. DreamDIA is publicly available at https://github.com/xmuyulab/DreamDIA-XMBD for high coverage and accuracy DIA data analysis.
Assuntos
Peptídeos/análise , Proteômica/métodos , SoftwareRESUMO
Multiple types of sleep arousal account for a large proportion of the causes of sleep disorders. The detection of sleep arousals is very important for diagnosing sleep disorders and reducing the risk of further complications including heart disease and cognitive impairment. Sleep arousal scoring is manually completed by sleep experts by checking the recordings of several periods of sleep polysomnography (PSG), which is a time-consuming and tedious work. Therefore, the development of efficient, fast, and reliable automatic sleep arousal detection system from PSG may provide powerful help for clinicians. This paper reviews the automatic arousal detection methods in recent years, which are based on statistical rules and deep learning methods. For statistical detection methods, three important processes are typically involved, including preprocessing, feature extraction and classifier selection. For deep learning methods, different models are discussed by now, including convolution neural network (CNN), recurrent neural network (RNN), long-term and short-term memory neural network (LSTM), residual neural network (ResNet), and the combinations of these neural networks. The prediction results of these neural network models are close to the judgments of human experts, and these methods have shown robust generalization capabilities on different data sets. Therefore, we conclude that the deep neural network will be the main research method of automatic arousal detection in the future.
RESUMO
There remains a significant gap in our quantitative understanding of crosstalk between apoptosis and necroptosis pathways. By employing the SWATH-MS technique, we quantified absolute amounts of up to thousands of proteins in dynamic assembling/de-assembling of TNF signaling complexes. Combining SWATH-MS-based network modeling and experimental validation, we found that when RIP1 level is below ~1000 molecules/cell (mpc), the cell solely undergoes TRADD-dependent apoptosis. When RIP1 is above ~1000 mpc, pro-caspase-8 and RIP3 are recruited to necrosome respectively with linear and nonlinear dependence on RIP1 amount, which well explains the co-occurrence of apoptosis and necroptosis and the paradoxical observations that RIP1 is required for necroptosis but its increase down-regulates necroptosis. Higher amount of RIP1 (>~46,000 mpc) suppresses apoptosis, leading to necroptosis alone. The relation between RIP1 level and occurrence of necroptosis or total cell death is biphasic. Our study provides a resource for encoding the complexity of TNF signaling and a quantitative picture how distinct dynamic interplay among proteins function as basis sets in signaling complexes, enabling RIP1 to play diverse roles in governing cell fate decisions.