Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Proteome Res ; 23(6): 1983-1999, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38728051

RESUMO

In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.


Assuntos
Aprendizado Profundo , Humanos , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas em Tandem/estatística & dados numéricos , Proteômica/métodos , Proteômica/estatística & dados numéricos
2.
J Proteome Res ; 23(2): 550-559, 2024 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-38153036

RESUMO

In bottom-up proteomics, peptide-spectrum matching is critical for peptide and protein identification. Recently, deep learning models have been used to predict tandem mass spectra of peptides, enabling the calculation of similarity scores between the predicted and experimental spectra for peptide-spectrum matching. These models follow the supervised learning paradigm, which trains a general model using paired peptides and spectra from standard data sets and directly employs the model on experimental data. However, this approach can lead to inaccurate predictions due to differences between the training data and the experimental data, such as sample types, enzyme specificity, and instrument calibration. To tackle this problem, we developed a test-time training paradigm that adapts the pretrained model to generate experimental data-specific models, namely, PepT3. PepT3 yields a 10-40% increase in peptide identification depending on the variability in training and experimental data. Intriguingly, when applied to a patient-derived immunopeptidomic sample, PepT3 increases the identification of tumor-specific immunopeptide candidates by 60%. Two-thirds of the newly identified candidates are predicted to bind to the patient's human leukocyte antigen isoforms. To facilitate access of the model and all the results, we have archived all the intermediate files in Zenodo.org with identifier 8231084.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Proteínas , Modelos Teóricos , Proteômica/métodos , Algoritmos
3.
Sensors (Basel) ; 24(5)2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38475035

RESUMO

Spectrum prediction is a promising technique to release spectrum resources and plays an essential role in cognitive radio networks and spectrum situation generating. Traditional algorithms normally focus on one-dimensional or predict spectrum values in a slot-by-slot manner and thus cannot fully perceive the spectrum states in complex environments and lack timeliness. In this paper, a deep learning-based prediction method with a simple structure is developed for temporal-spectral and multi-slot spectrum prediction simultaneously. Specifically, we first analyze and construct spectrum data suitable for the model to simultaneously achieve long-term and multi-dimensional spectrum prediction. Then, a hierarchical spectrum prediction system is developed that takes advantage of the advanced Bi-ConvLSTM and the seq2seq framework. The Bi-ConvLSTM captures time-frequency characteristics of spectrum data, and the seq2seq framework is used for long-term spectrum prediction. Furthermore, the attention mechanism is used to address the limitations of the seq2seq framework that compresses all inputs into fixed-length vectors, resulting in information loss. Finally, the experimental results have shown that the proposed model has a significant advantage over the benchmark schemes. Especially, the proposed spectrum prediction model achieves 6.15%, 0.7749, 1.0978, and 0.9628 in MAPE, MAE, RMSE, and R2, respectively, which is better than all the baseline deep learning models.

4.
Sensors (Basel) ; 23(21)2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37960582

RESUMO

In general, judging the use/idle state of the wireless spectrum is the foundation for cognitive radio users (secondary users, SUs) to access limited spectrum resources efficiently. Rich information can be mined by the inherent correlation of electromagnetic spectrum data from SUs in time, frequency, space, and other dimensions. Therefore, how to efficiently use the spectrum status of each SU implementation of reception multidimensional combination forecasting is the core of this paper. In this paper, we propose a deep-learning hybrid model called TensorGCN-LSTM based on the tensor data structure. The model treats SUs deployed at different spatial locations under the same frequency, and the spectrum status of SUs themselves under different frequencies in the task area as nodes and constructs two types of graph structures. Graph convolutional operations are used to sequentially extract corresponding spatial-domain and frequency-domain features from the two types of graph structures. Then, the long short-term memory (LSTM) model is used to fuse the spatial, frequency, and temporal features of the cognitive radio environment data. Finally, the prediction task of the spectrum distribution situation is accomplished through fully connected layers. Specifically, the model constructs a tensor graph based on the spatial similarity of SUs' locations and the frequency correlation between different frequency signals received by SUs, which describes the electromagnetic wave's dependency relationship in spatial and frequency domains. LSTM is used to capture the electromagnetic wave's dependency relationship in the temporal domain. To evaluate the effectiveness of the model, we conducted ablation experiments on LSTM, GCN, GC-LSTM, and TensorGCN-LSTM models using simulated data. The experimental results showed that our model achieves better prediction performance in RMSE, and the correlation coefficient R2 of 0.8753 also confirms the feasibility of the model.

5.
J Proteome Res ; 20(1): 634-644, 2021 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32985198

RESUMO

Liquid chromatography tandem mass spectrometry (LC-MS/MS) has been the most widely used technology for phosphoproteomics studies. As an alternative to database searching and probability-based phosphorylation site localization approaches, spectral library searching has been proved to be effective in the identification of phosphopeptides. However, incompletion of experimental spectral libraries limits the identification capability. Herein, we utilize MS/MS spectrum prediction coupled with spectral matching for site localization of phosphopeptides. In silico MS/MS spectra are generated from peptide sequences by deep learning/machine learning models trained with nonphosphopeptides. Then, mass shift according to phosphorylation sites, phosphoric acid neutral loss, and a "budding" strategy are adopted to adjust the in silico mass spectra. In silico MS/MS spectra can also be generated in one step for phosphopeptides using models trained with phosphopeptides. The method is benchmarked on data sets of synthetic phosphopeptides and is used to process real biological samples. It is demonstrated to be a method requiring only computational resources that supplements the probability-based approaches for phosphorylation site localization of singly and multiply phosphorylated peptides.


Assuntos
Fosfopeptídeos , Espectrometria de Massas em Tandem , Cromatografia Líquida , Simulação por Computador , Fosfopeptídeos/metabolismo , Fosforilação
6.
J Proteome Res ; 20(5): 2570-2582, 2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33821641

RESUMO

In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.


Assuntos
Aprendizado Profundo , Espectrometria de Massas em Tandem , Algoritmos , Redes Neurais de Computação , Peptídeos , Proteoma
7.
Nanotechnology ; 32(33)2021 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-33971632

RESUMO

The development of nanophotonic devices has presented a revolutionary means to manipulate light at nanoscale. How to efficiently design these devices is an active area of research. Recently, artificial neural networks (ANNs) have displayed powerful ability in the inverse design of nanophotonic devices. However, there is limited research on the inverse design for modeling and learning the sequence characteristics of a spectrum. In this work, we propose a deep learning method based on an improved recurrent neural network to extract the sequence characteristics of a spectrum and achieve inverse design and spectrum prediction. A key feature of the network is that the memory or feedback loops it comprises allow it to effectively recognize time series data. In the context of nanorods hyperbolic metamaterials, we demonstrated the high consistency between the target spectrum and the predicted spectrum, and the network learned the deep physical relationship concerning the structural parameter changes reflected on the spectrum. The effectiveness of our approach is also tested by user-drawn spectra. Moreover, the proposed model is capable of predicting an unknown spectrum based on a known spectrum with only 0.32% mean relative error. The prediction model may be helpful to predict data beyond the detection limit. We propose this versatile method as an effective and accurate alternative to the application of ANNs in nanophotonics, paving way for fast and accurate design of desired devices.

8.
Molecules ; 26(11)2021 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-34200052

RESUMO

A systematic investigation of the experimental 13C-NMR spectra published in Molecules during the period of 1996 to 2015 with respect to their quality using CSEARCH-technology is described. It is shown that the systematic application of the CSEARCH-Robot-Referee during the peer-reviewing process prohibits at least the most trivial assignment errors and wrong structure proposals. In many cases, the correction of the assignments/chemical shift values is possible by manual inspection of the published tables; in certain cases, reprocessing of the original experimental data might help to clarify the situation, showing the urgent need for a public domain repository. A comparison of the significant key numbers derived for Molecules against those of other important journals in the field of natural product chemistry shows a quite similar level of quality for all publishers responsible for the six journals under investigation. From the results of this study, general rules for data handling, data storage, and manuscript preparation can be derived, helping to increase the quality of published NMR-data and making these data available as validated reference material.

9.
J Proteome Res ; 19(8): 3230-3237, 2020 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-32539411

RESUMO

Data dependent acquisition (DDA) and data independent acquisition (DIA) are traditionally separate experimental paradigms in bottom-up proteomics. In this work, we developed a strategy combining the two experimental methods into a single LC-MS/MS run. We call the novel strategy data dependent-independent acquisition proteomics, or DDIA for short. Peptides identified from DDA scans by a conventional and robust DDA identification workflow provide useful information for interrogation of DIA scans. Deep learning based LC-MS/MS property prediction tools, developed previously, can be used repeatedly to produce spectral libraries facilitating DIA scan extraction. A complete DDIA data processing pipeline, including the modules for iRT vs RT calibration curve generation, DIA extraction classifier training, and false discovery rate control, has been developed. Compared to another spectral library-free method, DIA-Umpire, the DDIA method produced a similar number of peptide identifications, but nearly twice as many protein group identifications. The primary advantage of the DDIA method is that it requires minimal information for processing its data.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Peptídeos , Proteínas
10.
ACS Appl Mater Interfaces ; 16(37): 49673-49686, 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39231373

RESUMO

In this paper, a multineural network fusion freestyle metasurface on-demand design method is proposed. The on-demand design method involves rapidly generating corresponding metasurface patterns based on the user-defined spectrum. The generated patterns are then input into a simulator to predict their corresponding S-parameter spectrogram, which is subsequently analyzed against the real S-parameter spectrogram to verify whether the generated metasurface patterns meet the desired requirements. The methodology is based on three neural network models: a Wasserstein Generative Adversarial Network model with a U-net architecture (U-WGAN) for inverse structural design, a Variational Autoencoder (VAE) model for compression, and an LSTM + Attention model for forward S-parameter spectrum prediction validation. The U-WGAN is utilized for on-demand reverse structural design, aiming to rapidly discover high-fidelity metasurface patterns that meet specific electromagnetic spectrum responses. The VAE, as a probabilistic generation model, serves as a bridge, mapping input data to latent space and transforming it into latent variable data, providing crucial input for a forward S-parameter spectrum prediction model. The LSTM + Attention network, acting as a forward S-parameter spectrum prediction model, can accurately and efficiently predict the S-parameter spectrum corresponding to the latent variable data and compare it with the real spectrum. In addition, the digits "0" and "1" are used in the design to represent vacuum and metallic materials, respectively, and a 10 × 10 cell array of freestyle metasurface patterns is constructed. The significance of the research method proposed in this paper lies in the following: (1) The freestyle metasurface design significantly expands the possibility of metamaterial design, enabling the creation of diverse metasurface structures that are difficult to achieve with traditional methods. (2) The on-demand design approach can generate high-fidelity metasurface patterns that meet the expected electromagnetic characteristics and responses. (3) The fusion of multiple neural networks demonstrates high flexibility, allowing for the adjustment of network structures and training methods based on specific design requirements and data characteristics, thus better accommodating different design problems and optimization objectives.

11.
Mass Spectrom (Tokyo) ; 12(1): A0120, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37250593

RESUMO

Electron ionization (EI) mass spectrum library searching is usually performed to identify a compound in gas chromatography/mass spectrometry. However, compounds whose EI mass spectra are registered in the library are still limited compared to the popular compound databases. This means that there are compounds that cannot be identified by conventional library searching but also may result in false positives. In this report, we report on the development of a machine learning model, which was trained using chemical formulae and EI mass spectra, that can predict the EI mass spectrum from the chemical structure. It allowed us to create a predicted EI mass spectrum database with predicted EI mass spectra for 100 million compounds in PubChem. We also propose a method for improving library searching time and accuracy that includes an extensive mass spectrum library.

12.
Prog Chem Org Nat Prod ; 105: 137-215, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28194563

RESUMO

Nuclear Magnetic Resonance spectroscopy contributes very efficiently to the structure elucidation process in organic chemistry. Carbon-13 NMR spectroscopy allows direct insight into the skeleton of organic compounds and therefore plays a central role in the structural assignment of natural products. Despite this important contribution, there is no established and well-accepted workflow protocol utilized during the first steps of interpreting spectroscopic data and converting them into structural fragments and then combining them, by considering the given spectroscopic constraints, into a final proposal of structure. The so-called "combinatorial explosion" in the process of structure generation allows in many cases the generation of reasonable alternatives, which are usually ignored during manual interpretation of the measured data leading ultimately to a large number of structural revisions. Furthermore, even when the determined structure is correct, problems may exist such as assignment errors, ignoring chemical shift values, or assigning lines of impurities to the compound under consideration. An extremely large heterogeneity in the presentation of carbon NMR data can be observed, but, as a result of the efficiency and precision of spectrum prediction, the published data can be analyzed in substantial detail.This contribution presents a comprehensive analysis of frequently occurring errors with respect to 13C NMR spectroscopic data and proposes a straightforward protocol to eliminate a high percentage of the most obvious errors. The procedure discussed can be integrated readily into the processes of submission and peer-reviewing of manuscripts.


Assuntos
Produtos Biológicos/química , Espectroscopia de Ressonância Magnética Nuclear de Carbono-13
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA