Machine learning to predict retention time of small molecules in nano-HPLC.
Anal Bioanal Chem
; 412(28): 7767-7776, 2020 Nov.
Article
en En
| MEDLINE
| ID: mdl-32860519
Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning-based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Tipo de estudio:
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
Anal Bioanal Chem
Año:
2020
Tipo del documento:
Article
País de afiliación:
Rusia
Pais de publicación:
Alemania