Your browser doesn't support javascript.
loading
Machine learning to predict retention time of small molecules in nano-HPLC.
Osipenko, Sergey; Bashkirova, Inga; Sosnin, Sergey; Kovaleva, Oxana; Fedorov, Maxim; Nikolaev, Eugene; Kostyukevich, Yury.
Afiliación
  • Osipenko S; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia.
  • Bashkirova I; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia.
  • Sosnin S; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia.
  • Kovaleva O; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia.
  • Fedorov M; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia.
  • Nikolaev E; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia. e.nikolaev@skoltech.ru.
  • Kostyukevich Y; Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205, Moscow, Russia. y.kostyukevich@skoltech.ru.
Anal Bioanal Chem ; 412(28): 7767-7776, 2020 Nov.
Article en En | MEDLINE | ID: mdl-32860519
Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning-based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Anal Bioanal Chem Año: 2020 Tipo del documento: Article País de afiliación: Rusia Pais de publicación: Alemania

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Anal Bioanal Chem Año: 2020 Tipo del documento: Article País de afiliación: Rusia Pais de publicación: Alemania