RESUMEN
Raman spectroscopy has been widely used for label-free biomolecular analysis of cells and tissues for pathological diagnosis in vitro and in vivo. AI technology facilitates disease diagnosis based on Raman spectroscopy, including machine learning (PCA and SVM), manifold learning (UMAP), and deep learning (ResNet and AlexNet). However, it is not clear how to optimize the appropriate AI classification model for different types of Raman spectral data. Here, we selected five representative Raman spectral data sets, including endometrial carcinoma, hepatoma extracellular vesicles, bacteria, melanoma cell, diabetic skin, with different characteristics regarding sample size, spectral data size, Raman shift range, tissue sites, Kullback-Leibler (KL) divergence, and significant Raman shifts (i.e., wavenumbers with significant differences between groups), to explore the performance of different AI models (e.g., PCA-SVM, SVM, UMAP-SVM, ResNet or AlexNet). For data set of large spectral data size, Resnet performed better than PCA-SVM and UMAP. By building data characteristic-assisted AI classification model, we optimized the network parameters (e.g., principal components, activation function, and loss function) of AI model based on data size and KL divergence etc. The accuracy improved from 85.1 to 94.6% for endometrial carcinoma grading, from 77.1 to 90.7% for hepatoma extracellular vesicles detection, from 89.3 to 99.7% for melanoma cell detection, from 88.1 to 97.9% for bacterial identification, from 53.7 to 85.5% for diabetic skin screening, and mean time expense of 5 s.