RESUMO
Blood oxygen saturation (SpO2) is an essential physiological parameter for evaluating a person's health. While conventional SpO2 measurement devices like pulse oximeters require skin contact, advanced computer vision technology can enable remote SpO2 monitoring through a regular camera without skin contact. In this paper, we propose novel deep learning models to measure SpO2 remotely from facial videos and evaluate them using a public benchmark database, VIPL-HR. We utilize a spatial-temporal representation to encode SpO2 information recorded by conventional RGB cameras and directly pass it into selected convolutional neural networks to predict SpO2. The best deep learning model achieves 1.274% in mean absolute error and 1.71% in root mean squared error, which exceed the international standard of 4% for an approved pulse oximeter. Our results significantly outperform the conventional analytical Ratio of Ratios model for contactless SpO2 measurement. Results of sensitivity analyses of the influence of spatial-temporal representation color spaces, subject scenarios, acquisition devices, and SpO2 ranges on the model performance are reported with explainability analyses to provide more insights for this emerging research field.