RESUMO
Single-photon-sensitive depth sensors are being increasingly used in next-generation electronics for human pose and gesture recognition. However, cost-effective sensors typically have a low spatial resolution, restricting their use to basic motion identification and simple object detection. Here, we perform a temporal to spatial mapping that drastically increases the resolution of a simple time-of-flight sensor, i.e., an initial resolution of 4 × 4 pixels to depth images of resolution 32 × 32 pixels. The output depth maps can then be used for accurate three-dimensional human pose estimation of multiple people. We develop a new explainable framework that provides intuition to how our network uses its input data and provides key information about the relevant parameters. Our work greatly expands the use cases of simple single-photon avalanche detector time-of-flight sensors and opens up promising possibilities for future super-resolution techniques applied to other types of sensors with similar data types, i.e., radar and sonar.