RESUMO
The computer vision, graphics, and machine learning research groups have given a significant amount of focus to 3D object recognition (segmentation, detection, and classification). Deep learning approaches have lately emerged as the preferred method for 3D segmentation problems as a result of their outstanding performance in 2D computer vision. As a result, many innovative approaches have been proposed and validated on multiple benchmark datasets. This study offers an in-depth assessment of the latest developments in deep learning-based 3D object recognition. We discuss the most well-known 3D object recognition models, along with evaluations of their distinctive qualities.
RESUMO
Digital holographically sensed 3D data processing, which is useful for AI-based vision, is demonstrated. Three prominent methods of learning from datasets such as sensed holograms, computationally retrieved intensity and phase from holograms forming concatenated intensity-phase (whole information) images, and phase-only images (depth information) were utilized for the proposed multi-class classification and multi-output regression tasks of the chosen 3D objects in supervised learning. Each dataset comprised 2268 images obtained from the chosen eighteen 3D objects. The efficacy of our approaches was validated on experimentally generated digital holographic data then further quantified and compared using specific evaluation matrices. The machine learning classifiers had better AUC values for different classes on the holograms and whole information datasets compared to the CNN, whereas the CNN had a better performance on the phase-only image dataset compared to these classifiers. The MLP regressor was found to have a stable prediction in the test and validation sets with a fixed EV regression score of 0.00 compared to the CNN, the other regressors for holograms, and the phase-only image datasets, whereas the RF regressor showed a better performance in the validation set for the whole information dataset with a fixed EV regression score of 0.01 compared to the CNN and other regressors.
RESUMO
There are a large number of publicly available datasets of 3D data, they generally suffer from some drawbacks, such as small number of data samples, and class imbalance. Data augmentation is a set of techniques that aim to increase the size of datasets and solve such defects, and hence to overcome the problem of overfitting when training a classifier. In this paper, we propose a method to create new synthesized data by converting complete meshes into occluded 3D point clouds similar to those in real-world datasets. The proposed method involves two main steps, the first one is hidden surface removal (HSR), where the occluded parts of objects surfaces from the viewpoint of a camera are deleted. A low-complexity method has been proposed to implement HSR based on occupancy grids. The second step is a random sampling of the detected visible surfaces. The proposed two-step method is applied to a subset of ModelNet40 dataset to create a new dataset, which is then used to train and test three different deep-learning classifiers (VoxNet, PointNet, and 3DmFV). We studied classifiers performance as a function of the camera elevation angle. We also conducted another experiment to show how the newly generated data samples can improve the classification performance when they are combined with the original data during training process. Simulation results show that the proposed method enables us to create a large number of new data samples with a small size needed for storage. Results also show that the performance of classifiers is highly dependent on the elevation angle of the camera. In addition, there may exist some angles where performance degrades significantly. Furthermore, data augmentation using our created data improves the performance of classifiers not only when they are tested on the original data, but also on real data.
RESUMO
A vital and challenging task in computer vision is 3D Object Classification and Retrieval, with many practical applications such as an intelligent robot, autonomous driving, multimedia contents processing and retrieval, and augmented/mixed reality. Various deep learning methods were introduced for solving classification and retrieval problems of 3D objects. Almost all view-based methods use many views to handle spatial loss, although they perform the best among current techniques such as View-based, Voxelization, and Point Cloud methods. Many views make network structure more complicated due to the parallel Convolutional Neural Network (CNN). We propose a novel method that combines a Global Point Signature Plus with a Deep Wide Residual Network, namely GPSP-DWRN, in this paper. Global Point Signature Plus (GPSPlus) is a novel descriptor because it can capture more shape information of the 3D object for a single view. First, an original 3D model was converted into a colored one by applying GPSPlus. Then, a 32 × 32 × 3 matrix stored the obtained 2D projection of this color 3D model. This matrix was the input data of a Deep Residual Network, which used a single CNN structure. We evaluated the GPSP-DWRN for a retrieval task using the Shapnetcore55 dataset, while using two well-known datasets-ModelNet10 and ModelNet40 for a classification task. Based on our experimental results, our framework performed better than the state-of-the-art methods.