RESUMO
The centrifugal pump is the workhorse of many industrial and domestic applications, such as water supply, wastewater treatment and heating. While modern pumps are reliable, their unexpected failures may jeopardise safety or lead to significant financial losses. Consequently, there is a strong demand for early fault diagnosis, detection and predictive monitoring systems. Most prior work on machine learning-based centrifugal pump fault detection is based on either synthetic data, simulations or data from test rigs in controlled laboratory conditions. In this research, we attempted to detect centrifugal pump faults using data collected from real operational pumps deployed in various places in collaboration with a specialist pump engineering company. The detection was done by the binary classification of visual features of DQ/Concordia patterns with residual networks. Besides using a real dataset, this study employed transfer learning from the image detection domain to systematically solve a real-life problem in the engineering domain. By feeding DQ image data into a popular and high-performance residual network (e.g., ResNet-34), the proposed approach achieved up to 85.51% classification accuracy.
RESUMO
In dynamic indoor environments and for a Visual Simultaneous Localization and Mapping (vSLAM) system to operate, moving objects should be considered because they could affect the system's visual odometer stability and its position estimation accuracy. vSLAM can use feature points or a sequence of images, as it is the only source of input that can perform localization while simultaneously creating a map of the environment. A vSLAM system based on ORB-SLAM3 and on YOLOR was proposed in this paper. The newly proposed system in combination with an object detection model (YOLOX) applied on extracted feature points is capable of achieving 2-4% better accuracy compared to VPS-SLAM and DS-SLAM. Static feature points such as signs and benches were used to calculate the camera position, and dynamic moving objects were eliminated by using the tracking thread. A specific custom personal dataset that includes indoor and outdoor RGB-D pictures of train stations, including dynamic objects and high density of people, ground truth data, sequence data, and video recordings of the train stations and X, Y, Z data was used to validate and evaluate the proposed method. The results show that ORB-SLAM3 with YOLOR as object detection achieves 89.54% of accuracy in dynamic indoor environments compared to previous systems such as VPS-SLAM.
Assuntos
Algoritmos , Fluxo Óptico , Humanos , Gravação em VídeoRESUMO
Recent developments in computational photography enabled variation of the optical focus of a plenoptic camera after image exposure, also known as refocusing. Existing ray models in the field simplify the camera's complexity for the purpose of image and depth map enhancement, but fail to satisfyingly predict the distance to which a photograph is refocused. By treating a pair of light rays as a system of linear functions, it will be shown in this paper that its solution yields an intersection indicating the distance to a refocused object plane. Experimental work is conducted with different lenses and focus settings while comparing distance estimates with a stack of refocused photographs for which a blur metric has been devised. Quantitative assessments over a 24 m distance range suggest that predictions deviate by less than 0.35 % in comparison to an optical design software. The proposed refocusing estimator assists in predicting object distances just as in the prototyping stage of plenoptic cameras and will be an essential feature in applications demanding high precision in synthetic focus or where depth map recovery is done by analyzing a stack of refocused photographs.
RESUMO
The Standard Plenoptic Camera (SPC) is an innovation in photography, allowing for acquiring two-dimensional images focused at different depths, from a single exposure. Contrary to conventional cameras, the SPC consists of a micro lens array and a main lens projecting virtual lenses into object space. For the first time, the present research provides an approach to estimate the distance and depth of refocused images extracted from captures obtained by an SPC. Furthermore, estimates for the position and baseline of virtual lenses which correspond to an equivalent camera array are derived. On the basis of paraxial approximation, a ray tracing model employing linear equations has been developed and implemented using Matlab. The optics simulation tool Zemax is utilized for validation purposes. By designing a realistic SPC, experiments demonstrate that a predicted image refocusing distance at 3.5 m deviates by less than 11% from the simulation in Zemax, whereas baseline estimations indicate no significant difference. Applying the proposed methodology will enable an alternative to the traditional depth map acquisition by disparity analysis.
RESUMO
Malaria is one of the life-threatening diseases caused by the parasite known as Plasmodium falciparum, affecting the human red blood cells. Therefore, it is an important to have an effective computer-aided system in place for early detection and treatment. The visual heterogeneity of the malaria dataset is highly complex and dynamic, therefore higher number of images are needed to train the machine learning (ML) models effectively. However, hospitals as well as medical institutions do not share the medical image data for collaboration due to general data protection regulations (GDPR) and the data protection act (DPA). To overcome this collaborative challenge, our research utilised real-time medical image data in the framework of federated learning (FL). We have used state-of-the-art ML models that include the ResNet-50 and DenseNet in a federated learning framework. We have experimented both models in different settings on a malaria dataset constituting 27,560 publicly available images and our preliminary results showed that the DenseNet model performed better in accuracy (75%) in contrast to ResNet-50 (72%) while considering eight clients, while the trend was observed as common in four clients with the similar accuracy of 94%, and six clients showed that the DenseNet model performed quite well with the accuracy of 92%, while ResNet-50 achieved only 72%. The federated learning framework enhances the accuracy due to its decentralised nature, continuous learning, and effective communication among clients, as well as the efficient local adaptation. The use of federated learning architecture among the distinct clients for ensuring the data privacy and following GDPR is the contribution of this research work.
RESUMO
The standard separable 2-D wavelet transform (WT) has recently achieved a great success in image processing because it provides a sparse representation of smooth images. However, it fails to efficiently capture 1-D discontinuities, like edges or contours. These features, being elongated and characterized by geometrical regularity along different directions, intersect and generate many large magnitude wavelet coefficients. Since contours are very important elements in the visual perception of images, to provide a good visual quality of compressed images, it is fundamental to preserve good reconstruction of these directional features. In our previous work, we proposed a construction of critically sampled perfect reconstruction transforms with directional vanishing moments imposed in the corresponding basis functions along different directions, called directionlets. In this paper, we show how to design and implement a novel efficient space-frequency quantization (SFQ) compression algorithm using directionlets. Our new compression method outperforms the standard SFQ in a rate-distortion sense, both in terms of mean-square error and visual quality, especially in the low-rate compression regime. We also show that our compression method, does not increase the order of computational complexity as compared to the standard SFQ algorithm.
Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Gráficos por Computador/normas , Análise Numérica Assistida por ComputadorRESUMO
In spite of the success of the standard wavelet transform (WT) in image processing in recent years, the efficiency of its representation is limited by the spatial isotropy of its basis functions built in the horizontal and vertical directions. One-dimensional (1-D) discontinuities in images (edges and contours) that are very important elements in visual perception, intersect too many wavelet basis functions and lead to a nonsparse representation. To efficiently capture these anisotropic geometrical structures characterized by many more than the horizontal and vertical directions, a more complex multidirectional (M-DIR) and anisotropic transform is required. We present a new lattice-based perfect reconstruction and critically sampled anisotropic M-DIR WT. The transform retains the separable filtering and subsampling and the simplicity of computations and filter design from the standard two-dimensional WT, unlike in the case of some other directional transform constructions (e.g., curvelets, contourlets, or edgelets). The corresponding anisotropic basis unctions (directionlets) have directional vanishing moments along any two directions with rational slopes. Furthermore, we show that this novel transform provides an efficient tool for nonlinear approximation of images, achieving the approximation power O(N(-1.55)), which, while slower than the optimal rate O(N(-2)), is much better than O(N(-1)) achieved with wavelets, but at similar complexity.
Assuntos
Algoritmos , Artefatos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Processamento de Sinais Assistido por Computador , Anisotropia , Gráficos por Computador , Simulação por Computador , Filtração/métodos , Análise Numérica Assistida por Computador , Processos EstocásticosRESUMO
We derive an optimization framework for joint view and rate scalable coding of multi-view video content represented in the texture plus depth format. The optimization enables the sender to select the subset of coded views and their encoding rates such that the aggregate distortion over a continuum of synthesized views is minimized. We construct the view and rate embedded bitstream such that it delivers optimal performance simultaneously over a discrete set of transmission rates. In conjunction, we develop a user interaction model that characterizes the view selection actions of the client as a Markov chain over a discrete state-space. We exploit the model within the context of our optimization to compute user-action-driven coding strategies that aim at enhancing the client's performance in terms of latency and video quality. Our optimization outperforms the state-of-the-art H.264 SVC codec as well as a multi-view wavelet-based coder equipped with a uniform rate allocation strategy, across all scenarios studied in our experiments. Equally important, we can achieve an arbitrarily fine granularity of encoding bit rates, while providing a novel functionality of view embedded encoding, unlike the other encoding methods that we examined. Finally, we observe that the interactivity-aware coding delivers superior performance over conventional allocation techniques that do not anticipate the client's view selection actions in their operation.
RESUMO
In this paper, we present a novel wavelet-based compression algorithm for multiview images. This method uses a layer-based representation, where the 3-D scene is approximated by a set of depth planes with their associated constant disparities. The layers are extracted from a collection of images captured at multiple viewpoints and transformed using the 3-D discrete wavelet transform (DWT). The DWT consists of the 1-D disparity compensated DWT across the viewpoints and the 2-D shape-adaptive DWT across the spatial dimensions. Finally, the wavelet coefficients are quantized and entropy coded along with the layer contours. To improve the rate-distortion performance of the entire coding method, we develop a bit allocation strategy for the distribution of the available bit budget between encoding the layer contours and the wavelet coefficients. The achieved performance of our proposed scheme outperforms the state-of-the-art codecs for several data sets of varying complexity.
RESUMO
The encoding of both texture and depth maps of multiview images, captured by a set of spatially correlated cameras, is important for any 3-D visual communication system based on depth-image-based rendering (DIBR). In this paper, we address the problem of efficient bit allocation among texture and depth maps of multiview images. More specifically, suppose we are given a coding tool to encode texture and depth maps at the encoder and a view-synthesis tool to construct intermediate views at the decoder using neighboring encoded texture and depth maps. Our goal is to determine how to best select captured views for encoding and distribute available bits among texture and depth maps of selected coded views, such that the visual distortion of desired constructed views is minimized. First, in order to obtain at the encoder a low complexity estimate of the visual quality of a large number of desired synthesized views, we derive a cubic distortion model based on basic DIBR properties, whose parameters are obtained using only a small number of viewpoint samples. Then, we demonstrate that the optimal selection of coded views and quantization levels for corresponding texture and depth maps is equivalent to the shortest path in a specially constructed 3-D trellis. Finally, we show that, using the assumptions of monotonicity in the predictor's quantization level and distance, suboptimal solutions can be efficiently pruned from the feasible space during solution search. Experiments show that our proposed efficient selection of coded views and quantization levels for corresponding texture and depth maps outperforms an alternative scheme using constant quantization levels for all maps (commonly used in video standard implementations) by up to 1.5 dB. Moreover, the complexity of our scheme can be reduced by at least 80% over the full solution search.