Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Sensors (Basel) ; 24(11)2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38894233

RESUMEN

This paper proposes a multimodal Transformer model that uses time-series data to detect and predict winter road surface conditions. For detecting or predicting road surface conditions, the previous approach focuses on the cooperative use of multiple modalities as inputs, e.g., images captured by fixed-point cameras (road surface images) and auxiliary data related to road surface conditions under simple modality integration. Although such an approach achieves performance improvement compared to the method using only images or auxiliary data, there is a demand for further consideration of the way to integrate heterogeneous modalities. The proposed method realizes a more effective modality integration using a cross-attention mechanism and time-series processing. Concretely, when integrating multiple modalities, feature compensation through mutual complementation between modalities is realized through a feature integration technique based on a cross-attention mechanism, and the representational ability of the integrated features is enhanced. In addition, by introducing time-series processing for the input data across several timesteps, it is possible to consider the temporal changes in the road surface conditions. Experiments are conducted for both detection and prediction tasks using data corresponding to the current winter condition and data corresponding to a few hours after the current winter condition, respectively. The experimental results verify the effectiveness of the proposed method for both tasks. In addition to the construction of the classification model for winter road surface conditions, we first attempt to visualize the classification results, especially the prediction results, through the image style transfer model as supplemental extended experiments on image generation at the end of the paper.

2.
Sensors (Basel) ; 24(10)2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38793890

RESUMEN

In our digitally driven society, advances in software and hardware to capture video data allow extensive gathering and analysis of large datasets. This has stimulated interest in extracting information from video data, such as buildings and urban streets, to enhance understanding of the environment. Urban buildings and streets, as essential parts of cities, carry valuable information relevant to daily life. Extracting features from these elements and integrating them with technologies such as VR and AR can contribute to more intelligent and personalized urban public services. Despite its potential benefits, collecting videos of urban environments introduces challenges because of the presence of dynamic objects. The varying shape of the target building in each frame necessitates careful selection to ensure the extraction of quality features. To address this problem, we propose a novel evaluation metric that considers the video-inpainting-restoration quality and the relevance of the target object, considering minimizing areas with cars, maximizing areas with the target building, and minimizing overlapping areas. This metric extends existing video-inpainting-evaluation metrics by considering the relevance of the target object and interconnectivity between objects. We conducted experiment to validate the proposed metrics using real-world datasets from Japanese cities Sapporo and Yokohama. The experiment results demonstrate feasibility of selecting video frames conducive to building feature extraction.

3.
Sensors (Basel) ; 24(10)2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38793888

RESUMEN

In this study, we propose a classification method of expert-novice levels using a graph convolutional network (GCN) with a confidence-aware node-level attention mechanism. In classification using an attention mechanism, highlighted features may not be significant for accurate classification, thereby degrading classification performance. To address this issue, the proposed method introduces a confidence-aware node-level attention mechanism into a spatiotemporal attention GCN (STA-GCN) for the classification of expert-novice levels. Consequently, our method can contrast the attention value of each node on the basis of the confidence measure of the classification, which solves the problem of classification approaches using attention mechanisms and realizes accurate classification. Furthermore, because the expert-novice levels have ordinalities, using a classification model that considers ordinalities improves the classification performance. The proposed method involves a model that minimizes a loss function that considers the ordinalities of classes to be classified. By implementing the above approaches, the expert-novice level classification performance is improved.

4.
Sensors (Basel) ; 24(10)2024 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-38793943

RESUMEN

The advancements in deep learning have significantly enhanced the capability of image generation models to produce images aligned with human intentions. However, training and adapting these models to new data and tasks remain challenging because of their complexity and the risk of catastrophic forgetting. This study proposes a method for addressing these challenges involving the application of class-replacement techniques within a continual learning framework. This method utilizes selective amnesia (SA) to efficiently replace existing classes with new ones while retaining crucial information. This approach improves the model's adaptability to evolving data environments while preventing the loss of past information. We conducted a detailed evaluation of class-replacement techniques, examining their impact on the "class incremental learning" performance of models and exploring their applicability in various scenarios. The experimental results demonstrated that our proposed method could enhance the learning efficiency and long-term performance of image generation models. This study broadens the application scope of image generation technology and supports the continual improvement and adaptability of corresponding models.

5.
Sensors (Basel) ; 24(3)2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38339636

RESUMEN

Text-guided image editing has been highlighted in the fields of computer vision and natural language processing in recent years. The approach takes an image and text prompt as input and aims to edit the image in accordance with the text prompt while preserving text-unrelated regions. The results of text-guided image editing differ depending on the way the text prompt is represented, even if it has the same meaning. It is up to the user to decide which result best matches the intended use of the edited image. This paper assumes a situation in which edited images are posted to social media and proposes a novel text-guided image editing method to help the edited images gain attention from a greater audience. In the proposed method, we apply the pre-trained text-guided image editing method and obtain multiple edited images from the multiple text prompts generated from a large language model. The proposed method leverages the novel model that predicts post scores representing engagement rates and selects one image that will gain the most attention from the audience on social media among these edited images. Subject experiments on a dataset of real Instagram posts demonstrate that the edited images of the proposed method accurately reflect the content of the text prompts and provide a positive impression to the audience on social media compared to those of previous text-guided image editing methods.


Asunto(s)
Medios de Comunicación Sociales , Humanos , Lenguaje , Procesamiento de Lenguaje Natural
6.
Sensors (Basel) ; 23(3)2023 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-36772095

RESUMEN

Auxiliary clinical diagnosis has been researched to solve unevenly and insufficiently distributed clinical resources. However, auxiliary diagnosis is still dominated by human physicians, and how to make intelligent systems more involved in the diagnosis process is gradually becoming a concern. An interactive automated clinical diagnosis with a question-answering system and a question generation system can capture a patient's conditions from multiple perspectives with less physician involvement by asking different questions to drive and guide the diagnosis. This clinical diagnosis process requires diverse information to evaluate a patient from different perspectives to obtain an accurate diagnosis. Recently proposed medical question generation systems have not considered diversity. Thus, we propose a diversity learning-based visual question generation model using a multi-latent space to generate informative question sets from medical images. The proposed method generates various questions by embedding visual and language information in different latent spaces, whose diversity is trained by our newly proposed loss. We have also added control over the categories of generated questions, making the generated questions directional. Furthermore, we use a new metric named similarity to accurately evaluate the proposed model's performance. The experimental results on the Slake and VQA-RAD datasets demonstrate that the proposed method can generate questions with diverse information. Our model works with an answering model for interactive automated clinical diagnosis and generates datasets to replace the process of annotation that incurs huge labor costs.


Asunto(s)
Procesamiento de Lenguaje Natural , Semántica , Humanos , Lenguaje
7.
Sensors (Basel) ; 23(10)2023 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-37430712

RESUMEN

In this paper, we propose a hierarchical multi-modal multi-label attribute classification model for anime illustrations using a graph convolutional network (GCN). Our focus is on the challenging task of multi-label attribute classification, which requires capturing subtle features intentionally highlighted by creators of anime illustrations. To address the hierarchical nature of these attributes, we leverage hierarchical clustering and hierarchical label assignments to organize the attribute information into a hierarchical feature. The proposed GCN-based model effectively utilizes this hierarchical feature to achieve high accuracy in multi-label attribute classification. The contributions of the proposed method are as follows. Firstly, we introduce GCN to the multi-label attribute classification task of anime illustrations, enabling the capturing of more comprehensive relationships between attributes from their co-occurrence. Secondly, we capture subordinate relationships among the attributes by adopting hierarchical clustering and hierarchical label assignment. Lastly, we construct a hierarchical structure of attributes that appear more frequently in anime illustrations based on certain rules derived from previous studies, which helps to reflect the relationships between different attributes. The experimental results on multiple datasets show that the proposed method is effective and extensible by comparing it with some existing methods, including the state-of-the-art method.

8.
Sensors (Basel) ; 23(15)2023 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-37571685

RESUMEN

Zero-shot neural decoding aims to decode image categories, which were not previously trained, from functional magnetic resonance imaging (fMRI) activity evoked when a person views images. However, having insufficient training data due to the difficulty in collecting fMRI data causes poor generalization capability. Thus, models suffer from the projection domain shift problem when novel target categories are decoded. In this paper, we propose a zero-shot neural decoding approach with semi-supervised multi-view embedding. We introduce the semi-supervised approach that utilizes additional images related to the target categories without fMRI activity patterns. Furthermore, we project fMRI activity patterns into a multi-view embedding space, i.e., visual and semantic feature spaces of viewed images to effectively exploit the complementary information. We define several source and target groups whose image categories are very different and verify the zero-shot neural decoding performance. The experimental results demonstrate that the proposed approach rectifies the projection domain shift problem and outperforms existing methods.

9.
Sensors (Basel) ; 23(9)2023 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-37177744

RESUMEN

This study proposes a novel off-screen sound separation method based on audio-visual pre-training. In the field of audio-visual analysis, researchers have leveraged visual information for audio manipulation tasks, such as sound source separation. Although such audio manipulation tasks are based on correspondences between audio and video, these correspondences are not always established. Specifically, sounds coming from outside a screen have no audio-visual correspondences and thus interfere with conventional audio-visual learning. The proposed method separates such off-screen sounds based on their arrival directions using binaural audio, which provides us with three-dimensional sensation. Furthermore, we propose a new pre-training method that can consider the off-screen space and use the obtained representation to improve off-screen sound separation. Consequently, the proposed method can separate off-screen sounds irrespective of the direction from which they arrive. We conducted our evaluation using generated video data to circumvent the problem of difficulty in collecting ground truth for off-screen sounds. We confirmed the effectiveness of our methods through off-screen sound detection and separation tasks.

10.
Sensors (Basel) ; 23(9)2023 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-37177712

RESUMEN

In soccer, quantitatively evaluating the performance of players and teams is essential to improve tactical coaching and players' decision-making abilities. To achieve this, some methods use predicted probabilities of shoot event occurrences to quantify player performances, but conventional shoot prediction models have not performed well and have failed to consider the reliability of the event probability. This paper proposes a novel method that effectively utilizes players' spatio-temporal relations and prediction uncertainty to predict shoot event occurrences with greater accuracy and robustness. Specifically, we represent players' relations as a complete bipartite graph, which effectively incorporates soccer domain knowledge, and capture latent features by applying a graph convolutional recurrent neural network (GCRNN) to the constructed graph. Our model utilizes a Bayesian neural network to predict the probability of shoot event occurrence, considering spatio-temporal relations between players and prediction uncertainty. In our experiments, we confirmed that the proposed method outperformed several other methods in terms of prediction performance, and we found that considering players' distances significantly affects the prediction accuracy.

11.
Sensors (Basel) ; 23(23)2023 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-38067982

RESUMEN

Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road and reduce traffic accidents. Most existing methods for traffic sign recognition use convolutional neural networks (CNNs) and can achieve high recognition accuracy. However, these methods first require a large number of carefully crafted traffic sign datasets for the training process. Moreover, since traffic signs differ in each country and there is a variety of traffic signs, these methods need to be fine-tuned when recognizing new traffic sign categories. To address these issues, we propose a traffic sign matching method for zero-shot recognition. Our proposed method can perform traffic sign recognition without training data by directly matching the similarity of target and template traffic sign images. Our method uses the midlevel features of CNNs to obtain robust feature representations of traffic signs without additional training or fine-tuning. We discovered that midlevel features improve the accuracy of zero-shot traffic sign recognition. The proposed method achieves promising recognition results on the German Traffic Sign Recognition Benchmark open dataset and a real-world dataset taken from Sapporo City, Japan.

12.
Sensors (Basel) ; 23(22)2023 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-38005673

RESUMEN

At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics.

13.
Sensors (Basel) ; 23(3)2023 Feb 02.
Artículo en Inglés | MEDLINE | ID: mdl-36772694

RESUMEN

This study presents a method for distress image classification in road infrastructures introducing self-supervised learning. Self-supervised learning is an unsupervised learning method that does not require class labels. This learning method can reduce annotation efforts and allow the application of machine learning to a large number of unlabeled images. We propose a novel distress image classification method using contrastive learning, which is a type of self-supervised learning. Contrastive learning provides image domain-specific representation, constraining such that similar images are embedded nearby in the latent space. We augment the single input distress image into multiple images by image transformations and construct the latent space, in which the augmented images are embedded close to each other. This provides a domain-specific representation of the damage in road infrastructure using a large number of unlabeled distress images. Finally, the representation obtained by contrastive learning is used to improve the distress image classification performance. The obtained contrastive learning model parameters are used for the distress image classification model. We realize the successful distress image representation by utilizing unlabeled distress images, which have been difficult to use in the past. In the experiments, we use the distress images obtained from the real world to verify the effectiveness of the proposed method for various distress types and confirm the performance improvement.

14.
Sensors (Basel) ; 22(10)2022 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-35632130

RESUMEN

In this study, we propose a novel music playlist generation method based on a knowledge graph and reinforcement learning. The development of music streaming platforms has transformed the social dynamics of music consumption and paved a new way of accessing and listening to music. The playlist generation is one of the most important multimedia techniques, which aims to recommend music tracks by sensing the vast amount of musical data and the users' listening histories from music streaming services. Conventional playlist generation methods have difficulty capturing the target users' long-term preferences. To overcome the difficulty, we use a reinforcement learning algorithm that can consider the target users' long-term preferences. Furthermore, we introduce the following two new ideas: using the informative knowledge graph data to promote efficient optimization through reinforcement learning, and setting the flexible reward function that target users can design the parameters of itself to guide target users to new types of music tracks. We confirm the effectiveness of the proposed method by verifying the prediction performance based on listening history and the guidance performance to music tracks that can satisfy the target user's unique preference.


Asunto(s)
Música , Percepción Auditiva , Conocimiento , Reconocimiento de Normas Patrones Automatizadas , Recompensa
15.
Sensors (Basel) ; 22(7)2022 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-35408079

RESUMEN

In this study, a novel prediction method for predicting important scenes in baseball videos using a time-lag aware latent variable model (Tl-LVM) is proposed. Tl-LVM adopts a multimodal variational autoencoder using tweets and videos as the latent variable model. It calculates the latent features from these tweets and videos and predicts important scenes using these latent features. Since time lags exist between posted tweets and events, Tl-LVM introduces the loss considering time lags by correlating the feature into the loss function of the multimodal variational autoencoder. Furthermore, Tl-LVM can train the encoder, decoder, and important scene predictor, simultaneously, using this loss function. This is the novelty of Tl-LVM, and this work is the first end-to-end prediction model of important scenes that considers time lags to the best of our knowledge. It is the contribution of Tl-LVM to realize high-quality prediction using latent features that consider time lags between tweets and multiple corresponding previous events. Experimental results using actual tweets and baseball videos show the effectiveness of Tl-LVM.


Asunto(s)
Béisbol , Medios de Comunicación Sociales
16.
Sensors (Basel) ; 22(14)2022 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-35891112

RESUMEN

Regularization has become an important method in adversarial defense. However, the existing regularization-based defense methods do not discuss which features in convolutional neural networks (CNN) are more suitable for regularization. Thus, in this paper, we propose a multi-stage feature fusion network with a feature regularization operation, which is called Enhanced Multi-Stage Feature Fusion Network (EMSF2Net). EMSF2Net mainly combines three parts: multi-stage feature enhancement (MSFE), multi-stage feature fusion (MSF2), and regularization. Specifically, MSFE aims to obtain enhanced and expressive features in each stage by multiplying the features of each channel; MSF2 aims to fuse the enhanced features of different stages to further enrich the information of the feature, and the regularization part can regularize the fused and original features during the training process. EMSF2Net has proved that if the regularization term of the enhanced multi-stage feature is added, the adversarial robustness of CNN will be significantly improved. The experimental results on extensive white-box attacks on the CIFAR-10 dataset illustrate the robustness and effectiveness of the proposed method.


Asunto(s)
Redes Neurales de la Computación
17.
Sensors (Basel) ; 22(1)2022 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-35009924

RESUMEN

This paper presents deterioration level estimation based on convolutional neural networks using a confidence-aware attention mechanism for infrastructure inspection. Spatial attention mechanisms try to highlight the important regions in feature maps for estimation by using an attention map. The attention mechanism using an effective attention map can improve feature maps. However, the conventional attention mechanisms have a problem as they fail to highlight important regions for estimation when an ineffective attention map is mistakenly used. To solve the above problem, this paper introduces the confidence-aware attention mechanism that reduces the effect of ineffective attention maps by considering the confidence corresponding to the attention map. The confidence is calculated from the entropy of the estimated class probabilities when generating the attention map. Because the proposed method can effectively utilize the attention map by considering the confidence, it can focus more on the important regions in the final estimation. This is the most significant contribution of this paper. The experimental results using images from actual infrastructure inspections confirm the performance improvement of the proposed method in estimating the deterioration level.


Asunto(s)
Redes Neurales de la Computación , Proyectos de Investigación
18.
Sensors (Basel) ; 22(6)2022 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-35336501

RESUMEN

In this paper, we present a novel defect detection model based on an improved U-Net architecture. As a semantic segmentation task, the defect detection task has the problems of background-foreground imbalance, multi-scale targets, and feature similarity between the background and defects in the real-world data. Conventionally, general convolutional neural network (CNN)-based networks mainly focus on natural image tasks, which are insensitive to the problems in our task. The proposed method has a network design for multi-scale segmentation based on the U-Net architecture including an atrous spatial pyramid pooling (ASPP) module and an inception module, and can detect various types of defects compared to conventional simple CNN-based methods. Through the experiments using a real-world subway tunnel image dataset, the proposed method showed a better performance than that of general semantic segmentation including state-of-the-art methods. Additionally, we showed that our method can achieve excellent detection balance among multi-scale defects.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Vías Férreas , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación
19.
Sensors (Basel) ; 22(16)2022 Aug 17.
Artículo en Inglés | MEDLINE | ID: mdl-36015909

RESUMEN

Brain decoding is a process of decoding human cognitive contents from brain activities. However, improving the accuracy of brain decoding remains difficult due to the unique characteristics of the brain, such as the small sample size and high dimensionality of brain activities. Therefore, this paper proposes a method that effectively uses multi-subject brain activities to improve brain decoding accuracy. Specifically, we distinguish between the shared information common to multi-subject brain activities and the individual information based on each subject's brain activities, and both types of information are used to decode human visual cognition. Both types of information are extracted as features belonging to a latent space using a probabilistic generative model. In the experiment, an publicly available dataset and five subjects were used, and the estimation accuracy was validated on the basis of a confidence score ranging from 0 to 1, and a large value indicates superiority. The proposed method achieved a confidence score of 0.867 for the best subject and an average of 0.813 for the five subjects, which was the best compared to other methods. The experimental results show that the proposed method can accurately decode visual cognition compared with other existing methods in which the shared information is not distinguished from the individual information.


Asunto(s)
Mapeo Encefálico , Encéfalo , Cognición , Humanos , Imagen por Resonancia Magnética/métodos , Modelos Estadísticos
20.
Sensors (Basel) ; 22(22)2022 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-36433529

RESUMEN

Distresses, such as cracks, directly reflect the structural integrity of subway tunnels. Therefore, the detection of subway tunnel distress is an essential task in tunnel structure maintenance. This paper presents the performance improvement of deep learning-based distress detection to support the maintenance of subway tunnels through a new data augmentation method, selective image cropping and patching (SICAP). Specifically, we generate effective data for training the distress detection model by focusing on the distressed regions via SICAP. After the data augmentation, we train a distress detection model using the expanded training data. The new image generated based on SICAP does not change the pixel values of the original image. Thus, there is little loss of information, and the generated images are effective in constructing a robust model for various subway tunnel lines. We conducted experiments with some comparative methods. The experimental results show that the detection performance can be improved by our data augmentation.


Asunto(s)
Vías Férreas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA