Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Sensors (Basel) ; 24(5)2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38475109

RESUMEN

Micro-expressions, which are spontaneous and difficult to suppress, reveal a person's true emotions. They are characterized by short duration and low intensity, making the task of micro-expression recognition challenging in the field of emotion computing. In recent years, deep learning-based feature extraction and fusion techniques have been widely used for micro-expression recognition, particularly methods based on Vision Transformer that have gained popularity. However, the Vision Transformer-based architecture used in micro-expression recognition involves a significant amount of invalid computation. Additionally, in the traditional two-stream architecture, although separate streams are combined through late fusion, only the output features from the deepest level of the network are utilized for classification, thus limiting the network's ability to capture subtle details due to the lack of fine-grained information. To address these issues, we propose a new two-level spatio-temporal feature fused with a two-stream architecture. This architecture includes a spatial encoder (modified ResNet) for learning texture features of the face, a temporal encoder (Swin Transformer) for learning facial muscle motor features, a feature fusion algorithm for integrating multi-level spatio-temporal features, a classification head, and a weighted average operator for temporal aggregation. The two-stream architecture has the advantage of extracting richer features compared to the single-stream architecture, leading to improved performance. The shifted window scheme of Swin Transformer restricts self-attention computation to non-overlapping local windows and allows cross-window connections, significantly improving the performance and reducing the computation compared to Vision Transformer. Moreover, the modified ResNet is computationally less intensive. Our proposed feature fusion algorithm leverages the similarity in output feature shapes at each stage of the two streams, enabling the effective fusion of multi-level spatio-temporal features. This algorithm results in an improvement of approximately 4% in both the F1 score and the UAR. Comprehensive evaluations conducted on three widely used spontaneous micro-expression datasets (SMIC-HS, CASME II, and SAMM) consistently demonstrate the superiority of our approach over comparative methods. Notably, our approach achieves a UAR exceeding 0.905 on CASME II, making it one of the few frameworks in the published micro-expression recognition literature to achieve such high performance.


Asunto(s)
Algoritmos , Suministros de Energía Eléctrica , Humanos , Emociones , Luz , Músculos
2.
Sensors (Basel) ; 24(9)2024 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-38732808

RESUMEN

Currently, surface EMG signals have a wide range of applications in human-computer interaction systems. However, selecting features for gesture recognition models based on traditional machine learning can be challenging and may not yield satisfactory results. Considering the strong nonlinear generalization ability of neural networks, this paper proposes a two-stream residual network model with an attention mechanism for gesture recognition. One branch processes surface EMG signals, while the other processes hand acceleration signals. Segmented networks are utilized to fully extract the physiological and kinematic features of the hand. To enhance the model's capacity to learn crucial information, we introduce an attention mechanism after global average pooling. This mechanism strengthens relevant features and weakens irrelevant ones. Finally, the deep features obtained from the two branches of learning are fused to further improve the accuracy of multi-gesture recognition. The experiments conducted on the NinaPro DB2 public dataset resulted in a recognition accuracy of 88.25% for 49 gestures. This demonstrates that our network model can effectively capture gesture features, enhancing accuracy and robustness across various gestures. This approach to multi-source information fusion is expected to provide more accurate and real-time commands for exoskeleton robots and myoelectric prosthetic control systems, thereby enhancing the user experience and the naturalness of robot operation.


Asunto(s)
Electromiografía , Gestos , Redes Neurales de la Computación , Humanos , Electromiografía/métodos , Procesamiento de Señales Asistido por Computador , Reconocimiento de Normas Patrones Automatizadas/métodos , Aceleración , Algoritmos , Mano/fisiología , Aprendizaje Automático , Fenómenos Biomecánicos/fisiología
3.
Sensors (Basel) ; 24(14)2024 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-39066043

RESUMEN

Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.


Asunto(s)
Algoritmos , Actividades Humanas , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Movimiento/fisiología , Postura/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos
4.
Sensors (Basel) ; 23(12)2023 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-37420932

RESUMEN

Defect inspection is important to ensure consistent quality and efficiency in industrial manufacturing. Recently, machine vision systems integrating artificial intelligence (AI)-based inspection algorithms have exhibited promising performance in various applications, but practically, they often suffer from data imbalance. This paper proposes a defect inspection method using a one-class classification (OCC) model to deal with imbalanced datasets. A two-stream network architecture consisting of global and local feature extractor networks is presented, which can alleviate the representation collapse problem of OCC. By combining an object-oriented invariant feature vector with a training-data-oriented local feature vector, the proposed two-stream network model prevents the decision boundary from collapsing to the training dataset and obtains an appropriate decision boundary. The performance of the proposed model is demonstrated in the practical application of automotive-airbag bracket-welding defect inspection. The effects of the classification layer and two-stream network architecture on the overall inspection accuracy were clarified by using image samples collected in a controlled laboratory environment and from a production site. The results are compared with those of a previous classification model, demonstrating that the proposed model can improve the accuracy, precision, and F1 score by up to 8.19%, 10.74%, and 4.02%, respectively.


Asunto(s)
Inteligencia Artificial , Ríos , Algoritmos
5.
Sensors (Basel) ; 23(11)2023 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-37299818

RESUMEN

Changes in pig behavior are crucial information in the livestock breeding process, and automatic pig behavior recognition is a vital method for improving pig welfare. However, most methods for pig behavior recognition rely on human observation and deep learning. Human observation is often time-consuming and labor-intensive, while deep learning models with a large number of parameters can result in slow training times and low efficiency. To address these issues, this paper proposes a novel deep mutual learning enhanced two-stream pig behavior recognition approach. The proposed model consists of two mutual learning networks, which include the red-green-blue color model (RGB) and flow streams. Additionally, each branch contains two student networks that learn collaboratively to effectively achieve robust and rich appearance or motion features, ultimately leading to improved recognition performance of pig behaviors. Finally, the results of RGB and flow branches are weighted and fused to further improve the performance of pig behavior recognition. Experimental results demonstrate the effectiveness of the proposed model, which achieves state-of-the-art recognition performance with an accuracy of 96.52%, surpassing other models by 2.71%.


Asunto(s)
Cruzamiento , Redes Neurales de la Computación , Humanos , Porcinos , Animales , Movimiento (Física)
6.
Sensors (Basel) ; 23(3)2023 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-36772327

RESUMEN

Generalization has always been a keyword in deep learning. Pretrained models and domain adaptation technology have received widespread attention in solving the problem of generalization. They are all focused on finding features in data to improve the generalization ability and to prevent overfitting. Although they have achieved good results in various tasks, those models are unstable when classifying a sentence whose label is positive but still contains negative phrases. In this article, we analyzed the attention heat map of the benchmarks and found that previous models pay more attention to the phrase rather than to the semantic information of the whole sentence. Moreover, we proposed a method to scatter the attention away from opposite sentiment words to avoid a one-sided judgment. We designed a two-stream network and stacked the gradient reversal layer and feature projection layer within the auxiliary network. The gradient reversal layer can reverse the gradient of features in the training stage so that the parameters are optimized following the reversed gradient in the backpropagation stage. We utilized an auxiliary network to extract the backward features and then fed them into the main network to merge them with normal features extracted by the main network. We applied this method to the three baselines of TextCNN, BERT, and RoBERTa using sentiment analysis and sarcasm detection datasets. The results show that our method can improve the sentiment analysis datasets by 0.5% and the sarcasm detection datasets by 2.1%.

7.
Sensors (Basel) ; 23(5)2023 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-36904990

RESUMEN

Because of societal changes, human activity recognition, part of home care systems, has become increasingly important. Camera-based recognition is mainstream but has privacy concerns and is less accurate under dim lighting. In contrast, radar sensors do not record sensitive information, avoid the invasion of privacy, and work in poor lighting. However, the collected data are often sparse. To address this issue, we propose a novel Multimodal Two-stream GNN Framework for Efficient Point Cloud and Skeleton Data Alignment (MTGEA), which improves recognition accuracy through accurate skeletal features from Kinect models. We first collected two datasets using the mmWave radar and Kinect v4 sensors. Then, we used zero-padding, Gaussian Noise (GN), and Agglomerative Hierarchical Clustering (AHC) to increase the number of collected point clouds to 25 per frame to match the skeleton data. Second, we used Spatial Temporal Graph Convolutional Network (ST-GCN) architecture to acquire multimodal representations in the spatio-temporal domain focusing on skeletal features. Finally, we implemented an attention mechanism aligning the two multimodal features to capture the correlation between point clouds and skeleton data. The resulting model was evaluated empirically on human activity data and shown to improve human activity recognition with radar data only. All datasets and codes are available in our GitHub.

8.
Sensors (Basel) ; 22(2)2022 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-35062558

RESUMEN

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.


Asunto(s)
Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas , Sonido
9.
Sensors (Basel) ; 22(16)2022 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-36015719

RESUMEN

The Convolutional Neural Network (CNN) has demonstrated excellent performance in image recognition and has brought new opportunities for sign language recognition. However, the features undergo many nonlinear transformations while performing the convolutional operation and the traditional CNN models are insufficient in dealing with the correlation between images. In American Sign Language (ASL) recognition, J and Z with moving gestures bring recognition challenges. This paper proposes a novel Two-Stream Mixed (TSM) method with feature extraction and fusion operation to improve the correlation of feature expression between two time-consecutive images for the dynamic gestures. The proposed TSM-CNN system is composed of preprocessing, the TSM block, and CNN classifiers. Two consecutive images in the dynamic gesture are used as inputs of streams, and resizing, transformation, and augmentation are carried out in the preprocessing stage. The fusion feature map obtained by addition and concatenation in the TSM block is used as inputs of the classifiers. Finally, a classifier classifies images. The TSM-CNN model with the highest performance scores depending on three concatenation methods is selected as the definitive recognition model for ASL recognition. We design 4 CNN models with TSM: TSM-LeNet, TSM-AlexNet, TSM-ResNet18, and TSM-ResNet50. The experimental results show that the CNN models with the TSM are better than models without TSM. The TSM-ResNet50 has the best accuracy of 97.57% for MNIST and ASL datasets and is able to be applied to a RGB image sensing system for hearing-impaired people.


Asunto(s)
Redes Neurales de la Computación , Lengua de Signos , Gestos , Humanos
10.
Sensors (Basel) ; 21(3)2021 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-33572928

RESUMEN

In recent years, human detection in indoor scenes has been widely applied in smart buildings and smart security, but many related challenges can still be difficult to address, such as frequent occlusion, low illumination and multiple poses. This paper proposes an asymmetric adaptive fusion two-stream network (AAFTS-net) for RGB-D human detection. This network can fully extract person-specific depth features and RGB features while reducing the typical complexity of a two-stream network. A depth feature pyramid is constructed by combining contextual information, with the motivation of combining multiscale depth features to improve the adaptability for targets of different sizes. An adaptive channel weighting (ACW) module weights the RGB-D feature channels to achieve efficient feature selection and information complementation. This paper also introduces a novel RGB-D dataset for human detection called RGBD-human, on which we verify the performance of the proposed algorithm. The experimental results show that AAFTS-net outperforms existing state-of-the-art methods and can maintain stable performance under conditions of frequent occlusion, low illumination and multiple poses.


Asunto(s)
Algoritmos , Actividades Humanas , Humanos , Redes Neurales de la Computación
11.
Sensors (Basel) ; 20(23)2020 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-33291759

RESUMEN

Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method.

12.
Sensors (Basel) ; 20(4)2020 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-32079299

RESUMEN

The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, and impractical. Therefore, there is an urgent need to implement methods for identifying pig behavior automatically. In recent years, deep learning has been gradually applied to the study of pig behavior recognition. Existing studies judge the behavior of the pig only based on the posture of the pig in a still image frame, without considering the motion information of the behavior. However, optical flow can well reflect the motion information. Thus, this study took image frames and optical flow from videos as two-stream input objects to fully extract the temporal and spatial behavioral characteristics. Two-stream convolutional network models based on deep learning were proposed, including inflated 3D convnet (I3D) and temporal segment networks (TSN) whose feature extraction network is Residual Network (ResNet) or the Inception architecture (e.g., Inception with Batch Normalization (BN-Inception), InceptionV3, InceptionV4, or InceptionResNetV2) to achieve pig behavior recognition. A standard pig video behavior dataset that included 1000 videos of feeding, lying, walking, scratching and mounting from five kinds of different behavioral actions of pigs under natural conditions was created. The dataset was used to train and test the proposed models, and a series of comparative experiments were conducted. The experimental results showed that the TSN model whose feature extraction network was ResNet101 was able to recognize pig feeding, lying, walking, scratching, and mounting behaviors with a higher average of 98.99%, and the average recognition time of each video was 0.3163 s. The TSN model (ResNet101) is superior to the other models in solving the task of pig behavior recognition.


Asunto(s)
Conducta Animal/fisiología , Aprendizaje Profundo , Redes Neurales de la Computación , Grabación en Video , Animales , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Porcinos/fisiología
13.
Entropy (Basel) ; 22(6)2020 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-33286467

RESUMEN

Biological recognition methods often use biological characteristics such as the human face, iris, fingerprint, and palm print; however, such images often become blurred under the limitation of the complex environment of the underground, which leads to low identification rates of underground coal mine personnel. A gait recognition method via similarity learning named Two-Stream neural network (TS-Net) is proposed based on a densely connected convolution network (DenseNet) and stacked convolutional autoencoder (SCAE). The mainstream network based on DenseNet is mainly used to learn the similarity of dynamic deep features containing spatiotemporal information in the gait pattern. The auxiliary stream network based on SCAE is used to learn the similarity of static invariant features containing physiological information. Moreover, a novel feature fusion method is adopted to achieve the fusion and representation of dynamic and static features. The extracted features are robust to angle, clothing, miner hats, waterproof shoes, and carrying conditions. The method was evaluated on the challenging CASIA-B gait dataset and the collected gait dataset of underground coal mine personnel (UCMP-GAIT). Experimental results show that the method is effective and feasible for the gait recognition of underground coal mine personnel. Besides, compared with other gait recognition methods, the recognition accuracy has been significantly improved.

14.
Entropy (Basel) ; 22(10)2020 Oct 14.
Artículo en Inglés | MEDLINE | ID: mdl-33286922

RESUMEN

Dissimilar flows can be compared by exploiting the fact that all flux densities divided by their conjugate volume densities form velocity fields, which have been described as generalized winds. These winds are an extension of the classical notion of wind in fluids which puts these distinct processes on a common footing, leading to thermodynamical implications. This paper extends this notion from fluids to radiative transfer in the context of a classical two-stream atmosphere, leading to such velocities for radiative energy and entropy. These are shown in this paper to exhibit properties for radiation previously only thought of in terms of fluids, such as the matching of velocity fields where entropy production stops.

15.
Sensors (Basel) ; 18(2)2018 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-29461473

RESUMEN

The paper presents an emerging issue of fine-grained pedestrian action recognition that induces an advanced pre-crush safety to estimate a pedestrian intention in advance. The fine-grained pedestrian actions include visually slight differences (e.g., walking straight and crossing), which are difficult to distinguish from each other. It is believed that the fine-grained action recognition induces a pedestrian intention estimation for a helpful advanced driver-assistance systems (ADAS). The following difficulties have been studied to achieve a fine-grained and accurate pedestrian action recognition: (i) In order to analyze the fine-grained motion of a pedestrian appearance in the vehicle-mounted drive recorder, a method to describe subtle change of motion characteristics occurring in a short time is necessary; (ii) even when the background moves greatly due to the driving of the vehicle, it is necessary to detect changes in subtle motion of the pedestrian; (iii) the collection of large-scale fine-grained actions is very difficult, and therefore a relatively small database should be focused. We find out how to learn an effective recognition model with only a small-scale database. Here, we have thoroughly evaluated several types of configurations to explore an effective approach in fine-grained pedestrian action recognition without a large-scale database. Moreover, two different datasets have been collected in order to raise the issue. Finally, our proposal attained 91.01% on National Traffic Science and Environment Laboratory database (NTSEL) and 53.23% on the near-miss driving recorder database (NDRDB). The paper has improved +8.28% and +6.53% from baseline two-stream fusion convnets.


Asunto(s)
Conducción de Automóvil , Bases de Datos Factuales , Peatones , Seguridad , Accidentes de Tránsito , Humanos , Factores de Tiempo , Grabación en Video , Caminata
16.
J Environ Manage ; 127: 300-7, 2013 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-23792881

RESUMEN

Several studies have tried to understand the mechanisms and effects of radiative transfer under different night-sky conditions. However, most of these studies are limited to the various effects of visible spectra. Nevertheless, the invisible parts of the electromagnetic spectrum can pose a more profound threat to nature. One visible threat is from what is popularly termed skyglow. Such skyglow is caused by injudiciously situated or designed artificial night lighting systems which degrade desired sky viewing. Therefore, since lamp emissions are not limited to visible electromagnetic spectra, it is necessary to consider the complete spectrum of such lamps in order to understand the physical behaviour of diffuse radiation at terrain level. In this paper, the downward diffuse radiative flux is computed in a two-stream approximation and obtained ultraviolet spectral radiative fluxes are inter-related with luminous fluxes. Such a method then permits an estimate of ultraviolet radiation if the traditionally measured illuminance on a horizontal plane is available. The utility of such a comparison of two spectral bands is shown, using the different lamp types employed in street lighting. The data demonstrate that it is insufficient to specify lamp type and its visible flux production independently of each other. Also the UV emissions have to be treated by modellers and environmental scientists because some light sources can be fairly important pollutants in the near ultraviolet. Such light sources can affect both the living organisms and ambient environment.


Asunto(s)
Exposición a Riesgos Ambientales/análisis , Luz , Rayos Ultravioleta , Animales , Oscuridad , Modelos Teóricos
17.
Biomed Phys Eng Express ; 9(4)2023 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-37276847

RESUMEN

The blood flow velocity in the nailfold capillary is an important indicator of the status of microcirculation. The conventional manual processing method is both laborious and prone to human artifacts. A feasible way to solve this problem is to use machine learning to assist in image processing and diagnosis. Inspired by the Two-Stream Convolutional Networks, this study proposes an optical flow-assisted two-stream network to segment nailfold blood vessels. Firstly, we use U-Net as the spatial flow network and the dense optical flow as the temporal stream. The results show that the optical flow information can effectively improve the integrity of the segmentation of blood vessels. The overall accuracy is 94.01 %, the Dice score is 0.8099, the IoU score is 0.6806, and the VOE score is 0.3194. Secondly, The flow velocity of the segmented blood vessel is determined by constructing the spatial-temporal (ST) image. The blood flow velocity evaluated is consistent with the typical blood flow speed reported. This study proposes a novel two-stream network for blood vessel segmentation of nailfold capillary images. Combined with ST image and line detection method, it provides an effective workflow for measuring the blood flow velocity of nailfold capillaries.


Asunto(s)
Capilares , Flujo Optico , Humanos , Capilares/fisiología , Ríos , Microcirculación , Procesamiento de Imagen Asistido por Computador/métodos
18.
J Ambient Intell Humaniz Comput ; 14(6): 7733-7745, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37228698

RESUMEN

The outbreak of COVID-19 (also known as Coronavirus) has put the entire world at risk. The disease first appears in Wuhan, China, and later spread to other countries, taking a form of a pandemic. In this paper, we try to build an artificial intelligence (AI) powered framework called Flu-Net to identify flu-like symptoms (which is also an important symptom of Covid-19) in people, and limit the spread of infection. Our approach is based on the application of human action recognition in surveillance systems, where videos captured by closed-circuit television (CCTV) cameras are processed through state-of-the-art deep learning techniques to recognize different activities like coughing, sneezing, etc. The proposed framework has three major steps. First, to suppress irrelevant background details in an input video, a frame difference operation is performed to extract foreground motion information. Second, a two-stream heterogeneous network based on 2D and 3D Convolutional Neural Networks (ConvNets) is trained using the RGB frame differences. And third, the features extracted from both the streams are combined using Grey Wolf Optimization (GWO) based feature selection technique. The experiments conducted on BII Sneeze-Cough (BIISC) video dataset show that our framework can 70% accuracy, outperforming the baseline results by more than 8%.

19.
Front Neurosci ; 17: 1212049, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37397450

RESUMEN

Introduction: The human brain processes shape and texture information separately through different neurons in the visual system. In intelligent computer-aided imaging diagnosis, pre-trained feature extractors are commonly used in various medical image recognition methods, common pre-training datasets such as ImageNet tend to improve the texture representation of the model but make it ignore many shape features. Weak shape feature representation is disadvantageous for some tasks that focus on shape features in medical image analysis. Methods: Inspired by the function of neurons in the human brain, in this paper, we proposed a shape-and-texture-biased two-stream network to enhance the shape feature representation in knowledge-guided medical image analysis. First, the two-stream network shape-biased stream and a texture-biased stream are constructed through classification and segmentation multi-task joint learning. Second, we propose pyramid-grouped convolution to enhance the texture feature representation and introduce deformable convolution to enhance the shape feature extraction. Third, we used a channel-attention-based feature selection module in shape and texture feature fusion to focus on the key features and eliminate information redundancy caused by feature fusion. Finally, aiming at the problem of model optimization difficulty caused by the imbalance in the number of benign and malignant samples in medical images, an asymmetric loss function was introduced to improve the robustness of the model. Results and conclusion: We applied our method to the melanoma recognition task on ISIC-2019 and XJTU-MM datasets, which focus on both the texture and shape of the lesions. The experimental results on dermoscopic image recognition and pathological image recognition datasets show the proposed method outperforms the compared algorithms and prove the effectiveness of our method.

20.
Zool Res ; 44(5): 967-980, 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37721106

RESUMEN

Video-based action recognition is becoming a vital tool in clinical research and neuroscientific study for disorder detection and prediction. However, action recognition currently used in non-human primate (NHP) research relies heavily on intense manual labor and lacks standardized assessment. In this work, we established two standard benchmark datasets of NHPs in the laboratory: MonkeyinLab (MiL), which includes 13 categories of actions and postures, and MiL2D, which includes sequences of two-dimensional (2D) skeleton features. Furthermore, based on recent methodological advances in deep learning and skeleton visualization, we introduced the MonkeyMonitorKit (MonKit) toolbox for automatic action recognition, posture estimation, and identification of fine motor activity in monkeys. Using the datasets and MonKit, we evaluated the daily behaviors of wild-type cynomolgus monkeys within their home cages and experimental environments and compared these observations with the behaviors exhibited by cynomolgus monkeys possessing mutations in the MECP2 gene as a disease model of Rett syndrome (RTT). MonKit was used to assess motor function, stereotyped behaviors, and depressive phenotypes, with the outcomes compared with human manual detection. MonKit established consistent criteria for identifying behavior in NHPs with high accuracy and efficiency, thus providing a novel and comprehensive tool for assessing phenotypic behavior in monkeys.


Asunto(s)
Aprendizaje Profundo , Animales , Macaca fascicularis , Esqueleto , Mutación , Fenotipo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA