Pesquisa | Portal de Pesquisa da BVS

1.

An Interactive Image Segmentation Method Based on Multi-Level Semantic Fusion.

Zou, Ruirui; Wang, Qinghui; Wen, Falin; Chen, Yang; Liu, Jiale; Du, Shaoyi; Yuan, Chengzhi.

Sensors (Basel) ; 23(14)2023 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-37514688

RESUMO

Understanding and analyzing 2D/3D sensor data is crucial for a wide range of machine learning-based applications, including object detection, scene segmentation, and salient object detection. In this context, interactive object segmentation is a vital task in image editing and medical diagnosis, involving the accurate separation of the target object from its background based on user annotation information. However, existing interactive object segmentation methods struggle to effectively leverage such information to guide object-segmentation models. To address these challenges, this paper proposes an interactive image-segmentation technique for static images based on multi-level semantic fusion. Our method utilizes user-guidance information both inside and outside the target object to segment it from the static image, making it applicable to both 2D and 3D sensor data. The proposed method introduces a cross-stage feature aggregation module, enabling the effective propagation of multi-scale features from previous stages to the current stage. This mechanism prevents the loss of semantic information caused by multiple upsampling and downsampling of the network, allowing the current stage to make better use of semantic information from the previous stage. Additionally, we incorporate a feature channel attention mechanism to address the issue of rough network segmentation edges. This mechanism captures richer feature details from the feature channel level, leading to finer segmentation edges. In the experimental evaluation conducted on the PASCAL Visual Object Classes (VOC) 2012 dataset, our proposed interactive image segmentation method based on multi-level semantic fusion demonstrates an intersection over union (IOU) accuracy approximately 2.1% higher than the currently popular interactive image segmentation method in static images. The comparative analysis highlights the improved performance and effectiveness of our method. Furthermore, our method exhibits potential applications in various fields, including medical imaging and robotics. Its compatibility with other machine learning methods for visual semantic analysis allows for integration into existing workflows. These aspects emphasize the significance of our contributions in advancing interactive image-segmentation techniques and their practical utility in real-world applications.

2.

A Heart Image Segmentation Method Based on Position Attention Mechanism and Inverted Pyramid.

Luo, Jinbin; Wang, Qinghui; Zou, Ruirui; Wang, Ying; Liu, Fenglin; Zheng, Haojie; Du, Shaoyi; Yuan, Chengzhi.

Sensors (Basel) ; 23(23)2023 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-38067739

RESUMO

In the realm of modern medicine, medical imaging stands as an irreplaceable pillar for accurate diagnostics. The significance of precise segmentation in medical images cannot be overstated, especially considering the variability introduced by different practitioners. With the escalating volume of medical imaging data, the demand for automated and efficient segmentation methods has become imperative. This study introduces an innovative approach to heart image segmentation, embedding a multi-scale feature and attention mechanism within an inverted pyramid framework. Recognizing the intricacies of extracting contextual information from low-resolution medical images, our method adopts an inverted pyramid architecture. Through training with multi-scale images and integrating prediction outcomes, we enhance the network's contextual understanding. Acknowledging the consistent patterns in the relative positions of organs, we introduce an attention module enriched with positional encoding information. This module empowers the network to capture essential positional cues, thereby elevating segmentation accuracy. Our research resides at the intersection of medical imaging and sensor technology, emphasizing the foundational role of sensors in medical image analysis. The integration of sensor-generated data showcases the symbiotic relationship between sensor technology and advanced machine learning techniques. Evaluation on two heart datasets substantiates the superior performance of our approach. Metrics such as the Dice coefficient, Jaccard coefficient, recall, and F-measure demonstrate the method's efficacy compared to state-of-the-art techniques. In conclusion, our proposed heart image segmentation method addresses the challenges posed by diverse medical images, offering a promising solution for efficiently processing 2D/3D sensor data in contemporary medical imaging.

Assuntos

Benchmarking , Sinais (Psicologia) , Coração/diagnóstico por imagem , Aprendizado de Máquina , Tecnologia , Processamento de Imagem Assistida por Computador

3.

A Salient Object Detection Method Based on Boundary Enhancement.

Wen, Falin; Wang, Qinghui; Zou, Ruirui; Wang, Ying; Liu, Fenglin; Chen, Yang; Yu, Linghao; Du, Shaoyi; Yuan, Chengzhi.

Sensors (Basel) ; 23(16)2023 Aug 10.

Artigo em Inglês | MEDLINE | ID: mdl-37631615

RESUMO

Visual saliency refers to the human's ability to quickly focus on important parts of their visual field, which is a crucial aspect of image processing, particularly in fields like medical imaging and robotics. Understanding and simulating this mechanism is crucial for solving complex visual problems. In this paper, we propose a salient object detection method based on boundary enhancement, which is applicable to both 2D and 3D sensors data. To address the problem of large-scale variation of salient objects, our method introduces a multi-level feature aggregation module that enhances the expressive ability of fixed-resolution features by utilizing adjacent features to complement each other. Additionally, we propose a multi-scale information extraction module to capture local contextual information at different scales for back-propagated level-by-level features, which allows for better measurement of the composition of the feature map after back-fusion. To tackle the low confidence issue of boundary pixels, we also introduce a boundary extraction module to extract the boundary information of salient regions. This information is then fused with salient target information to further refine the saliency prediction results. During the training process, our method uses a mixed loss function to constrain the model training from two levels: pixels and images. The experimental results demonstrate that our salient target detection method based on boundary enhancement shows good detection effects on targets of different scales, multi-targets, linear targets, and targets in complex scenes. We compare our method with the best method in four conventional datasets and achieve an average improvement of 6.2% on the mean absolute error (MAE) indicators. Overall, our approach shows promise for improving the accuracy and efficiency of salient object detection in a variety of settings, including those involving 2D/3D semantic analysis and reconstruction/inpainting of image/video/point cloud data.

4.

An integrated AI model to improve diagnostic accuracy of ultrasound and output known risk features in suspicious thyroid nodules.

Wang, Juan; Jiang, Jue; Zhang, Dong; Zhang, Yao-Zhong; Guo, Long; Jiang, Yusheng; Du, Shaoyi; Zhou, Qi.

Eur Radiol ; 32(3): 2120-2129, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-34657970

RESUMO

OBJECTIVES: From the viewpoint of ultrasound (US) physicians, an ideal thyroid US computer-assisted diagnostic (CAD) system for thyroid cancer should perform well in suspicious thyroid nodules with atypical risk features and be able to output explainable results. This study aims to develop an explainable US CAD model for suspicious thyroid nodules. METHODS: A total of 2992 solid or almost-solid thyroid nodules were analyzed retrospectively. All nodules had pathological results (1070 malignancies and 1992 benignities) confirmed by ultrasound-guided fine-needle aspiration cytology and histopathology after thyroidectomy. A deep learning model (ResNet50) and a multiple risk features learning ensemble model (XGBoost) were used to train the US images of 2794 thyroid nodules. Then, an integrated AI model was generated by combining both models. The diagnostic accuracies of the three AI models (ResNet50, XGBoost, and the integrated model) were predicted in a testing set including 198 thyroid nodules and compared to the diagnostic efficacy of five ultrasonographers. RESULTS: The accuracy of the integrated model was 76.77%, while the mean accuracy of the ultrasonographers was 68.38%. Of the risk features, microcalcifications showed the highest contribution to the diagnosis of malignant nodules. CONCLUSIONS: The integrated AI model in our study can improve the diagnostic accuracy of suspicious thyroid nodules and output the known risk features simultaneously, thus aiding in training young ultrasonographers by linking the explainable results to their clinical experience and advancing the acceptance of AI diagnosis for thyroid cancer in clinical practice. KEY POINTS: â¢ We developed an artificial intelligence (AI) diagnosis model based on both deep learning and multiple risk feature ensemble learning methods. â¢ The AI diagnosis model showed higher diagnostic accuracy for suspicious thyroid nodules than ultrasonographers. â¢ The AI diagnosis model showed partial explainability by outputting the known risk features, thus aiding young ultrasonic doctors in increasing the diagnostic level for thyroid cancer.

Assuntos

Neoplasias da Glândula Tireoide , Nódulo da Glândula Tireoide , Inteligência Artificial , Humanos , Estudos Retrospectivos , Sensibilidade e Especificidade , Neoplasias da Glândula Tireoide/diagnóstico por imagem , Nódulo da Glândula Tireoide/diagnóstico por imagem , Ultrassonografia

5.

With or without human interference for precise age estimation based on machine learning?

Han, Mengqi; Du, Shaoyi; Ge, Yuyan; Zhang, Dong; Chi, Yuting; Long, Hong; Yang, Jing; Yang, Yang; Xin, Jingmin; Chen, Teng; Zheng, Nanning; Guo, Yu-Cheng.

Int J Legal Med ; 136(3): 821-831, 2022 May.

Artigo em Inglês | MEDLINE | ID: mdl-35157129

RESUMO

Age estimation can aid in forensic medicine applications, diagnosis, and treatment planning for orthodontics and pediatrics. Existing dental age estimation methods rely heavily on specialized knowledge and are highly subjective, wasting time, and energy, which can be perfectly solved by machine learning techniques. As the key factor affecting the performance of machine learning models, there are usually two methods for feature extraction: human interference and autonomous extraction without human interference. However, previous studies have rarely applied these two methods for feature extraction in the same image analysis task. Herein, we present two types of convolutional neural networks (CNNs) for dental age estimation. One is an automated dental stage evaluation model (ADSE model) based on specified manually defined features, and the other is an automated end-to-end dental age estimation model (ADAE model), which autonomously extracts potential features for dental age estimation. Although the mean absolute error (MAE) of the ADSE model for stage classification is 0.17 stages, its accuracy in dental age estimation is unsatisfactory, with the MAE (1.63 years) being only 0.04 years lower than the manual dental age estimation method (MDAE model). However, the MAE of the ADAE model is 0.83 years, being reduced by half that of the MDAE model. The results show that fully automated feature extraction in a deep learning model without human interference performs better in dental age estimation, prominently increasing the accuracy and objectivity. This indicates that without human interference, machine learning may perform better in the application of medical imaging.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Criança , Humanos , Processamento de Imagem Assistida por Computador , Lactente , Radiografia

6.

Accurate age classification using manual method and deep convolutional neural network based on orthopantomogram images.

Guo, Yu-Cheng; Han, Mengqi; Chi, Yuting; Long, Hong; Zhang, Dong; Yang, Jing; Yang, Yang; Chen, Teng; Du, Shaoyi.

Int J Legal Med ; 135(4): 1589-1597, 2021 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-33661340

RESUMO

Age estimation is an important challenge in many fields, including immigrant identification, legal requirements, and clinical treatments. Deep learning techniques have been applied for age estimation recently but lacking performance comparison between manual and machine learning methods based on a large sample of dental orthopantomograms (OPGs). In total, we collected 10,257 orthopantomograms for the study. We derived logistic regression linear models for each legal age threshold (14, 16, and 18 years old) for manual method and developed the end-to-end convolutional neural network (CNN) which classified the dental age directly to compare with the manual method. Both methods are based on left mandibular eight permanent teeth or the third molar separately. Our results show that compared with the manual methods (92.5%, 91.3%, and 91.8% for age thresholds of 14, 16, and 18, respectively), the end-to-end CNN models perform better (95.9%, 95.4%, and 92.3% for age thresholds of 14, 16, and 18, respectively). This work proves that CNN models can surpass humans in age classification, and the features extracted by machines may be different from that defined by human.

Assuntos

Determinação da Idade pelos Dentes/métodos , Aprendizado de Máquina , Redes Neurais de Computação , Adolescente , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Radiografia Panorâmica , Adulto Jovem

7.

Lip morphology estimation models based on three-dimensional images in a modern adult population from China.

Zhao, Jia-Min; Ji, Ling-Ling; Han, Meng-Qi; Mou, Qing-Nan; Chu, Guang; Chen, Teng; Du, Shao-Yi; Hou, Yu-Xia; Guo, Yu-Cheng.

Int J Legal Med ; 135(5): 1887-1901, 2021 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-33760976

RESUMO

Lips are the main part of the lower facial soft tissue and are vital to forensic facial approximation (FFA). Facial soft tissue thickness (FSTT) and linear measurements in three dimensions are used in the quantitative analysis of lip morphology. With most FSTT analysis methods, the surface of soft tissue is unexplicit. Our study aimed to determine FSTT and explore the relationship between the hard and soft tissues of lips in different skeletal occlusions based on cone-beam CT (CBCT) and 3dMD images in a Chinese population. The FSTT of 11 landmarks in CBCT and 29 lip measurements in CBCT and 3dMD of 180 healthy Chinese individuals (90 males, 90 females) between 18 and 30 years were analyzed. The subjects were randomly divided into two groups with different skeletal occlusions distributed equally: 156 subjects in the experimental group to establish the prediction regression formulae of lip morphology and 24 subjects in the test group to assess the accuracy of the formulae. The results indicated that FSTT in the lower lip region varied among different skeletal occlusions. Furthermore, sex discrepancy was noted in the FSTT in midline landmarks and linear measurements. Measurements showing the highest correlation between soft and hard tissues were between total upper lip height and Ns-Pr (0.563 in males, 0.651 in females). The stepwise multiple regression equations were verified to be reliable with an average error of 1.246 mm. The method of combining CBCT with 3dMD provides a new perspective in predicting lip morphology and expands the database for FFA.

Assuntos

Tomografia Computadorizada de Feixe Cônico , Imageamento Tridimensional , Lábio/anatomia & histologia , Lábio/diagnóstico por imagem , Adulto , Pontos de Referência Anatômicos , Povo Asiático/etnologia , Pesos e Medidas Corporais , Face/anatomia & histologia , Face/diagnóstico por imagem , Feminino , Humanos , Masculino , Análise de Regressão , Reprodutibilidade dos Testes , Adulto Jovem

8.

Three-dimensional prediction of nose morphology in Chinese young adults: a pilot study combining cone-beam computed tomography and 3dMD photogrammetry system.

Chu, Guang; Zhao, Jia-Min; Han, Meng-Qi; Mou, Qing-Nan; Ji, Ling-Ling; Zhou, Hong; Chen, Teng; Du, Shao-Yi; Guo, Yu-Cheng.

Int J Legal Med ; 134(5): 1803-1816, 2020 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-32647961

RESUMO

The nose is the most prominent part of the face and is a crucial factor for facial esthetics as well as facial reconstruction. Although some studies have explored the features of external nose and predicted the relationships between skeletal structures and soft tissues in the nasal region, the reliability and applicability of methods used in previous studies have not been reproduced. In addition, the majority of previous studies have focused on the sagittal direction, whereas the thickness of the soft tissues was rarely analyzed in three dimensions. A few studies have explained the specific characteristics of the nose of Chinese individuals. The aim of this study was to investigate the relationship between the hard nasal structures and soft external nose in three dimensions and to predict the morphology of the nose based on hard-tissue measurements. To eliminate the influence of low resolution of CBCT and increase the accuracy of measurement, three-dimensional (3D) images captured by cone-beam computed tomography (CBCT) and 3dMD photogrammetry system were used in this study. Twenty-six measurements (15 measurements for hard tissue and 11 measurements for soft tissue) based on 5 craniometric and 5 capulometric landmarks of the nose of 120 males and 120 females were obtained. All of the subjects were randomly divided into an experimental group (180 subjects consisting of 90 males and 90 females) and a test group (60 subjects consisting of 30 males and 30 females). Correlation coefficients between hard- and soft-tissue measurements were analyzed, and regression equations were obtained based on the experimental group and served as predictors to estimate nasal morphology in the test group. Most hard- and soft-tissue measurements appeared significantly different between genders. The strongest correlation was found between basis nasi protrusion and nasospinale protrusion (0.499) in males, and nasal height and nTr-nsTr (0.593) in females. For the regression equations, the highest value of R2 was observed in the nasal bridge length in males (0.257) and nasal tip protrusion in females (0.389). The proportion of subjects with predicted errors < 10% was over 86.7% in males and 70.0% in females. Our study proved that a combined CBCT and 3dMD photogrammetry system is a reliable method for nasal morphology estimation. Further research should investigate other influencing factors such as age, skeletal types, facial proportions, or population variance in nasal morphology estimation.

Assuntos

Tomografia Computadorizada de Feixe Cônico/métodos , Face/anatomia & histologia , Face/diagnóstico por imagem , Imageamento Tridimensional , Nariz/anatomia & histologia , Nariz/diagnóstico por imagem , Fotogrametria/métodos , Adulto , Pontos de Referência Anatômicos , Povo Asiático/etnologia , Cefalometria , Feminino , Ciências Forenses , Humanos , Masculino , Projetos Piloto , Reprodutibilidade dos Testes , Adulto Jovem

9.

Correntropy Based Divided Difference Filtering for the Positioning of Ships.

Liu, Xi; Chen, Badong; Wang, Shiyuan; Du, Shaoyi.

Sensors (Basel) ; 18(11)2018 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-30469453

RESUMO

In this paper, robust first and second-order divided difference filtering algorithms based on correntropy are proposed, which not only retain the advantages of divided difference filters, but also exhibit robustness in the presence of non-Gaussian noises, especially when the measurements are contaminated by heavy-tailed noises. The proposed filters are then applied to the problem of ship positioning. In order to improve the accuracy and reliability of ship positioning, the positioning method combines the Dead Reckoning (DR) algorithm and the Global Positioning System (GPS). Experimental results of an illustrative example show the superior performance of the new algorithms when applied to ship positioning.

10.

Hypergraph-Based Multi-View Action Recognition Using Event Cameras.

Gao, Yue; Lu, Jiaxuan; Li, Siqi; Li, Yipeng; Du, Shaoyi.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38536691

RESUMO

Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THUMV-EACT-50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.

11.

GUESS: GradUally Enriching SyntheSis for Text-Driven Human Motion Generation.

Gao, Xuehao; Yang, Yang; Xie, Zhenyu; Du, Shaoyi; Sun, Zhongqian; Wu, Yang.

IEEE Trans Vis Comput Graph ; PP2024 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-38224502

RESUMO

In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels. Notably, we further integrate GUESS with the proposed dynamic multi-condition fusion mechanism to dynamically balance the cooperative effects of the given textual condition and synthesized coarse motion prompt in different generation stages. Extensive experiments on large-scale datasets verify that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity. Please refer to the supplemental demo video for more visualizations.

12.

Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction.

Gao, Xuehao; Yang, Yang; Wu, Yang; Du, Shaoyi; Qi, Guo-Jun.

IEEE Trans Image Process ; 33: 3907-3920, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38900622

RESUMO

Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes. In this paper, instead of planning future human motion in a "dark" room, we propose a Multi-Condition Latent Diffusion network (MCLD) that reformulates the human motion prediction task as a multi-condition joint inference problem based on the given historical 3D body motion and the current 3D scene contexts. Specifically, instead of directly modeling joint distribution over the raw motion sequences, MCLD performs a conditional diffusion process within the latent embedding space, characterizing the cross-modal mapping from the past body movement and current scene context condition embeddings to the future human motion embedding. Extensive experiments on large-scale human motion prediction datasets demonstrate that our MCLD achieves significant improvements over the state-of-the-art methods on both realistic and diverse predictions.

Assuntos

Movimento , Humanos , Movimento/fisiologia , Algoritmos , Redes Neurais de Computação , Gravação em Vídeo/métodos , Imageamento Tridimensional/métodos , Processamento de Imagem Assistida por Computador/métodos

13.

IA-LSTM: Interaction-Aware LSTM for Pedestrian Trajectory Prediction.

Yang, Jing; Chen, Yuehai; Du, Shaoyi; Chen, Badong; Principe, Jose C.

IEEE Trans Cybern ; 54(7): 3904-3917, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38381633

RESUMO

Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions, and the interactions between humans and objects in the current environment, especially between humans themselves, are complex. Previous researchers focused on how to model human-human interactions but neglected the relative importance of interactions. To address this issue, a novel mechanism based on correntropy is introduced. The proposed mechanism not only can measure the relative importance of human-human interactions but also can build personal space for each pedestrian. An interaction module, including this data-driven mechanism, is further proposed. In the proposed module, the data-driven mechanism can effectively extract the feature representations of dynamic human-human interactions in the scene and calculate the corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, an interaction-aware architecture based on long short-term memory network for trajectory prediction is designed. Experiments are conducted on two public datasets. Experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.

14.

Learning Discriminative Features for Crowd Counting.

Chen, Yuehai; Wang, Qingzhong; Yang, Jing; Chen, Badong; Xiong, Haoyi; Du, Shaoyi.

IEEE Trans Image Process ; 33: 3749-3764, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38848225

RESUMO

Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high-level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM). The MPM randomly masks feature vectors in the feature map and then reconstructs them, allowing the model to learn about what is present in the masked regions and improving the model's ability to localize objects in high-density regions. The CLM pulls targets close to each other and pushes them far away from background in the feature space, enabling the model to discriminate foreground objects from background. Additionally, the proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection, where dense scenes or cluttered environments pose challenges to accurate localization. The proposed two modules are plug-and-play, incorporating the proposed modules into existing models can potentially boost their performance in these scenarios.

15.

Generative Variational-Contrastive Learning for Self-Supervised Point Cloud Representation.

Wang, Bohua; Tian, Zhiqiang; Ye, Aixue; Wen, Feng; Du, Shaoyi; Gao, Yue.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-38502626

RESUMO

Self-supervised representation learning for 3D point clouds has attracted increasing attention. However, existing methods in the field of 3D computer vision generally use fixed embeddings to represent the latent features, and impose hard constraints on the embeddings to make the latent feature values of the positive samples converge to consistency, which limits the ability of feature extractors to generalize over different data domains. To address this issue, we propose a Generative Variational-Contrastive Learning (GVC) model, where Gaussian distribution is used to construct a continuous, smoothed representation of the latent features. A distribution constraint and cross-supervision are constructed to improve the transfer ability of the feature extractor over synthetic and real-world data. Specifically, we design a variational contrastive module to constrain the feature distribution instead of feature values corresponding to each sample in the latent space. Moreover, a generative cross-supervision module is introduced to preserve the invariance features and promote the consistency of feature distribution among positive samples. Experimental results demonstrate that GVC achieves SOTA on different downstream tasks. In particular, with only pre-training on the synthetic dataset, GVC achieves a lead of 8.4% and 14.2% when transferring to the real-world dataset in the linear classification and few-shot classification.

16.

Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object Retrieval.

Feng, Yifan; Ji, Shuyi; Liu, Yu-Shen; Du, Shaoyi; Dai, Qionghai; Gao, Yue.

IEEE Trans Pattern Anal Mach Intell ; 46(4): 2206-2223, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37966934

RESUMO

The traditional 3D object retrieval (3DOR) task is under the close-set setting, which assumes the categories of objects in the retrieval stage are all seen in the training stage. Existing methods under this setting may tend to only lazily discriminate their categories, while not learning a generalized 3D object embedding. Under such circumstances, it is still a challenging and open problem in real-world applications due to the existence of various unseen categories. In this paper, we first introduce the open-set 3DOR task to expand the applications of the traditional 3DOR task. Then, we propose the Hypergraph-Based Multi-Modal Representation (HGM 2 R) framework to learn 3D object embeddings from multi-modal representations under the open-set setting. The proposed framework is composed of two modules, i.e., the Multi-Modal 3D Object Embedding (MM3DOE) module and the Structure-Aware and Invariant Knowledge Learning (SAIKL) module. By utilizing the collaborative information of modalities derived from the same 3D object, the MM3DOE module is able to overcome the distinction across different modality representations and generate unified 3D object embeddings. Then, the SAIKL module utilizes the constructed hypergraph structure to model the high-order correlation among 3D objects from both seen and unseen categories. The SAIKL module also includes a memory bank that stores typical representations of 3D objects. By aligning with those memory anchors in the memory bank, the aligned embeddings can integrate the invariant knowledge to exhibit a powerful generalized capacity toward unseen categories. We formally prove that hypergraph modeling has better representative capability on data correlation than graph modeling. We generate four multi-modal datasets for the open-set 3DOR task, i.e., OS-ESB-core, OS-NTU-core, OS-MN40-core, and OS-ABO-core, in which each 3D object contains three modality representations: multi-view, point clouds, and voxel. Experiments on these four datasets show that the proposed method can significantly outperform existing methods. In particular, the proposed method outperforms the state-of-the-art by 12.12%/12.88% in terms of mAP on the OS-MN40-core/OS-ABO-core dataset, respectively. Results and visualizations demonstrate that the proposed method can effectively extract the generalized 3D object embeddings on the open-set 3DOR task and achieve satisfactory performance.

17.

Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification.

Li, Yufei; Xin, Yufei; Li, Xinni; Zhang, Yinrui; Liu, Cheng; Cao, Zhengwen; Du, Shaoyi; Wang, Lin.

Vis Comput Ind Biomed Art ; 7(1): 17, 2024 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-38976189

RESUMO

Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods. This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution (ODConv) module, leveraging the residual module for feature extraction from X-ray images. The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions. Additionally, the ODConv module extracts and fuses feature information in four dimensions: the spatial dimension of the convolution kernel, input and output channel quantities, and convolution kernel quantity. The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification, which is 3.77% higher than that of ResNet18. The model parameters are 4.45M, which was reduced by approximately 2.5 times. The code is available at https://github.com/limuni/X-ODFCANET .

18.

Glimpse and focus: Global and local-scale graph convolution network for skeleton-based action recognition.

Gao, Xuehao; Du, Shaoyi; Yang, Yang.

Neural Netw ; 167: 551-558, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37696072

RESUMO

In the 3D skeleton-based action recognition task, learning rich spatial and temporal motion patterns from body joints are two foundational yet under-explored problems. In this paper, we propose two methods for improving these problems: (I) a novel glimpse-focus action recognition strategy that captures multi-range pose features from the whole body and key body parts jointly; (II) a powerful temporal feature extractor JD-TC that enriches trajectory features by inferring different inter-frame correlations for different joints. By coupling these two proposals, we develop a powerful skeleton-based action recognition system that extracts rich pose and trajectory features from a skeleton sequence and outperforms previous state-of-the-art methods on three large-scale datasets.

Assuntos

Aprendizagem , Esqueleto , Movimento (Física) , Reconhecimento Psicológico

19.

Epileptic seizure detection with deep EEG features by convolutional neural network and shallow classifiers.

Zeng, Wei; Shan, Liangmin; Su, Bo; Du, Shaoyi.

Front Neurosci ; 17: 1145526, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37284662

RESUMO

Introduction: In the clinical setting, it becomes increasingly important to detect epileptic seizures automatically since it could significantly reduce the burden for the care of patients suffering from intractable epilepsy. Electroencephalography (EEG) signals record the brain's electrical activity and contain rich information about brain dysfunction. As a non-invasive and inexpensive tool for detecting epileptic seizures, visual evaluation of EEG recordings is labor-intensive and subjective and requires significant improvement. Methods: This study aims to develop a new approach to recognize seizures automatically using EEG recordings. During feature extraction of EEG input from raw data, we construct a new deep neural network (DNN) model. Deep feature maps derived from layers placed hierarchically in a convolution neural network are put into different kinds of shallow classifiers to detect the anomaly. Feature maps are reduced in dimensionality using Principal Component Analysis (PCA). Results: By analyzing the EEG Epilepsy dataset and the Bonn dataset for epilepsy, we conclude that our proposed method is both effective and robust. These datasets vary significantly in the acquisition of data, the formulation of clinical protocols, and the storage of digital information, making processing and analysis challenging. On both datasets, extensive experiments are performed using a cross-validation by 10 folds strategy to demonstrate approximately 100% accuracy for binary and multi-category classification. Discussion: In addition to demonstrating that our methodology outperforms other up-to-date approaches, the results of this study also suggest that it can be applied in clinical practice as well.

20.

Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition.

Gao, Xuehao; Yang, Yang; Wu, Yang; Du, Shaoyi.

IEEE Trans Neural Netw Learn Syst ; PP2023 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-37030786

RESUMO

Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA