Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
Vis Comput Ind Biomed Art ; 7(1): 17, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38976189

RESUMO

Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods. This network incorporates a feature coordination attention module and an omni-dimensional dynamic convolution (ODConv) module, leveraging the residual module for feature extraction from X-ray images. The feature coordination attention module utilizes two one-dimensional feature encoding processes to aggregate feature information from different spatial directions. Additionally, the ODConv module extracts and fuses feature information in four dimensions: the spatial dimension of the convolution kernel, input and output channel quantities, and convolution kernel quantity. The experimental results demonstrate that the proposed method can effectively improve the accuracy of pneumonia classification, which is 3.77% higher than that of ResNet18. The model parameters are 4.45M, which was reduced by approximately 2.5 times. The code is available at https://github.com/limuni/X-ODFCANET .

2.
IEEE Trans Image Process ; 33: 3907-3920, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38900622

RESUMO

Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes. In this paper, instead of planning future human motion in a "dark" room, we propose a Multi-Condition Latent Diffusion network (MCLD) that reformulates the human motion prediction task as a multi-condition joint inference problem based on the given historical 3D body motion and the current 3D scene contexts. Specifically, instead of directly modeling joint distribution over the raw motion sequences, MCLD performs a conditional diffusion process within the latent embedding space, characterizing the cross-modal mapping from the past body movement and current scene context condition embeddings to the future human motion embedding. Extensive experiments on large-scale human motion prediction datasets demonstrate that our MCLD achieves significant improvements over the state-of-the-art methods on both realistic and diverse predictions.


Assuntos
Movimento , Humanos , Movimento/fisiologia , Algoritmos , Redes Neurais de Computação , Gravação em Vídeo/métodos , Imageamento Tridimensional/métodos , Processamento de Imagem Assistida por Computador/métodos
3.
IEEE Trans Image Process ; 33: 3749-3764, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38848225

RESUMO

Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high-level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM). The MPM randomly masks feature vectors in the feature map and then reconstructs them, allowing the model to learn about what is present in the masked regions and improving the model's ability to localize objects in high-density regions. The CLM pulls targets close to each other and pushes them far away from background in the feature space, enabling the model to discriminate foreground objects from background. Additionally, the proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection, where dense scenes or cluttered environments pose challenges to accurate localization. The proposed two modules are plug-and-play, incorporating the proposed modules into existing models can potentially boost their performance in these scenarios.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38536691

RESUMO

Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THUMV-EACT-50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38502626

RESUMO

Self-supervised representation learning for 3D point clouds has attracted increasing attention. However, existing methods in the field of 3D computer vision generally use fixed embeddings to represent the latent features, and impose hard constraints on the embeddings to make the latent feature values of the positive samples converge to consistency, which limits the ability of feature extractors to generalize over different data domains. To address this issue, we propose a Generative Variational-Contrastive Learning (GVC) model, where Gaussian distribution is used to construct a continuous, smoothed representation of the latent features. A distribution constraint and cross-supervision are constructed to improve the transfer ability of the feature extractor over synthetic and real-world data. Specifically, we design a variational contrastive module to constrain the feature distribution instead of feature values corresponding to each sample in the latent space. Moreover, a generative cross-supervision module is introduced to preserve the invariance features and promote the consistency of feature distribution among positive samples. Experimental results demonstrate that GVC achieves SOTA on different downstream tasks. In particular, with only pre-training on the synthetic dataset, GVC achieves a lead of 8.4% and 14.2% when transferring to the real-world dataset in the linear classification and few-shot classification.

6.
IEEE Trans Cybern ; 54(7): 3904-3917, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38381633

RESUMO

Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions, and the interactions between humans and objects in the current environment, especially between humans themselves, are complex. Previous researchers focused on how to model human-human interactions but neglected the relative importance of interactions. To address this issue, a novel mechanism based on correntropy is introduced. The proposed mechanism not only can measure the relative importance of human-human interactions but also can build personal space for each pedestrian. An interaction module, including this data-driven mechanism, is further proposed. In the proposed module, the data-driven mechanism can effectively extract the feature representations of dynamic human-human interactions in the scene and calculate the corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, an interaction-aware architecture based on long short-term memory network for trajectory prediction is designed. Experiments are conducted on two public datasets. Experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.

7.
Artigo em Inglês | MEDLINE | ID: mdl-38224502

RESUMO

In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels. Notably, we further integrate GUESS with the proposed dynamic multi-condition fusion mechanism to dynamically balance the cooperative effects of the given textual condition and synthesized coarse motion prompt in different generation stages. Extensive experiments on large-scale datasets verify that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity. Please refer to the supplemental demo video for more visualizations.

8.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 2206-2223, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37966934

RESUMO

The traditional 3D object retrieval (3DOR) task is under the close-set setting, which assumes the categories of objects in the retrieval stage are all seen in the training stage. Existing methods under this setting may tend to only lazily discriminate their categories, while not learning a generalized 3D object embedding. Under such circumstances, it is still a challenging and open problem in real-world applications due to the existence of various unseen categories. In this paper, we first introduce the open-set 3DOR task to expand the applications of the traditional 3DOR task. Then, we propose the Hypergraph-Based Multi-Modal Representation (HGM 2 R) framework to learn 3D object embeddings from multi-modal representations under the open-set setting. The proposed framework is composed of two modules, i.e., the Multi-Modal 3D Object Embedding (MM3DOE) module and the Structure-Aware and Invariant Knowledge Learning (SAIKL) module. By utilizing the collaborative information of modalities derived from the same 3D object, the MM3DOE module is able to overcome the distinction across different modality representations and generate unified 3D object embeddings. Then, the SAIKL module utilizes the constructed hypergraph structure to model the high-order correlation among 3D objects from both seen and unseen categories. The SAIKL module also includes a memory bank that stores typical representations of 3D objects. By aligning with those memory anchors in the memory bank, the aligned embeddings can integrate the invariant knowledge to exhibit a powerful generalized capacity toward unseen categories. We formally prove that hypergraph modeling has better representative capability on data correlation than graph modeling. We generate four multi-modal datasets for the open-set 3DOR task, i.e., OS-ESB-core, OS-NTU-core, OS-MN40-core, and OS-ABO-core, in which each 3D object contains three modality representations: multi-view, point clouds, and voxel. Experiments on these four datasets show that the proposed method can significantly outperform existing methods. In particular, the proposed method outperforms the state-of-the-art by 12.12%/12.88% in terms of mAP on the OS-MN40-core/OS-ABO-core dataset, respectively. Results and visualizations demonstrate that the proposed method can effectively extract the generalized 3D object embeddings on the open-set 3DOR task and achieve satisfactory performance.

9.
Sensors (Basel) ; 23(23)2023 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-38067739

RESUMO

In the realm of modern medicine, medical imaging stands as an irreplaceable pillar for accurate diagnostics. The significance of precise segmentation in medical images cannot be overstated, especially considering the variability introduced by different practitioners. With the escalating volume of medical imaging data, the demand for automated and efficient segmentation methods has become imperative. This study introduces an innovative approach to heart image segmentation, embedding a multi-scale feature and attention mechanism within an inverted pyramid framework. Recognizing the intricacies of extracting contextual information from low-resolution medical images, our method adopts an inverted pyramid architecture. Through training with multi-scale images and integrating prediction outcomes, we enhance the network's contextual understanding. Acknowledging the consistent patterns in the relative positions of organs, we introduce an attention module enriched with positional encoding information. This module empowers the network to capture essential positional cues, thereby elevating segmentation accuracy. Our research resides at the intersection of medical imaging and sensor technology, emphasizing the foundational role of sensors in medical image analysis. The integration of sensor-generated data showcases the symbiotic relationship between sensor technology and advanced machine learning techniques. Evaluation on two heart datasets substantiates the superior performance of our approach. Metrics such as the Dice coefficient, Jaccard coefficient, recall, and F-measure demonstrate the method's efficacy compared to state-of-the-art techniques. In conclusion, our proposed heart image segmentation method addresses the challenges posed by diverse medical images, offering a promising solution for efficiently processing 2D/3D sensor data in contemporary medical imaging.


Assuntos
Benchmarking , Sinais (Psicologia) , Coração/diagnóstico por imagem , Aprendizado de Máquina , Tecnologia , Processamento de Imagem Assistida por Computador
10.
IEEE Trans Image Process ; 32: 6359-6372, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37971907

RESUMO

Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation displacement properly which arises from the human annotation process and may lead to different distributions. We conjecture that it would be beneficial to consider the annotation displacement in the dense object counting task. To obtain strong robustness against annotation displacement, generalized Gaussian distribution (GGD) function with a tunable bandwidth and shape parameter is exploited to form the learning target point annotation probability map, PAPM. Specifically, we first present a hand-designed PAPM method (HD-PAPM), in which we design a function based on GGD to tolerate the annotation displacement. For end-to-end training, the hand-designed PAPM may not be optimal for the particular network and dataset. An adaptively learned PAPM method (AL-PAPM) is proposed. To improve the robustness to annotation displacement, we design an effective transport cost function based on GGD. The proposed PAPM is capable of integration with other methods. We also combine PAPM with P2PNet through modifying the matching cost matrix, forming P2P-PAPM. This could also improve the robustness to annotation displacement of P2PNet. Extensive experiments show the superiority of our proposed methods.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14760-14776, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37695971

RESUMO

After decades of investigation, point cloud registration is still a challenging task in practice, especially when the correspondences are contaminated by a large number of outliers. It may result in a rapidly decreasing probability of generating a hypothesis close to the true transformation, leading to the failure of point cloud registration. To tackle this problem, we propose a transformation estimation method, named Hunter, for robust point cloud registration with severe outliers. The core of Hunter is to design a global-to-local exploration scheme to robustly find the correct correspondences. The global exploration aims to exploit guided sampling to generate promising initial alignments. To this end, a hypergraph-based consistency reasoning module is introduced to learn the high-order consistency among correct correspondences, which is able to yield a more distinct inlier cluster that facilitates the generation of all-inlier hypotheses. Moreover, we propose a preference-based local exploration module that exploits the preference information of top- k promising hypotheses to find a better transformation. This module can efficiently obtain multiple reliable transformation hypotheses by using a multi-initialization searching strategy. Finally, we present a distance-angle based hypothesis selection criterion to choose the most reliable transformation, which can avoid selecting symmetrically aligned false transformations. Experimental results on simulated, indoor, and outdoor datasets, demonstrate that Hunter can achieve significant superiority over the state-of-the-art methods, including both learning-based and traditional methods (as shown in Fig. 1). Moreover, experimental results also indicate that Hunter can achieve more stable performance compared with all other methods with severe outliers.

12.
Neural Netw ; 167: 551-558, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37696072

RESUMO

In the 3D skeleton-based action recognition task, learning rich spatial and temporal motion patterns from body joints are two foundational yet under-explored problems. In this paper, we propose two methods for improving these problems: (I) a novel glimpse-focus action recognition strategy that captures multi-range pose features from the whole body and key body parts jointly; (II) a powerful temporal feature extractor JD-TC that enriches trajectory features by inferring different inter-frame correlations for different joints. By coupling these two proposals, we develop a powerful skeleton-based action recognition system that extracts rich pose and trajectory features from a skeleton sequence and outperforms previous state-of-the-art methods on three large-scale datasets.


Assuntos
Aprendizagem , Esqueleto , Movimento (Física) , Reconhecimento Psicológico
13.
Sensors (Basel) ; 23(16)2023 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-37631615

RESUMO

Visual saliency refers to the human's ability to quickly focus on important parts of their visual field, which is a crucial aspect of image processing, particularly in fields like medical imaging and robotics. Understanding and simulating this mechanism is crucial for solving complex visual problems. In this paper, we propose a salient object detection method based on boundary enhancement, which is applicable to both 2D and 3D sensors data. To address the problem of large-scale variation of salient objects, our method introduces a multi-level feature aggregation module that enhances the expressive ability of fixed-resolution features by utilizing adjacent features to complement each other. Additionally, we propose a multi-scale information extraction module to capture local contextual information at different scales for back-propagated level-by-level features, which allows for better measurement of the composition of the feature map after back-fusion. To tackle the low confidence issue of boundary pixels, we also introduce a boundary extraction module to extract the boundary information of salient regions. This information is then fused with salient target information to further refine the saliency prediction results. During the training process, our method uses a mixed loss function to constrain the model training from two levels: pixels and images. The experimental results demonstrate that our salient target detection method based on boundary enhancement shows good detection effects on targets of different scales, multi-targets, linear targets, and targets in complex scenes. We compare our method with the best method in four conventional datasets and achieve an average improvement of 6.2% on the mean absolute error (MAE) indicators. Overall, our approach shows promise for improving the accuracy and efficiency of salient object detection in a variety of settings, including those involving 2D/3D semantic analysis and reconstruction/inpainting of image/video/point cloud data.

14.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14081-14097, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37527291

RESUMO

Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper, we propose an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named THUE-ACT-50 and the accompanying THUE-ACT-50-CHL dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.

15.
Front Neurosci ; 17: 1212049, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37397450

RESUMO

Introduction: The human brain processes shape and texture information separately through different neurons in the visual system. In intelligent computer-aided imaging diagnosis, pre-trained feature extractors are commonly used in various medical image recognition methods, common pre-training datasets such as ImageNet tend to improve the texture representation of the model but make it ignore many shape features. Weak shape feature representation is disadvantageous for some tasks that focus on shape features in medical image analysis. Methods: Inspired by the function of neurons in the human brain, in this paper, we proposed a shape-and-texture-biased two-stream network to enhance the shape feature representation in knowledge-guided medical image analysis. First, the two-stream network shape-biased stream and a texture-biased stream are constructed through classification and segmentation multi-task joint learning. Second, we propose pyramid-grouped convolution to enhance the texture feature representation and introduce deformable convolution to enhance the shape feature extraction. Third, we used a channel-attention-based feature selection module in shape and texture feature fusion to focus on the key features and eliminate information redundancy caused by feature fusion. Finally, aiming at the problem of model optimization difficulty caused by the imbalance in the number of benign and malignant samples in medical images, an asymmetric loss function was introduced to improve the robustness of the model. Results and conclusion: We applied our method to the melanoma recognition task on ISIC-2019 and XJTU-MM datasets, which focus on both the texture and shape of the lesions. The experimental results on dermoscopic image recognition and pathological image recognition datasets show the proposed method outperforms the compared algorithms and prove the effectiveness of our method.

16.
IEEE J Biomed Health Inform ; 27(10): 4926-4937, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37478028

RESUMO

Bone age, as a measure of biological age (BA), plays an important role in a variety of fields, including forensics, orthodontics, sports, and immigration. Despite its significance, accurate estimation of BA remains a challenge due to the uncertainty error between BA and chronological age (CA) caused by individual diversity and the difficult integration of multiple factors, such as sex, and identified or measured anatomical structures, into the estimation process. To address problems, we propose an uncertainty-aware and sex-prior guided biological age estimation from orthopantomogram images (OPGs), named UASP-BAE, which models uncertainty errors while setting sex dimorphism as tractive features to enhance age-related specific features, aiming to improve the accuracy of BA estimation. Furthermore, considering the global relevance of the anatomic structure, such as the mandible, teeth, maxillary sinus, etc., a cross-attention module based on CNN and self-attention is proposed to mine the local texture and global semantic features of OPGs. Moreover, we design a novel age composition loss by cross-entropy, probability bias, and regression functions, aiming at evaluating BA's uncertainty errors and results to obtain an accurate and robust model. On 10703 OPGs from 5.00 to 25.00 years of age, our model had a best MAE value of 0.8005 years and higher than the comparison popular algorithms, which also demonstrates the method's potential for improved accuracy in BA estimation.

17.
Sensors (Basel) ; 23(14)2023 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-37514688

RESUMO

Understanding and analyzing 2D/3D sensor data is crucial for a wide range of machine learning-based applications, including object detection, scene segmentation, and salient object detection. In this context, interactive object segmentation is a vital task in image editing and medical diagnosis, involving the accurate separation of the target object from its background based on user annotation information. However, existing interactive object segmentation methods struggle to effectively leverage such information to guide object-segmentation models. To address these challenges, this paper proposes an interactive image-segmentation technique for static images based on multi-level semantic fusion. Our method utilizes user-guidance information both inside and outside the target object to segment it from the static image, making it applicable to both 2D and 3D sensor data. The proposed method introduces a cross-stage feature aggregation module, enabling the effective propagation of multi-scale features from previous stages to the current stage. This mechanism prevents the loss of semantic information caused by multiple upsampling and downsampling of the network, allowing the current stage to make better use of semantic information from the previous stage. Additionally, we incorporate a feature channel attention mechanism to address the issue of rough network segmentation edges. This mechanism captures richer feature details from the feature channel level, leading to finer segmentation edges. In the experimental evaluation conducted on the PASCAL Visual Object Classes (VOC) 2012 dataset, our proposed interactive image segmentation method based on multi-level semantic fusion demonstrates an intersection over union (IOU) accuracy approximately 2.1% higher than the currently popular interactive image segmentation method in static images. The comparative analysis highlights the improved performance and effectiveness of our method. Furthermore, our method exhibits potential applications in various fields, including medical imaging and robotics. Its compatibility with other machine learning methods for visual semantic analysis allows for integration into existing workflows. These aspects emphasize the significance of our contributions in advancing interactive image-segmentation techniques and their practical utility in real-world applications.

18.
Front Neurosci ; 17: 1145526, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37284662

RESUMO

Introduction: In the clinical setting, it becomes increasingly important to detect epileptic seizures automatically since it could significantly reduce the burden for the care of patients suffering from intractable epilepsy. Electroencephalography (EEG) signals record the brain's electrical activity and contain rich information about brain dysfunction. As a non-invasive and inexpensive tool for detecting epileptic seizures, visual evaluation of EEG recordings is labor-intensive and subjective and requires significant improvement. Methods: This study aims to develop a new approach to recognize seizures automatically using EEG recordings. During feature extraction of EEG input from raw data, we construct a new deep neural network (DNN) model. Deep feature maps derived from layers placed hierarchically in a convolution neural network are put into different kinds of shallow classifiers to detect the anomaly. Feature maps are reduced in dimensionality using Principal Component Analysis (PCA). Results: By analyzing the EEG Epilepsy dataset and the Bonn dataset for epilepsy, we conclude that our proposed method is both effective and robust. These datasets vary significantly in the acquisition of data, the formulation of clinical protocols, and the storage of digital information, making processing and analysis challenging. On both datasets, extensive experiments are performed using a cross-validation by 10 folds strategy to demonstrate approximately 100% accuracy for binary and multi-category classification. Discussion: In addition to demonstrating that our methodology outperforms other up-to-date approaches, the results of this study also suggest that it can be applied in clinical practice as well.

19.
J Biophotonics ; 16(9): e202300059, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37289201

RESUMO

Automated analysis of the vessel structure in intravascular optical coherence tomography (IVOCT) images is critical to assess the health status of vessels and monitor coronary artery disease progression. However, deep learning-based methods usually require well-annotated large datasets, which are difficult to obtain in the field of medical image analysis. Hence, an automatic layers segmentation method based on meta-learning was proposed, which can simultaneously extract the surfaces of the lumen, intima, media, and adventitia using a handful of annotated samples. Specifically, we leverage a bi-level gradient strategy to train a meta-learner for capturing the shared meta-knowledge among different anatomical layers and quickly adapting to unknown anatomical layers. Then, a Claw-type network and a contrast consistency loss were designed to better learn the meta-knowledge according to the characteristic of annotation of the lumen and anatomical layers. Experimental results on the two cardiovascular IVOCT datasets show that the proposed method achieved state-of-art performance.


Assuntos
Doença da Artéria Coronariana , Tomografia de Coerência Óptica , Humanos , Tomografia de Coerência Óptica/métodos , Pulmão
20.
Artigo em Inglês | MEDLINE | ID: mdl-37030786

RESUMO

Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA