Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30.682
Filter
1.
Cogn Res Princ Implic ; 9(1): 63, 2024 Sep 18.
Article in English | MEDLINE | ID: mdl-39289316

ABSTRACT

People perform poorly at sighting missing and wanted persons in simulated searches due to attention and face recognition failures. We manipulated participants' expectations of encountering a target person and the within-person variability of the targets' photographs studied in a laboratory-based and a field-based prospective person memory task. We hypothesized that within-person variability and expectations of encounter would impact prospective person memory performance, and that expectations would interact with within-person variability to mitigate the effect of variability. Surprisingly, low within-person variability resulted in better performance on the search task than high within-person variability in Experiment one possibly due to the study-test images being rated as more similar in the low variability condition. We found the expected effect of high variability producing more hits for the target whose study-test images were equally similar across variability conditions. There was no effect of variability in Experiment two. Expectations affected performance only in the field-based study (Experiment two), possibly because performance is typically poor in field-based studies. Our research demonstrates some nuance to the effect of within-person variability on search performance and extends existing research demonstrating expectations affect search performance.


Subject(s)
Facial Recognition , Memory, Episodic , Humans , Facial Recognition/physiology , Male , Female , Adult , Young Adult , Attention/physiology , Adolescent
2.
Heliyon ; 10(17): e37229, 2024 Sep 15.
Article in English | MEDLINE | ID: mdl-39295989

ABSTRACT

Customer Relationship Management (CRM) is vital in modern business, aiding in the management and analysis of customer interactions. However, existing methods struggle to capture the dynamic and complex nature of customer relationships, as traditional approaches fail to leverage time series data effectively. To address this, we propose a novel GWO-attention-ConvLSTM model, which offers more effective prediction of customer churn and analysis of customer satisfaction. This model utilizes an attention mechanism to focus on key information and integrates a ConvLSTM layer to capture spatiotemporal features, effectively modeling complex temporal patterns in customer data. We validate our proposed model on multiple real-world datasets, including the BigML Telco Churn dataset, IBM Telco dataset, Cell2Cell dataset, and Orange Telecom dataset. Experimental results demonstrate significant performance improvements of our model compared to existing baseline models across these datasets. For instance, on the BigML Telco Churn dataset, our model achieves an accuracy of 95.17%, a recall of 93.66%, an F1 score of 92.89%, and an AUC of 95.00%. Similar results are validated on other datasets. In conclusion, our proposed GWO-attention-ConvLSTM model makes significant advancements in the CRM domain, providing powerful tools for predicting customer churn and analyzing customer satisfaction. By addressing the limitations of existing methods and leveraging the capabilities of deep learning, attention mechanisms, and optimization algorithms, our model paves the way for improving customer relationship management practices and driving business success.

3.
Heliyon ; 10(17): e36861, 2024 Sep 15.
Article in English | MEDLINE | ID: mdl-39296200

ABSTRACT

Text classification involves annotating text data with specific labels and is a crucial research task in the field of natural language processing. Chinese text classification presents significant challenges due to the complex semantics of the language, difficulties in semantic feature extraction, and the interleaving and irregularity of lexical features. Traditional methods often struggle to manage the relationships between words and sentences in Chinese, hindering the model's ability to capture deep semantic information and resulting in poor classification performance. To address these issues, a Chinese text classification method based on utterance information enhancement and feature fusion is proposed. This method first embeds the text into a unified space and obtains feature representations of word vectors and sentence vectors using the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model. Subsequently, an utterance information enhancement module is constructed to perform syntactic enhancement and feature extraction on the sentence information within the text. Additionally, a feature fusion strategy is introduced to combine the enhanced sentence-level information features with the word-level features extracted by the Bi-GRU (Bidirectional Gated Recurrent Unit network), culminating in the classification output. This approach effectively enhances the feature representation of Chinese text and significantly filters out irrelevant and noisy information. Evaluations on several Chinese datasets demonstrate that the proposed method surpasses existing mainstream classification models in terms of classification accuracy and F1 value, validating its effectiveness and feasibility.

4.
Neurobiol Aging ; 144: 93-103, 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39298870

ABSTRACT

Sustained attention is important for maintaining cognitive function and autonomy during ageing, yet older people often show reductions in this domain. The role of the underlying neurobiology is not yet well understood, with most neuroimaging studies primarily focused on fMRI. Here, we utilise sMRI to investigate the relationships between age, structural brain volumes and sustained attention performance. Eighty-nine healthy older adults (50-84 years, Mage 65.5 (SD=8.4) years, 74 f) underwent MRI brain scanning and completed two sustained attention tasks: a rapid visual information processing (RVP) task and sustained attention to response task (SART). Independent hierarchical linear regressions demonstrated that greater volumes of white matter hyperintensities (WMH) were associated with worse RVP_A' performance, whereas greater grey matter volumes were associated with better RVP_A' performance. Further, greater cerebral white matter volumes were associated with better SART_d' performance. Importantly, mediation analyses revealed that both grey and white matter volumes completely mediated the relationship between ageing and sustained attention. These results explain disparate attentional findings in older adults, highlighting the intervening role of brain structure.

5.
Public Health ; 236: 184-192, 2024 Sep 18.
Article in English | MEDLINE | ID: mdl-39299085

ABSTRACT

OBJECTIVES: To synthesize eye-tracking-based evidence on consumers' visual attention devoted to alcohol warning labels (AWLs) on alcohol packaging. STUDY DESIGN: A systematic review was conducted and reported in accordance with the PRISMA guidelines. METHODS: Two rounds of a literature search were conducted to identify relevant peer-reviewed articles and unpublished grey literature. While the first round (July 3 to August 21, 2023) was based on three electronic databases (PubMed, Web of Science, and PsycINFO), the second round (May 20 to 28, 2024) followed a multiple-step protocol that systematically searched the grey literature. Five criteria were applied to screen eligible articles. Using established quality control tools, the identified articles were assessed for overall quality and then for quality specific to the eye-tracking method. RESULTS: Six published peer-reviewed articles were thus included in the current review along with one unpublished research paper from a doctoral thesis. This review paper summarizes earlier findings in terms of bottom-up (i.e., AWL design-related) factors such as size, color, surrounding border, and pictorial elements, and top-down (i.e., goal-driven) factors such as motivation to change drinking behavior and self-affirmation. The review found that people tend to pay very little attention to AWLs displayed on alcohol packaging, although there is mixed evidence as to the effectiveness of specific factors. CONCLUSIONS: Further investigations using eye-tracking are needed to collect additional evidence on attention devoted to AWLs. Meanwhile, we put forward implications for policymakers and future avenues for research based on our review of the existing literature.

6.
J Behav Ther Exp Psychiatry ; 86: 101997, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39299175

ABSTRACT

BACKGROUND AND OBJECTIVES: This study was conducted to identify the characteristics of attentional bias of individuals with Sluggish Cognitive Tempo (SCT) and how Attention Bias to Threat (ABT) changes when feedback was provided in attention training. METHODS: First, a dot probe task was conducted to confirm the ABT of the SCT feedback group (N = 27) and SCT no feedback group (N = 25), and healthy control group (N = 30) before intervention. Thereafter, a VR-based attention training was conducted three times with feedback or no feedback. Finally, a dot probe task was executed again. RESULTS: The SCT groups showed a higher ABT than the healthy control group. A result of the attention training, the reaction time of disengage was significantly reduced when provided feedback. In addition, it was confirmed that the ABT of the SCT group that received feedback, was significantly reduced. LIMITATIONS: First, the only stimulus used to examine the ABT was the angry face, and the reaction time to other threatening facial expressions was not confirmed. Second, attention training was conducted three times, but further studies are needed on the effect of the duration of training on the magnitude of effect. CONCLUSIONS: This study identified ABT associated with internalizing symptoms of SCT and suggests that attention training with immediate and continuous feedback is needed to reduce ABT.

7.
Article in English | MEDLINE | ID: mdl-39300051

ABSTRACT

Numerous studies have indicated that both the broaden-and-build model and the motivational dimensional model emphasize the impact of emotion on spatial attention by altering the attentional scope. However, no prior research has investigated the impact of emotional valence and motivational intensity on spatial attention within the same paradigm. Furthermore, object-based attention, characterized by distinct neural mechanisms from space-based attention and also susceptible to attentional scope, represents a major pattern of selective attention. Nevertheless, it is still unclear whether and how emotional valence and motivation play a role in object-based attentional selection. Therefore, the present study aimed to explore these areas. Using a two-rectangle paradigm, Experiment 1 found that motivational intensity modulated space-based effects, whereas emotional valence modulated object-based effects. Experiment 2 used a traditional spatial cueing paradigm to further study the stability of modulating effect of motivation intensity on space-based attention, yielding results consistent with those of Experiment 1. The present study indicated that the broaden-and-build model and motivational dimensional model were not either one or the other, but both played a role in object- and space-based attention. This study provides crucial empirical evidence for theoretical complementation and integration of emotional attention.

8.
Network ; : 1-27, 2024 Sep 20.
Article in English | MEDLINE | ID: mdl-39302211

ABSTRACT

Monitoring Surveillance video is really time-consuming, and the complexity of typical crowd behaviour in crowded situations makes this even more challenging. This has sparked a curiosity about computer vision-based anomaly detection. This study introduces a new crowd anomaly detection method with two main steps: Visual Attention Detection and Anomaly Detection. The Visual Attention Detection phase uses an Enhanced Bilateral Texture-Based Methodology to pinpoint crucial areas in crowded scenes, improving anomaly detection precision. Next, the Anomaly Detection phase employs Optimized Deep Maxout Network to robustly identify unusual behaviours. This network's deep learning capabilities are essential for detecting complex patterns in diverse crowd scenarios. To enhance accuracy, the model is trained using the innovative Battle Royale Coalesced Atom Search Optimization (BRCASO) algorithm, which fine-tunes optimal weights for superior performance, ensuring heightened detection accuracy and reliability. Lastly, using various performance metrics, the suggested work's effectiveness will be contrasted with that of the other traditional approaches. The proposed crowd anomaly detection is implemented in Python. On observing the result showed that the suggested model attains a detection accuracy of 97.28% at a learning rate of 90%, which is much superior than the detection accuracy of other models, including ASO = 90.56%, BMO = 91.39%, BES = 88.63%, BRO = 86.98%, and FFLY = 89.59%.

9.
PeerJ Comput Sci ; 10: e2224, 2024.
Article in English | MEDLINE | ID: mdl-39314678

ABSTRACT

Surface defect inspection methods have proven effective in addressing casting quality control tasks. However, traditional inspection methods often struggle to achieve high-precision detection of surface defects in castings with similar characteristics and minor scales. The study introduces DES-YOLO, a novel real-time method for detecting castings' surface defects. In the DES-YOLO model, we incorporate the DSC-Darknet backbone network and global attention mechanism (GAM) module to enhance the identification of defect target features. These additions are essential for overcoming the challenge posed by the high similarity among defect characteristics, such as shrinkage holes and slag holes, which can result in decreased detection accuracy. An enhanced pyramid pooling module is also introduced to improve feature representation for small defective parts through multi-layer pooling. We integrate Slim-Neck and SIoU bounding box regression loss functions for real-time detection in actual production scenarios. These functions reduce memory overhead and enable real-time detection of surface defects in castings. Experimental findings demonstrate that the DES-YOLO model achieves a mean average precision (mAP) of 92.6% on the CSD-DET dataset and a single-image inference speed of 3.9 milliseconds. The proposed method proves capable of swiftly and accurately accomplishing real-time detection of surface defects in castings.

10.
PeerJ Comput Sci ; 10: e2283, 2024.
Article in English | MEDLINE | ID: mdl-39314683

ABSTRACT

Automatic polarity prediction is a challenging assessment issue. Even though polarity assessment is a critical topic with many existing applications, it is probably not an easy challenge and faces several difficulties in natural language processing (NLP). Public polling data can give useful information, and polarity assessment or classification of comments on Twitter and Facebook may be an effective approach for gaining a better understanding of user sentiments. Text embedding techniques and models related to the artificial intelligence field and sub-fields with differing and almost accurate parameters are among the approaches available for assessing student comments. Existing state-of-the-art methodologies for sentiment analysis to analyze student responses were discussed in this study endeavor. An innovative hybrid model is proposed that uses ensemble learning-based text embedding, a multi-head attention mechanism, and a combination of deep learning classifiers. The proposed model outperforms the existing state-of-the-art deep learning-based techniques. The proposed model achieves 95% accuracy, 97% recall, having a precision of 95% with an F1-score of 96% demonstrating its effectiveness in sentiment analysis of student feedback.

11.
PeerJ Comput Sci ; 10: e2266, 2024.
Article in English | MEDLINE | ID: mdl-39314684

ABSTRACT

The facial expression reflects a person's emotion, cognition, and even physiological or mental state to a large extent. It has important application value in medical treatment, business, criminal investigation, education, and human-computer interaction. Automatic facial expression recognition technology has become an important research topic in computer vision. To solve the problems of insufficient feature extraction, loss of local key information, and low accuracy in facial expression recognition, this article proposes a facial expression recognition network based on attention double branch enhanced fusion. Two parallel branches are used to capture global enhancement features and local attention semantics respectively, and the fusion and complementarity of global and local information is realized through decision-level fusion. The experimental results show that the features extracted by the network are made more complete by fusing and enhancing the global and local features. The proposed method achieves 89.41% and 88.84% expression recognition accuracy on the natural scene face expression datasets RAF-DB and FERPlus, respectively, which is an excellent performance compared with many current methods and demonstrates the effectiveness and superiority of the proposed network model.

12.
PeerJ Comput Sci ; 10: e2311, 2024.
Article in English | MEDLINE | ID: mdl-39314697

ABSTRACT

The syntactic information of a dependency tree is an essential feature in relation extraction studies. Traditional dependency-based relation extraction methods can be categorized into hard pruning methods, which aim to remove unnecessary information, and soft pruning methods, which aim to utilize all lexical information. However, hard pruning has the potential to overlook important lexical information, while soft pruning can weaken the syntactic information between entities. As a result, recent studies in relation extraction have been shifting from dependency-based methods to pre-trained language model (LM) based methods. Nonetheless, LM-based methods increasingly demand larger language models and additional data. This trend leads to higher resource consumption, longer training times, and increased computational costs, yet often results in only marginal performance improvements. To address this problem, we propose a relation extraction model based on an entity-centric dependency tree: a dependency tree that is reconstructed by considering entities as root nodes. Using the entity-centric dependency tree, the proposed method can capture the syntactic information of an input sentence without losing lexical information. Additionally, we propose a novel model that utilizes entity-centric dependency trees in conjunction with language models, enabling efficient relation extraction without the need for additional data or larger models. In experiments with representative sentence-level relation extraction datasets such as TACRED, Re-TACRED, and SemEval 2010 Task 8, the proposed method achieves F1-scores of 74.9%, 91.2%, and 90.5%, respectively, which are state-of-the-art performances.

13.
PeerJ Comput Sci ; 10: e2313, 2024.
Article in English | MEDLINE | ID: mdl-39314705

ABSTRACT

To address issues such as misdetection and omission due to low light, image defocus, and worker occlusion in coal-rock image recognition, a new method called YOLOv8-Coal, based on YOLOv8, is introduced to enhance recognition accuracy and processing speed. The Deformable Convolution Network version 3 enhances object feature extraction by adjusting sampling positions with offsets and aligning them closely with the object's shape. The Polarized Self-Attention module in the feature fusion network emphasizes crucial features and suppresses unnecessary information to minimize irrelevant factors. Additionally, the lightweight C2fGhost module combines the strengths of GhostNet and the C2f module, further decreasing model parameters and computational load. The empirical findings indicate that YOLOv8-Coal has achieved substantial enhancements in all metrics on the coal rock image dataset. More precisely, the values for AP50, AP50:95, and AR50:95 were improved to 77.7%, 62.8%, and 75.0% respectively. In addition, optimal localization recall precision (oLRP) were decreased to 45.6%. In addition, the model parameters were decreased to 2.59M and the FLOPs were reduced to 6.9G. Finally, the size of the model weight file is a mere 5.2 MB. The enhanced algorithm's advantage is further demonstrated when compared to other commonly used algorithms.

14.
PeerJ Comput Sci ; 10: e2332, 2024.
Article in English | MEDLINE | ID: mdl-39314702

ABSTRACT

Image style transfer is an important way to combine different styles and contents to generate new images, which plays an important role in computer vision tasks such as image reconstruction and image texture synthesis. In style transfer tasks, there are often long-distance dependencies between pixels of different styles and contents, and existing neural network-based work cannot handle this problem well. This paper constructs a generation model for style transfer based on the cycle-consistent network and the attention mechanism. The forward and backward learning process of the cycle-consistent mechanism could make the network complete the mismatch conversion between the input and output of the image. The attention mechanism enhances the model's ability to perceive the long-distance dependencies between pixels in process of learning feature representation from the target content and the target styles, and at the same time suppresses the style feature information of the non-target area. Finally, a large number of experiments were carried out in the monet2photo dataset, and the results show that the misjudgment rate of Amazon Mechanical Turk (AMT) perceptual studies achieves 45%, which verified that the cycle-consistent network model with attention mechanism has certain advantages in image style transfer.

15.
PeerJ Comput Sci ; 10: e2201, 2024.
Article in English | MEDLINE | ID: mdl-39314710

ABSTRACT

Multivariate time series anomaly detection has garnered significant attention in fields such as IT operations, finance, medicine, and industry. However, a key challenge lies in the fact that anomaly patterns often exhibit multi-scale temporal variations, which existing detection models often fail to capture effectively. This limitation significantly impacts detection accuracy. To address this issue, we propose the MFAM-AD model, which combines the strengths of convolutional neural networks (CNNs) and bi-directional long short-term memory (Bi-LSTM). The MFAM-AD model is designed to enhance anomaly detection accuracy by seamlessly integrating temporal dependencies and multi-scale spatial features. Specifically, it utilizes parallel convolutional layers to extract features across different scales, employing an attention mechanism for optimal feature fusion. Additionally, Bi-LSTM is leveraged to capture time-dependent information, reconstruct the time series and enable accurate anomaly detection based on reconstruction errors. In contrast to existing algorithms that struggle with inadequate feature fusion or are confined to single-scale feature analysis, MFAM-AD effectively addresses the unique challenges of multivariate time series anomaly detection. Experimental results on five publicly available datasets demonstrate the superiority of the proposed model. Specifically, on the datasets SMAP, MSL, and SMD1-1, our MFAM-AD model has the second-highest F1 score after the current state-of-the-art DCdetector model. On the datasets NIPS-TS-SWAN and NIPS-TS-GECCO, the F1 scores of MAFM-AD are 0.046 (6.2%) and 0.09 (21.3%) higher than those of DCdetector, respectively(the value ranges from 0 to 1). These findings validate the MFAMAD model's efficacy in multivariate time series anomaly detection, highlighting its potential in various real-world applications.

16.
PeerJ Comput Sci ; 10: e2273, 2024.
Article in English | MEDLINE | ID: mdl-39314741

ABSTRACT

Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved.

17.
PeerJ Comput Sci ; 10: e2240, 2024.
Article in English | MEDLINE | ID: mdl-39314739

ABSTRACT

Background: The majority of extant methodologies for text classification prioritize the extraction of feature representations from texts with high degrees of distinction, a process that may result in computational inefficiencies. To address this limitation, the current study proposes a novel approach by directly leveraging label information to construct text representations. This integration aims to optimize the use of label data alongside textual content. Methods: The methodology initiated with separate pre-processing of texts and labels, followed by encoding through a projection layer. This research then utilized a conventional self-attention model enhanced by instance normalization (IN) and Gaussian Error Linear Unit (GELU) functions to assess emotional valences in review texts. An advanced self-attention mechanism was further developed to enable the efficient integration of text and label information. In the final stage, an adaptive label encoder was employed to extract relevant label information from the combined text-label data efficiently. Results: Empirical evaluations demonstrate that the proposed model achieves a significant improvement in classification performance, outperforming existing methodologies. This enhancement is quantitatively evidenced by its superior micro-F1 score, indicating the efficacy of integrating label information into text classification processes. This suggests that the model not only addresses computational inefficiencies but also enhances the accuracy of text classification.

18.
Psychophysiology ; : e14687, 2024 Sep 24.
Article in English | MEDLINE | ID: mdl-39315537

ABSTRACT

Prepulse inhibition of perceived stimulus intensity (PPIPSI) is a phenomenon where a weak stimulus preceding a stronger one reduces the perceived intensity of the latter. Previous studies have shown that PPIPSI relies on attention and is sensitive to stimulus onset asynchrony (SOA). Longer SOAs may increase conscious awareness of the impact of gating mechanisms on perception by allowing more time for attention to be directed toward relevant processing channels. In other psychophysiological paradigms, temporal predictability improves attention to task relevant stimuli and processes. We hypothesized that temporal predictability may similarly facilitate attention being directed toward the pulse and its processing in PPIPSI. To examine this, we conducted a 2 (SOA: 90 ms, 150 ms) × 2 (predictability: low, high) experiment, where participants were tasked with comparing the perceived intensity of an acoustic pulse-alone against one preceded by a prepulse. The relationship between PPIPSI and cortical PPI (N1-P2 inhibition) was also investigated. Significant main effects of temporal predictability, SOA, and cortical PPI were revealed. Under high temporal predictability, both SOAs (90 and 150 ms) elicited greater PPIPSI. The findings indicate that temporal predictability enhances the timely allocation of finite attentional resources, increasing PPIPSI observations by facilitating perceptual access to the gated pulse signal. Moreover, the finding that reductions in N1-P2 magnitude by a prepulse are associated with increased probability of the participants perceiving the pulse "with prepulse" as less intense, suggests that under various experimental conditions, the link between these cortical processes and perception is similarly engaged.

19.
Ann N Y Acad Sci ; 2024 Sep 24.
Article in English | MEDLINE | ID: mdl-39316839

ABSTRACT

Recent research on healthy individuals suggests that the valence of emotional stimuli influences behavioral reactions only when relevant to ongoing tasks, as they impact reaching arm movements and gait only when the emotional content cued the responses. However, it has been suggested that emotional expressions elicit automatic gaze shifting, indicating that oculomotor behavior might differ from that of the upper and lower limbs. To investigate, 40 participants underwent two Go/No-go tasks, an emotion discrimination task (EDT) and a gender discrimination task (GDT). In the EDT, participants had to perform a saccade to a peripheral target upon the presentation of angry or happy faces and refrain from moving with neutral ones. In the GDT, the same images were shown, but participants responded based on the posers' gender. Participants displayed two behavioral strategies: a single saccade to the target (92.7%) or two saccades (7.3%), with the first directed at a task-salient feature, that is, the mouth in the EDT and the nose-eyes regions in the GDT. In both cases, the valence of facial expression impacted the saccades only when relevant to the response. Such evidence indicates the same principles govern the interplay between emotional stimuli and motor reactions despite the effectors employed.

20.
Comput Biol Med ; 182: 109173, 2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39317055

ABSTRACT

Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, characterized by high in-plane resolution but lower through-plane resolution, presents significant challenges. While applying 2D models to individual slices of a 2.5D image is feasible, it fails to capture the spatial relationships between slices. On the other hand, 3D models face challenges such as resolution inconsistencies in 2.5D images, along with computational complexity and susceptibility to overfitting when trained with limited data. In this context, 2.5D models, which capture inter-slice correlations using only 2D neural networks, emerge as a promising solution due to their reduced computational demand and simplicity in implementation. In this paper, we introduce CSA-Net, a flexible 2.5D segmentation model capable of processing 2.5D images with an arbitrary number of slices. CSA-Net features an innovative Cross-Slice Attention (CSA) module that effectively captures 3D spatial information by learning long-range dependencies between the center slice (for segmentation) and its neighboring slices. Moreover, CSA-Net utilizes the self-attention mechanism to learn correlations among pixels within the center slice. We evaluated CSA-Net on three 2.5D segmentation tasks: (1) multi-class brain MR image segmentation, (2) binary prostate MR image segmentation, and (3) multi-class prostate MR image segmentation. CSA-Net outperformed leading 2D, 2.5D, and 3D segmentation methods across all three tasks, achieving average Dice coefficients and HD95 values of 0.897 and 1.40 mm for the brain dataset, 0.921 and 1.06 mm for the prostate dataset, and 0.659 and 2.70 mm for the ProstateX dataset, demonstrating its efficacy and superiority. Our code is publicly available at: https://github.com/mirthAI/CSA-Net.

SELECTION OF CITATIONS
SEARCH DETAIL