Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
1.
Comput Biol Med ; 182: 109179, 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39326263

RESUMO

Sesamoiditis is a common equine disease with varying severity, leading to increased injury risks and performance degradation in horses. Accurate grading of sesamoiditis is crucial for effective treatment. Although deep learning-based approaches for grading sesamoiditis show promise, they remain underexplored and often lack clinical interpretability. To address this issue, we propose a novel, clinically interpretable multi-task learning model that integrates clinical knowledge with machine learning. The proposed model employs a dual-branch decoder to simultaneously perform sesamoiditis grading and vascular channel segmentation. Feature fusion is utilized to transfer knowledge between these tasks, enabling the identification of subtle radiographic variations. Additionally, our model generates a diagnostic report that, along with the vascular channel mask, serves as an explanation of the model's grading decisions, thereby increasing the transparency of the decision-making process. We validate our model on two datasets, demonstrating its superior performance compared to state-of-the-art models in terms of accuracy and generalization. This study provides a foundational framework for the interpretable grading of similar diseases.

2.
Sci Rep ; 14(1): 21760, 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39294345

RESUMO

Transformer-based methods effectively capture global dependencies in images, demonstrating outstanding performance in multiple visual tasks. However, existing Transformers cannot effectively denoise large noisy images captured under low-light conditions owing to (1) the global self-attention mechanism causing high computational complexity in the spatial dimension owing to a quadratic increase in computation with the number of tokens; (2) the channel-wise self-attention computation unable to optimise the spatial correlations in images. We propose a local-global interaction Transformer (LGIT) that employs an adaptive strategy to select relevant patches for global interaction, achieving low computational complexity in global self-attention computation. A top-N patch cross-attention model (TPCA) is designed based on superpixel segmentation guidance. TPCA selects top-N patches most similar to the target image patch and applies cross attention to aggregate information from them into the target patch, effectively enhancing the utilisation of the image's nonlocal self-similarity. A mixed-scale dual-gated feedforward network (MDGFF) is introduced for the effective extraction of multiscale local correlations. TPCA and MDGFF were combined to construct a hierarchical encoder-decoder network, LGIT, to compute self-attention within and across patches at different scales. Extensive experiments using real-world image-denoising datasets demonstrated that LGIT outperformed state-of-the-art (SOTA) convolutional neural network (CNN) and Transformer-based methods in qualitative and quantitative results.

3.
Sci Rep ; 14(1): 22518, 2024 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-39342017

RESUMO

Hemolytic peptides are therapeutic peptides that damage red blood cells. However, therapeutic peptides used in medical treatment must exhibit low toxicity to red blood cells to achieve the desired therapeutic effect. Therefore, accurate prediction of the hemolytic activity of therapeutic peptides is essential for the development of peptide therapies. In this study, a multi-feature cross-fusion model, HemoFuse, for hemolytic peptide identification is proposed. The feature vectors of peptide sequences are transformed by word embedding technique and four hand-crafted feature extraction methods. We apply multi-head cross-attention mechanism to hemolytic peptide identification for the first time. It captures the interaction between word embedding features and hand-crafted features by calculating the attention of all positions in them, so that multiple features can be deeply fused. Moreover, we visualize the features obtained by this module to enhance its interpretability. On the comprehensive integrated dataset, HemoFuse achieves ideal results, with ACC, SP, SN, MCC, F1, AUC, and AP of 0.7575, 0.8814, 0.5793, 0.4909, 0.6620, 0.8387, and 0.7118, respectively. Compared with HemoDL proposed by Yang et al., it is 3.32%, 3.89%, 5.93%, 10.6%, 8.17%, 5.88%, and 2.72% higher. Other ablation experiments also prove that our model is reasonable and efficient. The codes and datasets are accessible at https://github.com/z11code/Hemo .


Assuntos
Hemólise , Peptídeos , Peptídeos/química , Humanos , Eritrócitos/metabolismo , Algoritmos , Biologia Computacional/métodos
4.
Neural Netw ; 180: 106718, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39293179

RESUMO

With the rapid advent and abundance of remote sensing data in different modalities, cross-modal retrieval tasks have gained importance in the research community. Cross-modal retrieval belongs to the research paradigm in which the query is of one modality and the retrieved output is of the other modality. In this paper, the remote sensing (RS) data modalities considered are the earth observation optical data (aerial photos) and the corresponding hand-drawn sketches. The main challenge of the cross-modal retrieval research objective for optical remote sensing images and the corresponding sketches is the distribution gap between the shared embedding space of the modalities. Prior attempts to resolve this issue have not yielded satisfactory outcomes regarding accurately retrieving cross-modal sketch-image RS data. The state-of-the-art architectures used conventional convolutional architectures, which focused on local pixel-wise information about the modalities to be retrieved. This limits the interaction between the sketch texture and the corresponding image, making these models susceptible to overfitting datasets with particular scenarios. To circumvent this limitation, we suggest establishing multi-modal correspondence using a novel architecture of the combined self and cross-attention algorithms, SPCA-Net to minimize the modality gap by employing attention mechanisms for the query and other modalities. Efficient cross-modal retrieval is achieved through the suggested attention architecture, which empirically emphasizes the global information of the relevant query modality and bridges the domain gap through a unique pairwise cross-attention network. In addition to the novel architecture, this paper introduces a unique loss function, label-specific supervised contrastive loss, tailored to the intricacies of the task and to enhance the discriminative power of the learned embeddings. Extensive evaluations are conducted on two sketch-image remote sensing datasets, Earth-on-Canvas and RSketch. Under the same experimental conditions, the performance metrics of our proposed model beat the state-of-the-art architectures by significant margins of 16.7%, 18.9%, 33.7%, and 40.9% correspondingly.

5.
Neural Netw ; 180: 106733, 2024 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-39293177

RESUMO

Improperly-exposed images often have unsatisfactory visual characteristics like inadequate illumination, low contrast, and the loss of small structures and details. The mapping relationship from an improperly-exposed condition to a well-exposed one may vary significantly due to the presence of multiple exposure conditions. Consequently, the enhancement methods that do not pay specific attention to this issue tend to yield inconsistent results when applied to the same scene under different exposure conditions. In order to obtain consistent enhancement results for various exposures while restoring rich details, we propose an illumination-aware divide-and-conquer network (IDNet). Specifically, to address the challenge of directly learning a sophisticated nonlinear mapping from an improperly-exposed condition to a well-exposed one, we utilize the discrete wavelet transform (DWT) to decompose the image into the low-frequency (LF) component, which primarily captures brightness and contrast, and the high-frequency (HF) components that depict fine-scale structures. To mitigate the inconsistency in correction across various exposures, we extract a conditional feature from the input that represents illumination-related global information. This feature is then utilized to modulate the dynamic convolution weights, enabling precise correction of the LF component. Furthermore, as the co-located positions of LF and HF components are highly correlated, we create a mask to distill useful knowledge from the corrected LF component, and integrate it into the HF component to support the restoration of fine-scale details. Extensive experimental results demonstrate that the proposed IDNet is superior to several state-of-the-art enhancement methods on two datasets with multiple exposures.

6.
Sensors (Basel) ; 24(16)2024 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-39205085

RESUMO

In recent years, significant progress has been made in facial expression recognition methods. However, tasks related to facial expression recognition in real environments still require further research. This paper proposes a tri-cross-attention transformer with a multi-feature fusion network (TriCAFFNet) to improve facial expression recognition performance under challenging conditions. By combining LBP (Local Binary Pattern) features, HOG (Histogram of Oriented Gradients) features, landmark features, and CNN (convolutional neural network) features from facial images, the model is provided with a rich input to improve its ability to discern subtle differences between images. Additionally, tri-cross-attention blocks are designed to facilitate information exchange between different features, enabling mutual guidance among different features to capture salient attention. Extensive experiments on several widely used datasets show that our TriCAFFNet achieves the SOTA performance on RAF-DB with 92.17%, AffectNet (7 cls) with 67.40%, and AffectNet (8 cls) with 63.49%, respectively.


Assuntos
Expressão Facial , Redes Neurais de Computação , Humanos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Face/anatomia & histologia , Reconhecimento Facial Automatizado/métodos , Reconhecimento Automatizado de Padrão/métodos
7.
PeerJ Comput Sci ; 10: e2169, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39145235

RESUMO

The Boolean satisfiability (SAT) problem exhibits different structural features in various domains. Neural network models can be used as more generalized algorithms that can be learned to solve specific problems based on different domain data than traditional rule-based approaches. How to accurately identify these structural features is crucial for neural networks to solve the SAT problem. Currently, learning-based SAT solvers, whether they are end-to-end models or enhancements to traditional heuristic algorithms, have achieved significant progress. In this article, we propose TG-SAT, an end-to-end framework based on Transformer and gated recurrent neural network (GRU) for predicting the satisfiability of SAT problems. TG-SAT can learn the structural features of SAT problems in a weakly supervised environment. To capture the structural information of the SAT problem, we encodes a SAT problem as an undirected graph and integrates GRU into the Transformer structure to update the node embeddings. By computing cross-attention scores between literals and clauses, a weighted representation of nodes is obtained. The model is eventually trained as a classifier to predict the satisfiability of the SAT problem. Experimental results demonstrate that TG-SAT achieves a 2%-5% improvement in accuracy on random 3-SAT problems compared to NeuroSAT. It also outperforms in SR(N), especially in handling more complex SAT problems, where our model achieves higher prediction accuracy.

8.
Anal Biochem ; 694: 115637, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39121938

RESUMO

Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.


Assuntos
Peptídeos , Ligação Proteica , Proteínas , Proteínas/química , Proteínas/metabolismo , Peptídeos/química , Peptídeos/metabolismo , Aprendizado Profundo , Sítios de Ligação , Bases de Dados de Proteínas , Biologia Computacional/métodos
9.
Neural Netw ; 180: 106663, 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39208459

RESUMO

Utilizing large-scale pretrained models is a well-known strategy to enhance performance on various target tasks. It is typically achieved through fine-tuning pretrained models on target tasks. However, naï ve fine-tuning may not fully leverage knowledge embedded in pretrained models. In this study, we introduce a novel fine-tuning method, called stochastic cross-attention (StochCA), specific to Transformer architectures. This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning. Specifically, in each block, instead of self-attention, cross-attention is performed stochastically according to the predefined probability, where keys and values are extracted from the corresponding block of a pretrained model. By doing so, queries and channel-mixing multi-layer perceptron layers of a target model are fine-tuned to target tasks to learn how to effectively exploit rich representations of pretrained models. To verify the effectiveness of StochCA, extensive experiments are conducted on benchmarks in the areas of transfer learning and domain generalization, where the exploitation of pretrained models is critical. Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas. Furthermore, we demonstrate that StochCA is complementary to existing approaches, i.e., it can be combined with them to further improve performance. We release the code at https://github.com/daintlab/stochastic_cross_attention.

10.
Neural Netw ; 179: 106553, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39053303

RESUMO

Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.


Assuntos
Atenção , Aprendizado Profundo , Atenção/fisiologia , Humanos , Redes Neurais de Computação , Algoritmos
11.
Biomed Phys Eng Express ; 10(5)2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39084234

RESUMO

Objective. Existing registration networks based on cross-attention design usually divide the image pairs to be registered into patches for input. The division and merging operations of a series of patches are difficult to maintain the topology of the deformation field and reduce the interpretability of the network. Therefore, our goal is to develop a new network architecture based on a cross-attention mechanism combined with a multi-resolution strategy to improve the accuracy and interpretability of medical image registration.Approach. We propose a new deformable image registration network NCNet based on neighborhood cross-attention combined with multi-resolution strategy. The network structure mainly consists of a multi-resolution feature encoder, a multi-head neighborhood cross-attention module and a registration decoder. The hierarchical feature extraction capability of our encoder is improved by introducing large kernel parallel convolution blocks; the cross-attention module based on neighborhood calculation is used to reduce the impact on the topology of the deformation field and double normalization is used to reduce its computational complexity.Main result. We performed atlas-based registration and inter-subject registration tasks on the public 3D brain magnetic resonance imaging datasets LPBA40 and IXI respectively. Compared with the popular VoxelMorph method, our method improves the average DSC value by 7.9% and 3.6% on LPBA40 and IXI. Compared with the popular TransMorph method, our method improves the average DSC value by 4.9% and 1.3% on LPBA40 and IXI.Significance. We proved the advantages of the neighborhood attention calculation method compared to the window attention calculation method based on partitioning patches, and analyzed the impact of the pyramid feature encoder and double normalization on network performance. This has made a valuable contribution to promoting the further development of medical image registration methods.


Assuntos
Algoritmos , Encéfalo , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Redes Neurais de Computação , Bases de Dados Factuais
12.
Med Image Anal ; 97: 103265, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39029158

RESUMO

Acute coronary syndromes (ACS) are one of the leading causes of mortality worldwide, with atherosclerotic plaque rupture and subsequent thrombus formation as the main underlying substrate. Thrombus burden evaluation is important for tailoring treatment therapy and predicting prognosis. Coronary optical coherence tomography (OCT) enables in-vivo visualization of thrombus that cannot otherwise be achieved by other image modalities. However, automatic quantification of thrombus on OCT has not been implemented. The main challenges are due to the variation in location, size and irregularities of thrombus in addition to the small data set. In this paper, we propose a novel dual-coordinate cross-attention transformer network, termed DCCAT, to overcome the above challenges and achieve the first automatic segmentation of thrombus on OCT. Imaging features from both Cartesian and polar coordinates are encoded and fused based on long-range correspondence via multi-head cross-attention mechanism. The dual-coordinate cross-attention block is hierarchically stacked amid convolutional layers at multiple levels, allowing comprehensive feature enhancement. The model was developed based on 5,649 OCT frames from 339 patients and tested using independent external OCT data from 548 frames of 52 patients. DCCAT achieved Dice similarity score (DSC) of 0.706 in segmenting thrombus, which is significantly higher than the CNN-based (0.656) and Transformer-based (0.584) models. We prove that the additional input of polar image not only leverages discriminative features from another coordinate but also improves model robustness for geometrical transformation.Experiment results show that DCCAT achieves competitive performance with only 10% of the total data, highlighting its data efficiency. The proposed dual-coordinate cross-attention design can be easily integrated into other developed Transformer models to boost performance.


Assuntos
Tomografia de Coerência Óptica , Tomografia de Coerência Óptica/métodos , Humanos , Algoritmos , Trombose Coronária/diagnóstico por imagem , Síndrome Coronariana Aguda/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos
13.
Comput Biol Med ; 180: 108931, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39079414

RESUMO

Skin cancer images have hair occlusion problems, which greatly affects the accuracy of diagnosis and classification. Current dermoscopic hair removal methods use segmentation networks to locate hairs, and then uses repair networks to perform image repair. However, it is difficult to segment hair and capture the overall structure between hairs because of the hair being thin, unclear, and similar in color to the entire image. When conducting image restoration tasks, the only available images are those obstructed by hair, and there is no corresponding ground truth (supervised data) of the same scene without hair obstruction. In addition, the texture information and structural information used in existing repair methods are often insufficient, which leads to poor results in skin cancer image repair. To address these challenges, we propose the intersection-union dual-stream cross-attention Lova-SwinUnet (IUDC-LS). Firstly, we propose the Lova-SwinUnet module, which embeds Lovasz loss function into Swin-Unet, enabling the network to better capture features of various scales, thus obtaining better hair mask segmentation results. Secondly, we design the intersection-union (IU) module, which takes the mask results obtained in the previous step for pairwise intersection or union, and then overlays the results on the skin cancer image without hair to generate the labeled training data. This turns the unsupervised image repair task into the supervised one. Finally, we propose the dual-stream cross-attention (DC) module, which makes texture information and structure information interact with each other, and then uses cross-attention to make the network pay attention to the more important texture information and structure information in the fusion process of texture information and structure information, so as to improve the effect of image repair. The experimental results show that the PSNR index and SSIM index of the proposed method are increased by 5.4875 and 0.0401 compared with the other common methods. Experimental results unequivocally demonstrate the effectiveness of our approach, which serves as a potent tool for skin cancer detection, significantly surpassing the performance of comparable methods.


Assuntos
Cabelo , Neoplasias Cutâneas , Humanos , Neoplasias Cutâneas/diagnóstico por imagem , Cabelo/diagnóstico por imagem , Algoritmos , Interpretação de Imagem Assistida por Computador/métodos , Dermoscopia/métodos , Processamento de Imagem Assistida por Computador/métodos
14.
Sensors (Basel) ; 24(14)2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39066029

RESUMO

Gearbox fault diagnosis is essential in the maintenance and preventive repair of industrial systems. However, in actual working environments, noise frequently interferes with fault signals, consequently reducing the accuracy of fault diagnosis. To effectively address this issue, this paper incorporates the noise attenuation of the DRSN-CW model. A compound fault detection method for gearboxes, integrated with a cross-attention module, is proposed to enhance fault diagnosis performance in noisy environments. First, frequency domain features are extracted from the public dataset by using the fast Fourier transform (FFT). Furthermore, the cross-attention mechanism model is inserted in the optimal position to improve the extraction and recognition rate of global and local fault features. Finally, noise-related features are filtered through soft thresholds within the network structure to efficiently mitigate noise interference. The experimental results show that, compared to existing network models, the proposed model exhibits superior noise immunity and high-precision fault diagnosis performance.

15.
Sensors (Basel) ; 24(14)2024 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-39066115

RESUMO

3D object detection is a challenging and promising task for autonomous driving and robotics, benefiting significantly from multi-sensor fusion, such as LiDAR and cameras. Conventional methods for sensor fusion rely on a projection matrix to align the features from LiDAR and cameras. However, these methods often suffer from inadequate flexibility and robustness, leading to lower alignment accuracy under complex environmental conditions. Addressing these challenges, in this paper, we propose a novel Bidirectional Attention Fusion module, named BAFusion, which effectively fuses the information from LiDAR and cameras using cross-attention. Unlike the conventional methods, our BAFusion module can adaptively learn the cross-modal attention weights, making the approach more flexible and robust. Moreover, drawing inspiration from advanced attention optimization techniques in 2D vision, we developed the Cross Focused Linear Attention Fusion Layer (CFLAF Layer) and integrated it into our BAFusion pipeline. This layer optimizes the computational complexity of attention mechanisms and facilitates advanced interactions between image and point cloud data, showcasing a novel approach to addressing the challenges of cross-modal attention calculations. We evaluated our method on the KITTI dataset using various baseline networks, such as PointPillars, SECOND, and Part-A2, and demonstrated consistent improvements in 3D object detection performance over these baselines, especially for smaller objects like cyclists and pedestrians. Our approach achieves competitive results on the KITTI benchmark.

16.
Front Med (Lausanne) ; 11: 1377479, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38841586

RESUMO

Retinal vessels play a pivotal role as biomarkers in the detection of retinal diseases, including hypertensive retinopathy. The manual identification of these retinal vessels is both resource-intensive and time-consuming. The fidelity of vessel segmentation in automated methods directly depends on the fundus images' quality. In instances of sub-optimal image quality, applying deep learning-based methodologies emerges as a more effective approach for precise segmentation. We propose a heterogeneous neural network combining the benefit of local semantic information extraction of convolutional neural network and long-range spatial features mining of transformer network structures. Such cross-attention network structure boosts the model's ability to tackle vessel structures in the retinal images. Experiments on four publicly available datasets demonstrate our model's superior performance on vessel segmentation and the big potential of hypertensive retinopathy quantification.

17.
Bioengineering (Basel) ; 11(6)2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38927785

RESUMO

Cardiovascular disease (CVD) is one of the leading causes of death globally. Currently, clinical diagnosis of CVD primarily relies on electrocardiograms (ECG), which are relatively easier to identify compared to other diagnostic methods. However, ensuring the accuracy of ECG readings requires specialized training for healthcare professionals. Therefore, developing a CVD diagnostic system based on ECGs can provide preliminary diagnostic results, effectively reducing the workload of healthcare staff and enhancing the accuracy of CVD diagnosis. In this study, a deep neural network with a cross-stage partial network and a cross-attention-based transformer is used to develop an ECG-based CVD decision system. To accurately represent the characteristics of ECG, the cross-stage partial network is employed to extract embedding features. This network can effectively capture and leverage partial information from different stages, enhancing the feature extraction process. To effectively distill the embedding features, a cross-attention-based transformer model, known for its robust scalability that enables it to process data sequences with different lengths and complexities, is employed to extract meaningful embedding features, resulting in more accurate outcomes. The experimental results showed that the challenge scoring metric of the proposed approach is 0.6112, which outperforms others. Therefore, the proposed ECG-based CVD decision system is useful for clinical diagnosis.

18.
Comput Biol Med ; 178: 108671, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38870721

RESUMO

Medical image segmentation is a compelling fundamental problem and an important auxiliary tool for clinical applications. Recently, the Transformer model has emerged as a valuable tool for addressing the limitations of convolutional neural networks by effectively capturing global relationships and numerous hybrid architectures combining convolutional neural networks (CNNs) and Transformer have been devised to enhance segmentation performance. However, they suffer from multilevel semantic feature gaps and fail to account for multilevel dependencies between space and channel. In this paper, we propose a hierarchical dependency Transformer for medical image segmentation, named HD-Former. First, we utilize a Compressed Bottleneck (CB) module to enrich shallow features and localize the target region. We then introduce the Dual Cross Attention Transformer (DCAT) module to fuse multilevel features and bridge the feature gap. In addition, we design the broad exploration network (BEN) that cascades convolution and self-attention from different percepts to capture hierarchical dense contextual semantic features locally and globally. Finally, we exploit uncertain multitask edge loss to adaptively map predictions to a consistent feature space, which can optimize segmentation edges. The extensive experiments conducted on medical image segmentation from ISIC, LiTS, Kvasir-SEG, and CVC-ClinicDB datasets demonstrate that our HD-Former surpasses the state-of-the-art methods in terms of both subjective visual performance and objective evaluation. Code: https://github.com/barcelonacontrol/HD-Former.


Assuntos
Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
19.
Artigo em Inglês | MEDLINE | ID: mdl-38765185

RESUMO

Colorectal cancer (CRC) is the third most common cancer in the United States. Tumor Budding (TB) detection and quantification are crucial yet labor-intensive steps in determining the CRC stage through the analysis of histopathology images. To help with this process, we adapt the Segment Anything Model (SAM) on the CRC histopathology images to segment TBs using SAM-Adapter. In this approach, we automatically take task-specific prompts from CRC images and train the SAM model in a parameter-efficient way. We compare the predictions of our model with the predictions from a trained-from-scratch model using the annotations from a pathologist. As a result, our model achieves an intersection over union (IoU) of 0.65 and an instance-level Dice score of 0.75, which are promising in matching the pathologist's TB annotation. We believe our study offers a novel solution to identify TBs on H&E-stained histopathology images. Our study also demonstrates the value of adapting the foundation model for pathology image segmentation tasks.

20.
Technol Health Care ; 32(S1): 299-312, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38759058

RESUMO

BACKGROUND: Plane-wave imaging is widely employed in medical imaging due to its ultra-fast imaging speed. However, the image quality is compromised. Existing techniques to enhance image quality tend to sacrifice the imaging frame rate. OBJECTIVE: The study aims to reconstruct high-quality plane-wave images while maintaining the imaging frame rate. METHODS: The proposed method utilizes a U-Net-based generator incorporating a multi-scale convolution module in the encoder to extract information at different levels. Additionally, a Dynamic Criss-Cross Attention (DCCA) mechanism is proposed in the decoder of the U-Net-based generator to extract both local and global features of plane-wave images while avoiding interference caused by irrelevant regions. RESULTS: In the reconstruction of point targets, the experimental images achieved a reduction in Full Width at Half Maximum (FWHM) of 0.0499 mm, compared to the Coherent Plane-Wave Compounding (CPWC) method using 75-beam plane waves. For the reconstruction of cyst targets, the simulated image achieved a 3.78% improvement in Contrast Ratio (CR) compared to CPWC. CONCLUSIONS: The proposed model effectively addresses the issue of unclear lesion sites in plane-wave images.


Assuntos
Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA