MSCT-UNET: multi-scale contrastive transformer within U-shaped network for medical image segmentation.

Xi, Heran; Dong, Haoji; Sheng, Yue; Cui, Hui; Huang, Chengying; Li, Jinbao; Zhu, Jinghua

Xi, Heran; Dong, Haoji; Sheng, Yue; Cui, Hui; Huang, Chengying; Li, Jinbao; Zhu, Jinghua.

Afiliação

Xi H; School of Electronic Engineering, Heilongjiang University, Harbin, 150001, People's Republic of China.
Dong H; School of Computer Science and Technology, Heilongjiang University, Harbin, 150000, People's Republic of China.
Sheng Y; School of Computer Science and Technology, Heilongjiang University, Harbin, 150000, People's Republic of China.
Cui H; Department of Computer Science and Information Technology, La Trobe University, Melbourne, 3000, Australia.
Huang C; School of Computer Science and Technology, Heilongjiang University, Harbin, 150000, People's Republic of China.
Li J; Qilu University of Technology (Shandong Academy of Science), Shandong Artificial Intelligence Institute, Jinnan, 250014, People's Republic of China.
Zhu J; School of Computer Science and Technology, Heilongjiang University, Harbin, 150000, People's Republic of China.

Phys Med Biol ; 69(1)2023 Dec 28.

Article em En | MEDLINE | ID: mdl-38061069

ABSTRACT

ABSTRACT

Objective.Automatic mutli-organ segmentation from anotomical images is essential in disease diagnosis and treatment planning. The U-shaped neural network with encoder-decoder has achieved great success in various segmentation tasks. However, a pure convolutional neural network (CNN) is not suitable for modeling long-range relations due to limited receptive fields, and a pure transformer is not good at capturing pixel-level features.Approach.We propose a new hybrid network named MSCT-UNET which fuses CNN features with transformer features at multi-scale and introduces multi-task contrastive learning to improve the segmentation performance. Specifically, the multi-scale low-level features extracted from CNN are further encoded through several transformers to build hierarchical global contexts. Then the cross fusion block fuses the low-level and high-level features in different directions. The deep-fused features are flowed back to the CNN and transformer branch for the next scale fusion. We introduce multi-task contrastive learning including a self-supervised global contrast learning and a supervised local contrast learning into MSCT-UNET. We also make the decoder stronger by using a transformer to better restore the segmentation map.Results.Evaluation results on ACDC, Synapase and BraTS datasets demonstrate the improved performance over other methods compared. Ablation study results prove the effectiveness of our major innovations.Significance.The hybrid encoder of MSCT-UNET can capture multi-scale long-range dependencies and fine-grained detail features at the same time. The cross fusion block can fuse these features deeply. The multi-task contrastive learning of MSCT-UNET can strengthen the representation ability of the encoder and jointly optimize the networks. The source code is publicly available athttps//github.com/msctunet/MSCT_UNET.git.

Assuntos

Redes Neurais de Computação; Software; Processamento de Imagem Assistida por Computador

Palavras-chave

contrastive learning; convolutional neural network; medical image segmentation; multi-scale global context; transformer

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Redes Neurais de Computação Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Redes Neurais de Computação Idioma: En Ano de publicação: 2023 Tipo de documento: Article