Co-ordinate-based positional embedding that captures resolution to enhance transformer's performance in medical image analysis.

Das, Badhan Kumar; Zhao, Gengyan; Islam, Saahil; Re, Thomas J; Comaniciu, Dorin; Gibson, Eli; Maier, Andreas

Das, Badhan Kumar; Zhao, Gengyan; Islam, Saahil; Re, Thomas J; Comaniciu, Dorin; Gibson, Eli; Maier, Andreas.

Afiliação

Das BK; Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany. badhankumar.das@siemens-healthineers.com.
Zhao G; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany. badhankumar.das@siemens-healthineers.com.
Islam S; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.
Re TJ; Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany.
Comaniciu D; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Gibson E; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.
Maier A; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.

Sci Rep ; 14(1): 9380, 2024 04 23.

Article em En | MEDLINE | ID: mdl-38654066

ABSTRACT

ABSTRACT

Vision transformers (ViTs) have revolutionized computer vision by employing self-attention instead of convolutional neural networks and demonstrated success due to their ability to capture global dependencies and remove spatial biases of locality. In medical imaging, where input data may differ in size and resolution, existing architectures require resampling or resizing during pre-processing, leading to potential spatial resolution loss and information degradation. This study proposes a co-ordinate-based embedding that encodes the geometry of medical images, capturing physical co-ordinate and resolution information without the need for resampling or resizing. The effectiveness of the proposed embedding is demonstrated through experiments with UNETR and SwinUNETR models for infarct segmentation on MRI dataset with AxTrace and AxADC contrasts. The dataset consists of 1142 training, 133 validation and 143 test subjects. Both models with the addition of co-ordinate based positional embedding achieved substantial improvements in mean Dice score by 6.5% and 7.6%. The proposed embedding showcased a statistically significant advantage p-value< 0.0001 over alternative approaches. In conclusion, the proposed co-ordinate-based pixel-wise positional embedding method offers a promising solution for Transformer-based models in medical image analysis. It effectively leverages physical co-ordinate information to enhance performance without compromising spatial resolution and provides a foundation for future advancements in positional embedding techniques for medical applications.

Assuntos

Processamento de Imagem Assistida por Computador; Imageamento por Ressonância Magnética; Humanos; Imageamento por Ressonância Magnética/métodos; Processamento de Imagem Assistida por Computador/métodos; Algoritmos; Redes Neurais de Computação

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Imagem Assistida por Computador / Imageamento por Ressonância Magnética Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google