Your browser doesn't support javascript.
loading
Co-ordinate-based positional embedding that captures resolution to enhance transformer's performance in medical image analysis.
Das, Badhan Kumar; Zhao, Gengyan; Islam, Saahil; Re, Thomas J; Comaniciu, Dorin; Gibson, Eli; Maier, Andreas.
Afiliação
  • Das BK; Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany. badhankumar.das@siemens-healthineers.com.
  • Zhao G; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany. badhankumar.das@siemens-healthineers.com.
  • Islam S; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.
  • Re TJ; Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany.
  • Comaniciu D; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
  • Gibson E; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.
  • Maier A; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.
Sci Rep ; 14(1): 9380, 2024 04 23.
Article em En | MEDLINE | ID: mdl-38654066
ABSTRACT
Vision transformers (ViTs) have revolutionized computer vision by employing self-attention instead of convolutional neural networks and demonstrated success due to their ability to capture global dependencies and remove spatial biases of locality. In medical imaging, where input data may differ in size and resolution, existing architectures require resampling or resizing during pre-processing, leading to potential spatial resolution loss and information degradation. This study proposes a co-ordinate-based embedding that encodes the geometry of medical images, capturing physical co-ordinate and resolution information without the need for resampling or resizing. The effectiveness of the proposed embedding is demonstrated through experiments with UNETR and SwinUNETR models for infarct segmentation on MRI dataset with AxTrace and AxADC contrasts. The dataset consists of 1142 training, 133 validation and 143 test subjects. Both models with the addition of co-ordinate based positional embedding achieved substantial improvements in mean Dice score by 6.5% and 7.6%. The proposed embedding showcased a statistically significant advantage p-value< 0.0001 over alternative approaches. In conclusion, the proposed co-ordinate-based pixel-wise positional embedding method offers a promising solution for Transformer-based models in medical image analysis. It effectively leverages physical co-ordinate information to enhance performance without compromising spatial resolution and provides a foundation for future advancements in positional embedding techniques for medical applications.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Imagem Assistida por Computador / Imageamento por Ressonância Magnética Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Imagem Assistida por Computador / Imageamento por Ressonância Magnética Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article