GlottisNetV2: Temporal Glottal Midline Detection Using Deep Convolutional Neural Networks.

Kruse, Elina; Dollinger, Michael; Schutzenberger, Anne; Kist, Andreas M

Kruse, Elina; Dollinger, Michael; Schutzenberger, Anne; Kist, Andreas M.

Afiliação

Kruse E; Department Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen-Nürnberg (FAU) 91052 Erlangen Germany.
Dollinger M; Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg (FAU) 91054 Erlangen Germany.
Schutzenberger A; Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg (FAU) 91054 Erlangen Germany.
Kist AM; Department Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen-Nürnberg (FAU) 91052 Erlangen Germany.

IEEE J Transl Eng Health Med ; 11: 137-144, 2023.

Article em En | MEDLINE | ID: mdl-36816097

RESUMO

High-speed videoendoscopy is a major tool for quantitative laryngology. Glottis segmentation and glottal midline detection are crucial for computing vocal fold-specific, quantitative parameters. However, fully automated solutions show limited clinical applicability. Especially unbiased glottal midline detection remains a challenging problem. We developed a multitask deep neural network for glottis segmentation and glottal midline detection. We used techniques from pose estimation to estimate the anterior and posterior points in endoscopy images. Neural networks were set up in TensorFlow/Keras and trained and evaluated with the BAGLS dataset. We found that a dual decoder deep neural network termed GlottisNetV2 outperforms the previously proposed GlottisNet in terms of MAPE on the test dataset (1.85% to 6.3%) while converging faster. Using various hyperparameter tunings, we allow fast and directed training. Using temporal variant data on an additional data set designed for this task, we can improve the median prediction accuracy from 2.1% to 1.76% when using 12 consecutive frames and additional temporal filtering. We found that temporal glottal midline detection using a dual decoder architecture together with keypoint estimation allows accurate midline prediction. We show that our proposed architecture allows stable and reliable glottal midline predictions ready for clinical use and analysis of symmetry measures.

Assuntos

Glote; Prega Vocal; Redes Neurais de Computação; Endoscopia

Palavras-chave

Laryngeal endoscopy; biomedical imaging; deep learning; deep neural networks; glottis; midline

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Prega Vocal / Glote Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google