Video Scene Detection Using Transformer Encoding Linker Network (TELNet).

Tseng, Shu-Ming; Yeh, Zhi-Ting; Wu, Chia-Yang; Chang, Jia-Bin; Norouzi, Mehdi

Tseng, Shu-Ming; Yeh, Zhi-Ting; Wu, Chia-Yang; Chang, Jia-Bin; Norouzi, Mehdi.

Afiliación

Tseng SM; Department of Electronic Engineering, National Taipei University of Technology, Taipei 106335, Taiwan.
Yeh ZT; College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH 45219, USA.
Wu CY; Department of Electronic Engineering, National Taipei University of Technology, Taipei 106335, Taiwan.
Chang JB; College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH 45219, USA.
Norouzi M; College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH 45219, USA.

Sensors (Basel) ; 23(16)2023 Aug 09.

Article en En | MEDLINE | ID: mdl-37631590

RESUMEN

This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet's computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos.

Palabras clave

video chaptering; video scene detection; video structure analysis; video summarization; video temporal segmentation

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Sensors (Basel) Año: 2023 Tipo del documento: Article País de afiliación: Taiwán

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google