Búsqueda | Portal de Búsqueda de la BVS Colombia

Improving Text-Independent Forced Alignment to Support Speech-Language Pathologists with Phonetic Transcription.

Li, Ying; Wohlan, Bryce Johannas; Pham, Duc-Son; Chan, Kit Yan; Ward, Roslyn; Hennessey, Neville; Tan, Tele.

Sensors (Basel) ; 23(24)2023 Dec 06.

Artículo en Inglés | MEDLINE | ID: mdl-38139496

RESUMEN

Problem: Phonetic transcription is crucial in diagnosing speech sound disorders (SSDs) but is susceptible to transcriber experience and perceptual bias. Current forced alignment (FA) tools, which annotate audio files to determine spoken content and its placement, often require manual transcription, limiting their effectiveness. Method: We introduce a novel, text-independent forced alignment model that autonomously recognises individual phonemes and their boundaries, addressing these limitations. Our approach leverages an advanced, pre-trained wav2vec 2.0 model to segment speech into tokens and recognise them automatically. To accurately identify phoneme boundaries, we utilise an unsupervised segmentation tool, UnsupSeg. Labelling of segments employs nearest-neighbour classification with wav2vec 2.0 labels, before connectionist temporal classification (CTC) collapse, determining class labels based on maximum overlap. Additional post-processing, including overfitting cleaning and voice activity detection, is implemented to enhance segmentation. Results: We benchmarked our model against existing methods using the TIMIT dataset for normal speakers and, for the first time, evaluated its performance on the TORGO dataset containing SSD speakers. Our model demonstrated competitive performance, achieving a harmonic mean score of 76.88% on TIMIT and 70.31% on TORGO. Implications: This research presents a significant advancement in the assessment and diagnosis of SSDs, offering a more objective and less biased approach than traditional methods. Our model's effectiveness, particularly with SSD speakers, opens new avenues for research and clinical application in speech pathology.

Asunto(s)

Percepción del Habla , Voz , Humanos , Fonética , Habla , Patólogos

Detection of dynamic background due to swaying movements from motion features.

Pham, Duc-Son; Arandjelovic, Ognjen; Venkatesh, Svetha.

IEEE Trans Image Process ; 24(1): 332-44, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25494505

RESUMEN

Dynamically changing background (dynamic background) still presents a great challenge to many motion-based video surveillance systems. In the context of event detection, it is a major source of false alarms. There is a strong need from the security industry either to detect and suppress these false alarms, or dampen the effects of background changes, so as to increase the sensitivity to meaningful events of interest. In this paper, we restrict our focus to one of the most common causes of dynamic background changes: 1) that of swaying tree branches and 2) their shadows under windy conditions. Considering the ultimate goal in a video analytics pipeline, we formulate a new dynamic background detection problem as a signal processing alternative to the previously described but unreliable computer vision-based approaches. Within this new framework, we directly reduce the number of false alarms by testing if the detected events are due to characteristic background motions. In addition, we introduce a new data set suitable for the evaluation of dynamic background detection. It consists of real-world events detected by a commercial surveillance system from two static surveillance cameras. The research question we address is whether dynamic background can be detected reliably and efficiently using simple motion features and in the presence of similar but meaningful events, such as loitering. Inspired by the tree aerodynamics theory, we propose a novel method named local variation persistence (LVP), that captures the key characteristics of swaying motions. The method is posed as a convex optimization problem, whose variable is the local variation. We derive a computationally efficient algorithm for solving the optimization problem, the solution of which is then used to form a powerful detection statistic. On our newly collected data set, we demonstrate that the proposed LVP achieves excellent detection results and outperforms the best alternative adapted from existing art in the dynamic background literature.

Efficient algorithms for robust recovery of images from compressed data.

Pham, Duc-Son; Venkatesh, Svetha.

IEEE Trans Image Process ; 22(12): 4724-37, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-23955755

RESUMEN

Compressed sensing (CS) is an important theory for sub-Nyquist sampling and recovery of compressible data. Recently, it has been extended to cope with the case where corruption to the CS data is modeled as impulsive noise. The new formulation, termed as robust CS, combines robust statistics and CS into a single framework to suppress outliers in the CS recovery. To solve the newly formulated robust CS problem, a scheme that iteratively solves a number of CS problems--the solutions from which provably converge to the true robust CS solution--is suggested. This scheme is, however, rather inefficient as it has to use existing CS solvers as a proxy. To overcome limitations with the original robust CS algorithm, we propose in this paper more computationally efficient algorithms by following latest advances in large-scale convex optimization for nonsmooth regularization. Furthermore, we also extend the robust CS formulation to various settings, including additional affine constraints, l1-norm loss function, mix-norm regularization, and multitasking, so as to further improve robust CS and derive simple but effective algorithms to solve these extensions. We demonstrate that the new algorithms provide much better computational advantage over the original robust CS method on the original robust CS formulation, and effectively solve more sophisticated extensions where the original methods simply cannot. We demonstrate the usefulness of the extensions on several imaging tasks.

Improved image recovery from compressed data contaminated with impulsive noise.

Pham, Duc-Son; Venkatesh, Svetha.

IEEE Trans Image Process ; 21(1): 397-405, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21914571

RESUMEN

Compressed sensing (CS) is a new information sampling theory for acquiring sparse or compressible data with much fewer measurements than those otherwise required by the Nyquist/Shannon counterpart. This is particularly important for some imaging applications such as magnetic resonance imaging or in astronomy. However, in the existing CS formulation, the use of the l(2) norm on the residuals is not particularly efficient when the noise is impulsive. This could lead to an increase in the upper bound of the recovery error. To address this problem, we consider a robust formulation for CS to suppress outliers in the residuals. We propose an iterative algorithm for solving the robust CS problem that exploits the power of existing CS solvers. We also show that the upper bound on the recovery error in the case of non-Gaussian noise is reduced and then demonstrate the efficacy of the method through numerical studies.

Asunto(s)

Algoritmos , Artefactos , Compresión de Datos/métodos , Interpretación Estadística de Datos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA