Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores.

Espejo, Jose Manuel Rivera; De Maeyer, Sven; Gillis, Steven

Espejo, Jose Manuel Rivera; De Maeyer, Sven; Gillis, Steven.

Afiliação

Espejo JMR; Faculty of Social Sciences, Department of Training and Education Sciences, Antwerp University, Antwerp, Belgium. josemanuel.riveraespejo@uantwerpen.be.
De Maeyer S; Faculty of Social Sciences, Department of Training and Education Sciences, Antwerp University, Antwerp, Belgium.
Gillis S; Computational Linguistics and Psycholinguistics Research Centre CLIPS, University of Antwerp, Antwerp, Belgium.

Behav Res Methods ; 2024 Jul 24.

Article em En | MEDLINE | ID: mdl-39048860

ABSTRACT

ABSTRACT

When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.

Palavras-chave

Bayesian analysis; Bounded outcomes; Clustering; Generalized linear latent and mixed models; Heteroscedasticity; Measurement error; Outliers; Robust regression models; Speech intelligibility

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article