RESUMO
Background: Systematic measurement of conversational features in the natural clinical setting is essential to better understand, disseminate, and incentivize high quality serious illness communication. Advances in machine-learning (ML) classification of human speech offer exceptional opportunity to complement human coding (HC) methods for measurement in large scale studies. Objectives: To test the reliability, efficiency, and sensitivity of a tandem ML-HC method for identifying one feature of clinical importance in serious illness conversations: Connectional Silence. Design: This was a cross-sectional analysis of 354 audio-recorded inpatient palliative care consultations from the Palliative Care Communication Research Initiative multisite cohort study. Setting/Subjects: Hospitalized people with advanced cancer. Measurements: We created 1000 brief audio "clips" of randomly selected moments predicted by a screening ML algorithm to be two-second or longer pauses in conversation. Each clip included 10 seconds of speaking before and 5 seconds after each pause. Two HCs independently evaluated each clip for Connectional Silence as operationalized from conceptual taxonomies of silence in serious illness conversations. HCs also evaluated 100 minutes from 10 additional conversations having unique speakers to identify how frequently the ML screening algorithm missed episodes of Connectional Silence. Results:Connectional Silences were rare (5.5%) among all two-second or longer pauses in palliative care conversations. Tandem ML-HC demonstrated strong reliability (kappa 0.62; 95% confidence interval: 0.47-0.76). HC alone required 61% more time than the Tandem ML-HC method. No Connectional Silences were missed by the ML screening algorithm. Conclusions: Tandem ML-HC methods are reliable, efficient, and sensitive for identifying Connectional Silence in serious illness conversations.