Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
J Acoust Soc Am ; 155(4): 2836-2848, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38682915

RESUMEN

This paper evaluates an innovative framework for spoken dialect density prediction on children's and adults' African American English. A speaker's dialect density is defined as the frequency with which dialect-specific language characteristics occur in their speech. Rather than treating the presence or absence of a target dialect in a user's speech as a binary decision, instead, a classifier is trained to predict the level of dialect density to provide a higher degree of specificity in downstream tasks. For this, self-supervised learning representations from HuBERT, handcrafted grammar-based features extracted from ASR transcripts, prosodic features, and other feature sets are experimented with as the input to an XGBoost classifier. Then, the classifier is trained to assign dialect density labels to short recorded utterances. High dialect density level classification accuracy is achieved for child and adult speech and demonstrated robust performance across age and regional varieties of dialect. Additionally, this work is used as a basis for analyzing which acoustic and grammatical cues affect machine perception of dialect.


Asunto(s)
Negro o Afroamericano , Acústica del Lenguaje , Humanos , Adulto , Niño , Masculino , Femenino , Medición de la Producción del Habla/métodos , Lenguaje , Preescolar , Adulto Joven , Percepción del Habla , Adolescente , Fonética , Lenguaje Infantil
2.
AMIA Jt Summits Transl Sci Proc ; 2023: 622-631, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37350923

RESUMEN

Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.

3.
J Biomed Inform ; 117: 103761, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33781918

RESUMEN

Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.


Asunto(s)
COVID-19/diagnóstico , Registros Electrónicos de Salud , Evaluación de Síntomas , Humanos , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural
4.
ArXiv ; 2021 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-33299904

RESUMEN

Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). In a secondary use application, we explored the prediction of COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information. The automatically extracted symptoms improve prediction performance, beyond structured data alone.

5.
J Biomed Inform ; 113: 103631, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33290878

RESUMEN

Social determinants of health (SDOH) affect health outcomes, and knowledge of SDOH can inform clinical decision-making. Automatically extracting SDOH information from clinical text requires data-driven information extraction models trained on annotated corpora that are heterogeneous and frequently include critical SDOH. This work presents a new corpus with SDOH annotations, a novel active learning framework, and the first extraction results on the new corpus. The Social History Annotation Corpus (SHAC) includes 4480 social history sections with detailed annotation for 12 SDOH characterizing the status, extent, and temporal information of 18K distinct events. We introduce a novel active learning framework that selects samples for annotation using a surrogate text classification task as a proxy for a more complex event extraction task. The active learning framework successfully increases the frequency of health risk factors and improves automatic extraction of these events over undirected annotation. An event extraction model trained on SHAC achieves high extraction performance for substance use status (0.82-0.93 F1), employment status (0.81-0.86 F1), and living status type (0.81-0.93 F1) on data from three institutions.


Asunto(s)
Determinantes Sociales de la Salud , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Factores de Riesgo
6.
Appl Clin Inform ; 9(4): 782-790, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30332689

RESUMEN

OBJECTIVE: Clinician progress notes are an important record for care and communication, but there is a perception that electronic notes take too long to write and may not accurately reflect the patient encounter, threatening quality of care. Automatic speech recognition (ASR) has the potential to improve clinical documentation process; however, ASR inaccuracy and editing time are barriers to wider use. We hypothesized that automatic text processing technologies could decrease editing time and improve note quality. To inform the development of these technologies, we studied how physicians create clinical notes using ASR and analyzed note content that is revised or added during asynchronous editing. MATERIALS AND METHODS: We analyzed a corpus of 649 dictated clinical notes from 9 physicians. Notes were dictated during rounds to portable devices, automatically transcribed, and edited later at the physician's convenience. Comparing ASR transcripts and the final edited notes, we identified the word sequences edited by physicians and categorized the edits by length and content. RESULTS: We found that 40% of the words in the final notes were added by physicians while editing: 6% corresponded to short edits associated with error correction and format changes, and 34% were associated with longer edits. Short error correction edits that affect note accuracy are estimated to be less than 3% of the words in the dictated notes. Longer edits primarily involved insertion of material associated with clinical data or assessment and plans. The longer edits improve note completeness; some could be handled with verbalized commands in dictation. CONCLUSION: Process interventions to reduce ASR documentation burden, whether related to technology or the dictation/editing workflow, should apply a portfolio of solutions to address all categories of required edits. Improved processes could reduce an important barrier to broader use of ASR by clinicians and improve note quality.


Asunto(s)
Registros Electrónicos de Salud , Médicos , Software de Reconocimiento del Habla , Humanos
7.
AMIA Annu Symp Proc ; 2018: 1395-1404, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30815184

RESUMEN

Substance abuse carries many negative health consequences. Detailed information about patients' substance abuse history is usually captured in free-text clinical notes. Automatic extraction of substance abuse information is vital to assess patients' risk for developing certain diseases and adverse outcomes. We introduce a novel neural architecture to automatically extract substance abuse information. The model, which uses multi-task learning, outperformed previous work and several baselines created using discrete models. The classifier obtained 0.88-0.95 F1 for detecting substance abuse status (current, none, past, unknown) on a withheld test set. Other substance abuse entities (amount, frequency, exposure history, quit history, and type) were also extracted with high-performance. Our results demonstrate the feasibility of extracting substance abuse information with little annotated data. Additionally, we used the neural multi-task model to automatically annotate 59.7K notes from a different source. Manual review of a subset of these notes resulted 0.84-0.89 precision for substance abuse status.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Detección de Abuso de Sustancias/métodos , Trastornos Relacionados con Sustancias/diagnóstico , Algoritmos , Femenino , Humanos , Masculino
8.
AMIA Annu Symp Proc ; 2017: 1186-1195, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29854187

RESUMEN

The use of automatic speech recognition (ASR) to create clinical notes has the potential to reduce costs associated with note creation for electronic medical records, but at current system accuracy levels, post-editing by practitioners is needed to ensure note quality. Aiming to reduce the time required to edit ASR transcripts, this paper investigates novel methods for automatic detection of edit regions within the transcripts, including both putative ASR errors but also regions that are targets for cleanup or rephrasing. We create detection models using logistic regression and conditional random field models, exploring a variety of text-based features that consider the structure of clinical notes and exploit the medical context. Different medical text resources are used to improve feature extraction. Experimental results on a large corpus of practitioner-edited clinical notes show that 67% of sentence-level edits and 45% of word-level edits can be detected with a false detection rate of 15%.


Asunto(s)
Registros Electrónicos de Salud , Software de Reconocimiento del Habla , Exactitud de los Datos , Modelos Logísticos , Registros Médicos
10.
J Acoust Soc Am ; 126(3): 1500-10, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19739763

RESUMEN

Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.


Asunto(s)
Patrones de Reconocimiento Fisiológico , Software de Reconocimiento del Habla , Habla , Inteligencia Artificial , Automatización , Bases de Datos como Asunto , Femenino , Análisis de Fourier , Humanos , Relaciones Interpersonales , Masculino , Reconocimiento de Normas Patrones Automatizadas , Proyectos Piloto , Lectura , Espectrografía del Sonido , Acústica del Lenguaje , Vocabulario
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...