RESUMO
BACKGROUND: There are limited data on the accuracy of cloud-based speech recognition (SR) open application programming interfaces (APIs) for medical terminology. This study aimed to evaluate the medical term recognition accuracy of current available cloud-based SR open APIs in Korean. METHODS: We analyzed the SR accuracy of currently available cloud-based SR open APIs using real doctor-patient conversation recordings collected from an outpatient clinic at a large tertiary medical center in Korea. For each original and SR transcription, we analyzed the accuracy rate of each cloud-based SR open API (i.e., the number of medical terms in the SR transcription per number of medical terms in the original transcription). RESULTS: A total of 112 doctor-patient conversation recordings were converted with three cloud-based SR open APIs (Naver Clova SR from Naver Corporation; Google Speech-to-Text from Alphabet Inc.; and Amazon Transcribe from Amazon), and each transcription was compared. Naver Clova SR (75.1%) showed the highest accuracy with the recognition of medical terms compared to the other open APIs (Google Speech-to-Text, 50.9%, P < 0.001; Amazon Transcribe, 57.9%, P < 0.001), and Amazon Transcribe demonstrated higher recognition accuracy compared to Google Speech-to-Text (P < 0.001). In the sub-analysis, Naver Clova SR showed the highest accuracy in all areas according to word classes, but the accuracy of words longer than five characters showed no statistical differences (Naver Clova SR, 52.6%; Google Speech-to-Text, 56.3%; Amazon Transcribe, 36.6%). CONCLUSION: Among three current cloud-based SR open APIs, Naver Clova SR which manufactured by Korean company showed highest accuracy of medical terms in Korean, compared to Google Speech-to-Text and Amazon Transcribe. Although limitations are existing in the recognition of medical terminology, there is a lot of rooms for improvement of this promising technology by combining strengths of each SR engines.
Assuntos
Percepção da Fala , Fala , Computação em Nuvem , Comunicação , Humanos , SoftwareRESUMO
Background: The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model's comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non-English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes. Objective: In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents. Methods: Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks. Results: The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with unknown ([UNK]) tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41. Conclusions: This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.
Assuntos
Algoritmos , Multilinguismo , Processamento de Linguagem Natural , Humanos , República da Coreia , Registros Eletrônicos de Saúde , Idioma , Mineração de Dados/métodosRESUMO
One of the artificial intelligence applications in the biomedical field is knowledge-intensive question-answering. As domain expertise is particularly crucial in this field, we propose a method for efficiently infusing biomedical knowledge into pretrained language models, ultimately targeting biomedical question-answering. Transferring all semantics of a large knowledge graph into the entire model requires too many parameters, increasing computational cost and time. We investigate an efficient approach that leverages adapters to inject Unified Medical Language System knowledge into pretrained language models, and we question the need to use all semantics in the knowledge graph. This study focuses on strategies of partitioning knowledge graph and either discarding or merging some for more efficient pretraining. According to the results of three biomedical question answering finetuning datasets, the adapters pretrained on semantically partitioned group showed more efficient performance in terms of evaluation metrics, required parameters, and time. The results also show that discarding groups with fewer concepts is a better direction for small datasets, and merging these groups is better for large dataset. Furthermore, the metric results show a slight improvement, demonstrating that the adapter methodology is rather insensitive to the group formulation.