Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 55
Filter
1.
Neural Netw ; 139: 105-117, 2021 Jul.
Article in English | MEDLINE | ID: mdl-33684609

ABSTRACT

Recently, we have witnessed Deep Learning methodologies gaining significant attention for severity-based classification of dysarthric speech. Detecting dysarthria, quantifying its severity, are of paramount importance in various real-life applications, such as the assessment of patients' progression in treatments, which includes an adequate planning of their therapy and the improvement of speech-based interactive systems in order to handle pathologically-affected voices automatically. Notably, current speech-powered tools often deal with short-duration speech segments and, consequently, are less efficient in dealing with impaired speech, even by using Convolutional Neural Networks (CNNs). Thus, detecting dysarthria severity-level based on short speech segments might help in improving the performance and applicability of those systems. To achieve this goal, we propose a novel Residual Network (ResNet)-based technique which receives short-duration speech segments as input. Statistically meaningful objective analysis of our experiments, reported over standard Universal Access corpus, exhibits average values of 21.35% and 22.48% improvement, compared to the baseline CNN, in terms of classification accuracy and F1-score, respectively. For additional comparisons, tests with Gaussian Mixture Models and Light CNNs were also performed. Overall, the values of 98.90% and 98.00% for classification accuracy and F1-score, respectively, were obtained with the proposed ResNet approach, confirming its efficacy and reassuring its practical applicability.


Subject(s)
Dysarthria/classification , Dysarthria/diagnosis , Neural Networks, Computer , Severity of Illness Index , Speech Recognition Software , Humans , Normal Distribution , Speech/physiology , Speech Recognition Software/standards , Time Factors
3.
Rev. Investig. Innov. Cienc. Salud ; 3(2): 98-118, 2021. ilus
Article in Spanish | LILACS, COLNAL | ID: biblio-1392911

ABSTRACT

La acústica forense es una disciplina de la criminalística que ha alcanzado una ma-durez analítica que obliga a que el perito en análisis de voz se especialice en adquirir conocimientos en fonética, tecnologías de sonido, habla, voz, lenguaje, patologías del habla y la voz, así como procesamiento de la señal sonora. Cuando un dictamen deba ser realizado por un profesional de la salud completamente ajeno a la técnica legal, se tropieza con una falta de protocolos, métodos y procedimientos de trabajo que le permitan entregar un informe técnico, válido y validado para la realización de una entrevista y su posterior análisis comparativo de voces, lo que promueve la necesidad de elaborar una ruta o guía metodológica a través de medios académicos físicos o electrónicos para el desarrollo de este conocimiento y su difusión profesional y científica


Forensic acoustics is a criminalistics discipline that has reached an analytical maturity that requires the expert in voice analysis to specialize in acquiring knowledge in pho-netics, sound technologies, speech, voice, language, speech, and voice pathologies, as well as sound signal processing. When an opinion must be made by a health profes-sional completely unrelated to the legal technique, he encounters a lack of protocols, methods, and work procedures that allow him to deliver a technical, valid, and vali-dated report for conducting an interview and its subsequent comparative analysis of voices, which promotes the need to develop a methodological route or guide through physical or electronic academic means for the development of this knowledge and its professional and scientific dissemination


Subject(s)
Speech Recognition Software , Voice Recognition , Voice , Voice Quality/physiology , Speech Recognition Software/standards , Dysarthria , Voice Recognition/physiology
4.
Curr Alzheimer Res ; 17(7): 658-666, 2020.
Article in English | MEDLINE | ID: mdl-33032509

ABSTRACT

BACKGROUND: Current conventional cognitive assessments are limited in their efficiency and sensitivity, often relying on a single score such as the total correct items. Typically, multiple features of response go uncaptured. OBJECTIVES: We aim to explore a new set of automatically derived features from the Digit Span (DS) task that address some of the drawbacks in the conventional scoring and are also useful for distinguishing subjects with Mild Cognitive Impairment (MCI) from those with intact cognition. METHODS: Audio-recordings of the DS tests administered to 85 subjects (22 MCI and 63 healthy controls, mean age 90.2 years) were transcribed using an Automatic Speech Recognition (ASR) system. Next, five correctness measures were generated from Levenshtein distance analysis of responses: number correct, incorrect, deleted, inserted, and substituted words compared to the test item. These per-item features were aggregated across all test items for both Forward Digit Span (FDS) and Backward Digit Span (BDS) tasks using summary statistical functions, constructing a global feature vector representing the detailed assessment of each subject's response. A support vector machine classifier distinguished MCI from cognitively intact participants. RESULTS: Conventional DS scores did not differentiate MCI participants from controls. The automated multi-feature DS-derived metric achieved 73% on AUC-ROC of the SVM classifier, independent of additional clinical features (77% when combined with demographic features of subjects); well above chance, 50%. CONCLUSION: Our analysis verifies the effectiveness of introduced measures, solely derived from the DS task, in the context of differentiating subjects with MCI from those with intact cognition.


Subject(s)
Cognitive Dysfunction/diagnosis , Cognitive Dysfunction/psychology , Diagnosis, Computer-Assisted/methods , Neuropsychological Tests , Proof of Concept Study , Speech Recognition Software , Aged , Aged, 80 and over , Cognitive Dysfunction/physiopathology , Diagnosis, Computer-Assisted/standards , Diagnosis, Differential , Female , Humans , Male , Neuropsychological Tests/standards , Speech Recognition Software/standards , Tape Recording/methods , Tape Recording/standards
5.
J Med Internet Res ; 22(6): e14827, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32442129

ABSTRACT

BACKGROUND: Recent advances in natural language processing and artificial intelligence have led to widespread adoption of speech recognition technologies. In consumer health applications, speech recognition is usually applied to support interactions with conversational agents for data collection, decision support, and patient monitoring. However, little is known about the use of speech recognition in consumer health applications and few studies have evaluated the efficacy of conversational agents in the hands of consumers. In other consumer-facing tools, cognitive load has been observed to be an important factor affecting the use of speech recognition technologies in tasks involving problem solving and recall. Users find it more difficult to think and speak at the same time when compared to typing, pointing, and clicking. However, the effects of speech recognition on cognitive load when performing health tasks has not yet been explored. OBJECTIVE: The aim of this study was to evaluate the use of speech recognition for documentation in consumer digital health tasks involving problem solving and recall. METHODS: Fifty university staff and students were recruited to undertake four documentation tasks with a simulated conversational agent in a computer laboratory. The tasks varied in complexity determined by the amount of problem solving and recall required (simple and complex) and the input modality (speech recognition vs keyboard and mouse). Cognitive load, task completion time, error rate, and usability were measured. RESULTS: Compared to using a keyboard and mouse, speech recognition significantly increased the cognitive load for complex tasks (Z=-4.08, P<.001) and simple tasks (Z=-2.24, P=.03). Complex tasks took significantly longer to complete (Z=-2.52, P=.01) and speech recognition was found to be overall less usable than a keyboard and mouse (Z=-3.30, P=.001). However, there was no effect on errors. CONCLUSIONS: Use of a keyboard and mouse was preferable to speech recognition for complex tasks involving problem solving and recall. Further studies using a broader variety of consumer digital health tasks of varying complexity are needed to investigate the contexts in which use of speech recognition is most appropriate. The effects of cognitive load on task performance and its significance also need to be investigated.


Subject(s)
Consumer Health Informatics/methods , Laboratories/standards , Problem Solving/physiology , Speech Recognition Software/standards , Adolescent , Adult , Female , Humans , Male , Middle Aged , Young Adult
6.
J Acoust Soc Am ; 146(3): 1615, 2019 09.
Article in English | MEDLINE | ID: mdl-31590492

ABSTRACT

Speech (syllable) rate estimation typically involves computing a feature contour based on sub-band energies having strong local maxima/peaks at syllable nuclei, which are detected with the help of voicing decisions (VDs). While such a two-stage scheme works well in clean conditions, the estimated speech rate becomes less accurate in noisy condition particularly due to erroneous VDs and non-informative sub-bands mainly at low signal-to-noise ratios (SNR). This work proposes a technique to use VDs in the peak detection strategy in an SNR dependent manner. It also proposes a data-driven sub-band pruning technique to improve syllabic peaks of the feature contour in the presence of noise. Further, this paper generalizes both the peak detection and the sub-band pruning technique for unknown noise and/or unknown SNR conditions. Experiments are performed in clean and 20, 10, and 0 dB SNR conditions separately using Switchboard, TIMIT, and CTIMIT corpora under five additive noises: white, car, high-frequency-channel, cockpit, and babble. Experiments are also carried out in test conditions at unseen SNRs of -5 and 5 dB with four unseen additive noises: factory, sub-way, street, and exhibition. The proposed method outperforms the best of the existing techniques in clean and noisy conditions for three corpora.


Subject(s)
Speech Recognition Software/standards , Signal-To-Noise Ratio , Speech Acoustics , Voice
7.
J Speech Lang Hear Res ; 62(7): 2203-2212, 2019 07 15.
Article in English | MEDLINE | ID: mdl-31200617

ABSTRACT

Purpose The application of Chinese Mandarin electrolaryngeal (EL) speech for laryngectomees has been limited by its drawbacks such as single fundamental frequency, mechanical sound, and large radiation noise. To improve the intelligibility of Chinese Mandarin EL speech, a new perspective using the automatic speech recognition (ASR) system was proposed, which can convert EL speech into healthy speech, if combined with text-to-speech. Method An ASR system was designed to recognize EL speech based on a deep learning model WaveNet and the connectionist temporal classification (WaveNet-CTC). This system mainly consists of 3 parts: the acoustic model, the language model, and the decoding model. The acoustic features are extracted during speech preprocessing, and 3,230 utterances of EL speech mixed with 10,000 utterances of healthy speech are used to train the ASR system. Comparative experiment was designed to evaluate the performance of the proposed method. Results The results show that the proposed ASR system has higher stability and generalizability compared with the traditional methods, manifesting superiority in terms of Chinese characters, Chinese words, short sentences, and long sentences. Phoneme confusion occurs more easily in the stop and affricate of EL speech than the healthy speech. However, the highest accuracy of the ASR could reach 83.24% when 3,230 utterances of EL speech were used to train the ASR system. Conclusions This study indicates that EL speech could be recognized effectively by the ASR based on WaveNet-CTC. This proposed method has a higher generalization performance and better stability than the traditional methods. A higher accuracy of the ASR system based on WaveNet-CTC can be obtained, which means that EL speech can be converted into healthy speech. Supplemental Material https://doi.org/10.23641/asha.8250830.


Subject(s)
Speech Intelligibility/physiology , Speech Recognition Software/standards , Speech, Alaryngeal , China , Deep Learning , Humans , Larynx, Artificial , Models, Theoretical , Phonetics , Speech Acoustics
8.
J Acoust Soc Am ; 145(3): 1493, 2019 03.
Article in English | MEDLINE | ID: mdl-31067946

ABSTRACT

The effects on speech intelligibility and sound quality of two noise-reduction algorithms were compared: a deep recurrent neural network (RNN) and spectral subtraction (SS). The RNN was trained using sentences spoken by a large number of talkers with a variety of accents, presented in babble. Different talkers were used for testing. Participants with mild-to-moderate hearing loss were tested. Stimuli were given frequency-dependent linear amplification to compensate for the individual hearing losses. A paired-comparison procedure was used to compare all possible combinations of three conditions. The conditions were: speech in babble with no processing (NP) or processed using the RNN or SS. In each trial, the same sentence was played twice using two different conditions. The participants indicated which one was better and by how much in terms of speech intelligibility and (in separate blocks) sound quality. Processing using the RNN was significantly preferred over NP and over SS processing for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. SS processing was not significantly preferred over NP for either subjective intelligibility or sound quality. Objective computational measures of speech intelligibility predicted better intelligibility for RNN than for SS or NP.


Subject(s)
Speech Intelligibility , Speech Recognition Software/standards , Aged , Female , Hearing Aids/standards , Humans , Male , Middle Aged , Neural Networks, Computer , Speech Perception
9.
J Acoust Soc Am ; 145(3): 1640, 2019 03.
Article in English | MEDLINE | ID: mdl-31067961

ABSTRACT

Hearing impaired persons, and particularly hearing-aid and cochlear implant (CI) users, often have difficulties communicating over the telephone. The intelligibility of classical so-called narrowband telephone speech is considerably lower than the intelligibility of face-to-face speech. This is partly because of the lack of visual cues, limited telephone bandwidth, and background noise. This work proposes to artificially extend the standard bandwidth of telephone speech to improve its intelligibility for CI users. Artificial speech bandwidth extension (ABE) is obtained through a front-end signal processing algorithm that estimates missing speech components in the high-frequency spectrum from learned data. A state-of-the-art ABE approach, which already led to superior speech quality for people with normal hearing, is used for processing telephone speech for CI users. Two different parameterizations are evaluated, one being more aggressive than the other. Nine CI users were tested with and without the proposed ABE algorithm. The experimental evaluation shows a significant improvement in speech intelligibility and speech quality over the phone for both versions of the ABE algorithm. These promising results support the potential of ABE, which could be incorporated into a commercial speech processor or a smartphone-based pre-processor that streams the telephone speech to the CI.


Subject(s)
Cochlear Implants/standards , Speech Acoustics , Speech Intelligibility , Speech Recognition Software/standards , Telephone , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged
10.
J Acoust Soc Am ; 145(1): 338, 2019 01.
Article in English | MEDLINE | ID: mdl-30710939

ABSTRACT

This paper describes a vision-referential speech enhancement of an audio signal using mask information captured as visual data. Smartphones and tablet devices have become popular in recent years. Most of them not only have a microphone but also a camera. Although the frame rate of the camera in such devices is very low compared to the audio signal from the microphone, it will be useful to enhance the speech signal if both signals are used adequately. In the proposed method, the speaker broadcasts not only his/her speech signal through a loudspeaker but also its mask information through a display. The receiver can enhance the speech combining the speech signal captured by the microphone and the reference signal captured by the camera. Some experiments were conducted to evaluate the effectiveness of the proposed method compared to a typical sparse approach. It was confirmed that the speech could be enhanced even when there were different kinds of noise and a high level of real noise in the environments. Experiments were also conducted to check the sound quality of the proposed method. They were compared to clear audio data compressed with various bps mp3 format. The sound quality was sufficient for practical application.


Subject(s)
Image Processing, Computer-Assisted/methods , Natural Language Processing , Speech Recognition Software/standards , Adult , Female , Humans , Image Processing, Computer-Assisted/standards , Male , Signal-To-Noise Ratio , Speech Perception
11.
J Acoust Soc Am ; 145(1): 131, 2019 01.
Article in English | MEDLINE | ID: mdl-30710945

ABSTRACT

The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic speech analysis and system design has focused on single-talker speech or multi-talker speech with overlapping talkers (for example, the cocktail party effect). There has been much less focus on how listeners detect a change in talker or in probing the acoustic features significant in characterizing a talker's voice in conversational speech. This study examines human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Human reaction times in this task can be well-estimated by a model of the acoustic feature distance among speech segments before and after a change in talker, with estimation improving for models incorporating longer durations of speech prior to a talker change. Further, human performance is superior to several online and offline state-of-the-art machine TCD systems.


Subject(s)
Natural Language Processing , Speech Perception , Adult , Female , Humans , Male , Psychoacoustics , Speech Intelligibility , Speech Recognition Software/standards
12.
IEEE Trans Neural Netw Learn Syst ; 30(1): 138-150, 2019 01.
Article in English | MEDLINE | ID: mdl-29993561

ABSTRACT

Inspired by the behavior of humans talking in noisy environments, we propose an embodied embedded cognition approach to improve automatic speech recognition (ASR) systems for robots in challenging environments, such as with ego noise, using binaural sound source localization (SSL). The approach is verified by measuring the impact of SSL with a humanoid robot head on the performance of an ASR system. More specifically, a robot orients itself toward the angle where the signal-to-noise ratio (SNR) of speech is maximized for one microphone before doing an ASR task. First, a spiking neural network inspired by the midbrain auditory system based on our previous work is applied to calculate the sound signal angle. Then, a feedforward neural network is used to handle high levels of ego noise and reverberation in the signal. Finally, the sound signal is fed into an ASR system. For ASR, we use a system developed by our group and compare its performance with and without the support from SSL. We test our SSL and ASR systems on two humanoid platforms with different structural and material properties. With our approach we halve the sentence error rate with respect to the common downmixing of both channels. Surprisingly, the ASR performance is more than two times better when the angle between the humanoid head and the sound source allows sound waves to be reflected most intensely from the pinna to the ear microphone, rather than when sound waves arrive perpendicularly to the membrane.


Subject(s)
Biomimetics/methods , Robotics/methods , Sound Localization , Speech Perception , Speech Recognition Software , Biomimetics/standards , Humans , Robotics/standards , Sound Localization/physiology , Speech Perception/physiology , Speech Recognition Software/standards , Virtual Reality
13.
Int J Med Inform ; 121: 39-52, 2019 01.
Article in English | MEDLINE | ID: mdl-30545488

ABSTRACT

The overall purpose of automatic speech recognition systems is to make possible the interaction between humans and electronic devices through speech. For example, the content captured from user's speech using a microphone can be transcribed into text. In general, such systems should be able to overcome adversities such as noise, communication channel variability, speaker's age and accent, speech speed, concurrent speeches from other speakers and spontaneous speech. Despite this challenging scenario, this study aims to develop a Web System Prototype to generate medical reports through automatic speech recognition in the Brazilian Portuguese language. The prototype was developed by applying a Software Engineering technique named Delivery in Stage. During the conduction of this technique, we integrated the Google Web Speech API and Microsoft Bing Speech API into the prototype to increase the number of compatible platforms. These automatic speech recognition systems were individually evaluated in the task of transcribing the dictation of a medical area text by 30 volunteers. The recognition performance was evaluated according to the Word Error Rate measure. The Google system achieved an error rate of 12.30%, which was statistically significantly better (p-value <0.0001) than the Microsoft one: 17.68%. Conducting this work allowed us to conclude that these automatic speech recognition systems are compatible with the prototype and can be used in the medical field. The findings also suggest that, besides supporting medical reports construction, the Web System Prototype can be useful for purposes such as recording physicians' notes during a clinical procedure.


Subject(s)
Documentation/methods , Internet/statistics & numerical data , Medical Errors/prevention & control , Medical Records Systems, Computerized/standards , Software , Speech Recognition Software/standards , Speech/physiology , Adult , Brazil , Female , Humans , Male , Middle Aged , Young Adult
14.
JAMA Netw Open ; 1(3): e180530, 2018 07.
Article in English | MEDLINE | ID: mdl-30370424

ABSTRACT

IMPORTANCE: Accurate clinical documentation is critical to health care quality and safety. Dictation services supported by speech recognition (SR) technology and professional medical transcriptionists are widely used by US clinicians. However, the quality of SR-assisted documentation has not been thoroughly studied. OBJECTIVE: To identify and analyze errors at each stage of the SR-assisted dictation process. DESIGN SETTING AND PARTICIPANTS: This cross-sectional study collected a stratified random sample of 217 notes (83 office notes, 75 discharge summaries, and 59 operative notes) dictated by 144 physicians between January 1 and December 31, 2016, at 2 health care organizations using Dragon Medical 360 | eScription (Nuance). Errors were annotated in the SR engine-generated document (SR), the medical transcriptionist-edited document (MT), and the physician's signed note (SN). Each document was compared with a criterion standard created from the original audio recordings and medical record review. MAIN OUTCOMES AND MEASURES: Error rate; mean errors per document; error frequency by general type (eg, deletion), semantic type (eg, medication), and clinical significance; and variations by physician characteristics, note type, and institution. RESULTS: Among the 217 notes, there were 144 unique dictating physicians: 44 female (30.6%) and 10 unknown sex (6.9%). Mean (SD) physician age was 52 (12.5) years (median [range] age, 54 [28-80] years). Among 121 physicians for whom specialty information was available (84.0%), 35 specialties were represented, including 45 surgeons (37.2%), 30 internists (24.8%), and 46 others (38.0%). The error rate in SR notes was 7.4% (ie, 7.4 errors per 100 words). It decreased to 0.4% after transcriptionist review and 0.3% in SNs. Overall, 96.3% of SR notes, 58.1% of MT notes, and 42.4% of SNs contained errors. Deletions were most common (34.7%), then insertions (27.0%). Among errors at the SR, MT, and SN stages, 15.8%, 26.9%, and 25.9%, respectively, involved clinical information, and 5.7%, 8.9%, and 6.4%, respectively, were clinically significant. Discharge summaries had higher mean SR error rates than other types (8.9% vs 6.6%; difference, 2.3%; 95% CI, 1.0%-3.6%; P < .001). Surgeons' SR notes had lower mean error rates than other physicians' (6.0% vs 8.1%; difference, 2.2%; 95% CI, 0.8%-3.5%; P = .002). One institution had a higher mean SR error rate (7.6% vs 6.6%; difference, 1.0%; 95% CI, -0.2% to 2.8%; P = .10) but lower mean MT and SN error rates (0.3% vs 0.7%; difference, -0.3%; 95% CI, -0.63% to -0.04%; P = .03 and 0.2% vs 0.6%; difference, -0.4%; 95% CI, -0.7% to -0.2%; P = .003). CONCLUSIONS AND RELEVANCE: Seven in 100 words in SR-generated documents contain errors; many errors involve clinical information. That most errors are corrected before notes are signed demonstrates the importance of manual review, quality assurance, and auditing.


Subject(s)
Medical Errors/statistics & numerical data , Medical Records/statistics & numerical data , Medical Records/standards , Speech Recognition Software/statistics & numerical data , Speech Recognition Software/standards , Adult , Aged , Aged, 80 and over , Boston , Clinical Audit , Colorado , Cross-Sectional Studies , Female , Humans , Male , Medical Records Systems, Computerized , Middle Aged , Physicians
15.
J Digit Imaging ; 31(5): 615-621, 2018 10.
Article in English | MEDLINE | ID: mdl-29713836

ABSTRACT

The aim of this study was to analyze retrospectively the influence of different acoustic and language models in order to determine the most important effects to the clinical performance of an Estonian language-based non-commercial radiology-oriented automatic speech recognition (ASR) system. An ASR system was developed for Estonian language in radiology domain by utilizing open-source software components (Kaldi toolkit, Thrax). The ASR system was trained with the real radiology text reports and dictations collected during development phases. The final version of the ASR system was tested by 11 radiologists who dictated 219 reports in total, in spontaneous manner in a real clinical environment. The audio files collected in the final phase were used to measure the performance of different versions of the ASR system retrospectively. ASR system versions were evaluated by word error rate (WER) for each speaker and modality and by WER difference for the first and the last version of the ASR system. Total average WER for the final version throughout all material was improved from 18.4% of the first version (v1) to 5.8% of the last (v8) version which corresponds to relative improvement of 68.5%. WER improvement was strongly related to modality and radiologist. In summary, the performance of the final ASR system version was close to optimal, delivering similar results to all modalities and being independent on user, the complexity of the radiology reports, user experience, and speech characteristics.


Subject(s)
Language , Radiology , Speech Recognition Software/standards , Estonia , Humans , Reproducibility of Results , Retrospective Studies
16.
Behav Res Methods ; 50(6): 2597-2605, 2018 12.
Article in English | MEDLINE | ID: mdl-29687235

ABSTRACT

Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65-94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants' recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants' prior responses.


Subject(s)
Behavioral Research/methods , Behavioral Research/standards , Mental Recall , Speech Recognition Software/standards , Speech , Adolescent , Female , Humans , Male , Young Adult
17.
J Med Syst ; 42(5): 89, 2018 Apr 03.
Article in English | MEDLINE | ID: mdl-29610981

ABSTRACT

Speech recognition is increasingly used in medical reporting. The aim of this article is to identify in the literature the strengths and weaknesses of this technology, as well as barriers to and facilitators of its implementation. A systematic review of systematic reviews was performed using PubMed, Scopus, the Cochrane Library and the Center for Reviews and Dissemination through August 2017. The gray literature has also been consulted. The quality of systematic reviews has been assessed with the AMSTAR checklist. The main inclusion criterion was use of speech recognition for medical reporting (front-end or back-end). A survey has also been conducted in Quebec, Canada, to identify the dissemination of this technology in this province, as well as the factors leading to the success or failure of its implementation. Five systematic reviews were identified. These reviews indicated a high level of heterogeneity across studies. The quality of the studies reported was generally poor. Speech recognition is not as accurate as human transcription, but it can dramatically reduce turnaround times for reporting. In front-end use, medical doctors need to spend more time on dictation and correction than required with human transcription. With speech recognition, major errors occur up to three times more frequently. In back-end use, a potential increase in productivity of transcriptionists was noted. In conclusion, speech recognition offers several advantages for medical reporting. However, these advantages are countered by an increased burden on medical doctors and by risks of additional errors in medical reports. It is also hard to identify for which medical specialties and which clinical activities the use of speech recognition will be the most beneficial.


Subject(s)
Medical Records/standards , Speech Recognition Software/standards , Humans , Quebec
19.
Health Informatics J ; 23(1): 3-13, 2017 03.
Article in English | MEDLINE | ID: mdl-26635322

ABSTRACT

Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software-generated reports from 147 different radiologists and proofread them for errors. Errors were classified as "material" if they were believed to alter interpretation of the report. "Immaterial" errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors ( p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties ( p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time ( p < .001), which suggests that a quality control program with regular feedback may reduce errors.


Subject(s)
Radiology Information Systems/standards , Research Design/statistics & numerical data , Research Report/standards , Semantics , Speech Recognition Software/standards , Cross-Sectional Studies , Documentation/methods , Documentation/standards , Documentation/statistics & numerical data , Humans , Radiologists/standards , Radiologists/statistics & numerical data , Radiology Information Systems/statistics & numerical data , Retrospective Studies , Speech Recognition Software/statistics & numerical data
20.
BMC Med Inform Decis Mak ; 16(1): 132, 2016 10 18.
Article in English | MEDLINE | ID: mdl-27756284

ABSTRACT

BACKGROUND: Speech recognition software might increase productivity in clinical documentation. However, low user satisfaction with speech recognition software has been observed. In this case study, an approach for implementing a speech recognition software package at a university-based outpatient department is presented. METHODS: Methods to create a specific dictionary for the context "sports medicine" and a shared vocabulary learning function are demonstrated. The approach is evaluated for user satisfaction (using a questionnaire before and 10 weeks after software implementation) and its impact on the time until the final medical document was saved into the system. RESULTS: As a result of implementing speech recognition software, the user satisfaction was not remarkably impaired. The median time until the final medical document was saved was reduced from 8 to 4 days. CONCLUSION: In summary, this case study illustrates how speech recognition can be implemented successfully when the user experience is emphasised.


Subject(s)
Hospital Departments/methods , Medical Informatics Applications , Outpatients , Speech Recognition Software/standards , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...