Pesquisa | BVS Integralidade em Saúde

1.

Cognitive Determinants of Dysarthria in Parkinson's Disease: An Automated Machine Learning Approach.

García, Adolfo M; Arias-Vergara, Tomás; C Vasquez-Correa, Juan; Nöth, Elmar; Schuster, Maria; Welch, Ariane E; Bocanegra, Yamile; Baena, Ana; Orozco-Arroyave, Juan R.

Mov Disord ; 36(12): 2862-2873, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34390508

RESUMO

BACKGROUND: Dysarthric symptoms in Parkinson's disease (PD) vary greatly across cohorts. Abundant research suggests that such heterogeneity could reflect subject-level and task-related cognitive factors. However, the interplay of these variables during motor speech remains underexplored, let alone by administering validated materials to carefully matched samples with varying cognitive profiles and combining automated tools with machine learning methods. OBJECTIVE: We aimed to identify which speech dimensions best identify patients with PD in cognitively heterogeneous, cognitively preserved, and cognitively impaired groups through tasks with low (reading) and high (retelling) processing demands. METHODS: We used support vector machines to analyze prosodic, articulatory, and phonemic identifiability features. Patient groups were compared with healthy control subjects and against each other in both tasks, using each measure separately and in combination. RESULTS: Relative to control subjects, patients in cognitively heterogeneous and cognitively preserved groups were best discriminated by combined dysarthric signs during reading (accuracy = 84% and 80.2%). Conversely, patients with cognitive impairment were maximally discriminated from control subjects when considering phonemic identifiability during retelling (accuracy = 86.9%). This same pattern maximally distinguished between cognitively spared and impaired patients (accuracy = 72.1%). Also, cognitive (executive) symptom severity was predicted by prosody in cognitively preserved patients and by phonemic identifiability in cognitively heterogeneous and impaired groups. No measure predicted overall motor dysfunction in any group. CONCLUSIONS: Predominant dysarthric symptoms appear to be best captured through undemanding tasks in cognitively heterogeneous and preserved cohorts and through cognitively loaded tasks in patients with cognitive impairment. Further applications of this framework could enhance dysarthria assessments in PD. © 2021 International Parkinson and Movement Disorder Society.

Assuntos

Disfunção Cognitiva , Doença de Parkinson , Cognição , Disartria/diagnóstico , Disartria/etiologia , Humanos , Aprendizado de Máquina , Fala

2.

Automatic boost articulation therapy in adults with dysarthria: Acceptability, usability and user interaction.

Mendoza Ramos, Viviana; Vasquez-Correa, Juan C; Cremers, Rani; Van Den Steen, Leen; Nöth, Elmar; De Bodt, Marc; Van Nuffelen, Gwen.

Int J Lang Commun Disord ; 56(5): 892-906, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34227721

RESUMO

BACKGROUND: Imprecise articulation has a negative impact on speech intelligibility. Therefore, treatment of articulation is clinically relevant in patients with dysarthria. In order to be effective and according to the principles of motor learning, articulation therapy needs to be intensive, well organized, with adequate feedback and requires frequent practice. AIMS: The aims of this pilot study are (1) to evaluate the feasibility of a virtual articulation therapy (VAT) to guide patients with dysarthria through a boost articulation therapy (BArT) program; (2) to evaluate the acoustic models' performance used for automatic phonological error detection; and (3) to validate the system by end-users from their perspective. METHODS & PROCEDURES: The VAT provides an extensive and well-structured package of exercises with visual and auditory modelling and adequate feedback on the utterances. The tool incorporates automated methods to detect phonological errors, which are specifically designed to analyse Dutch speech production. A total of 14 subjects with dysarthria evaluated the acceptability, usability and user interaction with the VAT based on two completed therapy sessions using a self-designed questionnaire. OUTCOMES & RESULTS: In general, participants were positive about the new computer-based therapy approach. The algorithm performance for phonological error detection shows it to be accurate, which contributes to adequate feedback of utterance production. The results of the study indicate that the VAT has a user-friendly interface that can be used independently by patients with dysarthria who have sufficient cognitive, linguistic, motoric and sensory skills to benefit from speech therapy. Recommendations were given by the end-users to further optimize the program and to ensure user engagement. CONCLUSIONS & IMPLICATIONS: The initial implementation of an automatic BArT shows it to be feasible and well accepted by end-users. The tool is an appropriate solution to increase the frequency and intensity of articulation training that supports traditional methods. WHAT THIS PAPER ADDS: What is already known on the subject Behavioural interventions to improve articulation in patients with dysarthria demand intensive treatments, repetitive practice and feedback. However, the current treatments are mainly limited in time to the interactive sessions in the presence of speech-language pathology. Automatic systems addressing the needs of individuals with dysarthria are scarce. This study evaluates the feasibility of a VAT program and investigates its acceptability, usability and user interaction. What this paper adds to existing knowledge The computer-based speech therapy approach developed and applied in this study intends to support intensive articulation training of patients with dysarthria. The virtual speech therapy offers the possibility of an individualized and customized therapy programme, with an extensive database of exercises, visual and auditory models of the target utterances, and providing adequate feedback based on automatic acoustic analysis of speech. What are the potential or actual clinical implications of this work? The automatic BArT overcomes the limitation in time of face-to-face traditional speech therapy. It offers patients the opportunity to have access to speech therapy more intensively and frequently in their home environment.

Assuntos

Disartria , Inteligibilidade da Fala , Adulto , Disartria/psicologia , Humanos , Projetos Piloto , Medida da Produção da Fala/métodos , Fonoterapia/métodos

3.

Speech Production Quality of Cochlear Implant Users with Respect to Duration and Onset of Hearing Loss.

Ruff, Suzan; Bocklet, Tobias; Nöth, Elmar; Müller, Joachim; Hoster, Eva; Schuster, Maria.

ORL J Otorhinolaryngol Relat Spec ; 79(5): 282-294, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29131113

RESUMO

PURPOSE: To assess whether postlingual onset and shorter duration of deafness before cochlear implant (CI) provision predict higher speech intelligibility results of CI users. METHODS: For an objective judgement of speech intelligibility, we used an automatic speech recognition system computing the word recognition rate (WR) of 50 adult CI users and 50 age-matched control individuals. All subjects were recorded reading a standardized text. Subjects were divided into three groups: pre- or perilingual deafness (A), both >2 years before implantation, postlingual deafness <2 years before implantation (B), or postlingual deafness >2 years before implantation (C). RESULTS: CI users with short duration of postlingual deafness (B) had a significantly higher WR (median 74%) than CI users with long duration of postlingual deafness (C; 68%, p < 0.001) or pre-/perilingual onset (A; 56%, p < 0.001). Compared to their control groups only CI users with short duration of postlingual deafness reached similar WR, others showed significantly lower WR. Other factors such as hearing loss onset, duration of CI use, or duration of amplified hearing showed no consistent influence on speech quality. CONCLUSIONS: The speech production quality of adult CI users shows dependencies on the onset and duration of deafness. These features need to be considered while planning rehabilitation.

Assuntos

Implante Coclear/métodos , Perda Auditiva/terapia , Inteligibilidade da Fala/fisiologia , Percepção da Fala/fisiologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Implantes Cocleares , Feminino , Perda Auditiva/fisiopatologia , Humanos , Masculino , Pessoa de Meia-Idade , Fala , Medida da Produção da Fala/métodos , Fatores de Tempo , Adulto Jovem

4.

Language-independent automatic evaluation of intelligibility of chronically hoarse persons.

Haderlein, Tino; Middag, Catherine; Martens, Jean-Pierre; Döllinger, Michael; Nöth, Elmar.

Folia Phoniatr Logop ; 66(6): 219-26, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25659422

RESUMO

OBJECTIVE: Automatic intelligibility assessment using automatic speech recognition is usually language specific. In this study, a language-independent approach is proposed. It uses models that are trained with Flemish speech, and it is applied to assess chronically hoarse German speakers. The research questions are here: is it possible to construct suitable acoustic features that generalize to other languages and a speech disorder, and is the generated model for intelligibility also suitable for specific subtypes of that disorder, i.e. functional and organic dysphonia? PATIENTS AND METHODS: 73 German-speaking persons with chronic hoarseness read the text 'Der Nordwind und die Sonne'. Perceptual intelligibility scores were used as ground truth during the training of an automatic model that converts speaker level acoustic measurements into intelligibility scores. Cross-validation is used to assess model performance. RESULTS: The interrater agreement for all patients (n = 73) and for the functional and organic dysphonia subgroups (n = 45 and n = 24) are r = 0.82, r = 0.83 and r = 0.75, respectively. The automatic assessment based on phonologically based acoustic models revealed correlations between perceptual and automatic intelligibility ratings of r = 0.79 (all patients), r = 0.78 (functional dysphonia) and r = 0.80 (organic dysphonia). CONCLUSION: The automatic, objective measurement of intelligibility is a valuable instrument in an evidence-based clinical practice.

Assuntos

Rouquidão/diagnóstico , Rouquidão/psicologia , Idioma , Inteligibilidade da Fala , Interface para o Reconhecimento da Fala , Adulto , Idoso , Idoso de 80 Anos ou mais , Doença Crônica , Disfonia/diagnóstico , Feminino , Rouquidão/etiologia , Humanos , Masculino , Pessoa de Meia-Idade , Fonética , Acústica da Fala , Adulto Jovem

5.

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation.

Hauer, Christopher; Nöth, Elmar; Barnhill, Alexander; Maier, Andreas; Guthunz, Julius; Hofer, Heribert; Cheng, Rachael Xi; Barth, Volker; Bergler, Christian.

Sci Rep ; 13(1): 11106, 2023 07 10.

Artigo em Inglês | MEDLINE | ID: mdl-37429871

RESUMO

Acoustic identification of vocalizing individuals opens up new and deeper insights into animal communications, such as individual-/group-specific dialects, turn-taking events, and dialogs. However, establishing an association between an individual animal and its emitted signal is usually non-trivial, especially for animals underwater. Consequently, a collection of marine species-, array-, and position-specific ground truth localization data is extremely challenging, which strongly limits possibilities to evaluate localization methods beforehand or at all. This study presents ORCA-SPY, a fully-automated sound source simulation, classification and localization framework for passive killer whale (Orcinus orca) acoustic monitoring that is embedded into PAMGuard, a widely used bioacoustic software toolkit. ORCA-SPY enables array- and position-specific multichannel audio stream generation to simulate real-world ground truth killer whale localization data and provides a hybrid sound source identification approach integrating ANIMAL-SPOT, a state-of-the-art deep learning-based orca detection network, followed by downstream Time-Difference-Of-Arrival localization. ORCA-SPY was evaluated on simulated multichannel underwater audio streams including various killer whale vocalization events within a large-scale experimental setup benefiting from previous real-world fieldwork experience. Across all 58,320 embedded vocalizing killer whale events, subject to various hydrophone array geometries, call types, distances, and noise conditions responsible for a signal-to-noise ratio varying from [Formula: see text] dB to 3 dB, a detection rate of 94.0 % was achieved with an average localization error of 7.01[Formula: see text]. ORCA-SPY was field-tested on Lake Stechlin in Brandenburg Germany under laboratory conditions with a focus on localization. During the field test, 3889 localization events were observed with an average error of 29.19[Formula: see text] and a median error of 17.54[Formula: see text]. ORCA-SPY was deployed successfully during the DeepAL fieldwork 2022 expedition (DLFW22) in Northern British Columbia, with a mean average error of 20.01[Formula: see text] and a median error of 11.01[Formula: see text] across 503 localization events. ORCA-SPY is an open-source and publicly available software framework, which can be adapted to various recording conditions as well as animal species.

Assuntos

Aprendizado Profundo , Orca , Animais , Som , Simulação por Computador , Software

6.

Adult Cochlear Implant Users Versus Typical Hearing Persons: An Automatic Analysis of Acoustic-Prosodic Parameters.

Arias-Vergara, Tomás; Batliner, Anton; Rader, Tobias; Polterauer, Daniel; Högerle, Catalina; Müller, Joachim; Orozco-Arroyave, Juan-Rafael; Nöth, Elmar; Schuster, Maria.

J Speech Lang Hear Res ; 65(12): 4623-4636, 2022 12 12.

Artigo em Inglês | MEDLINE | ID: mdl-36417788

RESUMO

PURPOSE: The aim of this study was to investigate the speech prosody of postlingually deaf cochlear implant (CI) users compared with control speakers without hearing or speech impairment. METHOD: Speech recordings of 74 CI users (37 males and 37 females) and 72 age-balanced control speakers (36 males and 36 females) are considered. All participants are German native speakers and read Der Nordwind und die Sonne (The North Wind and the Sun), a standard text in pathological speech analysis and phonetic transcriptions. Automatic acoustic analysis is performed considering pitch, loudness, and duration features, including speech rate and rhythm. RESULTS: In general, duration and rhythm features differ between CI users and control speakers. CI users read slower and have a lower voiced segment ratio compared with control speakers. A lower voiced ratio goes along with a prolongation of the voiced segments' duration in male and with a prolongation of pauses in female CI users. Rhythm features in CI users have higher variability in the duration of vowels and consonants than in control speakers. The use of bilateral CIs showed no advantages concerning speech prosody features in comparison to unilateral use of CI. CONCLUSIONS: Even after cochlear implantation and rehabilitation, the speech of postlingually deaf adults deviates from the speech of control speakers, which might be due to changed auditory feedback. We suggest considering changes in temporal aspects of speech in future rehabilitation strategies. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21579171.

Assuntos

Implante Coclear , Implantes Cocleares , Surdez , Percepção da Fala , Adulto , Masculino , Feminino , Humanos , Surdez/reabilitação , Audição , Acústica

7.

ANIMAL-SPOT enables animal-independent signal detection and classification using deep learning.

Bergler, Christian; Smeele, Simeon Q; Tyndel, Stephen A; Barnhill, Alexander; Ortiz, Sara T; Kalan, Ammie K; Cheng, Rachael Xi; Brinkløv, Signe; Osiecka, Anna N; Tougaard, Jakob; Jakobsen, Freja; Wahlberg, Magnus; Nöth, Elmar; Maier, Andreas; Klump, Barbara C.

Sci Rep ; 12(1): 21966, 2022 12 19.

Artigo em Inglês | MEDLINE | ID: mdl-36535999

RESUMO

Bioacoustic research spans a wide range of biological questions and applications, relying on identification of target species or smaller acoustic units, such as distinct call types. However, manually identifying the signal of interest is time-intensive, error-prone, and becomes unfeasible with large data volumes. Therefore, machine-driven algorithms are increasingly applied to various bioacoustic signal identification challenges. Nevertheless, biologists still have major difficulties trying to transfer existing animal- and/or scenario-related machine learning approaches to their specific animal datasets and scientific questions. This study presents an animal-independent, open-source deep learning framework, along with a detailed user guide. Three signal identification tasks, commonly encountered in bioacoustics research, were investigated: (1) target signal vs. background noise detection, (2) species classification, and (3) call type categorization. ANIMAL-SPOT successfully segmented human-annotated target signals in data volumes representing 10 distinct animal species and 1 additional genus, resulting in a mean test accuracy of 97.9%, together with an average area under the ROC curve (AUC) of 95.9%, when predicting on unseen recordings. Moreover, an average segmentation accuracy and F1-score of 95.4% was achieved on the publicly available BirdVox-Full-Night data corpus. In addition, multi-class species and call type classification resulted in 96.6% and 92.7% accuracy on unseen test data, as well as 95.2% and 88.4% regarding previous animal-specific machine-based detection excerpts. Furthermore, an Unweighted Average Recall (UAR) of 89.3% outperformed the multi-species classification baseline system of the ComParE 2021 Primate Sub-Challenge. Besides animal independence, ANIMAL-SPOT does not rely on expert knowledge or special computing resources, thereby making deep-learning-based bioacoustic signal identification accessible to a broad audience.

Assuntos

Aprendizado Profundo , Animais , Humanos , Aprendizado de Máquina , Algoritmos , Acústica , Área Sob a Curva

8.

Automatic quantification of speech intelligibility in patients after treatment for oral squamous cell carcinoma.

Stelzle, Florian; Maier, Andreas; Nöth, Elmar; Bocklet, Tobias; Knipfer, Christian; Schuster, Maria; Neukam, Friedrich Wilhelm; Nkenke, Emeka.

J Oral Maxillofac Surg ; 69(5): 1493-500, 2011 May.

Artigo em Inglês | MEDLINE | ID: mdl-21216061

RESUMO

PURPOSE: Treatment of oral carcinomas often causes reduced speech intelligibility. It was the aim of this study to objectively evaluate the speech intelligibility of patients after multimodal therapy for oral squamous cell carcinoma (OSCC) with a computer-based, automatic speech recognition system. MATERIALS AND METHODS: The speech intelligibility of 59 patients after multimodal tumor treatment for OSCC, located at the lateral tongue, floor of the mouth, or the alveolar crest of the lower jaw, was objectively analyzed by a computer-based speech recognition system that calculates the percentage of correct word recognition (WR). RESULTS: The patients' WR was significantly reduced compared with a healthy control group without speech impairment (P ≤ .001). Higher T-classification was associated with a reduced WR (P < .01). Tumors located at the tongue showed a significantly higher WR than tumors at the floor of the mouth or the alveolar crest (P ≤ .001). Surgical resection and reconstruction of the lower jaw bone significantly reduced the WR (P ≤ .001) compared with cases without osseous tumor infiltration. CONCLUSIONS: Speech intelligibility after treatment for OSCC, objectively quantified by a standardized automatic speech recognition system, is reduced for increasing tumor size, increasing resection volume, and tumor localization near the lower jaw. Surgical reconstruction techniques seem to have an impact on speech intelligibility.

Assuntos

Carcinoma de Células Escamosas/cirurgia , Neoplasias Bucais/cirurgia , Inteligibilidade da Fala/fisiologia , Interface para o Reconhecimento da Fala , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Alveolectomia/métodos , Estudos de Coortes , Estudos Transversais , Feminino , Humanos , Masculino , Mandíbula/cirurgia , Neoplasias Mandibulares/cirurgia , Pessoa de Meia-Idade , Soalho Bucal/cirurgia , Esvaziamento Cervical , Terapia Neoadjuvante , Estadiamento de Neoplasias , Radioterapia Adjuvante , Procedimentos de Cirurgia Plástica , Fonoterapia , Retalhos Cirúrgicos , Neoplasias da Língua/cirurgia , Adulto Jovem

9.

FIN-PRINT a fully-automated multi-stage deep-learning-based framework for the individual recognition of killer whales.

Bergler, Christian; Gebhard, Alexander; Towers, Jared R; Butyrev, Leonid; Sutton, Gary J; Shaw, Tasli J H; Maier, Andreas; Nöth, Elmar.

Sci Rep ; 11(1): 23480, 2021 12 06.

Artigo em Inglês | MEDLINE | ID: mdl-34873193

RESUMO

Biometric identification techniques such as photo-identification require an array of unique natural markings to identify individuals. From 1975 to present, Bigg's killer whales have been photo-identified along the west coast of North America, resulting in one of the largest and longest-running cetacean photo-identification datasets. However, data maintenance and analysis are extremely time and resource consuming. This study transfers the procedure of killer whale image identification into a fully automated, multi-stage, deep learning framework, entitled FIN-PRINT. It is composed of multiple sequentially ordered sub-components. FIN-PRINT is trained and evaluated on a dataset collected over an 8-year period (2011-2018) in the coastal waters off western North America, including 121,000 human-annotated identification images of Bigg's killer whales. At first, object detection is performed to identify unique killer whale markings, resulting in 94.4% recall, 94.1% precision, and 93.4% mean-average-precision (mAP). Second, all previously identified natural killer whale markings are extracted. The third step introduces a data enhancement mechanism by filtering between valid and invalid markings from previous processing levels, achieving 92.8% recall, 97.5%, precision, and 95.2% accuracy. The fourth and final step involves multi-class individual recognition. When evaluated on the network test set, it achieved an accuracy of 92.5% with 97.2% top-3 unweighted accuracy (TUA) for the 100 most commonly photo-identified killer whales. Additionally, the method achieved an accuracy of 84.5% and a TUA of 92.9% when applied to the entire 2018 image collection of the 100 most common killer whales. The source code of FIN-PRINT can be adapted to other species and will be publicly available.

10.

From discourse to pathology: Automatic identification of Parkinson's disease patients via morphological measures across three languages.

Eyigoz, Elif; Courson, Melody; Sedeño, Lucas; Rogg, Katharina; Orozco-Arroyave, Juan Rafael; Nöth, Elmar; Skodda, Sabine; Trujillo, Natalia; Rodríguez, Mabel; Rusz, Jan; Muñoz, Edinson; Cardona, Juan F; Herrera, Eduar; Hesse, Eugenia; Ibáñez, Agustín; Cecchi, Guillermo; García, Adolfo M.

Cortex ; 132: 191-205, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32992069

RESUMO

Embodied cognition research on Parkinson's disease (PD) points to disruptions of frontostriatal language functions as sensitive targets for clinical assessment. However, no existing approach has been tested for crosslinguistic validity, let alone by combining naturalistic tasks with machine-learning tools. To address these issues, we conducted the first classifier-based examination of morphological processing (a core frontostriatal function) in spontaneous monologues from PD patients across three typologically different languages. The study comprised 330 participants, encompassing speakers of Spanish (61 patients, 57 matched controls), German (88 patients, 88 matched controls), and Czech (20 patients, 16 matched controls). All subjects described the activities they perform during a regular day, and their monologues were automatically coded via morphological tagging, a computerized method that labels each word with a part-of-speech tag (e.g., noun, verb) and specific morphological tags (e.g., person, gender, number, tense). The ensuing data were subjected to machine-learning analyses to assess whether differential morphological patterns could classify between patients and controls and reflect the former's degree of motor impairment. Results showed robust classification rates, with over 80% of patients being discriminated from controls in each language separately. Moreover, the most discriminative morphological features were associated with the patients' motor compromise (as indicated by Pearson r correlations between predicted and collected motor impairment scores that ranged from moderate to moderate-to-strong across languages). Taken together, our results suggest that morphological patterning, an embodied frontostriatal domain, may be distinctively affected in PD across languages and even under ecological testing conditions.

Assuntos

Idioma , Doença de Parkinson , Cognição , Humanos , Aprendizado de Máquina , Fala

11.

Apkinson: the smartphone application for telemonitoring Parkinson's patients through speech, gait and hands movement.

Orozco-Arroyave, Juan Rafael; Vásquez-Correa, Juan Camilo; Klumpp, Philipp; Pérez-Toro, Paula Andrea; Escobar-Grisales, Daniel; Roth, Nils; Ríos-Urrego, Cristian David; Strauss, Martin; Carvajal-Castaño, Helber Andrés; Bayerl, Sebastian; Castrillón-Osorio, Luis Reinel; Arias-Vergara, Tomas; Künderle, Arne; López-Pabón, Felipe Orlando; Parra-Gallego, Luis Felipe; Eskofier, Björn; Gómez-Gómez, Luis Felipe; Schuster, Maria; Nöth, Elmar.

Neurodegener Dis Manag ; 10(3): 137-157, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32571150

RESUMO

Aim: This paper introduces Apkinson, a mobile application for motor evaluation and monitoring of Parkinson's disease patients. Materials & methods: The App is based on previously reported methods, for instance, the evaluation of articulation and pronunciation in speech, regularity and freezing of gait in walking, and tapping accuracy in hand movement. Results: Preliminary experiments indicate that most of the measurements are suitable to discriminate patients and controls. Significance is evaluated through statistical tests. Conclusion: Although the reported results correspond to preliminary experiments, we think that Apkinson is a very useful App that can help patients, caregivers and clinicians, in performing a more accurate monitoring of the disease progression. Additionally, the mobile App can be a personal health assistant.

Assuntos

Aplicativos Móveis , Doença de Parkinson/fisiopatologia , Smartphone , Idoso , Idoso de 80 Anos ou mais , Feminino , Marcha , Humanos , Masculino , Pessoa de Meia-Idade , Movimento , Índice de Gravidade de Doença , Fala

12.

Automatic detection of articulation disorders in children with cleft lip and palate.

Maier, Andreas; Hönig, Florian; Bocklet, Tobias; Nöth, Elmar; Stelzle, Florian; Nkenke, Emeka; Schuster, Maria.

J Acoust Soc Am ; 126(5): 2589-602, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19894838

RESUMO

Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical and nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time and manpower. Hence, there is a need for an easy to apply and reliable automatic method. To create a reference for an automatic system, speech data of 58 children with CLP were assessed perceptually by experienced speech therapists for characteristic phonetic disorders at the phoneme level. The first part of the article aims to detect such characteristics by a semiautomatic procedure and the second to evaluate a fully automatic, thus simple, procedure. The methods are based on a combination of speech processing algorithms. The semiautomatic method achieves moderate to good agreement (kappa approximately 0.6) for the detection of all phonetic disorders. On a speaker level, significant correlations between the perceptual evaluation and the automatic system of 0.89 are obtained. The fully automatic system yields a correlation on the speaker level of 0.81 to the perceptual evaluation. This correlation is in the range of the inter-rater correlation of the listeners. The automatic speech evaluation is able to detect phonetic disorders at an experts'level without any additional human postprocessing.

Assuntos

Transtornos da Articulação/diagnóstico , Transtornos da Articulação/etiologia , Fenda Labial/complicações , Fissura Palatina/complicações , Modelos Biológicos , Algoritmos , Criança , Humanos , Fonação , Fonética , Psicolinguística , Fonoterapia

13.

Automatic evaluation of tracheoesophageal substitute voice: sustained vowel versus standard text.

Bocklet, Tobias; Toy, Hikmet; Nöth, Elmar; Schuster, Maria; Eysholdt, Ulrich; Rosanowski, Frank; Gottwald, Frank; Haderlein, Tino.

Folia Phoniatr Logop ; 61(2): 112-6, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19321983

RESUMO

OBJECTIVE: The Hoarseness Diagram, a program for voice quality analysis used in German-speaking countries, was compared with an automatic speech recognition system with a module for prosodic analysis. The latter computed prosodic features on the basis of a text recording. We examined whether voice analysis of sustained vowels and text analysis correlate in tracheoesophageal speakers. PATIENTS AND METHODS: Test speakers were 24 male laryngectomees with tracheoesophageal substitute speech, age 60.6 +/- 8.9 years. Each person read the German version of the text 'The North Wind and the Sun'. Additionally, five sustained vowels were recorded from each patient. The fundamental frequency (F(0)) detected by both programs was compared for all vowels. The correlation between the measures obtained by the Hoarseness Diagram and the features from the prosody module was computed. RESULTS: Both programs have problems in determining the F(0) of highly pathologic voices. Parameters like jitter, shimmer, F(0), and irregularity as computed by the Hoarseness Diagram from vowels show correlations of about -0.8 with prosodic features obtained from the text recordings. CONCLUSION: Voice properties can reliably be evaluated both on the basis of vowel and text recordings. Text analysis, however, also offers possibilities for the automatic evaluation of running speech since it realistically represents everyday speech.

Assuntos

Fonética , Interface para o Reconhecimento da Fala , Voz Alaríngea/psicologia , Rouquidão/diagnóstico , Humanos , Masculino , Pessoa de Meia-Idade , Leitura , Acústica da Fala , Distúrbios da Voz/diagnóstico , Qualidade da Voz

14.

Application of automatic speech recognition to quantitative assessment of tracheoesophageal speech with different signal quality.

Haderlein, Tino; Riedhammer, Korbinian; Nöth, Elmar; Toy, Hikmet; Schuster, Maria; Eysholdt, Ulrich; Hornegger, Joachim; Rosanowski, Frank.

Folia Phoniatr Logop ; 61(1): 12-7, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19122460

RESUMO

OBJECTIVE: Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation. PATIENTS AND METHODS: Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study. RESULTS: Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = -0.87 and Krippendorff's alpha of 0.65 when broadband speech was processed. The rater group alone achieved alpha = 0.66. With the test recordings in telephone quality, the system reached r = -0.79 and alpha = 0.67. CONCLUSION: For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An automatic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone.

Assuntos

Medida da Produção da Fala/métodos , Interface para o Reconhecimento da Fala , Voz Alaríngea , Idoso , Processamento Eletrônico de Dados/métodos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Inteligibilidade da Fala , Telefone

15.

Deep Learning Approach to Parkinson's Disease Detection Using Voice Recordings and Convolutional Neural Network Dedicated to Image Classification.

Wodzinski, Marek; Skalski, Andrzej; Hemmerling, Daria; Orozco-Arroyave, Juan Rafael; Noth, Elmar.

Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 717-720, 2019 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-31945997

RESUMO

This study presents an approach to Parkinson's disease detection using vowels with sustained phonation and a ResNet architecture dedicated originally to image classification. We calculated spectrum of the audio recordings and used them as an image input to the ResNet architecture pre-trained using the ImageNet and SVD databases. To prevent overfitting the dataset was strongly augmented in the time domain. The Parkinson's dataset (from PC-GITA database) consists of 100 patients (50 were healthy / 50 were diagnosed with Parkinson's disease). Each patient was recorded 3 times. The obtained accuracy on the validation set is above 90% which is comparable to the current state-of-the-art methods. The results are promising because it turned out that features learned on natural images are able to transfer the knowledge to artificial images representing the spectrogram of the voice signal. What is more, we showed that it is possible to perform a successful detection of Parkinson's disease using only frequency-based features. A spectrogram enables visual representation of frequencies spectrum of a signal. It allows to follow the frequencies changes of a signal in time.

Assuntos

Doença de Parkinson , Voz , Aprendizado Profundo , Humanos , Redes Neurais de Computação

16.

ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning.

Bergler, Christian; Schröter, Hendrik; Cheng, Rachael Xi; Barth, Volker; Weber, Michael; Nöth, Elmar; Hofer, Heribert; Maier, Andreas.

Sci Rep ; 9(1): 10997, 2019 07 29.

Artigo em Inglês | MEDLINE | ID: mdl-31358873

RESUMO

Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis - particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository - the Orchive - comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species.

Assuntos

Vocalização Animal , Orca/fisiologia , Acústica , Animais , Aprendizado Profundo , Feminino , Masculino , Redes Neurais de Computação , Som , Espectrografia do Som/métodos

17.

Multimodal Assessment of Parkinson's Disease: A Deep Learning Approach.

Vasquez-Correa, Juan Camilo; Arias-Vergara, Tomas; Orozco-Arroyave, J R; Eskofier, Bjorn; Klucken, Jochen; Noth, Elmar.

IEEE J Biomed Health Inform ; 23(4): 1618-1630, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-30137018

RESUMO

Parkinson's disease is a neurodegenerative disorder characterized by a variety of motor symptoms. Particularly, difficulties to start/stop movements have been observed in patients. From a technical/diagnostic point of view, these movement changes can be assessed by modeling the transitions between voiced and unvoiced segments in speech, the movement when the patient starts or stops a new stroke in handwriting, or the movement when the patient starts or stops the walking process. This study proposes a methodology to model such difficulties to start or to stop movements considering information from speech, handwriting, and gait. We used those transitions to train convolutional neural networks to classify patients and healthy subjects. The neurological state of the patients was also evaluated according to different stages of the disease (initial, intermediate, and advanced). In addition, we evaluated the robustness of the proposed approach when considering speech signals in three different languages: Spanish, German, and Czech. According to the results, the fusion of information from the three modalities is highly accurate to classify patients and healthy subjects, and it shows to be suitable to assess the neurological state of the patients in several stages of the disease. We also aimed to interpret the feature maps obtained from the deep learning architectures with respect to the presence or absence of the disease and the neurological state of the patients. As far as we know, this is one of the first works that considers multimodal information to assess Parkinson's disease following a deep learning approach.

Assuntos

Aprendizado Profundo , Doença de Parkinson/classificação , Processamento de Sinais Assistido por Computador , Idoso , Idoso de 80 Anos ou mais , Bases de Dados Factuais , Feminino , Marcha/fisiologia , Análise da Marcha , Escrita Manual , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/diagnóstico , Doença de Parkinson/fisiopatologia , Curva ROC , Fala/classificação

18.

Collinearity and Sample Coverage Issues in the Objective Measurement of Vocal Quality: The Case of Roughness and Breathiness.

Ferrer, Carlos A; Haderlein, Tino; Maryn, Youri; de Bodt, Marc S; Nöth, Elmar.

J Speech Lang Hear Res ; 61(1): 1-24, 2018 01 22.

Artigo em Inglês | MEDLINE | ID: mdl-29222538

RESUMO

Purpose: The aim of the study was to address the reported inconsistencies in the relationship between objective acoustic measures and perceptual ratings of vocal quality. Method: This tutorial moves away from the more widely examined problems related to obtaining the perceptual ratings and the acoustic measures and centers in less scrutinized issues regarding the procedure to establish the correspondence. Expressions for the most common measure of association between perceptual and acoustic measures (Pearson's r) are derived using a multiple linear regression model. The particular case where the multiple linear regression involves only roughness and breathiness is discussed to illustrate the issues. Results: Most problems reported regarding inconsistent findings in the relationship between given acoustic measures and particular perceptual ratings could be linked to sample properties not directly related to the actual relationship. The influential sample properties are the collinearity between the regressors in the multiple linear regression and their relative variances. Recommendations on how to rule out this possible cause of inconsistency are given, varying in scope from data collection, reporting, manipulation, and results interpretation. Conclusions: The problems described can be extended to more general cases than the exemplified roughness and breathiness sample's coverage. Ruling out this possible cause of inconsistency would increase the validity of the results reported.

Assuntos

Medida da Produção da Fala/métodos , Qualidade da Voz , Percepção Auditiva , Humanos , Modelos Lineares , Acústica da Fala , Distúrbios da Voz/diagnóstico

19.

Automatic evaluation of prosodic features of tracheoesophageal substitute voice.

Haderlein, Tino; Nöth, Elmar; Toy, Hikmet; Batliner, Anton; Schuster, Maria; Eysholdt, Ulrich; Hornegger, Joachim; Rosanowski, Frank.

Eur Arch Otorhinolaryngol ; 264(11): 1315-21, 2007 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-17571273

RESUMO

In comparison with laryngeal voice, substitute voice after laryngectomy is characterized by restricted aero-acoustic properties. Until now, an objective means of prosodic differences between substitute and normal voices does not exist. In a pilot study, we applied an automatic prosody analysis module to 18 speech samples of laryngectomees (age: 64.2 +/- 8.3 years) and 18 recordings of normal speakers of the same age (65.4 +/- 7.6 years). Ninety-five different features per word based upon the speech energy, fundamental frequency F(0) and duration measures on words, pauses and voiced/voiceless sections were measured. These reflect aspects of loudness, pitch and articulation rate. Subjective evaluation of the 18 patients' voices was performed by a panel of five experts on the criteria "noise", "speech effort", "roughness", "intelligibility", "match of breath and sense units" and "overall quality". These ratings were compared to the automatically computed features. Several of them could be identified being twice as high for the laryngectomees compared to the normal speakers, and vice versa. Comparing the evaluation data of the human experts and the automatic rating, correlation coefficients of up to 0.84 were measured. The automatic analysis serves as a good means to objectify and quantify the global speech outcome of laryngectomees. Even better results are expected when both the computation of the features and the comparison method to the human ratings will have been revised and adapted to the special properties of the substitute voices.

Assuntos

Processamento Eletrônico de Dados , Voz Alaríngea , Voz Esofágica , Fístula Traqueoesofágica , Qualidade da Voz , Idoso , Humanos , Laringectomia , Masculino , Pessoa de Meia-Idade

20.

Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition.

Schuster, Maria; Maier, Andreas; Haderlein, Tino; Nkenke, Emeka; Wohlleben, Ulrike; Rosanowski, Frank; Eysholdt, Ulrich; Nöth, Elmar.

Int J Pediatr Otorhinolaryngol ; 70(10): 1741-7, 2006 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-16814875

RESUMO

OBJECTIVE: Cleft lip and palate (CLP) may cause functional limitations even after adequate surgical and non-surgical treatment, speech disorders being one of them. Interindividually, they vary a lot, showing typical articulation specifics such as nasal emission and shift of articulation and therefore a diminished intelligibility. Until now, an objective means to determine and quantify the intelligibility does not exist. METHOD: An automatic speech recognition system, a new method, was applied on recordings of a standard test to evaluate articulation disorders (psycholinguistic analysis of speech disorders of children PLAKSS) of 31 children at the age of 10.1+/-3.8 years. Two had an isolated cleft lip, 20 a unilateral cleft lip and palate, 4 a bilateral cleft lip and palate, and 5 an isolated cleft palate. The speech recognition system was trained with adults and children without speech disorders and adapted to the speech of children with CLP. In this study, the automatic speech evaluation focussed on the word accuracy which represents the percentage of correctly recognized words. Results were confronted to a perceptive evaluation of intelligibility that was performed by a panel of three experts. RESULTS: The automatic speech recognition yielded word accuracies between 1.2 and 75.8% (mean 48.0+/-19.6%). The word accuracy was lowest for children with isolated cleft palate (36.9+/-23.3) and highest for children with isolated cleft lip (72.8+/-2.9). For children with unilateral cleft lip and palate it was 48.0+/-18.6 and for children with bilateral cleft lip and palate 49.3+/-9.4. The automatic evaluation complied with the experts' subjective evaluation of intelligibility (p<0.01). The multi-rater kappa of the experts alone differed only slightly from the multi-rater kappa of experts and recognizer. CONCLUSION: Automatic speech recognition may serve as a good means to objectify and quantify global speech outcome of children with cleft lip and palate.

Assuntos

Fenda Labial/fisiopatologia , Fissura Palatina/fisiopatologia , Distúrbios da Fala/diagnóstico , Inteligibilidade da Fala , Adolescente , Audiometria da Fala , Criança , Pré-Escolar , Fenda Labial/complicações , Fissura Palatina/complicações , Feminino , Humanos , Masculino , Projetos Piloto , Análise de Regressão , Distúrbios da Fala/etiologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa