Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Laryngoscope ; 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38864282

RESUMO

OBJECTIVE: This study investigated whether artificial intelligence (AI) models combining voice signals, demographics, and structured medical records can detect glottic neoplasm from benign voice disorders. METHODS: We used a primary dataset containing 2-3 s of vowel "ah", demographics, and 26 items of structured medical records (e.g., symptoms, comorbidity, smoking and alcohol consumption, vocal demand) from 60 patients with pathology-proved glottic neoplasm (i.e., squamous cell carcinoma, carcinoma in situ, and dysplasia) and 1940 patients with benign voice disorders. The validation dataset comprised data from 23 patients with glottic neoplasm and 1331 patients with benign disorders. The AI model combined convolutional neural networks, gated recurrent units, and attention layers. We used 10-fold cross-validation (training-validation-testing: 8-1-1) and preserved the percentage between neoplasm and benign disorders in each fold. RESULTS: Results from the AI model using voice signals reached an area under the ROC curve (AUC) value of 0.631, and additional demographics increased this to 0.807. The highest AUC of 0.878 was achieved when combining voice, demographics, and medical records (sensitivity: 0.783, specificity: 0.816, accuracy: 0.815). External validation yielded an AUC value of 0.785 (voice plus demographics; sensitivity: 0.739, specificity: 0.745, accuracy: 0.745). Subanalysis showed that AI had higher sensitivity but lower specificity than human assessment (p < 0.01). The accuracy of AI detection with additional medical records was comparable with human assessment (82% vs. 83%, p = 0.78). CONCLUSIONS: Voice signal alone was insufficient for AI differentiation between glottic neoplasm and benign voice disorders, but additional demographics and medical records notably improved AI performance and approximated the prediction accuracy of humans. LEVEL OF EVIDENCE: NA Laryngoscope, 2024.

2.
Sensors (Basel) ; 23(16)2023 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-37631568

RESUMO

The detection of audio tampering plays a crucial role in ensuring the authenticity and integrity of multimedia files. This paper presents a novel approach to identifying tampered audio files by leveraging the unique Electric Network Frequency (ENF) signal, which is inherent to the power grid and serves as a reliable indicator of authenticity. The study begins by establishing a comprehensive Chinese ENF database containing diverse ENF signals extracted from audio files. The proposed methodology involves extracting the ENF signal, applying wavelet decomposition, and utilizing the autoregressive model to train effective classification models. Subsequently, the framework is employed to detect audio tampering and assess the influence of various environmental conditions and recording devices on the ENF signal. Experimental evaluations conducted on our Chinese ENF database demonstrate the efficacy of the proposed method, achieving impressive accuracy rates ranging from 91% to 93%. The results emphasize the significance of ENF-based approaches in enhancing audio file forensics and reaffirm the necessity of adopting reliable tamper detection techniques in multimedia authentication.

3.
IEEE Trans Biomed Eng ; 70(10): 2922-2932, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37099463

RESUMO

OBJECTIVE: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. METHODS: This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain-invariant features. RESULTS: The results show that the unweighted average recall in the noisy real-world domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. CONCLUSION: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. SIGNIFICANCE: To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources.


Assuntos
Compressão de Dados , Distúrbios da Voz , Humanos , Distúrbios da Voz/diagnóstico , Redes Neurais de Computação
4.
J Voice ; 2023 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-36732109

RESUMO

OBJECTIVE: Doctors, nowadays, primarily use auditory-perceptual evaluation, such as the grade, roughness, breathiness, asthenia, and strain scale, to evaluate voice quality and determine the treatment. However, the results predicted by individual physicians often differ, because of subjective perceptions, and diagnosis time interval, if the patient's symptoms are hard to judge. Therefore, an accurate computerized pathological voice quality assessment system will improve the quality of assessment. METHOD: This study proposes a self_attention-based system, with a deep learning technology, named self_attention-based bidirectional long-short term memory (SA BiLSTM). Different pitches [low, normal, high], and vowels [/a/, /i/, /u/], were added into the proposed model, to make it learn how professional doctors evaluate the grade, roughness, breathiness, asthenia, and strain scale, in a high dimension view. RESULTS: The experimental results showed that the proposed system provided higher performance than the baseline system. More specifically, the macro average of the F1 score, presented as decimal, was used to compare the accuracy of classification. The (G, R, and B) of the proposed system were (0.768±0.011, 0.820±0.009, and 0.815±0.009), which is higher than the baseline systems: deep neural network (0.395±0.010, 0.312±0.019, 0.321±0.014) and convolution neural network (0.421±0.052, 0.306±0.043, 0.3250±0.032) respectively. CONCLUSIONS: The proposed system, with SA BiLSTM, pitches, and vowels, provides a more accurate way to evaluate the voice. This will be helpful for clinical voice evaluations and will improve patients' benefits from voice therapy.

5.
Artigo em Inglês | MEDLINE | ID: mdl-36085875

RESUMO

Generally, those patients with dysarthria utter a distorted sound and the restrained intelligibility of a speech for both human and machine. To enhance the intelligibility of dysarthric speech, we applied a deep learning-based speech enhancement (SE) system in this task. Conventional SE approaches are used for shrinking noise components from the noise-corrupted input, and thus improve the sound quality and intelligibility simultaneously. In this study, we are focusing on reconstructing the severely distorted signal from the dysarthric speech for improving intelligibility. The proposed SE system prepares a convolutional neural network (CNN) model in the training phase, which is then used to process the dysarthric speech in the testing phase. During training, paired dysarthric-normal speech utterances are required. We adopt a dynamic time warping technique to align the dysarthric-normal utter-ances. The gained training data are used to train a CNN - based SE model. The proposed SE system is evaluated on the Google automatic speech recognition (ASR) system and a subjective listening test. The results showed that the proposed method could notably enhance the recognition performance for more than 10% in each of ASR and human recognitions from the unprocessed dysarthric speech. Clinical Relevance- This study enhances the intelligibility and ASR accuracy from a dysarthria speech to more than 10.


Assuntos
Disartria , Fala , Percepção Auditiva , Disartria/diagnóstico , Humanos , Redes Neurais de Computação , Som
6.
Sensors (Basel) ; 22(17)2022 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-36081092

RESUMO

Deep learning techniques such as convolutional neural networks (CNN) have been successfully applied to identify pathological voices. However, the major disadvantage of using these advanced models is the lack of interpretability in explaining the predicted outcomes. This drawback further introduces a bottleneck for promoting the classification or detection of voice-disorder systems, especially in this pandemic period. In this paper, we proposed using a series of learnable sinc functions to replace the very first layer of a commonly used CNN to develop an explainable SincNet system for classifying or detecting pathological voices. The applied sinc filters, a front-end signal processor in SincNet, are critical for constructing the meaningful layer and are directly used to extract the acoustic features for following networks to generate high-level voice information. We conducted our tests on three different Far Eastern Memorial Hospital voice datasets. From our evaluations, the proposed approach achieves the highest 7%-accuracy and 9%-sensitivity improvements from conventional methods and thus demonstrates superior performance in predicting input pathological waveforms of the SincNet system. More importantly, we intended to give possible explanations between the system output and the first-layer extracted speech features based on our evaluated results.


Assuntos
Distúrbios da Voz , Voz , Acústica , Humanos , Redes Neurais de Computação , Distúrbios da Voz/diagnóstico
7.
IEEE Open J Eng Med Biol ; 3: 25-33, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35399790

RESUMO

Goal: Numerous studies had successfully differentiated normal and abnormal voice samples. Nevertheless, further classification had rarely been attempted. This study proposes a novel approach, using continuous Mandarin speech instead of a single vowel, to classify four common voice disorders (i.e. functional dysphonia, neoplasm, phonotrauma, and vocal palsy). Methods: In the proposed framework, acoustic signals are transformed into mel-frequency cepstral coefficients, and a bi-directional long-short term memory network (BiLSTM) is adopted to model the sequential features. The experiments were conducted on a large-scale database, wherein 1,045 continuous speech were collected by the speech clinic of a hospital from 2012 to 2019. Results: Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall improvements of 78.12-89.27% and 50.92-80.68%, respectively, compared with systems that use a single vowel. Conclusions: The results are consistent with other machine learning algorithms, including gated recurrent units, random forest, deep neural networks, and LSTM.The sensitivities for each disorder were also analyzed, and the model capabilities were visualized via principal component analysis. An alternative experiment based on a balanced dataset again confirms the advantages of using continuous speech for learning voice disorders.

8.
JMIR Mhealth Uhealth ; 8(12): e16746, 2020 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-33270033

RESUMO

BACKGROUND: Voice disorders mainly result from chronic overuse or abuse, particularly in occupational voice users such as teachers. Previous studies proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, along with the lack of real-time processing, has limited its clinical application. OBJECTIVE: This study aims to (1) propose an automatic speech detection system using wireless microphones for real-time ambulatory voice monitoring, (2) examine the detection accuracy under controlled environment and noisy conditions, and (3) report the results of the phonation ratio in practical scenarios. METHODS: We designed an adaptive threshold function to detect the presence of speech based on the energy envelope. We invited 10 teachers to participate in this study and tested the performance of the proposed automatic speech detection system regarding detection accuracy and phonation ratio. Moreover, we investigated whether the unsupervised noise reduction algorithm (ie, log minimum mean square error) can overcome the influence of environmental noise in the proposed system. RESULTS: The proposed system exhibited an average accuracy of speech detection of 89.9%, ranging from 81.0% (67,357/83,157 frames) to 95.0% (199,201/209,685 frames). Subsequent analyses revealed a phonation ratio between 44.0% (33,019/75,044 frames) and 78.0% (68,785/88,186 frames) during teaching sessions of 40-60 minutes; the durations of most of the phonation segments were less than 10 seconds. The presence of background noise reduced the accuracy of the automatic speech detection system, and an adjuvant noise reduction function could effectively improve the accuracy, especially under stable noise conditions. CONCLUSIONS: This study demonstrated an average detection accuracy of 89.9% in the proposed automatic speech detection system with wireless microphones. The preliminary results for the phonation ratio were comparable to those of previous studies. Although the wireless microphones are susceptible to background noise, an additional noise reduction function can alleviate this limitation. These results indicate that the proposed system can be applied for ambulatory voice monitoring in occupational voice users.


Assuntos
Acústica da Fala , Distúrbios da Voz , Algoritmos , Humanos , Fonação , Fala
9.
IEEE J Biomed Health Inform ; 24(11): 3203-3214, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32795973

RESUMO

Auscultation is the most efficient way to diagnose cardiovascular and respiratory diseases. To reach accurate diagnoses, a device must be able to recognize heart and lung sounds from various clinical situations. However, the recorded chest sounds are mixed by heart and lung sounds. Thus, effectively separating these two sounds is critical in the pre-processing stage. Recent advances in machine learning have progressed on monaural source separations, but most of the well-known techniques require paired mixed sounds and individual pure sounds for model training. As the preparation of pure heart and lung sounds is difficult, special designs must be considered to derive effective heart and lung sound separation techniques. In this study, we proposed a novel periodicity-coded deep auto-encoder (PC-DAE) approach to separate mixed heart-lung sounds in an unsupervised manner via the assumption of different periodicities between heart rate and respiration rate. The PC-DAE benefits from deep-learning-based models by extracting representative features and considers the periodicity of heart and lung sounds to carry out the separation. We evaluated PC-DAE on two datasets. The first one includes sounds from the Student Auscultation Manikin (SAM), and the second is prepared by recording chest sounds in real-world conditions. Experimental results indicate that PC-DAE outperforms several well-known separation works in terms of standardized evaluation metrics. Moreover, waveforms and spectrograms demonstrate the effectiveness of PC-DAE compared to existing approaches. It is also confirmed that by using the proposed PC-DAE as a pre-processing stage, the heart sound recognition accuracies can be notably boosted. The experimental results confirmed the effectiveness of PC-DAE and its potential to be used in clinical applications.


Assuntos
Ruídos Cardíacos , Sons Respiratórios , Auscultação , Coração , Humanos , Pulmão
10.
Sci Rep ; 10(1): 4153, 2020 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-32139787

RESUMO

This study proposes a gradient-boosting-based machine learning approach for predicting the PM2.5 concentration in Taiwan. The proposed mechanism is evaluated on a large-scale database built by the Environmental Protection Administration, and Central Weather Bureau, Taiwan, which includes data from 77 air monitoring stations and 580 weather stations performing hourly measurements over 1 year. By learning from past records of PM2.5 and neighboring weather stations' climatic information, the forecasting model works well for 24-h prediction at most air stations. This study also investigates the geographical and meteorological divergence for the forecasting results of seven regional monitoring areas. We also compare the prediction performance between Taiwan, Taipei, and London; analyze the impact of industrial pollution; and propose an enhanced version of the prediction model to improve the prediction accuracy. The results indicate that Taipei and London have similar prediction results because these two cities have similar topography (basin) and are financial centers without domestic pollution sources. The results also suggest that after considering industrial impacts by incorporating additional features from the Taichung and Thong-Siau power plants, the proposed method achieves significant improvement in the coefficient of determination (R2) from 0.58 to 0.71. Moreover, for Taichung City the root-mean-square error decreases from 8.56 for the conventional approach to 7.06 for the proposed method.

11.
J Voice ; 33(5): 634-641, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29567049

RESUMO

OBJECTIVES: Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms. METHODS: This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms. RESULTS: The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms. CONCLUSIONS: By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives.


Assuntos
Acústica , Aprendizado Profundo , Disfonia/diagnóstico , Processamento de Sinais Assistido por Computador , Acústica da Fala , Medida da Produção da Fala , Máquina de Vetores de Suporte , Prega Vocal/fisiopatologia , Qualidade da Voz , Adulto , Idoso , Idoso de 80 Anos ou mais , Diagnóstico por Computador , Disfonia/fisiopatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Projetos Piloto , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Estudos Retrospectivos , Espectrografia do Som , Prega Vocal/patologia , Adulto Jovem
12.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 404-408, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30440419

RESUMO

The performance of a deep-learning-based speech enhancement (SE) technology for hearing aid users, called a deep denoising autoencoder (DDAE), was investigated. The hearing-aid speech perception index (HASPI) and the hearing- aid sound quality index (HASQI), which are two well-known evaluation metrics for speech intelligibility and quality, were used to evaluate the performance of the DDAE SE approach in two typical high-frequency hearing loss (HFHL) audiograms. Our experimental results show that the DDAE SE approach yields higher intelligibility and quality scores than two classical SE approaches. These results suggest that a deep-learning-based SE method could be used to improve speech intelligibility and quality for hearing aid users in noisy environments.


Assuntos
Aprendizado Profundo , Auxiliares de Audição , Percepção Auditiva , Perda Auditiva Neurossensorial/reabilitação , Testes Auditivos , Humanos , Som , Inteligibilidade da Fala , Percepção da Fala
13.
Folia Phoniatr Logop ; 70(3-4): 174-182, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30184538

RESUMO

BACKGROUND: Studies have used questionnaires of dysphonic symptoms to screen voice disorders. This study investigated whether the differential presentation of demographic and symptomatic features can be applied to computerized classification. METHODS: We recruited 100 patients with glottic neoplasm, 508 with phonotraumatic lesions, and 153 with unilateral vocal palsy. Statistical analyses revealed significantly different distributions of demographic and symptomatic variables. Machine learning algorithms, including decision tree, linear discriminant analysis, K-nearest neighbors, support vector machine, and artificial neural network, were applied to classify voice disorders. RESULTS: The results showed that demographic features were more effective for detecting neoplastic and phonotraumatic lesions, whereas symptoms were useful for detecting vocal palsy. When combining demographic and symptomatic variables, the artificial neural network achieved the highest accuracy of 83 ± 1.58%, whereas the accuracy achieved by other algorithms ranged from 74 to 82.6%. Decision tree analyses revealed that sex, age, smoking status, sudden onset of dysphonia, and 10-item voice handicap index scores were significant characteristics for classification. CONCLUSION: This study demonstrated a significant difference in demographic and symptomatic features between glottic neoplasm, phonotraumatic lesions, and vocal palsy. These features may facilitate automatic classification of voice disorders through machine learning algorithms.


Assuntos
Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Distúrbios da Voz/classificação , Adulto , Fatores Etários , Idoso , Consumo de Bebidas Alcoólicas/epidemiologia , Algoritmos , Demografia , Feminino , Glote/lesões , Glote/fisiopatologia , Humanos , Neoplasias Laríngeas/complicações , Neoplasias Laríngeas/diagnóstico , Neoplasias Laríngeas/fisiopatologia , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Índice de Gravidade de Doença , Fatores Sexuais , Fumar/epidemiologia , Avaliação de Sintomas , Paralisia das Pregas Vocais/complicações , Paralisia das Pregas Vocais/diagnóstico , Paralisia das Pregas Vocais/fisiopatologia , Distúrbios da Voz/epidemiologia , Qualidade da Voz , Ferimentos e Lesões/diagnóstico
14.
Sensors (Basel) ; 18(2)2018 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-29415508

RESUMO

The development of indoor positioning solutions using smartphones is a growing activity with an enormous potential for everyday life and professional applications. The research activities on this topic concentrate on the development of new positioning solutions that are tested in specific environments under their own evaluation metrics. To explore the real positioning quality of smartphone-based solutions and their capabilities for seamlessly adapting to different scenarios, it is needed to find fair evaluation frameworks. The design of competitions using extensive pre-recorded datasets is a valid way to generate open data for comparing the different solutions created by research teams. In this paper, we discuss the details of the 2017 IPIN indoor localization competition, the different datasets created, the teams participating in the event, and the results they obtained. We compare these results with other competition-based approaches (Microsoft and Perf-loc) and on-line evaluation web sites. The lessons learned by organising these competitions and the benefits for the community are addressed along the paper. Our analysis paves the way for future developments on the standardization of evaluations and for creating a widely-adopted benchmark strategy for researchers and companies in the field.

15.
Sensors (Basel) ; 16(8)2016 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-27548182

RESUMO

This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user's transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.

16.
IEEE Trans Neural Netw ; 19(11): 1973-8, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19000967

RESUMO

This brief paper presents a novel localization algorithm, named discriminant-adaptive neural network (DANN), which takes the received signal strength (RSS) from the access points (APs) as inputs to infer the client position in the wireless local area network (LAN) environment. We extract the useful information into discriminative components (DCs) for network learning. The nonlinear relationship between RSS and the position is then accurately constructed by incrementally inserting the DCs and recursively updating the weightings in the network until no further improvement is required. Our localization system is developed in a real-world wireless LAN WLAN environment, where the realistic RSS measurement is collected. We implement the traditional approaches on the same test bed, including weighted kappa-nearest neighbor (WKNN), maximum likelihood (ML), and multilayer perceptron (MLP), and compare the results. The experimental results indicate that the proposed algorithm is much higher in accuracy compared with other examined techniques. The improvement can be attributed to that only the useful information is efficiently extracted for positioning while the redundant information is regarded as noise and discarded. Finally, the analysis shows that our network intelligently accomplishes learning while the inserted DCs provide sufficient information.


Assuntos
Algoritmos , Redes de Comunicação de Computadores , Meio Ambiente , Modelos Teóricos , Redes Neurais de Computação , Orientação , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Análise Discriminante
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...