Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Sci Data ; 11(1): 746, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38982093

RESUMO

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.


Assuntos
Fala , Voz , Humanos , Período Pós-Operatório , Tonsilectomia , Masculino , Feminino , Período Pré-Operatório , Adulto
2.
Bioengineering (Basel) ; 10(11)2023 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-38002440

RESUMO

End-to-end deep learning models have shown promising results for the automatic screening of Parkinson's disease by voice and speech. However, these models often suffer degradation in their performance when applied to scenarios involving multiple corpora. In addition, they also show corpus-dependent clusterings. These facts indicate a lack of generalisation or the presence of certain shortcuts in the decision, and also suggest the need for developing new corpus-independent models. In this respect, this work explores the use of domain adversarial training as a viable strategy to develop models that retain their discriminative capacity to detect Parkinson's disease across diverse datasets. The paper presents three deep learning architectures and their domain adversarial counterparts. The models were evaluated with sustained vowels and diadochokinetic recordings extracted from four corpora with different demographics, dialects or languages, and recording conditions. The results showed that the space distribution of the embedding features extracted by the domain adversarial networks exhibits a higher intra-class cohesion. This behaviour is supported by a decrease in the variability and inter-domain divergence computed within each class. The findings suggest that domain adversarial networks are able to learn the common characteristics present in Parkinsonian voice and speech, which are supposed to be corpus, and consequently, language independent. Overall, this effort provides evidence that domain adaptation techniques refine the existing end-to-end deep learning approaches for Parkinson's disease detection from voice and speech, achieving more generalizable models.

3.
Diagnostics (Basel) ; 13(8)2023 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-37189482

RESUMO

Due to the primary affection of the respiratory system, COVID-19 leaves traces that are visible in plain chest X-ray images. This is why this imaging technique is typically used in the clinic for an initial evaluation of the patient's degree of affection. However, individually studying every patient's radiograph is time-consuming and requires highly skilled personnel. This is why automatic decision support systems capable of identifying those lesions due to COVID-19 are of practical interest, not only for alleviating the workload in the clinic environment but also for potentially detecting non-evident lung lesions. This article proposes an alternative approach to identify lung lesions associated with COVID-19 from plain chest X-ray images using deep learning techniques. The novelty of the method is based on an alternative pre-processing of the images that focuses attention on a certain region of interest by cropping the original image to the area of the lungs. The process simplifies training by removing irrelevant information, improving model precision, and making the decision more understandable. Using the FISABIO-RSNA COVID-19 Detection open data set, results report that the opacities due to COVID-19 can be detected with a Mean Average Precision with an IoU > 0.5 (mAP@50) of 0.59 following a semi-supervised training procedure and an ensemble of two architectures: RetinaNet and Cascade R-CNN. The results also suggest that cropping to the rectangular area occupied by the lungs improves the detection of existing lesions. A main methodological conclusion is also presented, suggesting the need to resize the available bounding boxes used to delineate the opacities. This process removes inaccuracies during the labelling procedure, leading to more accurate results. This procedure can be easily performed automatically after the cropping stage.

4.
IEEE Access ; 8: 226811-226827, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34786299

RESUMO

Current standard protocols used in the clinic for diagnosing COVID-19 include molecular or antigen tests, generally complemented by a plain chest X-Ray. The combined analysis aims to reduce the significant number of false negatives of these tests and provide complementary evidence about the presence and severity of the disease. However, the procedure is not free of errors, and the interpretation of the chest X-Ray is only restricted to radiologists due to its complexity. With the long term goal to provide new evidence for the diagnosis, this paper presents an evaluation of different methods based on a deep neural network. These are the first steps to develop an automatic COVID-19 diagnosis tool using chest X-Ray images to differentiate between controls, pneumonia, or COVID-19 groups. The paper describes the process followed to train a Convolutional Neural Network with a dataset of more than 79, 500 X-Ray images compiled from different sources, including more than 8, 500 COVID-19 examples. Three different experiments following three preprocessing schemes are carried out to evaluate and compare the developed models. The aim is to evaluate how preprocessing the data affects the results and improves its explainability. Likewise, a critical analysis of different variability issues that might compromise the system and its effects is performed. With the employed methodology, a 91.5% classification accuracy is obtained, with an 87.4% average recall for the worst but most explainable experiment, which requires a previous automatic segmentation of the lung region.

5.
Sci Rep ; 9(1): 19066, 2019 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-31836744

RESUMO

Literature documents the impact of Parkinson's Disease (PD) on speech but no study has analyzed in detail the importance of the distinct phonemic groups for the automatic identification of the disease. This study presents new approaches that are evaluated in three different corpora containing speakers suffering from PD with two main objectives: to investigate the influence of the different phonemic groups in the detection of PD and to propose more accurate detection schemes employing speech. The proposed methodology uses GMM-UBM classifiers combined with a technique introduced in this paper called phonemic grouping, that permits observation of the differences in accuracy depending on the manner of articulation. Cross-validation results reach accuracies between 85% and 94% with AUC ranging from 0.91 to 0.98, while cross-corpora trials yield accuracies between 75% and 82% with AUC between 0.84 and 0.95, depending on the corpus. This is the first work analyzing the generalization properties of the proposed approaches employing cross-corpora trials and reaching high accuracies. Among the different phonemic groups, results suggest that plosives, vowels and fricatives are the most relevant acoustic segments for the detection of PD with the proposed schemes. In addition, the use of text-dependent utterances leads to more consistent and accurate models.


Assuntos
Doença de Parkinson/fisiopatologia , Fonética , Fala/fisiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Área Sob a Curva , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Espectrografia do Som
6.
J Voice ; 30(5): 518-28, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26377510

RESUMO

To date, although much attention has been paid to the estimation and modeling of the voice source (ie, the glottal airflow volume velocity), the measurement and characterization of the supraglottal pressure wave have been much less studied. Some previous results have unveiled that the supraglottal pressure wave has some spectral resonances similar to those of the voice pressure wave. This makes the supraglottal wave partially intelligible. Although the explanation for such effect seems to be clearly related to the reflected pressure wave traveling upstream along the vocal tract, the influence that nonlinear source-filter interaction has on it is not as clear. This article provides an insight into this issue by comparing the acoustic analyses of measured and simulated supraglottal and voice waves. Simulations have been performed using a high-dimensional discrete vocal fold model. Results of such comparative analysis indicate that spectral resonances in the supraglottal wave are mainly caused by the regressive pressure wave that travels upstream along the vocal tract and not by source-tract interaction. On the contrary and according to simulation results, source-tract interaction has a role in the loss of intelligibility that happens in the supraglottal wave with respect to the voice wave. This loss of intelligibility mainly corresponds to spectral differences for frequencies above 1500 Hz.


Assuntos
Simulação por Computador , Laringe/fisiologia , Modelos Biológicos , Fonação , Acústica da Fala , Inteligibilidade da Fala , Qualidade da Voz , Estimulação Acústica , Acústica , Fenômenos Biomecânicos , Feminino , Glote/fisiologia , Humanos , Julgamento , Laringe/anatomia & histologia , Masculino , Pressão , Reconhecimento Psicológico , Espectrografia do Som , Percepção da Fala , Medida da Produção da Fala , Vibração
7.
Biomed Eng Online ; 14: 100, 2015 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-26510707

RESUMO

BACKGROUND: The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration. METHODS: In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed. RESULTS: Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds. CONCLUSIONS: The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.


Assuntos
Endoscopia , Processamento de Imagem Assistida por Computador/métodos , Prega Vocal , Automação , Humanos , Fonação , Fatores de Tempo , Prega Vocal/fisiologia
8.
IEEE Trans Biomed Eng ; 58(2): 370-9, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21257362

RESUMO

This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001.


Assuntos
Inteligência Artificial , Processamento de Sinais Assistido por Computador , Distúrbios da Voz/diagnóstico , Algoritmos , Humanos , Cadeias de Markov , Dinâmica não Linear , Distribuição Normal , Espectrografia do Som/métodos , Distúrbios da Voz/fisiopatologia
9.
Logoped Phoniatr Vocol ; 36(2): 52-9, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20849245

RESUMO

Within this paper, the authors report on an experiment on automatic labelling of perceived voice roughness (R) and breathiness (B), according to the GRBAS scale. The main objective of the experiment has not been to correlate objective measures to perceived R and B, but to automatically evaluate R and B. For this purpose, a system has been trained that extracts the first mel-frequency cepstral coefficients (MFCC) of available sustained vowel phonations. Afterwards, a classifier has been trained to estimate the corresponding degrees of roughness and breathiness. The obtained results reveal a significant correlation between subjective and automatic labelling, hence indicating the feasibility of objective evaluation of voice quality by means of perceptually meaningful measures.


Assuntos
Processamento de Sinais Assistido por Computador , Medida da Produção da Fala , Distúrbios da Voz/diagnóstico , Qualidade da Voz , Adulto , Algoritmos , Automação , Bases de Dados como Assunto , Estudos de Viabilidade , Feminino , Análise de Fourier , Humanos , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão , Fonação , Valor Preditivo dos Testes , Espectrografia do Som , Acústica da Fala , Distúrbios da Voz/fisiopatologia
10.
Logoped Phoniatr Vocol ; 36(2): 60-9, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21073260

RESUMO

This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and then the outputs of each classifier were used in a second classification stage. In order to establish the best configuration which provides the highest accuracy in the detection, the fusion of information was carried out employing different classifier combination strategies. The experiments were carried out using two different databases: the one developed by The Massachusetts Eye and Ear Infirmary Voice Laboratory, and a database recorded by the Universidad Politécnica de Madrid. The results show that the combination of MFCC and MS features employing the proposed approach yields an improvement in the detection accuracy, demonstrating that both methods of parameterization are complementary.


Assuntos
Processamento de Sinais Assistido por Computador , Medida da Produção da Fala , Distúrbios da Voz/diagnóstico , Qualidade da Voz , Adolescente , Adulto , Idoso , Algoritmos , Automação , Criança , Bases de Dados como Assunto , Feminino , Análise de Fourier , Humanos , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão , Fonação , Valor Preditivo dos Testes , Espectrografia do Som , Acústica da Fala , Distúrbios da Voz/fisiopatologia , Adulto Jovem
11.
Artigo em Inglês | MEDLINE | ID: mdl-19965158

RESUMO

In this work an entropy based nonlinear analysis of pathological voices is presented. The complexity analysis is carried out by means of six different entropies, including three measures derived from the entropy rate of Markov chains. The aim is to characterize the divergence of the trajectories and theirs directions into the state space of Markov Chains. By employing these measures in conjunction with conventional entropy features, it is possible to improve the discrimination capabilities of the nonlinear analysis in the automatic detection of pathological voices.


Assuntos
Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Distúrbios da Voz/fisiopatologia , Voz , Acústica , Algoritmos , Automação , Engenharia Biomédica/métodos , Entropia , Humanos , Cadeias de Markov , Modelos Estatísticos , Curva ROC , Fatores de Tempo , Distúrbios da Voz/diagnóstico
12.
Comput Med Imaging Graph ; 32(3): 193-201, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18243657

RESUMO

The present work describes a new method for the automatic detection of the glottal space from laryngeal images obtained either with high speed or with conventional video cameras attached to a laryngoscope. The detection is based on the combination of several relevant techniques in the field of digital image processing. The image is segmented with a watershed transform followed by a region merging, while the final decision is taken using a simple linear predictor. This scheme has successfully segmented the glottal space in all the test images used. The method presented can be considered a generalist approach for the segmentation of the glottal space because, in contrast with other methods found in literature, this approach does not need either initialization or finding strict environmental conditions extracted from the images to be processed. Therefore, the main advantage is that the user does not have to outline the region of interest with a mouse click. In any case, some a priori knowledge about the glottal space is needed, but this a priori knowledge can be considered weak compared to the environmental conditions fixed in former works.


Assuntos
Glote/patologia , Processamento de Imagem Assistida por Computador/métodos , Laringoscópios , Laringe/patologia , Gravação em Vídeo , Algoritmos , Automação , Humanos
13.
IEEE Trans Biomed Eng ; 55(12): 2831-5, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19126465

RESUMO

This paper investigates the performance of an automatic system for voice pathology detection when the voice samples have been compressed in MP3 format and different binary rates (160, 96, 64, 48, 24, and 8 kb/s). The detectors employ cepstral and noise measurements, along with their derivatives, to characterize the voice signals. The classification is performed using Gaussian mixtures models and support vector machines. The results between the different proposed detectors are compared by means of detector error tradeoff (DET) and receiver operating characteristic (ROC) curves, concluding that there are no significant differences in the performance of the detector when the binary rates of the compressed data are above 64 kb/s. This has useful applications in telemedicine, reducing the storage space of voice recordings or transmitting them over narrow-band communications channels.


Assuntos
Artefatos , Compressão de Dados/métodos , Espectrografia do Som/métodos , Acústica da Fala , Distúrbios da Voz/diagnóstico , Inteligência Artificial , Análise de Fourier , Humanos , Multimídia , Distribuição Normal , Reconhecimento Automatizado de Padrão/métodos , Curva ROC , Voz , Distúrbios da Voz/fisiopatologia , Qualidade da Voz
14.
Med Eng Phys ; 28(3): 276-89, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15950513

RESUMO

A PC-based integrated aid tool has been developed for the analysis and screening of pathological voices. With it the user can simultaneously record speech, electroglottographic (EGG), and videoendoscopic signals, and synchronously edit them to select the most significant segments. These multimedia data are stored on a relational database, together with a patient's personal information, anamnesis, diagnosis, visits, explorations and any other comment the specialist may wish to include. The speech and EGG waveforms are analysed by means of temporal representations and the quantitative measurements of parameters such as spectrograms, frequency and amplitude perturbation measurements, harmonic energy, noise, etc. are calculated using digital signal processing techniques, giving an idea of the degree of hoarseness and quality of the voice register. Within this framework, the system uses a standard protocol to evaluate and build complete databases of voice disorders. The target users of this system are speech and language therapists and ear nose and throat (ENT) clinicians. The application can be easily configured to cover the needs of both groups of professionals. The software has a user-friendly Windows style interface. The PC should be equipped with standard sound and video capture cards. Signals are captured using common transducers: a microphone, an electroglottograph and a fiberscope or telelaryngoscope. The clinical usefulness of the system is addressed in a comprehensive evaluation section.


Assuntos
Diagnóstico por Computador/métodos , Laringoscopia/métodos , Sistemas Computadorizados de Registros Médicos , Software , Espectrografia do Som/métodos , Interface Usuário-Computador , Distúrbios da Voz/diagnóstico , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Eletroencefalografia/métodos , Armazenamento e Recuperação da Informação/métodos , Design de Software , Integração de Sistemas , Telemedicina/métodos
15.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 2478-81, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17946516

RESUMO

Nowadays, the most extended techniques to measure the voice quality are based on perceptual evaluation by well trained professionals. The GRBAS scale is a widely used method for perceptual evaluation of voice quality. The GRBAS scale is widely used in Japan and there is increasing interest in both Europe and the United States. However, this technique needs well-trained experts, and is based on the evaluator's expertise, depending a lot on his own psycho-physical state. Furthermore, a great variability in the assessments performed from one evaluator to another is observed. Therefore, an objective method to provide such measurement of voice quality would be very valuable. In this paper, the automatic assessment of voice quality is addressed by means of short-term Mel cepstral parameters (MFCC), and learning vector quantization (LVQ) in a pattern recognition stage. Results show that this approach provides acceptable results for this purpose, with accuracy around 65% at the best.


Assuntos
Diagnóstico por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Índice de Gravidade de Doença , Espectrografia do Som/métodos , Medida da Produção da Fala/métodos , Distúrbios da Voz/diagnóstico , Qualidade da Voz , Algoritmos , Inteligência Artificial , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Distúrbios da Voz/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA