Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
J Acoust Soc Am ; 151(3): 2181, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35364933

RESUMEN

This paper proposes a method for displaying the phase information in speech signals through group delay spectrogram, without the need for phase unwrapping. The method involves scaling down the phase values without affecting the shape of the phase or group delay function, thus preserving the information of the phase spectrum. This is accomplished using single-frequency filtering (SFF) of speech signals to obtain the instantaneous complex SFF spectrum. The SFF involves filtering a frequency-shifting signal using a resonator at half the sampling frequency. The SFF spectrum displays characteristics similar to the standard short-time Fourier transform (STFT) spectrum, but without the effects of truncation due to windowing operation. The objective of the present study is to show that features of speech production can also be observed in the phase information, displayed through the group delay spectrogram. The time-frequency resolution in the group delay spectrogram depends on the choice of the bandwidth of the resonator used in the SFF analysis. The speech production features displayed in the group delay spectrogram are examined for different types of speech signals at different time-frequency resolutions.


Asunto(s)
Habla , Análisis de Fourier
2.
J Acoust Soc Am ; 152(3): 1721, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36182326

RESUMEN

This paper examines the phase derivatives of speech signals. The instantaneous complex spectra obtained in the single frequency filtering (SFF) analysis of signals is used to derive the phase function. The problem of phase wrapping is avoided by using the proposed modification to SFF analysis to derive a scaled down version of the phase function. We consider the derivatives of the exponent (i.e., logarithm) of the complex SFF spectra, with respect to frequency, time, and both frequency and time. The imaginary part of the exponent is the phase function, and the real part is the log magnitude function. The negative derivative of phase with respect to frequency is the group delay (GD) function, and the derivative of the phase with respect to time is the instantaneous frequency (IF) function. The features of speech production displayed through the GD function are compared with the features displayed through the derivative with respect to frequency of the corresponding log magnitude function. Likewise, the features of production displayed through the IF function are compared with the features displayed through the derivative with respect to time of the corresponding log magnitude function. The speech production characteristics reflected in these representations of phase derivatives are examined for different types of utterances.


Asunto(s)
Acústica del Lenguaje , Habla , Medición de la Producción del Habla
3.
J Acoust Soc Am ; 146(6): 4446, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31893761

RESUMEN

Aperiodicity in the voice source is caused by changes in the vocal fold vibrations, other than the normal quasi-periodicity and the turbulence at the glottis. The aperiodicity appears to be one of the main properties that is responsible for conveying the emotion in artistic voices. In this paper, the feasibility of representing the excitation source characteristics in artistic (Noh) singing voice by an impulse-like sequence in the time domain is examined. The impulses at the glottal closure instants contribute to the major excitation of the vocal tract system. The sequence of such impulses produces harmonics of the fundamental frequency in the spectrum. The amplitude variation or amplitude modulation (AM) of these impulses in the sequence contributes to the aperiodicity in the excitation, and can result in appearance of subharmonics in the spectrum. The variation in the impulse intervals or frequency modulation (FM) can also contribute to the aperiodicity in the excitation. The aperiodic component of the excitation in the Noh voice is examined in the impulse-like sequence derived from the signal using the single frequency filtering analysis. The effects of aperiodicity are explained for synthetic AM and FM sequences of impulses using spectrograms and saliency plots.

4.
J Acoust Soc Am ; 145(1): 551, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30710923

RESUMEN

Speech produced by a speaker in emotionally charged situations, such as anger, happiness, and shout corresponds to high arousal speech. Changes in the production characteristics such as increase in the subglottal air pressure, increase in the glottal closed phase in each cycle, and increase in the rate of glottal vibration are observed in the high arousal speech. Acoustic parameters such as glottal closed quotient and fundamental frequency (F0) are used to characterize the high arousal speech. In this paper, high arousal is characterized by features extracted using the zero-time windowing (ZTW) method. The spectrum derived from the ZTW method emphasizes the instantaneous spectral characteristics in the speech signal. In the glottal open region, changes are clearly observed in the lower frequency range of the spectrum. Distinctive spectral features are observed during the glottal open region in the case of high arousal speech, when compared to neutral speech. These features are used to develop a method for identification of high arousal speech. Simple and maybe somewhat ad hoc rules, based on these features seem to give good performance in the identification of high arousal speech, even without using neutral speech as reference.


Asunto(s)
Nivel de Alerta , Emociones , Acústica del Lenguaje , Voz/fisiología , Adulto , Femenino , Glotis/fisiología , Humanos , Masculino
5.
J Acoust Soc Am ; 140(1): 666, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27475188

RESUMEN

This paper presents an approach to determine the open phase region of a glottal cycle based on changes in the characteristics of the vocal tract system. The glottal closing phase contributes to major excitation of the vocal tract system. The opening phase affects the vocal tract system characteristics by effectively increasing the length of the tract, due to coupling of the subglottal region. To determine the glottal open region, it is necessary to estimate the vocal tract characteristics from the segment with subglottal coupling. The proposed method derives the dominant resonance frequency (DRF) of the vocal tract system at every sampling instant, using a heavily decaying window (HDW) for analysis. The DRF contour transits to lower frequencies during glottal open region, when compared to the glottal closed region. The open region, within the glottal cycles from voiced speech segment, is extracted using the HDW method. The results are compared with the open region derived from the electroglottograph (EGG) signals and speech signals. The results show that the proposed method based on DRF contour, derived from the speech signals, seems to perform better than the methods based on EGG signals.


Asunto(s)
Fonación/fisiología , Pliegues Vocales/fisiología , Glotis/fisiología , Humanos , Habla , Vibración , Voz
6.
J Acoust Soc Am ; 137(6): 3411-21, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26093430

RESUMEN

The feasibility of representing the excitation source characteristics in expressive voice signals by an aperiodic sequence of impulses in the time domain is examined in this paper. In particular, the aperiodic components of excitation of expressive voices, like the Noh voice, are examined in some detail. The aperiodic component is extracted from the speech signal using a modified zero-frequency filtering method, and it is represented using a sequence of impulses with amplitudes corresponding to the relative strength of excitation around each impulse. The spectral characteristics of the aperiodic sequence show subharmonics and harmonics of the fundamental frequency corresponding to pitch. The effects of aperiodicity are examined using spectrograms and saliency plots of synthetic amplitude and duration (i.e., frequency) modulation of sequences of impulses.

7.
J Acoust Soc Am ; 136(4): 1932-41, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25324092

RESUMEN

Characteristics of glottal vibration are affected by the obstruction to the flow of air through the vocal tract system. The obstruction to the airflow is determined by the nature, location, and extent of constriction in the vocal tract during production of voiced sounds. The effects of constriction on glottal vibration are examined for six different categories of speech sounds having varying degree of constriction. The effects are examined in terms of source and system features derived from the speech and electroglottograph signals. It is observed that a high degree of constriction causing obstruction to the flow of air results in large changes in these features, relative to the adjacent steady vowel regions, as in the case of apical trill and alveolar fricative sounds. These changes are insignificant when the obstruction to the airflow is less, as in the case of velar fricative and lateral approximant sounds. There are no changes in the excitation features when there is a free flow of air along the auxiliary tract, despite constriction in the vocal tract, as in the case of nasals. These studies show that effects of constriction can indeed be observed in the features of glottal vibration as well as vocal tract resonances.

8.
J Acoust Soc Am ; 133(5): 3050-61, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23654408

RESUMEN

In this paper characteristics of speech produced at different loudness levels are analyzed in terms of changes in the glottal excitation. Four loudness levels are considered in this study, namely, soft, normal, loud, and shout. The distinct changes in the excitation of the shout signal are analyzed using electroglottograph signals. The open and closed phases of the glottal vibration are distinctly different for shout signals, in comparison with those for normal speech. It is generally difficult to derive the glottal pulse information from the speech signal due to limitations in inverse filtering. Hence, the effects of changes in the excitation are examined by analyzing the speech signal using methods that can capture the temporal variations of the spectral features. In particular, the recently proposed methods of zero-frequency filtering and zero-time liftering are used in this analysis. It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region. The ratio of the LF to high frequency energy clearly discriminates the speech produced at different loudness levels. These distinctions in the excitation features are also observed in different vowel contexts and across several speakers.


Asunto(s)
Glotis/fisiología , Fonación , Acústica del Lenguaje , Calidad de la Voz , Acústica , Fenómenos Biomecánicos , Electrodiagnóstico , Femenino , Humanos , Masculino , Procesamiento de Señales Asistido por Computador , Espectrografía del Sonido , Medición de la Producción del Habla , Factores de Tiempo , Vibración
9.
J Acoust Soc Am ; 133(5): 3072-82, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23654410

RESUMEN

In this paper, a method to synthesize laughter by modifying the excitation source information is presented. The excitation source information is derived by extracting epoch locations and instantaneous fundamental frequency using zero frequency filtering approach. The zero frequency filtering approach is modified to capture the rapidly varying instantaneous fundamental frequency in natural laugh signals. The nature of variation of excitation features in natural laughter is examined to determine the features to be incorporated in the synthesis of a laugh signal. Features such as pitch period and strength of excitation are modified in the utterance of vowel /a/ or /i/ to generate the laughter signal. Frication is also incorporated wherever appropriate. Laugh signal is generated by varying parameters at both call level and bout level. Experiments are conducted to determine the significance of different features in the perception of laughter. Subjective evaluation is performed to determine the level of acceptance and quality of synthesis of the synthesized laughter signal for different choices of parameter values and for different input types.


Asunto(s)
Acústica , Percepción Auditiva , Risa , Femenino , Humanos , Masculino , Fonación , Procesamiento de Señales Asistido por Computador , Espectrografía del Sonido , Acústica del Lenguaje , Percepción del Habla , Factores de Tiempo , Calidad de la Voz
10.
J Acoust Soc Am ; 131(4): 3141-52, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22501086

RESUMEN

In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed.


Asunto(s)
Fonación/fisiología , Fonética , Acústica del Lenguaje , Lengua/fisiología , Glotis/fisiología , Humanos , Masculino , Espectrografía del Sonido , Vibración
11.
J Acoust Soc Am ; 126(4): 2061-71, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19813815

RESUMEN

The impulse-like characteristic of glottal excitation in speech production is an important factor in the perception of loudness of speech signals. This characteristic is attributed to the abruptness of the closing phase in the glottal cycle. In this paper, an acoustic feature, called strength of excitation, is proposed to represent the impulse-like nature of excitation. The strength of excitation is derived from the linear prediction residual of speech signals, where the residual can be considered as an estimate of the source of excitation. Since the loudness of speech is perceived over one or more utterances of speech, it is hypothesized that the distribution of strength of excitation is indicative of the perceived loudness of speech. The distribution of strength of excitation is shown to distinguish between soft and loud utterances of speakers. The distribution can also help in discriminating between the loudness of two speakers. The loudness measure obtained using the distribution of the strength of excitation is in agreement with the subjective judgment of loudness of speech.


Asunto(s)
Glotis/fisiología , Percepción Sonora , Acústica del Lenguaje , Percepción del Habla , Habla/fisiología , Adolescente , Adulto , Algoritmos , Electrodiagnóstico , Femenino , Humanos , Masculino , Psicoacústica , Factores de Tiempo , Adulto Joven
12.
IEEE Trans Image Process ; 17(4): 594-607, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-18390367

RESUMEN

Changes in motion properties of trajectories provide useful cues for modeling and recognizing human activities. We associate an event with significant changes that are localized in time and space, and represent activities as a sequence of such events. The localized nature of events allows for detection of subtle changes or anomalies in activities. In this paper, we present a probabilistic approach for representing events using the hidden Markov model (HMM) framework. Using trained HMMs for activities, an event probability sequence is computed for every motion trajectory in the training set. It reflects the probability of an event occurring at every time instant. Though the parameters of the trained HMMs depend on viewing direction, the event probability sequences are robust to changes in viewing direction. We describe sufficient conditions for the existence of view invariance. The usefulness of the proposed event representation is illustrated using activity recognition and anomaly detection. Experiments using the indoor University of Central Florida human action dataset, the Carnegie Mellon University Credo Intelligence, Inc., Motion Capture dataset, and the outdoor Transportation Security Administration airport tarmac surveillance dataset show encouraging results.


Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador/métodos , Modelos Biológicos , Actividad Motora/fisiología , Movimiento/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Grabación en Video/métodos , Simulación por Computador , Humanos , Aumento de la Imagen/métodos , Modelos Estadísticos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
13.
Neural Netw ; 15(3): 459-69, 2002 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-12125897

RESUMEN

The objective in any pattern recognition problem is to capture the characteristics common to each class from feature vectors of the training data. While Gaussian mixture models appear to be general enough to characterize the distribution of the given data, the model is constrained by the fact that the shape of the components of the distribution is assumed to be Gaussian, and the number of mixtures are fixed a priori. In this context, we investigate the potential of non-linear models such as autoassociative neural network (AANN) models, which perform identity mapping of the input space. We show that the training error surface realized by the neural network model in the feature space is useful to study the characteristics of the distribution of the input data. We also propose a method of obtaining an error surface to match the distribution of the given data. The distribution capturing ability of AANN models is illustrated in the context of speaker verification.


Asunto(s)
Redes Neurales de la Computación , Distribución Normal , Reconocimiento de Normas Patrones Automatizadas
14.
IEEE Trans Image Process ; 13(12): 1559-66, 2004 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-15575151

RESUMEN

This paper addresses the problem of detecting axes of bilateral symmetry in images. In order to achieve robustness to variation in illumination, only edge-gradient information is used. To overcome the problem of edge breaks, a potential field is developed from the edge map which spreads the information in the image plane. Pairs of points in the image plane are made to vote for their axes of symmetry with some confidence values. To make the method robust to overlapping objects, only local features in the form of Taylor coefficients are used for quantifying symmetry. We define an axis of symmetry histogram, which is used to accumulate the weighted votes for all possible axes of symmetry. To reduce the computational complexity of voting, a hashing scheme is proposed, wherein pairs of points, whose potential fields are too asymmetric, are pruned by not being counted for the vote. Experimental results indicate that the proposed method is fairly robust to edge breaks and is able to detect symmetries even when only 0.05% of the possible pairs are used for voting.


Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis por Conglomerados , Gráficos por Computador , Simulación por Computador , Análisis Numérico Asistido por Computador , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Procesamiento de Señales Asistido por Computador
15.
IEEE Trans Image Process ; 5(12): 1625-36, 1996.
Artículo en Inglés | MEDLINE | ID: mdl-18290080

RESUMEN

A supervised texture segmentation scheme is proposed in this article. The texture features are extracted by filtering the given image using a filter bank consisting of a number of Gabor filters with different frequencies, resolutions, and orientations. The segmentation model consists of feature formation, partition, and competition processes. In the feature formation process, the texture features from the Gabor filter bank are modeled as a Gaussian distribution. The image partition is represented as a noncausal Markov random field (MRF) by means of the partition process. The competition process constrains the overall system to have a single label for each pixel. Using these three random processes, the a posteriori probability of each pixel label is expressed as a Gibbs distribution. The corresponding Gibbs energy function is implemented as a set of constraints on each pixel by using a neural network model based on Hopfield network. A deterministic relaxation strategy is used to evolve the minimum energy state of the network, corresponding to a maximum a posteriori (MAP) probability. This results in an optimal segmentation of the textured image. The performance of the scheme is demonstrated on a variety of images including images from remote sensing.

16.
IEEE Trans Image Process ; 6(10): 1376-87, 1997.
Artículo en Inglés | MEDLINE | ID: mdl-18282893

RESUMEN

This paper describes the use of a neural network architecture for classifying textured images in an unsupervised manner using image-specific constraints. The texture features are extracted by using two-dimensional (2-D) Gabor filters arranged as a set of wavelet bases. The classification model comprises feature quantization, partition, and competition processes. The feature quantization process uses a vector quantizer to quantize the features into codevectors, where the probability of grouping the vectors is modeled as Gibbs distribution. A set of label constraints for each pixel in the image are provided by the partition and competition processes. An energy function corresponding to the a posteriori probability is derived from these processes, and a neural network is used to represent this energy function. The state of the network and the codevectors of the vector quantizer are iteratively adjusted using a deterministic relaxation procedure until a stable state is reached. The final equilibrium state of the vector quantizer gives a classification of the textured image. A cluster validity measure based on modified Hubert index is used to determine the optimal number of texture classes in the image.

17.
IEEE Trans Neural Netw ; 9(3): 516-22, 1998.
Artículo en Inglés | MEDLINE | ID: mdl-18252475

RESUMEN

In this paper, the texture classification problem is projected as a constraint satisfaction problem. The focus is on the use of a probabilistic neural network (PNN) for representing the distribution of feature vectors of each texture class in order to generate a feature-label interaction constraint. This distribution of features for each class is assumed as a Gaussian mixture model. The feature-label interactions and a set of label-label interactions are represented on a constraint satisfaction neural network. A stochastic relaxation strategy is used to obtain an optimal classification of textures in an image. The advantage of this approach is that all classes in an image are determined simultaneously, similar to human perception of textures in an image.

18.
J Acoust Soc Am ; 118(1): 364-74, 2005 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-16119357

RESUMEN

The head related impulse response (HRIR) characterizes the auditory cues created by scattering of sound off a person's anatomy. The experimentally measured HRIR depends on several factors such as reflections from body parts (torso, shoulder, and knees), head diffraction, and reflection/ diffraction effects due to the pinna. Structural models (Algazi et al., 2002; Brown and Duda, 1998) seek to establish direct relationships between the features in the HRIR and the anatomy. While there is evidence that particular features in the HRIR can be explained by anthropometry, the creation of such models from experimental data is hampered by the fact that the extraction of the features in the HRIR is not automatic. One of the prominent features observed in the HRIR, and one that has been shown to be important for elevation perception, are the deep spectral notches attributed to the pinna. In this paper we propose a method to robustly extract the frequencies of the pinna spectral notches from the measured HRIR, distinguishing them from other confounding features. The method also extracts the resonances described by Shaw (1997). The techniques are applied to the publicly available CIPIC HRIR database (Algazi et al., 2001c). The extracted notch frequencies are related to the physical dimensions and shape of the pinna.


Asunto(s)
Acústica , Señales (Psicología) , Oído Externo/anatomía & histología , Oído Externo/fisiología , Cabeza , Audición/fisiología , Sonido , Algoritmos , Humanos , Modelos Teóricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA