Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 153(4): 2406, 2023 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-37092944

RESUMO

A data-driven approach to constructing a prosodic grammar of Mandarin read speech is proposed. Prosodic labeling is performed, first, on a large speech corpus with syntactic-tree parsing to add four-level break indices. Two types of prosodic grammatical rules are explored. One type is composed of simplified rules to compute break-type distributions at critical junctures for 5 phrase-level and 11 basic syntactic patterns. The other type entails detailed rules to compute break-type distributions conditioned on syntactic function for four determinative-measure (DM)-related syntactic patterns. Effectiveness of the approach was confirmed by meaningful interpretations of the resulting main prosodic patterns and outliers of targeted syntactic patterns by inferred rules. The main findings are given below. Strong paused breaks are found at VE-clause object (VE, active verb with a sentential object) junctures and junctures after idioms. For DM-related patterns, the entropies of break-type distributions decrease significantly as syntactic functions are involved; break-type distributions on both edges are seriously affected by their syntactic functions; when acting as subject (S) and object (O), their prosodic phenomena support the tendency of Mandarin to be S(VO) (V, verb); strong paused breaks at postboundaries of DM-2-DM-4 are caused by their more complex syntactic structures and greater lengths; and the insertions of modifier + DE (special tag for the word DE) into DM-N (N, noun) junctures cause more paused-break insertions at junctures after DMs.


Assuntos
Percepção da Fala , Fala , Linguística , Idioma , Estimulação Acústica
2.
J Acoust Soc Am ; 145(4): 2576, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-31046330

RESUMO

In this paper, a hierarchical prosody model (HPM)-based method for Mandarin spontaneous speech is proposed. First, an HPM is designed for describing relations among acoustic features of utterances, linguistic features of texts, and prosodic tags representing the underlying hierarchical prosodic structures of utterances. Subsequently, a sequential optimization algorithm is employed to train the HPM based on a large conversational speech corpus, the Mandarin Conversational Dialogue Corpus (MCDC), which features orthographic transcriptions and prosodic event annotations. In this unsupervised training method, all utterances of the MCDC are labeled with two types of prosodic tags, namely, break and prosodic states, automatically and simultaneously. After training, the HPM parameters are examined to identify critical prosodic properties of Mandarin spontaneous speech, which are then compared with their counterparts in the read-speech HPM. The prosodic tags on the studied utterances enable mapping of various prosodic events onto the hierarchical prosodic structures of the utterances. Prosodic analyses of some disfluent events are conducted using the prosodic tags affixed to the MCDC. Finally, an application of the HPM to assist in Mandarin spontaneous-speech recognition is discussed. Significant relative error rate reductions of 9.0%, 9.2%, 15.6%, and 7.3% are obtained for base-syllable, character, tone, and word recognition, respectively.

3.
J Acoust Soc Am ; 125(2): 1164-83, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19206890

RESUMO

An unsupervised joint prosody labeling and modeling method for Mandarin speech is proposed, a new scheme intended to construct statistical prosodic models and to label prosodic tags consistently for Mandarin speech. Two types of prosodic tags are determined by four prosodic models designed to illustrate the hierarchy of Mandarin prosody: the break of a syllable juncture to demarcate prosodic constituents and the prosodic state to represent any prosodic domain's pitch-level variation resulting from its upper-layered prosodic constituents' influences. The performance of the proposed method was evaluated using an unlabeled read-speech corpus articulated by an experienced female announcer. Experimental results showed that the estimated parameters of the four prosodic models were able to explore and describe the structures and patterns of Mandarin prosody. Besides, certain corresponding relationships between the break indices labeled and the associated words were found, and manifested the connections between prosodic and linguistic parameters, a finding further verifying the capability of the method presented. Finally, a quantitative comparison in labeling results between the proposed method and human labelers indicated that the former was more consistent and discriminative than the latter in prosodic feature distributions, a merit of the method developed here on the applications of prosody modeling.


Assuntos
Sinais (Psicologia) , Idioma , Modelos Estatísticos , Acústica da Fala , Percepção da Fala , Algoritmos , Feminino , Humanos , Reconhecimento Fisiológico de Modelo , Percepção da Altura Sonora
4.
J Acoust Soc Am ; 117(2): 908-25, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15759710

RESUMO

A statistics-based syllable pitch contour model for Mandarin speech is proposed. This approach takes the mean and the shape of a syllable log-pitch contour as two basic modeling units and considers several affecting factors that contribute to their variations. The affecting factors include the speaker, prosodic state (which essentially represents the high-level linguistic components of F0 and will be explained more clearly in Sec. I), tone, and initial and final syllable classes. The parameters of the two modeling units were automatically estimated using the expectation-maximization (EM) algorithm. Experimental results showed that the root mean squared errors (RMSEs) obtained in the closed and open tests in the reconstructed pitch period were 0.362 and 0.373 ms, respectively. This model provides a way to separate the effects of several major factors. All of the inferred values of the affecting factors were in close agreement with our prior linguistic knowledge. It also gives a quantitative and more complete description of the coarticulation effect of neighboring tones rather than conventional qualitative descriptions of the tone sandhi rules. In addition, the model can provide useful cues to determine the prosodic phrase boundaries, including those occurring at intersyllable locations, with or without punctuation marks.


Assuntos
Idioma , Fonética , Discriminação da Altura Tonal , Espectrografia do Som/estatística & dados numéricos , Acústica da Fala , Adulto , Feminino , Humanos , Masculino , Modelos Estatísticos , Psicolinguística
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA