RESUMO
Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.
Assuntos
Emoções , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Algoritmos , Reprodutibilidade dos Testes , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados FactuaisRESUMO
A multichannel speech enhancement system usually consists of spatial filters such as adaptive beamformers followed by postfilters, which suppress remaining noise. Accurate estimation of the power spectral density (PSD) of the residual noise is crucial for successful noise reduction in the postfilters. In this paper, we propose a postfilter utilizing proposed a posteriori speech presence probability (SPP) and noise PSD estimators, which are based on both the coherence and the statistical models. We model the coherence-based a posteriori SPP as a simple function of the magnitude of coherence between two microphone signals and combine it with a single-channel SPP based on statistical models. The coherence-based estimator for the PSD of the noise remaining in the beamformer output in the presence of speech is derived using the pseudo-coherence considering the effect of the beamformers, which is used to construct the coherence-based noise PSD estimator. Then, the final noise PSD estimator is obtained by combining the coherence-based and statistical model-based noise PSD estimators with the proposed SPP. The spectral gain function is also modified, incorporating the proposed SPP. Experimental results demonstrate that the proposed method led to more accurate noise PSD estimation and perceptual evaluation of speech quality scores in various diffuse noise environments, and did not degrade the speech quality under the presence of directional interference, although the proposed method utilizes the coherence information.
RESUMO
Online multi-microphone speech enhancement aims to extract target speech from multiple noisy inputs by exploiting the spatial information as well as the spectro-temporal characteristics with low latency. Acoustic parameters such as the acoustic transfer function and speech and noise spatial covariance matrices (SCMs) should be estimated in a causal manner to enable the online estimation of the clean speech spectra. In this paper, we propose an improved estimator for the speech SCM, which can be parameterized with the speech power spectral density (PSD) and relative transfer function (RTF). Specifically, we adopt the temporal cepstrum smoothing (TCS) scheme to estimate the speech PSD, which is conventionally estimated with temporal smoothing. Furthermore, we propose a novel RTF estimator based on a time difference of arrival (TDoA) estimate obtained by the cross-correlation method. Furthermore, we propose refining the initial estimate of speech SCM by utilizing the estimates for the clean speech spectrum and clean speech power spectrum. The proposed approach showed superior performance in terms of the perceptual evaluation of speech quality (PESQ) scores, extended short-time objective intelligibility (eSTOI), and scale-invariant signal-to-distortion ratio (SISDR) in our experiments on the CHiME-4 database.
Assuntos
Percepção da Fala , Fala , Ruído , AcústicaRESUMO
The three-dimensional (3D) coordination polymers [Cd(tpmd)(NCX)2]n [X = O (1), S (2), and BH3 (3); tpmd = N,N,N',N'-tetrakis(pyridin-4-yl)methanediamine] have been determined to display their network structures through coordinated anionic ligands. Polymers 1 and 2 show nonporous structures, whereas polymer 3 shows a porous coordination framework. On the basis of the Cd(II) network structures, the 3D coordination polymer [Zn(tpmd)(NCBH3)2]n·nMeOH (4) was self-assembled. In the cases of polymers 1 and 2, pseudohalide ions acted to form nonporous network structures; however, in polymers 3 and 4, NCBH3- helps to construct porous network structures. Polymers 1-4 show strong ultraviolet luminescence emissions, depending on the pseudohalide ions present, compared to the tpmd ligands. Interestingly, coordination polymers 3 and 4 that possess NCBH3- ions exhibit high porosities and gas sorption properties. The polymers appeared to absorb N2, H2, CO2, and CH4. In the case of polymer 4, the structure is almost identical with that of polymer 3, except for the Cd(II) ion. However, polymer 4 has a larger void volume and higher gas absorption ability for N2 gas than polymer 3. For the sorption of gases, polymers 3 and 4 showed similar behaviors.
RESUMO
In this paper, we propose a novel emotion recognition method based on the underlying emotional characteristics extracted from a conditional adversarial auto-encoder (CAAE), in which both acoustic and lexical features are used as inputs. The acoustic features are generated by calculating statistical functionals of low-level descriptors and by a deep neural network (DNN). These acoustic features are concatenated with three types of lexical features extracted from the text, which are a sparse representation, a distributed representation, and an affective lexicon-based dimensions. Two-dimensional latent representations similar to vectors in the valence-arousal space are obtained by a CAAE, which can be directly mapped into the emotional classes without the need for a sophisticated classifier. In contrast to the previous attempt to a CAAE using only acoustic features, the proposed approach could enhance the performance of the emotion recognition because combined acoustic and lexical features provide enough discriminant power. Experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus showed that our method outperformed the previously reported best results on the same corpus, achieving 76.72% in the unweighted average recall.
Assuntos
Acústica , Nível de Alerta , Emoções , Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manners by background noises and reverberation and therefore can have complementary information. Conventional approaches utilize the statistics from all frequencies with fixed weight, although the information from some time-frequency bins may degrade the performance of VAD. In this letter, we propose a dual microphone VAD scheme based on the spatial cues in reliable frequency bins only, considering the sparsity of the speech signal in the time-frequency domain. The reliability of each time-frequency bin is determined by three conditions on signal energy, ILD, and ITD. ITD-based and ILD-based VADs and statistics are evaluated using the information from selected frequency bins and then combined to produce the final VAD results. Experimental results show that the proposed frequency selective approach enhances the performances of VAD in realistic environments.
RESUMO
Dinuclear FeIII and NiII complexes, [(phenO)Fe(N3 )]2 (NO3 )2 (1) and [(phenOH)Ni(N3 )2 ]2 (2), were prepared by treating Fe(NO3 )3 â 9 H2 O and Ni(NO3 )2 â 6 H2 O in methanol, respectively, with phenOH (=N-(2-pyridylmethyl)-N'-(2-hydroxyethyl)ethylenediamine) and NaN3 ; both 1 and 2 were characterized by elemental analysis, IR spectroscopy, X-ray diffraction, and magnetic susceptibility measurements. Two ethoxo-bridged FeIII and two azido-bridged NiII were observed in 1 and 2, respectively; corresponding antiferromagnetic interaction via the bridged ethoxo groups and strong ferromagnetic coupling via the bridged end-on azido ligands within the dimeric unit were observed. Complex 1 did not exhibit any catalytic activity, while 2 exhibited excellent catalytic activities for the epoxidation of aliphatic, aromatic, and terminal olefins.
RESUMO
In this letter, a multichannel decision-directed approach to estimate the speech power spectral density (PSD) matrix for multichannel speech enhancement is proposed. There have been attempts to build multichannel speech enhancement filters which depend only on the speech and noise PSD matrices, for which the accurate estimate of the clean speech PSD matrix is crucial for a successful noise reduction. In contrast to the maximum likelihood estimator which has been applied conventionally, the proposed decision-directed method is capable of tracking the time-varying speech characteristics more robustly and improves the noise reduction performance under various noise environments.
Assuntos
Acústica , Modelos Teóricos , Ruído/efeitos adversos , Processamento de Sinais Assistido por Computador , Medida da Produção da Fala/métodos , Fala , Análise de Fourier , Humanos , Movimento (Física) , Espectrografia do Som , VibraçãoRESUMO
BL2D-SMC at the Pohang Light Source II is a supramolecular crystallography beamline based on a bending magnet. The beamline delivers high-flux tunable X-rays with energies from 8.3 to 20.7â keV and a 100â µm (horizontal) × 85â µm (vertical) full width at half-maximum focal spot. Experiments involving variable temperature, photo-excitation and gas sorption are supported by ancillary equipment and software in the beamline. The design of the beamline, its role and the main components are described.
RESUMO
Bioactivity-guided isolation of a methanolic extract of Euphorbia fischeriana led to the isolation of four new abietane-type diterpenoids, fischeriolides A-D (1-4), together with 11 known diterpenoids. Their structures were elucidated based on the interpretation of 1D and 2D NMR spectroscopic and HRESIMS data. The absolute configuration of compound 3 was determined by single-crystal X-ray diffraction analysis and electronic circular dichroism methods. Compounds 5-9 exhibited inhibitory effects on LPS-induced nitric oxide production in RAW 264.7 macrophages with IC50 values in the range 4.9-12.6 µM.
Assuntos
Diterpenos/isolamento & purificação , Diterpenos/farmacologia , Medicamentos de Ervas Chinesas/isolamento & purificação , Medicamentos de Ervas Chinesas/farmacologia , Euphorbia/química , Óxido Nítrico/biossíntese , Animais , Antineoplásicos Fitogênicos/química , Cristalografia por Raios X , Diterpenos/química , Medicamentos de Ervas Chinesas/química , Lipopolissacarídeos/farmacologia , Macrófagos/efeitos dos fármacos , Camundongos , Conformação Molecular , Estrutura Molecular , Raízes de Plantas/química , República da CoreiaRESUMO
The asymmetric unit of the title compound, [Mn(C13H15N4O)2]NO3·CH3OH, contains two independent complex cations, in each of which the Mn(III) ion is located on an inversion centre. The Mn(III) ion is coordinated by four N and two O atoms from two 1,3-bis-{[(1H-pyrrol-2-yl)methyl-idene]amino}-propan-2-olate ligands, resulting in a distorted octa-hedral geometry. The average Mn-ligand bond lengths in the two complex mol-ecules are 2.074 and 2.079â Å. In the crystal, inter-molecular N-Hâ¯O hydrogen bonds between the pyrrole group of the ligand and the non-coordinating nitrate ion give rise to a chain structure along [10-1]. The methanol solvent mol-ecule and the nitrate ion are connected by an O-Hâ¯O hydrogen bond.
RESUMO
In the title compound, [Fe(C10H15N2O2)Cl2]·2H2O, the Fe(III) ion is coordinated by two N and two O atoms of the tetra-dentate 2-{(2-hy-droxy-eth-yl)(pyridin-2-ylmeth-yl)amino}-ethano-late ligand and by two chloride anions, resulting in a distorted octa-hedral coordination sphere. The average Fe-X (X = ligand N and O atoms) and Fe-Cl bond lengths are 2.10 and 2.32â Å, respectively. In the crystal, duplex O-Hâ¯O hydrogen bonds between the hydroxyl and eth-oxy groups of two neighbouring complexes give rise to a dimeric unit. The dimers are connected to the lattice water mol-ecules (one of which is equally disordered over two sets of sites) through O-Hâ¯Cl hydrogen bonds, forming undulating sheets parallel to (010). Weak C-Hâ¯Cl hydrogen bonds are also observed.
RESUMO
Objectives: Previous research has predominantly focused on total bilirubin levels without clearly distinguishing between direct and indirect bilirubin. In this study, the differences between these forms were examined, and their potential causal relationships with ischemic stroke were investigated. Methods: Two-sample multivariable Mendelian randomization (MVMR) analysis was employed, extracting summary data on bilirubin from the Korean Cancer Prevention Study-II (KCPS-II; n=159,844) and the Korean Genome and Epidemiology Study (KoGES; n=72,299). Data on ischemic stroke were obtained from BioBank Japan (BBJ; n=201,800). Colocalization analysis was performed, focusing on the UGT1A1, SLCO1B1, and SLCO1B3 genes, which are the primary loci associated with serum bilirubin levels. Results: Crude 2-sample Mendelian randomization analysis revealed a significant negative association between total bilirubin levels and ischemic stroke. However, in MVMR analyses, only indirect bilirubin demonstrated a significant negative association with ischemic stroke (odds ratio, 0.76; 95% confidence interval, 0.59 to 0.98). Colocalization analysis did not identify a shared causal variant between the 3 genetic loci related to indirect bilirubin and the risk of ischemic stroke. Conclusion: Our study establishes a causal association between higher genetically determined levels of serum indirect bilirubin and reduced risk of ischemic stroke in an Asian population. Future research should include more in-depth analysis of shared genetic variants between indirect bilirubin and ischemic stroke.
RESUMO
Three Fe(III)-based coordination complexes [Fe(dqmp)2](NO3)·H2O (1), [Fe(dqmp)2](BF4)·2CH3COCH3 (2), and [Fe(dqmp)2](ClO4) (3) were synthesized from Fe(NO3)3·9H2O/Fe(ClO4)3·xH2O, NaBF4, and 2,4-dichloro-6-((quinoline-8-ylimino)methyl)phenol (Hdqmp) in methanol/acetone and characterized. The structures of complexes 1-3 were determined via single-crystal X-ray crystallography at 100 K and room temperature, and their magnetic properties in the solid and solution forms were investigated. All complexes showed meridional structures with two tridentate dqmp- ligands coordinated with Fe(III) cations. In the solid state, complex 1 showed an abrupt and complete spin crossover at 225 K, whereas complexes 2 and 3 exhibited an incomplete spin crossover at 135 and 150 K, respectively. In a dimethylformamide solution, the complexes showed counterion-dependent spin transitions. In contrast to the solid state, in solution, complex 1 did not exhibit complete spin crossover. However, complexes 2 and 3 showed more complete spin transitions in solutions than in the solid state. The relaxation times, T1 and T2, for 1 and 2 were determined and both increased with temperature from 220 to 380 K. The T1 of 1 was larger than that of 2 at 380 K, and the T1 values were larger than the T2 values.
RESUMO
Coordination polymer networks, i.e., [Zn(tpmd)(H2O)](NO3)2·7H2O (1) and [Cd(tpmd)(H2O)2](NO3)2·4H2O·4CH3OH (2), were assembled from M(II)(NO3)2 hydrates (M = Zn, Cd) and N,N,N',N'-tetrakis(pyridin-4-yl)methanediamine (tpmd) in CH3OH and characterized. 1 and 2 feature three-dimensional networks formed by coordination of the metal ions to the tpmd ligands. 1 exhibits a strong blue emission at â¼397 nm, while 2 shows strong emission at â¼361 nm. 1 is a more efficient catalyst for the transesterification of various esters than 2.
Assuntos
Cádmio/química , Diaminas/química , Luminescência , Compostos Organometálicos/química , Piridinas/química , Zinco/química , Álcoois/síntese química , Álcoois/química , Catálise , Ésteres/química , Modelos Moleculares , Estrutura Molecular , Compostos Organometálicos/síntese químicaRESUMO
This paper proposes a voice activity detection (VAD) method in a kernel subspace domain to improve the performance of the kernel-based VAD. A linear transform matrix that can simultaneously diagonalize the two covariance matrices using kernel principal component analysis is presented to generate the kernel subspace. The likelihood ratio test based on Gaussian distributions is applied for the VAD in the kernel subspace. Experimental results show that the proposed VAD algorithm outperforms the conventional approaches under various noise conditions.
Assuntos
Algoritmos , Processamento de Sinais Assistido por Computador , Espectrografia do Som , Acústica da Fala , Percepção da Fala , Voz , Análise de Fourier , Humanos , Modelos Lineares , Distribuição NormalRESUMO
Use of Hirshfeld surfaces calculated from crystal structure determinations on various transition metal ion complexes of three terpyridine ligands carrying trimethoxyphenyl substituents has enabled an assessment of the contribution made by the ligand components to the interactions determining the lattice structures, interactions expected also to be present in metallomesogens derived from similar ligands. The form of the link joining the trimethoxyphenyl substituent to the 4' position of 2,2';6',2''-terpyridine is of some importance. In the case of the Co(II) complexes of two of the ligands, their spin-crossover characteristics can be rationalised in terms of the different interactions seen in their lattices.
Assuntos
Íons/química , Ligantes , Metais/químicaRESUMO
In the title compound, C(21)H(18)N(6)·H(2)O, two 4,4'-dipyridyl-amine groups are linked by a methyl-ene C atom, which sits on a twofold axis. The lattice water mol-ecule is located slightly off a twofold axis, and is therefore disordered over two positions. In the crystal, the organic mol-ecules and the water mol-ecule are linked by O-Hâ¯N hydrogen bonds. The organic mol-ecules exhibit extensive offset face-to-face π-π inter-actions to symmetry equivalents [centroid-centroid distances = 3.725â (3) and 4.059â (3)â Å].
RESUMO
A mixed valence tetranuclear iron complex [(Hpmide)FeII(NCSe)2FeIII(pmide)]2·5CH3OH (1) underwent oxidation and ligand exchange in the solid state (H2pmide = N-(2-pyridylmethyl)iminodiethanol). Upon air oxidation, 1 was converted into [(pmide)FeIII(NCSe)FeIII(pmide)]2(NCSe)2·2H2O (2), which was accompanied by deprotonation and ligand exchange through a single crystal-to-single-crystal transformation.
RESUMO
In the title compound, [Cu(N(3))(2)(C(20)H(21)N(3))], the Cu(II) ion is coordinated by the three N atoms of the (S)-1-phenyl-N,N-bis-[(2-pyrid-yl)meth-yl]ethanamine ligand and two N atoms from two azide anions, resulting in a distorted square-pyramidal environment. A weak inter-molecular C-Hâ¯N hydrogen-bonding inter-action between one pyridine group of the ligand and an azide N atom of an adjacent complex unit gives a one-dimensional chain structure parallel to the c axis.