Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 92
Filtrar
1.
J Neural Eng ; 21(3)2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38688262

RESUMO

Objective.The rapid serial visual presentation (RSVP) paradigm, which is based on the electroencephalogram (EEG) technology, is an effective approach for object detection. It aims to detect the event-related potentials (ERP) components evoked by target images for rapid identification. However, the object detection performance within this paradigm is affected by the visual disparity between adjacent images in a sequence. Currently, there is no objective metric to quantify this visual difference. Consequently, a reliable image sorting method is required to ensure the generation of a smooth sequence for effective presentation.Approach. In this paper, we propose a novel semantic image sorting method for sorting RSVP sequences, which aims at generating sequences that are perceptually smoother in terms of the human visual experience.Main results. We conducted a comparative analysis between our method and two existing methods for generating RSVP sequences using both qualitative and quantitative assessments. A qualitative evaluation revealed that the sequences generated by our method were smoother in subjective vision and were more effective in evoking stronger ERP components than those generated by the other two methods. Quantitatively, our method generated semantically smoother sequences than the other two methods. Furthermore, we employed four advanced approaches to classify single-trial EEG signals evoked by each of the three methods. The classification results of the EEG signals evoked by our method were superior to those of the other two methods.Significance. In summary, the results indicate that the proposed method can significantly enhance the object detection performance in RSVP-based sequences.


Assuntos
Eletroencefalografia , Potenciais Evocados Visuais , Estimulação Luminosa , Semântica , Humanos , Eletroencefalografia/métodos , Masculino , Feminino , Estimulação Luminosa/métodos , Adulto Jovem , Adulto , Potenciais Evocados Visuais/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Algoritmos
2.
Small ; : e2311249, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38482932

RESUMO

Host-guest catalyst provides new opportunities for targeted applications and the development of new strategies for preparing host-guest catalysts is highly desired. Herein, an in situ solvent-free approach is developed for implanting ZrW2 O7 (OH)2 (H2 O)2 nanorods (ZrW-NR) in nitro-functionalized UiO-66(Zr) (UiO-66(Zr)-NO2 ) with hierarchical porosity, and the encapsulation of ZrW-NR enables the as-prepared host-guest catalyst remarkably enhanced catalytic performance for both for oxidative desulfurization (ODS) and acetalization reactions. ZrW-NR@UiO-66(Zr)-NO2 can eliminate 500 ppm sulfur within 9 min at 40 °C in ODS, and can transform 5.6 mmol benzaldehyde after 3 min at room temperature in acetalization reaction. Its turnover frequencies reach 72.3 h-1 at 40 °C for ODS which is 33.4 times higher than UiO-66(Zr)-NO2 , and 28140 h-1 for acetalization which is the highest among previous reports. Density functional theory calculation result indicates that the W sites in ZrW-NR can decompose H2 O2 to WVI -peroxo intermediates that contribute to catalytic activity for the ODS reaction. This work opens a new solvent-free approach for preparing MOFs-based host-guest catalysts to upgrade their redox and acid performance.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38206780

RESUMO

Real-time electrocardiogram (ECG) monitoring and diagnosis through Internet of Things (IoT) are crucial for addressing the severity and timely treatment of cardiovascular diseases, enabling timely intervention and preventing life-threatening complications. However, current ECG monitoring research predominantly focuses on individual aspects such as signal compression, diagnostic analysis, or secure transmission, lacking joint optimization of various modules in IoT scenarios. To address this gap, this work proposes a novel framework based on superimposed semantic communication for real-time ECG monitoring in IoT. The framework comprises three hierarchical levels: the edge level for data collection and processing, the relay level for signal compression and coding, and the cloud level for data analysis and reconstruction. The proposed framework offers several unique advantages. By employing semantic encoding guided by ECG classification tasks, it selectively extracts crucial features within and between signals, improving compression ratio and adaptability to channel noise. The superimposed semantic encoding achieves content encryption without requiring any additional operations. Moreover, the framework utilizes lightweight anomaly detection neural networks, reducing edge device power consumption and conserving communication resources. Simulation and real experimental results demonstrate that the proposed method achieves real-time encoding and transmission of ECG signals with a compression ratio of 0.019 on the MIT-BIH dataset. Furthermore, it attains a heartbeat classification accuracy of 0.988 and a reconstruction error of 0.061.

4.
Neurophotonics ; 11(1): 015002, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38192584

RESUMO

Significance: fNIRS-based neuroenhancement depends on the feasible detection of hemodynamic responses in target brain regions. Using the lateral occipital complex (LOC) and the fusiform face area (FFA) in the ventral visual pathway as neurofeedback targets boosts performance in visual recognition. However, the feasibility of utilizing fNIRS to detect LOC and FFA activity in adults remains to be validated as the depth of these regions may exceed the detection limit of fNIRS. Aim: This study aims to investigate the feasibility of using fNIRS to measure hemodynamic responses in the ventral visual pathway, specifically in the LOC and FFA, in adults. Approach: We recorded the hemodynamic activities of the LOC and FFA regions in 35 subjects using a portable eight-channel fNIRS instrument. A standard one-back object and face recognition task was employed to elicit selective brain responses in the LOC and FFA regions. The placement of fNIRS optodes for LOC and FFA detection was guided by our group's transcranial brain atlas (TBA). Results: Our findings revealed selective activation of the LOC target channel (CH2) in response to objects, whereas the FFA target channel (CH7) did not exhibit selective activation in response to faces. Conclusions: Our findings indicate that, although fNIRS detection has limitations in capturing FFA activity, the LOC region emerges as a viable target for fNIRS-based detection. Furthermore, our results advocate for the adoption of the TBA-based method for setting the LOC target channel, offering a promising solution for optrode placement. This feasibility study stands as the inaugural validation of fNIRS for detecting cortical activity in the ventral visual pathway, underscoring its ecological validity. We suggest that our findings establish a pivotal technical groundwork for prospective real-life applications of fNIRS-based research.

5.
IEEE Trans Image Process ; 33: 856-866, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38231815

RESUMO

Unsupervised Domain adaptation (UDA) aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Most existing domain adaptation methods are based on convolutional neural networks (CNNs) to learn cross-domain invariant features. Inspired by the success of transformer architectures and their superiority to CNNs, we propose to combine the transformer with UDA to improve their generalization properties. In this paper, we present a novel model named Trans ferable V ector Q uantization A lignment for Unsupervised Domain Adaptation (TransVQA), which integrates the Transferable transformer-based feature extractor (Trans), vector quantization domain alignment (VQA), and mutual information weighted maximization confusion matrix (MIMC) of intra-class discrimination into a unified domain adaptation framework. First, TransVQA uses the transformer to extract more accurate features in different domains for classification. Second, TransVQA, based on the vector quantization alignment module, uses a two-step alignment method to align the extracted cross-domain features and solve the domain shift problem. The two-step alignment includes global alignment via vector quantization and intra-class local alignment via pseudo-labels. Third, for intra-class feature discrimination problem caused by the fuzzy alignment of different domains, we use the MIMC module to constrain the target domain output and increase the accuracy of pseudo-labels. The experiments on several datasets of domain adaptation show that TransVQA can achieve excellent performance and outperform existing state-of-the-art methods.

6.
J Environ Sci (China) ; 138: 10-18, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38135378

RESUMO

The ozone (O3) pollution in China drew lots of attention in recent years, and the Sichuan Basin (SCB) was one of the regions confronting worsening O3 pollution problem. Many previous studies have shown that regional transport is an important contributor to O3 pollution. However, very few features of the O3 profile during transport have been reported, especially in the border regions between different administrative divisions. In this study, we conducted tethered balloon soundings in SCB during the summer of 2020 and captured a nocturnal O3 transport event during the campaign. Vertically, the O3 transport occurred in the bottom of the residual layer, between 200 and 500 m above ground level. Horizontally, the transport pathway was directed from southeast to northwest based on the analysis of the wind field and air mass trajectories. The effect of transport in the residual layer on the surface O3 concentration was related to the spatial distribution of O3. For cities with high O3 concentrations in the upwind region, the transport process would bring clean air masses and abate pollution. For downwind lightly polluted cities, the transport process would slow down the decreasing or even increase the surface O3 concentration during the night. We provided observational facts on the profile features of a transboundary O3 transport event between two provincial administrative divisions, which implicated the importance of joint prevention and control measures. However, the sounding parameters were limited and the quantitative analysis was preliminary, more integrated, and thorough studies of this topic were called for in the future.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Ozônio , Ozônio/análise , Poluentes Atmosféricos/análise , Monitoramento Ambiental , Poluição do Ar/análise , Estações do Ano , China
7.
Sci Rep ; 13(1): 17751, 2023 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-37853050

RESUMO

Underground mining activities can easily trigger surface subsidence and cause damage to surface soil. However, there is still a lack of studies on damaged soil, restricting ecological remediation in mining-induced subsidence regions to a certain degree. Focusing on the particular example of No. 4 Mine in Yili, Xinjiang, China, this study comprehensively combined field sampling, laboratory experiments, and data analysis to investigate the variation rules of basic physical properties and shear characteristics of soil samples. The latter had different subsidence degrees (0, 0-20, 20-40, and above 40 cm) and various depths (0- 10, 10-20, 20- 40, 40-60, and 60-80 cm). The experimental results show that: First, the natural density and dry unit weight of shallow soil in the serious-subsidence region were more significantly affected by mining-induced subsidence than the conditions in the deep layer, which also dropped with the increase in subsidence degree (with a mean drop rate of 7%). Second, serious subsidence could greatly counteract the positive effect of slight and moderate subsidence on the soil shear strength, with a drop rate of up to 30.7%. Third, compared with soil physical indices, mining-induced subsidence more easily affected shear strength indices. In particular, the soil samples taken from 0 to 10 cm depth in the slight subsidence area and 60-80 cm depth in the moderate subsidence area were most significantly affected by mining-induced subsidence, with PCA comprehensive scores of over 1.5. The present study can con-tribute to gaining in-depth knowledge of the damage characteristics of surface soil under mining-induced subsidence and provide a theoretical foundation for formulating reasonable coal mining strategies and ecological protection measures.

8.
Dalton Trans ; 52(43): 15968-15973, 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37846746

RESUMO

Insights into the relationship between the crystal structure and activity of metal-organic frameworks (MOFs) are meaningful to investigate the potential properties of pristine MOFs for targeted catalytic reactions. Herein, we develop a high-efficiency method for boosting the oxidative desulfurization (ODS) activity of Ti-MOF in the presence of H+. The ODS activity of pristine Ti-MOF prepared via a solvothermal approach is very poor at a low reaction temperature but can be enhanced in the presence of H+. Ti-MOF in the presence of H+ shows ultrahigh ODS activity that can eliminate 1000 ppm sulfur after 7 min at 30 °C with no catalytic activity loss after recycling 11 times. The turnover frequency value reaches 12.4 h-1 at 30 °C, surpassing all the previously reported Ti-MOFs as ODS catalysts even at high temperatures. Characterization and quenching experimental results indicate that more uncoordinated Ti sites can be formed from slight damage to the structure of Ti-MOF during the catalytic reaction, and such exposed Ti sites can easily react with H+ and H2O2 to form Ti-hydroperoxo active species that determine the upgradation of ODS activity. This work provides a significant way to upgrade the catalytic activity of pristine Ti-MOFs for future application.

9.
IEEE Trans Image Process ; 32: 4416-4431, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37527319

RESUMO

In recent years, researchers have become more interested in hyperspectral image fusion (HIF) as a potential alternative to expensive high-resolution hyperspectral imaging systems, which aims to recover a high-resolution hyperspectral image (HR-HSI) from two images obtained from low-resolution hyperspectral (LR-HSI) and high-spatial-resolution multispectral (HR-MSI). It is generally assumed that degeneration in both the spatial and spectral domains is known in traditional model-based methods or that there existed paired HR-LR training data in deep learning-based methods. However, such an assumption is often invalid in practice. Furthermore, most existing works, either introducing hand-crafted priors or treating HIF as a black-box problem, cannot take full advantage of the physical model. To address those issues, we propose a deep blind HIF method by unfolding model-based maximum a posterior (MAP) estimation into a network implementation in this paper. Our method works with a Laplace distribution (LD) prior that does not need paired training data. Moreover, we have developed an observation module to directly learn degeneration in the spatial domain from LR-HSI data, addressing the challenge of spatially-varying degradation. We also propose to learn the uncertainty (mean and variance) of LD models using a novel Swin-Transformer-based denoiser and to estimate the variance of degraded images from residual errors (rather than treating them as global scalars). All parameters of the MAP estimation algorithm and the observation module can be jointly optimized through end-to-end training. Extensive experiments on both synthetic and real datasets show that the proposed method outperforms existing competing methods in terms of both objective evaluation indexes and visual qualities.

10.
Artigo em Inglês | MEDLINE | ID: mdl-37163401

RESUMO

Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled training samples, and the number and quality of these samples directly affect the representational capability of the trained model. However, this approach is restrictive in practice, because manually labeling such a large number of training samples is time-consuming and prohibitively expensive. In this article, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNN model. Under the guidance of active learning, the tracker based on the trained deep CNN model can achieve competitive tracking performance while reducing the labeling cost. More specifically, to ensure the diversity of selected samples, we propose an active learning method based on multiframe collaboration to select those training samples that should be and need to be annotated. Meanwhile, considering the representativeness of these selected samples, we adopt a nearest-neighbor discrimination method based on the average nearest-neighbor distance to screen isolated samples and low-quality samples. Therefore, the training samples' subset selected based on our method requires only a given budget to maintain the diversity and representativeness of the entire sample set. Furthermore, we adopt a Tversky loss to improve the bounding box estimation of our tracker, which can ensure that the tracker achieves more accurate target states. Extensive experimental results confirm that our active-learning-based tracker (ALT) achieves competitive tracking accuracy and speed compared with state-of-the-art trackers on the seven most challenging evaluation benchmarks. Project website: https://sites.google.com/view/altrack/.

11.
Artigo em Inglês | MEDLINE | ID: mdl-37048010

RESUMO

Air pollutants suspended in the atmosphere have a large impact on air quality, climate, and human health. As one of the important populated and industrialized regions in China, the Sichuan Basin (SCB) has confronted severe air pollution in recent years. Previous studies have shown that regional transport played a significant role in the formation of regional pollution in the SCB, particularly in the southern basin. Using Yibin and Zigong as representative receptor cities, we further identified the transport channels affecting the southern basin by conducting gridded dispersion simulations. A total of seven channels were identified, including three for cyclonic transport, three through the mountainous areas between the Longquan Mountain and the Huaying Mountain, and one along the Yangtze River. Varying seasonal distributions of their occurrence frequencies were observed. Furthermore, observational evidence for several universal channels was presented during a typical transport case. The transport pathways identified in this study can guide the planning of regional distribution of emission sources and the measures for regional joint prevention and control of air pollution.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Humanos , Poluentes Atmosféricos/análise , Material Particulado/análise , Monitoramento Ambiental , Poluição do Ar/análise , China , Cidades , Estações do Ano
12.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9325-9338, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37027639

RESUMO

Both network pruning and neural architecture search (NAS) can be interpreted as techniques to automate the design and optimization of artificial neural networks. In this paper, we challenge the conventional wisdom of training before pruning by proposing a joint search-and-training approach to learn a compact network directly from scratch. Using pruning as a search strategy, we advocate three new insights for network engineering: 1) to formulate adaptive search as a cold start strategy to find a compact subnetwork on the coarse scale; and 2) to automatically learn the threshold for network pruning; 3) to offer flexibility to choose between efficiency and robustness. More specifically, we propose an adaptive search algorithm in the cold start by exploiting the randomness and flexibility of filter pruning. The weights associated with the network filters will be updated by ThreshNet, a flexible coarse-to-fine pruning method inspired by reinforcement learning. In addition, we introduce a robust pruning strategy leveraging the technique of knowledge distillation through a teacher-student network. Extensive experiments on ResNet and VGGNet have shown that our proposed method can achieve a better balance in terms of efficiency and accuracy and notable advantages over current state-of-the-art pruning methods in several popular datasets, including CIFAR10, CIFAR100, and ImageNet. The code associate with this paper is available at: https://see.xidian.edu.cn/faculty/wsdong/Projects/AST-NP.htm.


Assuntos
Algoritmos , Aprendizagem , Humanos , Redes Neurais de Computação
13.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10778-10794, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37023148

RESUMO

Image reconstruction from partial observations has attracted increasing attention. Conventional image reconstruction methods with hand-crafted priors often fail to recover fine image details due to the poor representation capability of the hand-crafted priors. Deep learning methods attack this problem by directly learning mapping functions between the observations and the targeted images can achieve much better results. However, most powerful deep networks lack transparency and are nontrivial to design heuristically. This paper proposes a novel image reconstruction method based on the Maximum a Posterior (MAP) estimation framework using learned Gaussian Scale Mixture (GSM) prior. Unlike existing unfolding methods that only estimate the image means (i.e., the denoising prior) but neglected the variances, we propose characterizing images by the GSM models with learned means and variances through a deep network. Furthermore, to learn the long-range dependencies of images, we develop an enhanced variant based on the Swin Transformer for learning GSM models. All parameters of the MAP estimator and the deep network are jointly optimized through end-to-end training. Extensive simulation and real data experimental results on spectral compressive imaging and image super-resolution demonstrate that the proposed method outperforms existing state-of-the-art methods.

14.
Neural Netw ; 163: 1-9, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37003110

RESUMO

We propose a novel few-shot learning framework that can recognize human-object interaction (HOI) classes with a few labeled samples. We achieve this by leveraging a meta-learning paradigm where human-object interactions are embedded into compact features for similarity calculation. More specifically, spatial and temporal relationships of HOI in videos are constructed with transformers which boost the performance over the baseline significantly. First, we present a spatial encoder that extracts the spatial context and infers frame-level features of a human and objects in each frame. And then the video-level feature is obtained by encoding a series of frame-level feature vectors with a temporal encoder. Experiments on two datasets, CAD-120 and Something-Else, validate that our approach achieves 7.8% and 15.2% accuracy improvement on 1-shot task, 4.7% and 15.7% on 5-shot task, which outperforms the state-of-the-art methods.


Assuntos
Aprendizagem , Percepção Visual , Humanos
15.
IEEE Trans Image Process ; 31: 4937-4951, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35853054

RESUMO

Due to the rapid increase in video traffic and relatively limited delivery infrastructure, end users often experience dynamically varying quality over time when viewing streaming videos. The user quality-of-experience (QoE) must be continuously monitored to deliver an optimized service. However, modern approaches for continuous-time video QoE estimation require densely annotating the continuous-time QoE labels, which is labor-intensive and time-consuming. To cope with such limitations, we propose a novel weakly-supervised domain adaptation approach for continuous-time QoE evaluation, by making use of a small amount of continuously labeled data in the source domain and abundant weakly-labeled data (only containing the retrospective QoE labels) in the target domain. Specifically, given a pair of videos from source and target domains, effective spatiotemporal segment-level feature representation is first learned by a combination of 2D and 3D convolutional networks. Then, a multi-task prediction framework is developed to simultaneously achieve continuous-time and retrospective QoE predictions, where a quality attentive adaptation approach is investigated to effectively alleviate the domain discrepancy without hampering the prediction performance. This approach is enabled by explicitly attending to the video-level discrimination and segment-level transferability in terms of the domain discrepancy. Experiments on benchmark databases demonstrate that the proposed method significantly improves the prediction performance under the cross-domain setting.

16.
BMC Bioinformatics ; 23(1): 219, 2022 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-35672665

RESUMO

BACKGROUND: With the rapid development of high-throughput sequencing technology, the cost of whole genome sequencing drops rapidly, which leads to an exponential growth of genome data. How to efficiently compress the DNA data generated by large-scale genome projects has become an important factor restricting the further development of the DNA sequencing industry. Although the compression of DNA bases has achieved significant improvement in recent years, the compression of quality score is still challenging. RESULTS: In this paper, by reinvestigating the inherent correlations between the quality score and the sequencing process, we propose a novel lossless quality score compressor based on adaptive coding order (ACO). The main objective of ACO is to traverse the quality score adaptively in the most correlative trajectory according to the sequencing process. By cooperating with the adaptive arithmetic coding and an improved in-context strategy, ACO achieves the state-of-the-art quality score compression performances with moderate complexity for the next-generation sequencing (NGS) data. CONCLUSIONS: The competence enables ACO to serve as a candidate tool for quality score compression, ACO has been employed by AVS(Audio Video coding Standard Workgroup of China) and is freely available at https://github.com/Yoniming/ACO.


Assuntos
Compressão de Dados , Software , Algoritmos , DNA , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
17.
Front Neurosci ; 16: 904623, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35712457

RESUMO

Visual experience modulates the intensity of evoked brain activity in response to training-related stimuli. Spontaneous fluctuations in the restful brain actively encode previous learning experience. However, few studies have considered how real-world visual experience alters the level of baseline brain activity in the resting state. This study aimed to investigate how short-term real-world visual experience modulates baseline neuronal activity in the resting state using the amplitude of low-frequency (<0.08 Hz) fluctuation (ALFF) and a visual expertise model of radiologists, who possess fine-level visual discrimination skill of homogeneous stimuli. In detail, a group of intern radiologists (n = 32) were recruited. The resting-state fMRI data and the behavioral data regarding their level of visual expertise in radiology and face recognition were collected before and after 1 month of training in the X-ray department in a local hospital. A machine learning analytical method, i.e., support vector machine, was used to identify subtle changes in the level of baseline brain activity. Our method led to a superb classification accuracy of 86.7% between conditions. The brain regions with highest discriminative power were the bilateral cingulate gyrus, the left superior frontal gyrus, the bilateral precentral gyrus, the bilateral superior parietal lobule, and the bilateral precuneus. To the best of our knowledge, this study is the first to investigate baseline neurodynamic alterations in response to real-world visual experience using longitudinal experimental design. These results suggest that real-world visual experience alters the resting-state brain representation in multidimensional neurobehavioral components, which are closely interrelated with high-order cognitive and low-order visual factors, i.e., attention control, working memory, memory, and visual processing. We propose that our findings are likely to help foster new insights into the neural mechanisms of visual expertise.

18.
IEEE Trans Image Process ; 31: 3920-3934, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35635813

RESUMO

Attention mechanisms have been extensively adopted in vision and language tasks such as image captioning. It encourages a captioning model to dynamically ground appropriate image regions when generating words or phrases, and it is critical to alleviate the problems of object hallucinations and language bias. However, current studies show that the grounding accuracy of existing captioners is still far from satisfactory. Recently, much effort is devoted to improving the grounding accuracy by linking the words to the full content of objects in images. However, due to the noisy grounding annotations and large variations of object appearance, such strict word-object alignment regularization may not be optimal for improving captioning performance. In this paper, to improve the performance of both grounding and captioning, we propose a novel grounding model which implicitly links the words to the evidence in the image. The proposed model encourages the captioner to dynamically focus on informative regions of the objects, which could be either discriminative parts or full object content. With slacked constraints, the proposed captioning model can capture correct linguistic characteristics and visual relevance, and then generate more grounded image captions. In addition, we propose a novel quantitative metric for evaluating the correctness of the soft attention mechanism by considering the overall contribution of all object proposals when generating certain words. The proposed grounding model can be seamlessly plugged into most attention-based architectures without introducing inference complexity. We conduct extensive experiments on Flickr30k (Young et al., 2014) and MS COCO datasets (Lin et al., 2014), demonstrating that the proposed method consistently improves image captioning in both grounding and captioning. Besides, the proposed attention evaluation metric shows better consistency with the captioning performance.


Assuntos
Idioma , Coleta de Dados
19.
J Neural Eng ; 19(3)2022 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-35523129

RESUMO

Electroencephalogram (EEG)-based affective computing brain-computer interfaces provide the capability for machines to understand human intentions. In practice, people are more concerned with the strength of a certain emotional state over a short period of time, which was called as fine-grained-level emotion in this paper. In this study, we built a fine-grained-level emotion EEG dataset that contains two coarse-grained emotions and four corresponding fine-grained-level emotions. To fully extract the features of the EEG signals, we proposed a corresponding fine-grained emotion EEG network (FG-emotionNet) for spatial-temporal feature extraction. Each feature extraction layer is linked to raw EEG signals to alleviate overfitting and ensure that the spatial features of each scale can be extracted from the raw signals. Moreover, all previous scale features are fused before the current spatial-feature layer to enhance the scale features in the spatial block. Additionally, long short-term memory is adopted as the temporal block to extract the temporal features based on spatial features and classify the category of fine-grained emotions. Subject-dependent and cross-session experiments demonstrated that the performance of the proposed method is superior to that of the representative methods in emotion recognition and similar structure methods with proposed method.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Eletroencefalografia/métodos , Emoções , Humanos , Intenção
20.
IEEE Trans Image Process ; 31: 3578-3590, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35511851

RESUMO

Blind image quality assessment (BIQA), which is capable of precisely and automatically estimating human perceived image quality with no pristine image for comparison, attracts extensive attention and is of wide applications. Recently, many existing BIQA methods commonly represent image quality with a quantitative value, which is inconsistent with human cognition. Generally, human beings are good at perceiving image quality in terms of semantic description rather than quantitative value. Moreover, cognition is a needs-oriented task where humans are able to extract image contents with local to global semantics as they need. The mediocre quality value represents coarse or holistic image quality and fails to reflect degradation on hierarchical semantics. In this paper, to comply with human cognition, a novel quality caption model is inventively proposed to measure fine-grained image quality with hierarchical semantics degradation. Research on human visual system indicates there are hierarchy and reverse hierarchy correlations between hierarchical semantics. Meanwhile, empirical evidence shows that there are also bi-directional degradation dependencies between them. Thus, a novel bi-directional relationship-based network (BDRNet) is proposed for semantics degradation description, through adaptively exploring those correlations and degradation dependencies in a bi-directional manner. Extensive experiments demonstrate that our method outperforms the state-of-the-arts in terms of both evaluation performance and generalization ability.


Assuntos
Cognição , Semântica , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA