Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
Opt Express ; 32(11): 18527-18538, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38859006

RESUMO

Dynamic range (DR) is a pivotal characteristic of imaging systems. Current frame-based cameras struggle to achieve high dynamic range imaging due to the conflict between globally uniform exposure and spatially variant scene illumination. In this paper, we propose AsynHDR, a pixel-asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generating mechanism of dynamic vision sensors (DVS). Our proposed AsynHDR system integrates the DVS with a set of LCD panels. The LCD panels modulate the irradiance incident upon the DVS by altering their transparency, thereby triggering the pixel-independent event streams. The HDR image is subsequently decoded from the event streams through our temporal-weighted algorithm. Experiments under the standard test platform and several challenging scenes have verified the feasibility of the system in HDR imaging tasks.

2.
Acta Pharmacol Sin ; 44(1): 105-119, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35732707

RESUMO

Hederacoside C (HSC) has attracted much attention as a novel modulator of inflammation, but its anti-inflammatory mechanism remains elusive. In the present study, we investigated how HSC attenuated intestinal inflammation in vivo and in vitro. HSC injection significantly alleviated TNBS-induced colitis by inhibiting pro-inflammatory cytokine production and colonic epithelial cell apoptosis, and partially restored colonic epithelial cell proliferation. The therapeutic effect of HSC injection was comparable to that of oral administration of mesalazine (200 mg·kg-1·d-1, i.g.). In LPS-stimulated human intestinal epithelial Caco-2 cells, pretreatment with HSC (0.1, 1, 10 µM) significantly inhibited activation of MAPK/NF-κB and its downstream signaling pathways. Pretreatment with HSC prevented LPS-induced TLR4 dimerization and MyD88 recruitment in vitro. Quantitative proteomic analysis revealed that HSC injection regulated 18 proteins in the colon samples, mainly clustered in neutrophil degranulation. Among them, S100A9 involved in the degranulation of neutrophils was one of the most significantly down-regulated proteins. HSC suppressed the expression of S100A9 and its downstream genes including TLR4, MAPK, and NF-κB axes in colon. In Caco-2 cells, recombinant S100A9 protein activated the MAPK/NF-κB signaling pathway and induced inflammation, which were ameliorated by pretreatment with HSC. Notably, HSC attenuated neutrophil recruitment and degranulation as well as S100A9 release in vitro and in vivo. In addition, HSC promoted the expression of tight junction proteins and repaired the epithelial barrier via inhibiting S100A9. Our results verify that HSC ameliorates colitis via restoring impaired intestinal barrier through moderating S100A9/MAPK and neutrophil recruitment inactivation, suggesting that HSC is a promising therapeutic candidate for colitis.


Assuntos
Colite , NF-kappa B , Humanos , NF-kappa B/metabolismo , Células CACO-2 , Calgranulina B/efeitos adversos , Infiltração de Neutrófilos , Receptor 4 Toll-Like/metabolismo , Lipopolissacarídeos/farmacologia , Proteômica , Citocinas/metabolismo , Colite/induzido quimicamente , Colite/tratamento farmacológico , Colite/metabolismo , Inflamação
3.
Mediators Inflamm ; 2022: 9241261, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35865997

RESUMO

Methods: The potential active ingredients and corresponding potential targets of BSYS Capsule were obtained from the TCMSP, BATMAN-TCM, Swiss Target Prediction platform, and literature research. Disease targets of CNSD were explored through the GeneCards and the DisGeNET databases. The matching targets of BSYS in CNSD were identified from a Venn diagram. The protein-protein interaction (PPI) network was constructed using bioinformatics methods. Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to predict the mechanisms of BSYS. Furthermore, the neuroprotective effects of BSYS were evaluated using a cell model of hydrogen peroxide- (H2O2-) induced cell death in OLN-93 cells. Results: A total of 59 potential bioactive components of BSYS Capsule and 227 intersection targets were obtained. Topological analysis showed that AKT had the highest connectivity degrees in the PPI network. Enrichment analysis revealed that the targets of BSYS in the treatment of CNSD were the PI3K-Akt and MAPK signaling pathway, among other pathways. GO analysis results showed that the targets were associated with various biological processes, including apoptosis, reactive oxygen species metabolic process, and response to oxidative stress, among others. The experimental results demonstrated that BSYS drug-containing serum alleviated the H2O2-induced increase in LDH, MDA, and ROS levels and reversed the decrease in SOD and mitochondrial membrane potential induced by H2O2. BSYS treatment also decreased the number of TUNEL (+) cells, downregulated Bcl-2 expression, and upregulated Bax and c-caspase-3 expression by promoting Akt phosphorylation. Conclusion: BSYS Capsule alleviated H2O2-induced OLN-93 cell injury by increasing Akt phosphorylation to suppress oxidative stress and cell apoptosis. Therefore, BSYS can be potentially used for CNSD treatment. However, the results of this study are only derived from in vitro experiments, lacking the validation of in vivo animal models, which is a limitation of our study. We will further verify the underlying mechanisms of BSYS in animal experiments in the future.


Assuntos
Medicamentos de Ervas Chinesas , Medicina Tradicional Chinesa , Animais , Sistema Nervoso Central , Medicamentos de Ervas Chinesas/uso terapêutico , Peróxido de Hidrogênio/farmacologia , Medicina Tradicional Chinesa/métodos , Farmacologia em Rede , Fosfatidilinositol 3-Quinases , Proteínas Proto-Oncogênicas c-akt
4.
Nanotechnology ; 31(46): 465102, 2020 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-32857735

RESUMO

The biological effects of nanoparticles are of great importance for the in-depth understanding of safety issues in biomedical applications. Induction of autophagy is a cellular response after nanoparticle exposure. Bismuth sulfide nanoparticles (Bi2S3 NPs) are often used as a CT contrast agent because of their excellent photoelectric conversion ability. Yet there has been no previous detailed study other than a cell toxicity assessment. In this study, three types of Bi2S3 NPs with different shapes (Bi2S3 nano rods (BSNR), hollow microsphere Bi2S3 NPs (BSHS) and urchin-like hollow microsphere Bi2S3 NPs (ULBSHS)) were used to evaluatecytotoxicity, autophagy induction, cell migration and invasion in human hepatocellular carcinoma cells (HepG2). Results showed that all three Bi2S3 NPs lead to blockage in autophagic flux, causing p62 protein accumulation. The cell death caused by these Bi2S3 NPs is proved to be autophagy related, rather than related to apoptosis. Moreover, Bi2S3 NPs can reduce the migration and invasion in HepG2 cells in an autophagy-dependent manner. ULBSHS is the most cytotoxic among three Bi2S3 NPs and has the best tumor metastasis suppression. These results demonstrated that, even with relatively low toxicity of Bi2S3 NPs, autophagy blockage may still substantially influence cell fate and thus significantly impact their biomedical applications, and that surface topography is a key factor regulating their biological response.


Assuntos
Autofagia/efeitos dos fármacos , Bismuto/efeitos adversos , Movimento Celular/efeitos dos fármacos , Citotoxinas/efeitos adversos , Nanopartículas/efeitos adversos , Sulfetos/efeitos adversos , Bismuto/química , Bismuto/toxicidade , Citotoxinas/química , Citotoxinas/toxicidade , Células Hep G2 , Humanos , Nanopartículas/química , Nanopartículas/toxicidade , Sulfetos/química , Sulfetos/toxicidade
6.
Nucleic Acids Res ; 40(Database issue): D1082-8, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22080565

RESUMO

In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.


Assuntos
Caenorhabditis elegans/genética , Bases de Dados Genéticas , Drosophila melanogaster/genética , Animais , Expressão Gênica , Genoma Helmíntico , Genoma de Inseto , Genômica , Internet , Interface Usuário-Computador
7.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4926-4943, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38349824

RESUMO

Change captioning aims to describe the semantic change between two similar images. In this process, as the most typical distractor, viewpoint change leads to the pseudo changes about appearance and position of objects, thereby overwhelming the real change. Besides, since the visual signal of change appears in a local region with weak feature, it is difficult for the model to directly translate the learned change features into the sentence. In this paper, we propose a syntax-calibrated multi-aspect relation transformer to learn effective change features under different scenes, and build reliable cross-modal alignment between the change features and linguistic words during caption generation. Specifically, a multi-aspect relation learning network is designed to 1) explore the fine-grained changes under irrelevant distractors (e.g., viewpoint change) by embedding the relations of semantics and relative position into the features of each image; 2) learn two view-invariant image representations by strengthening their global contrastive alignment relation, so as to help capture a stable difference representation; 3) provide the model with the prior knowledge about whether and where the semantic change happened by measuring the relation between the representations of captured difference and the image pair. Through the above manner, the model can learn effective change features for caption generation. Further, we introduce the syntax knowledge of Part-of-Speech (POS) and devise a POS-based visual switch to calibrate the transformer decoder. The POS-based visual switch dynamically utilizes visual information during different word generation based on the POS of words. This enables the decoder to build reliable cross-modal alignment, so as to generate a high-level linguistic sentence about change. Extensive experiments show that the proposed method achieves the state-of-the-art performance on the three public datasets.

8.
IEEE Trans Image Process ; 33: 625-638, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38198242

RESUMO

How to model the effect of reflection is crucial for single image reflection removal (SIRR) task. Modern SIRR methods usually simplify the reflection formulation with the assumption of linear combination of a transmission layer and a reflection layer. However, the large variations in image content and the real-world picture-taking conditions often result in far more complex reflection. In this paper, we introduce a new screen-blur combination based on two important factors, namely the intensity and the blurriness of reflection, to better characterize the reflection formulation in SIRR. Specifically, we present Screen-blur Reflection Networks (SRNet), which executes the screen-blur formulation in its network design and adapts to the complex reflection on real scenes. Technically, SRNet consists of three components: a blended image generator, a reflection estimator and a reflection removal module. The image generator exploits the screen-blur combination to synthesize the training blended images. The reflection estimator learns the reflection layer and a blur degree that measures the level of blurriness for reflection. The reflection removal module further uses the blended image, blur degree and reflection layer to filter out the transmission layer in a cascaded manner. Superior results on three different SIRR methods are reported when generating the training data on the principle of the screen-blur combination. Moreover, extensive experiments on six datasets quantitatively and qualitatively demonstrate the efficacy of SRNet over the state-of-the-art methods.

9.
IEEE Trans Image Process ; 33: 1938-1951, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38224517

RESUMO

Generalized Zero-Shot Learning (GZSL) aims at recognizing images from both seen and unseen classes by constructing correspondences between visual images and semantic embedding. However, existing methods suffer from a strong bias problem, where unseen images in the target domain tend to be recognized as seen classes in the source domain. To address this issue, we propose a Prototype-augmented Self-supervised Generative Network by integrating self-supervised learning and prototype learning into a feature generating model for GZSL. The proposed model enjoys several advantages. First, we propose a Self-supervised Learning Module to exploit inter-domain relationships, where we introduce anchors as a bridge between seen and unseen categories. In the shared space, we pull the distribution of the target domain away from the source domain and obtain domain-aware features. To our best knowledge, this is the first work to introduce self-supervised learning into GZSL as learning guidance. Second, a Prototype Enhancing Module is proposed to utilize class prototypes to model reliable target domain distribution in finer granularity. In this module, a Prototype Alignment mechanism and a Prototype Dispersion mechanism are combined to guide the generation of better target class features with intra-class compactness and inter-class separability. Extensive experimental results on five standard benchmarks demonstrate that our model performs favorably against state-of-the-art GZSL methods.

10.
Mol Cancer Res ; 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38747975

RESUMO

Small-cell lung cancer (SCLC) accounts for nearly 15% of all lung cancers. Although patients respond to first-line therapy readily, rapid relapse is inevitable, with few treatment options in the second-line setting. Here, we describe SCLC cell lines harboring amplification of MYC and MYCN, but not MYCL1 nor non-amplified MYC cell lines, exhibit superior sensitivity to treatment with the pan-BET bromodomain protein inhibitor Mivebresib (ABBV-075). Silencing MYC and MYCN partially rescued SCLC cell lines harboring these respective amplifications from the anti-proliferative effects of mivebresib. Further characterization of genome-wide binding of MYC, MYCN, and MYCL1 uncovered unique enhancer and epigenetic preferences. Implications: Our study suggests that chromatin landscapes could establish cell states with unique gene expression programs, conveying sensitivity to epigenetic inhibitors such as mivebresib.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 12978-12995, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35709118

RESUMO

Existing deep learning based de-raining approaches have resorted to the convolutional architectures. However, the intrinsic limitations of convolution, including local receptive fields and independence of input content, hinder the model's ability to capture long-range and complicated rainy artifacts. To overcome these limitations, we propose an effective and efficient transformer-based architecture for the image de-raining. First, we introduce general priors of vision tasks, i.e., locality and hierarchy, into the network architecture so that our model can achieve excellent de-raining performance without costly pre-training. Second, since the geometric appearance of rainy artifacts is complicated and of significant variance in space, it is essential for de-raining models to extract both local and non-local features. Therefore, we design the complementary window-based transformer and spatial transformer to enhance locality while capturing long-range dependencies. Besides, to compensate for the positional blindness of self-attention, we establish a separate representative space for modeling positional relationship, and design a new relative position enhanced multi-head self-attention. In this way, our model enjoys powerful abilities to capture dependencies from both content and position, so as to achieve better image content recovery while removing rainy artifacts. Experiments substantiate that our approach attains more appealing results than state-of-the-art methods quantitatively and qualitatively.

12.
Artigo em Inglês | MEDLINE | ID: mdl-37467094

RESUMO

Audiovisual event localization aims to localize the event that is both visible and audible in a video. Previous works focus on segment-level audio and visual feature sequence encoding and neglect the event proposals and boundaries, which are crucial for this task. The event proposal features provide event internal consistency between several consecutive segments constructing one proposal, while the event boundary features offer event boundary consistency to make segments located at boundaries be aware of the event occurrence. In this article, we explore the proposal-level feature encoding and propose a novel context-aware proposal-boundary (CAPB) network to address audiovisual event localization. In particular, we design a local-global context encoder (LGCE) to aggregate local-global temporal context information for visual sequence, audio sequence, event proposals, and event boundaries, respectively. The local context from temporally adjacent segments or proposals contributes to event discrimination, while the global context from the entire video provides semantic guidance of temporal relationship. Furthermore, we enhance the structural consistency between segments by exploiting the above-encoded proposal and boundary representations. CAPB leverages the context information and structural consistency to obtain context-aware event-consistent cross-modal representation for accurate event localization. Extensive experiments conducted on the audiovisual event (AVE) dataset show that our approach outperforms the state-of-the-art methods by clear margins in both supervised event localization and cross-modality localization.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7711-7725, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37015417

RESUMO

We study the problem of localizing audio-visual events that are both audible and visible in a video. Existing works focus on encoding and aligning audio and visual features at the segment level while neglecting informative correlation between segments of the two modalities and between multi-scale event proposals. We propose a novel Semantic and Relation Modulation Network (SRMN) to learn the above correlation and leverage it to modulate the related auditory, visual, and fused features. In particular, for semantic modulation, we propose intra-modal normalization and cross-modal normalization. The former modulates features of a single modality with the event-relevant semantic guidance of the same modality. The latter modulates features of two modalities by establishing and exploiting the cross-modal relationship. For relation modulation, we propose a multi-scale proposal modulating module and a multi-alignment segment modulating module to introduce multi-scale event proposals and enable dense matching between cross-modal segments, which strengthen correlations between successive segments within one proposal and between all segments. With the features modulated by the correlation information regarding audio-visual events, SRMN performs accurate event localization. Extensive experiments conducted on the public AVE dataset demonstrate that our method outperforms the state-of-the-art methods in both supervised event localization and cross-modality localization tasks.

14.
Artigo em Inglês | MEDLINE | ID: mdl-37943649

RESUMO

With high temporal resolution, high dynamic range, and low latency, event cameras have made great progress in numerous low-level vision tasks. To help restore low-quality (LQ) video sequences, most existing event-based methods usually employ convolutional neural networks (CNNs) to extract sparse event features without considering the spatial sparse distribution or the temporal relation in neighboring events. It brings about insufficient use of spatial and temporal information from events. To address this problem, we propose a new spiking-convolutional network (SC-Net) architecture to facilitate event-driven video restoration. Specifically, to properly extract the rich temporal information contained in the event data, we utilize a spiking neural network (SNN) to suit the sparse characteristics of events and capture temporal correlation in neighboring regions; to make full use of spatial consistency between events and frames, we adopt CNNs to transform sparse events as an extra brightness prior to being aware of detailed textures in video sequences. In this way, both the temporal correlation in neighboring events and the mutual spatial information between the two types of features are fully explored and exploited to accurately restore detailed textures and sharp edges. The effectiveness of the proposed network is validated in three representative video restoration tasks: deblurring, super-resolution, and deraining. Extensive experiments on synthetic and real-world benchmarks have illuminated that our method performs better than existing competing methods.

15.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3003-3018, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35759595

RESUMO

Weakly supervised Referring Expression Grounding (REG) aims to ground a particular target in an image described by a language expression while lacking the correspondence between target and expression. Two main problems exist in weakly supervised REG. First, the lack of region-level annotations introduces ambiguities between proposals and queries. Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects. To address the above challenges, we design an entity-enhanced adaptive reconstruction network (EARN). Specifically, EARN includes three modules: entity enhancement, adaptive grounding, and collaborative reconstruction. In entity enhancement, we calculate semantic similarity as supervision to select the candidate proposals. Adaptive grounding calculates the ranking score of candidate proposals upon subject, location and context with hierarchical attention. Collaborative reconstruction measures the ranking result from three perspectives: adaptive reconstruction, language reconstruction and attribute classification. The adaptive mechanism helps to alleviate the variance of different referring expressions. Experiments on five datasets show EARN outperforms existing state-of-the-art methods. Qualitative results demonstrate that the proposed EARN can better handle the situation where multiple objects of a particular category are situated together.

16.
Artigo em Inglês | MEDLINE | ID: mdl-37220051

RESUMO

Reflection from glasses is ubiquitous in daily life, but it is usually undesirable in photographs. To remove these unwanted noises, existing methods utilize either correlative auxiliary information or handcrafted priors to constrain this ill-posed problem. However, due to their limited capability to describe the properties of reflections, these methods are unable to handle strong and complex reflection scenes. In this article, we propose a hue guidance network (HGNet) with two branches for single image reflection removal (SIRR) by integrating image information and corresponding hue information. The complementarity between image information and hue information has not been noticed. The key to this idea is that we found that hue information can describe reflections well and thus can be used as a superior constraint for the specific SIRR task. Accordingly, the first branch extracts the salient reflection features by directly estimating the hue map. The second branch leverages these effective features, which can help locate salient reflection regions to obtain a high-quality restored image. Furthermore, we design a new cyclic hue loss to provide a more accurate optimization direction for the network training. Experiments substantiate the superiority of our network, especially its excellent generalization ability to various reflection scenes, as compared with state-of-the-arts both qualitatively and quantitatively. Source codes are available at https://github.com/zhuyr97/HGRR.

17.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9534-9551, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022385

RESUMO

Image deraining is a challenging task since rain streaks have the characteristics of a spatially long structure and have a complex diversity. Existing deep learning-based methods mainly construct the deraining networks by stacking vanilla convolutional layers with local relations, and can only handle a single dataset due to catastrophic forgetting, resulting in a limited performance and insufficient adaptability. To address these issues, we propose a new image deraining framework to effectively explore nonlocal similarity, and to continuously learn on multiple datasets. Specifically, we first design a patchwise hypergraph convolutional module, which aims to better extract the nonlocal properties with higher-order constraints on the data, to construct a new backbone and to improve the deraining performance. Then, to achieve better generalizability and adaptability in real-world scenarios, we propose a biological brain-inspired continual learning algorithm. By imitating the plasticity mechanism of brain synapses during the learning and memory process, our continual learning process allows the network to achieve a subtle stability-plasticity tradeoff. This it can effectively alleviate catastrophic forgetting and enables a single network to handle multiple datasets. Compared with the competitors, our new deraining network with unified parameters attains a state-of-the-art performance on seen synthetic datasets and has a significantly improved generalizability on unseen real rainy images.


Assuntos
Algoritmos , Encéfalo , Memória
18.
Food Sci Nutr ; 10(4): 1058-1069, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35432973

RESUMO

Diabetes mellitus (DM) is a chronic disorder associated with severe metabolic derangement and comorbidities. The constant increase in the global population of diabetic patients coupled with some prevailing side effects associated with synthetic antidiabetic drugs has necessitated the urgent need for the search for alternative antidiabetic regimens. This study investigated the antidiabetic, antioxidant, and pancreatic protective effects of the Acacia pennata extract (APE) against nicotinamide/streptozotocin induced DM in rats. The antidiabetic activity of APE was evaluated and investigated at doses of 100 and 400 mg/kg body weight, while metformin (150 mg/kg bw) was used as a standard drug. APE markedly decreased blood glucose level, homeostatic model assessment for insulin resistance, serum total cholesterol, triglycerides, low-density lipoprotein, blood urea nitrogen, creatinine, alanine transaminase, aspartate transaminase, and alanine phosphatase levels. Additionally, treatment with APE increased the body weight, serum insulin concentration, and high-density lipoprotein. Moreover, activities of pancreatic superoxide dismutase, catalase, and glutathione peroxidase were increased, while the altered pancreatic architecture in the histopathological examination was notably restored in the treated rats. Ultra-high performance liquid chromatography combined with electrospray ionization quadrupole time-of-flight mass spectrometry (UHPLC-ESI-QTOF-MS) analysis of APE showcases the prevailing presence of polyphenolic compounds. Conclusively, this study showed the beneficial effects of the Acacia pennata in controlling metabolic derangement, pancreatic and hepatorenal dysfunction in diabetic rats.

19.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 710-722, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-30969916

RESUMO

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in generating longer, richer and more fine-grained sentences and paragraphs as image descriptions. Image captioning can be translated to the task of sequential language prediction given visual content, where the output sequence forms natural language description with plausible grammar. However, existing image captioning methods focus only on language policy while not visual policy, and thus fail to capture visual context that are crucial for compositional reasoning such as object relationships (e.g., "man riding horse") and visual comparisons (e.g., "small(er) cat"). This issue is especially severe when generating longer sequences such as a paragraph. To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for fine-grained image-to-language generation: image sentence captioning and image paragraph captioning. During captioning, CAVP explicitly considers the previous visual attentions as context, and decides whether the context is used for the current word/sentence generation given the current visual attention. Compared against traditional visual attention mechanism that only fixes a single visual region at each step, CAVP can attend to complex visual compositions over time. The whole image captioning model-CAVP and its subsequent language policy network-can be efficiently optimized end-to-end by using an actor-critic policy gradient method. We have demonstrated the effectiveness of CAVP by state-of-the-art performances on MS-COCO and Stanford captioning datasets, using various metrics and sensible visualizations of qualitative visual context.


Assuntos
Algoritmos , Políticas , Animais , Cavalos , Humanos
20.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6802-6816, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34081590

RESUMO

Deep learning-based methods have achieved notable progress in removing blocking artifacts caused by lossy JPEG compression on images. However, most deep learning-based methods handle this task by designing black-box network architectures to directly learn the relationships between the compressed images and their clean versions. These network architectures are always lack of sufficient interpretability, which limits their further improvements in deblocking performance. To address this issue, in this article, we propose a model-driven deep unfolding method for JPEG artifacts removal, with interpretable network structures. First, we build a maximum posterior (MAP) model for deblocking using convolutional dictionary learning and design an iterative optimization algorithm using proximal operators. Second, we unfold this iterative algorithm into a learnable deep network structure, where each module corresponds to a specific operation of the iterative algorithm. In this way, our network inherits the benefits of both the powerful model ability of data-driven deep learning method and the interpretability of traditional model-driven method. By training the proposed network in an end-to-end manner, all learnable modules can be automatically explored to well characterize the representations of both JPEG artifacts and image content. Experiments on synthetic and real-world datasets show that our method is able to generate competitive or even better deblocking results, compared with state-of-the-art methods both quantitatively and qualitatively.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA