Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38593016

RESUMO

Few-shot single-view 3D reconstruction learns to reconstruct the novel category objects based on a query image and a few support shapes. However, since the query image and the support shapes are of different modalities, there is an inherent feature misalignment problem damaging the reconstruction. Previous works in the literature do not consider this problem. To this end, we propose the cross-modal feature alignment network (CMFAN) with two novel techniques. One is a strategy for model pretraining, namely, cross-modal contrastive learning (CMCL), here the 2D images and 3D shapes of the same objects compose the positives, and those from different objects form the negatives. With CMCL, the model learns to embed the 2D and 3D modalities of the same object into a tight area in the feature space and push away those from different objects, thus effectively aligning the global cross-modal features. The other is cross-modal feature fusion (CMFF), which further aligns and fuses the local features. Specifically, it first re-represents the local features with the cross-attention operation, making the local features share more information. Then, CMFF generates a descriptor for the support features and attaches it to each local feature vector of the query image with dense concatenation. Moreover, CMFF can be applied to multilevel local features and brings further advantages. We conduct extensive experiments to evaluate the effectiveness of our designs, and CMFAN sets new state-of-the-art performance in all of the 1-/10-/25-shot tasks of ShapeNet and ModelNet datasets.

2.
Ecotoxicol Environ Saf ; 275: 116206, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38518608

RESUMO

Although the association between changes in human telomere length (TL) and ambient fine particulate matter (PM2.5) has been documented, there remains disagreement among the related literature. Our study conducted a systematic review and meta-analysis of epidemiological studies to investigate the health effects of outdoor PM2.5 exposure on human TL after a thorough database search. To quantify the overall effect estimates of TL changes associated with every 10 µg/m3 increase in PM2.5 exposure, we focused on two main topics, which were outdoor long-term exposure and prenatal exposure of PM2.5. Additionally, we included a summary of short-term PM2.5 exposure and its impact on TL due to limited data availability. Our qualitative analysis included 20 studies with 483,600 participants. The meta-analysis showed a statistically significant association between outdoor PM2.5 exposure and shorter human TL, with pooled impact estimates (ß) of -0.12 (95% CI: -0.20, -0.03, I2= 95.4%) for general long-term exposure and -0.07 (95% CI: -0.15, 0.00, I2= 74.3%) for prenatal exposure. In conclusion, our findings suggest that outdoor PM2.5 exposure may contribute to TL shortening, and noteworthy associations were observed in specific subgroups, suggesting the impact of various research variables. Larger, high-quality studies using standardized methodologies are necessary to strengthen these conclusions further.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Efeitos Tardios da Exposição Pré-Natal , Feminino , Gravidez , Humanos , Material Particulado/toxicidade , Material Particulado/análise , Poluição do Ar/análise , Encurtamento do Telômero , Telômero , Poluentes Atmosféricos/toxicidade , Poluentes Atmosféricos/análise , Exposição Ambiental/efeitos adversos , Exposição Ambiental/análise
3.
J Colloid Interface Sci ; 659: 621-628, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38198939

RESUMO

The electrocatalytic 5-hydroxymethylfurfural (HMF) oxidation reaction coupling with hydrogen evolution reaction (HER) serves as a promising strategy to generate both high-value-added products and clean energy, which is limited by the poor catalytic efficiency of bifunctional electrocatalysts and unclear electrocatalytic mechanism for HMF oxidation reaction. Herein, we fabricate a bifunctional NiSe2-NiMoO4 heterostructure nanowire electrocatalyst for the conversion of HMF to 2,5-furandicarboxylic acid (FDCA) and simultaneous H2 production. As expected, the NiSe2-NiMoO4 exhibits outstanding activity and selectivity toward HMF oxidation reaction. In particular, at a potential of 1.50 V, the yield of FDCA could reach 98 % with a faradaic efficiency of 96.5 %, as well as excellent stability. Density functional theory calculation results demonstrate that the NiSe2-NiMoO4 heterostructure could tune the adsorption energy of HMF, facilitate high-valence active species formation, and enhance electronic conductivity. Furthermore, a two-electrode electrolyzer assembled using NiSe2-NiMoO4 as a bifunctional catalyst requires 1.53 V to acquire a current density of 50 mA cm-2, which is 201 mV lower than that of water electrolysis. This work provides new insights for designing multifunctional catalysts for biomass upgrading coupled with hydrogen evolution.

4.
Front Public Health ; 11: 1192517, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37693713

RESUMO

Introduction: Shift work has become an increasingly common work mode globally. This study aimed to investigate the association between shift work and the risk of incident gastroesophageal reflux disease (GORD), an upward gastrointestinal disorder disease worldwide, and to explore the mediating factors. Method: A total of 262,722 participants from the UK Biobank free of GORD and related gastrointestinal diseases were included to investigate the association and potential mediators between shift work and incident GORD. Multivariate-adjusted Cox models were used to evaluate the association between shift work status and GORD incidence. Results: Compared to non-shift workers, shift workers had a 1.10-fold greater risk of incident GORD [95% confidence intervals (CIs): 1.03, 1.18], after adjusting for a range of potential confounders. However, the excess risk of GORD attenuated to the null after further adjusting for selected mediators. Specifically, the association was mediated by sleep patterns (25.7%), healthy behaviors (16.8%), depressive symptoms (20.2%), chronic conditions (13.3%), and biological factors (17.6%). After adjustment for all the mediators together, the association was attenuated by 71.5%. Discussion: Our findings indicated that long-term shift workers may have a higher risk of incident GORD, yet the excess risk may be explained by poor sleep quality, unhealthy behaviors, depressive symptoms, etc. This has positive implications for protecting the health of shift workers.


Assuntos
Refluxo Gastroesofágico , Jornada de Trabalho em Turnos , Humanos , Jornada de Trabalho em Turnos/efeitos adversos , Refluxo Gastroesofágico/epidemiologia , Comportamentos Relacionados com a Saúde , Qualidade do Sono
5.
BMC Genomics ; 24(1): 393, 2023 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-37442977

RESUMO

BACKGROUND: Due to the dynamic nature of enhancers, identifying enhancers and their strength are major bioinformatics challenges. With the development of deep learning, several models have facilitated enhancers detection in recent years. However, existing studies either neglect different length motifs information or treat the features at all spatial locations equally. How to effectively use multi-scale motifs information while ignoring irrelevant information is a question worthy of serious consideration. In this paper, we propose an accurate and stable predictor iEnhancer-DCSA, mainly composed of dual-scale fusion and spatial attention, automatically extracting features of different length motifs and selectively focusing on the important features. RESULTS: Our experimental results demonstrate that iEnhancer-DCSA is remarkably superior to existing state-of-the-art methods on the test dataset. Especially, the accuracy and MCC of enhancer identification are improved by 3.45% and 9.41%, respectively. Meanwhile, the accuracy and MCC of enhancer classification are improved by 7.65% and 18.1%, respectively. Furthermore, we conduct ablation studies to demonstrate the effectiveness of dual-scale fusion and spatial attention. CONCLUSIONS: iEnhancer-DCSA will be a valuable computational tool in identifying and classifying enhancers, especially for those not included in the training dataset.


Assuntos
Biologia Computacional , Elementos Facilitadores Genéticos , Biologia Computacional/métodos
6.
PLoS One ; 18(1): e0270945, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36662697

RESUMO

This study aimed to investigate the presence and persistence of antibiotics in wastewater of four typical pharmaceutical manufactories in China and receiving water bodies and suggest the removal of antibiotics by the wastewater treatment process. It also evaluated the environmental impact of antibiotic residues through wastewater discharge into receiving water bodies. The results indicated that thirteen antibiotics were detected in wastewater samples with concentrations ranging from 57.03 to 726.79 ng/L. Fluoroquinolones and macrolides were the most abundant antibiotic classes found in wastewater samples, accounting for 42.5% and 38.7% of total antibiotic concentrations, respectively, followed by sulfonamides (16.4%) and tetracyclines (2.4%). Erythromycin-H2O, lincomycin, ofloxacin, and trimethoprim were the most frequently detected antibiotics; among these antibiotics, the concentration of ofloxacin was the highest in most wastewater samples. No significant difference was found in different treatment processes used to remove antibiotics in wastewater samples. More than 50% of antibiotics were not completely removed with a removal efficiency of less than 70%. The concentration of detected antibiotics in the receiving water bodies was an order of magnitude lower than that in the wastewater sample due to dilution. An environmental risk assessment showed that lincomycin and ofloxacin could pose a high risk at the concentrations detected in effluents and a medium risk in their receiving water bodies, highlighting a potential hazard to the health of the aquatic ecosystem. Overall, The investigation was aimed to determine and monitor the concentration of selected antibiotics in 4 typical PMFs and their receiving water bodies, and to study the removal of these substances in PMFs. This study will provide significant data and findings for future studies on antibiotics-related pollution control and management in water bodies.


Assuntos
Antibacterianos , Poluentes Químicos da Água , Antibacterianos/análise , Águas Residuárias , Ecossistema , Poluentes Químicos da Água/análise , Monitoramento Ambiental , Ofloxacino , Lincomicina , China , Medição de Risco , Água , Preparações Farmacêuticas , Eliminação de Resíduos Líquidos
7.
Environ Pollut ; 313: 120139, 2022 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-36087892

RESUMO

Neonicotinoid insecticides (NNIs) had been detected in soil and surface water frequently because of extensive use worldwide, however, data regarding regional characteristics and potential influential factors of sediment were scarce. In the present study, eight NNIs were analyzed in 86 surface sediment samples from different regions (central cities, rural areas and suburbs) and land use types (construction land and crop land) in Jiangsu Province. NNIs were widespread in the sediments, with a mean value of 1.73 ± 0.89 ng g-1 dry weight (dw) (ranged from 0.41 to 3.87 ng g-1 dw). Imidaclothiz (IMIZ), dinotefuran (DIN) and nitenpyram (NIT) were the dominant compounds in the surface sediment, accounted for half of combined total. The results of regional distribution analysis show that NNIs were at higher concentrations in rural areas and crop land, while the residues of NNIs in lakes were more severe compare with rivers in Jiangsu Province. Region characteristics and land use types have an influence on residues of NNIs in surface sediment. Principal component analysis showed that residues of NNIs in surface sediment in Jiangsu Province mainly originated from protect grain crops (maize), fruit (apples, pears) and vegetables in agricultural systems. The residues of NNIs were found to be mostly concentrated in the northwest and northeast in Jiangsu Province, where were the area of intensive agriculture. To investigate the residues of NNIs, while identify the contributing factors, could provide a scientific basis for basic of region environment management and pollution control.


Assuntos
Inseticidas , Poluentes Químicos da Água , Inseticidas/análise , Lagos/análise , Neonicotinoides/análise , Rios/química , Solo , Água/análise , Poluentes Químicos da Água/análise
8.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1670-1684, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-32956036

RESUMO

Visual grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. Generally, it requires the machine to first understand the query, identify the key concepts in the image, and then locate the target object by specifying its bounding box. However, in many real-world visual grounding applications, we have to face with ambiguous queries and images with complicated scene structures. Identifying the target based on highly redundant and correlated information can be very challenging, and often leading to unsatisfactory performance. To tackle this, in this paper, we exploit an attention module for each kind of information to reduce internal redundancies. We then propose an accumulated attention (A-ATT) mechanism to reason among all the attention modules jointly. In this way, the relation among different kinds of information can be explicitly captured. Moreover, to improve the performance and robustness of our VG models, we additionally introduce some noises into the training procedure to bridge the distribution gap between the human-labeled training data and the real-world poor quality data. With this "noised" training strategy, we can further learn a bounding box regressor, which can be used to refine the bounding box of the target object. We evaluate the proposed methods on four popular datasets (namely ReferCOCO, ReferCOCO+, ReferCOCOg, and GuessWhat?!). The experimental results show that our methods significantly outperform all previous works on every dataset in terms of accuracy.


Assuntos
Algoritmos , Atenção , Humanos
9.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 211-227, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32750833

RESUMO

Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GAN methods. In this paper, rather than sampling from the predefined prior distribution, we propose a GAN model with local coordinate coding (LCC), termed LCCGAN, to improve the performance of the image generation. First, we propose an LCC sampling method in LCCGAN to sample meaningful points from the latent manifold. With the LCC sampling method, we can explicitly exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an improved version, namely LCCGAN++, by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN and LCCGAN++ and prove that a low-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on several benchmark datasets demonstrate the superiority of the proposed method over existing GAN methods.

10.
IEEE Trans Cybern ; 52(7): 5756-5766, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33635817

RESUMO

Recently, semisupervised feature selection has gained more attention in many real applications due to the high cost of obtaining labeled data. However, existing methods cannot solve the "multimodality" problem that samples in some classes lie in several separate clusters. To solve the multimodality problem, this article proposes a new feature selection method for semisupervised task, namely, semisupervised structured manifold learning (SSML). The new method learns a new structured graph which consists of more clusters than the known classes. Meanwhile, we propose to exploit the submanifold in both labeled data and unlabeled data by consuming the nearest neighbors of each object in both labeled and unlabeled objects. An iterative optimization algorithm is proposed to solve the new model. A series of experiments was conducted on both synthetic and real-world datasets and the experimental results verify the ability of the new method to solve the multimodality problem and its superior performance compared with the state-of-the-art methods.


Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado , Análise por Conglomerados , Aprendizagem
11.
iScience ; 24(7): 102718, 2021 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-34258553

RESUMO

Tumor multiregion sequencing reveals intratumor heterogeneity (ITH) and clonal evolution playing a key role in tumor progression and metastases. Large-scale high-depth multiregional sequencing of colorectal cancer, comparative analysis among patients with right-sided colon cancer (RCC), left-sided colon cancer (LCC), and rectal cancer (RC), as well as the study of lymph node metastasis (LN) with extranodal tumor deposits (ENTDs) from evolutionary perspective remain weakly explored. Here, we recruited 68 patients with RCC (18), LCC (20), and RC (30). We performed high-depth whole-exome sequencing of 206 tumor regions including 176 primary tumors, 19 LN, and 11 ENTD samples. Our results showed ITH with a Darwinian pattern of evolution and the evolution pattern of LCC and RC was more complex and divergent than RCC. Genetic and evolutionary evidences found that both LN and ENTD originated from different clones. Moreover, ENTD was a distinct entity from LN and evolved later.

12.
IEEE Trans Image Process ; 30: 6744-6756, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34264827

RESUMO

During the past decades, manifold ranking has been widely applied to content-based image retrieval and shown excellent performance. However, manifold ranking is computationally expensive in both graph construction and ranking learning. Much effort has been devoted to improve its performance by introducing approximating techniques. In this paper, we propose a fast manifold ranking method, namely Local Bipartite Manifold Ranking (LBMR). Given a set of images, we first extract multiple regions from each image to form a large image descriptor matrix, and then use the anchor-based strategy to construct a local bipartite graph in which a regional k -means (RKM) is proposed to obtain high quality anchors. We propose an iterative method to directly solve the manifold ranking problem from the local bipartite graph, which monotonically decreases the objective function value in each iteration until the algorithm converges. Experimental results on several real-world image datasets demonstrate the effectiveness and efficiency of our proposed method.

13.
IEEE Trans Image Process ; 30: 5652-5664, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34125678

RESUMO

Image co-segmentation is an active computer vision task that aims to segment the common objects from a set of images. Recently, researchers design various learning-based algorithms to undertake the co-segmentation task. The main difficulty in this task is how to effectively transfer information between images to make conditional predictions. In this paper, we present CycleSegNet, a novel framework for the co-segmentation task. Our network design has two key components: a region correspondence module which is the basic operation for exchanging information between local image regions, and a cycle refinement module, which utilizes ConvLSTMs to progressively update image representations and exchange information in a cycle and iterative manner. Extensive experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on four popular benchmark datasets - PASCAL VOC dataset, MSRC dataset, Internet dataset, and iCoseg dataset, by 2.6%, 7.7%, 2.2%, and 2.9%, respectively.

14.
ACS Appl Mater Interfaces ; 13(21): 24702-24709, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34027657

RESUMO

Methanol aqueous phase reforming (MAPR) reaction under mild conditions is one of the most practical ways to generate hydrogen (H2), in which the liquid vaporization unit could be removed by the water phase reforming, making the structure of an in situ H2 production reactor more compact. In this work, the H2 production performances of the metal-free catalyst, N-doped carbon dots/g-C3N4 (NCDs/g-C3N4; CN-x) composites, was investigated for the MAPR reaction under low temperature and normal pressure. The optimized metal-free catalyst (NCDs/g-C3N4; CN-0.7) displays a H2 yield of 19.5 µmol g-1 h-1 at 80 °C. More importantly, a clear understanding on the effective MAPR reaction at low temperature and normal pressure was acquired from in situ diffuse reflectance FTIR spectroscopy and the transient photovoltage test. The introduction of NCDs leads to the localization of surface charge, which is beneficial to the selective adsorption and polarization activation of polar molecules on the catalyst surface. This work provides a new strategy for the carbon-based catalyst design of the MAPR reaction at low temperatures.

15.
Chin Med J (Engl) ; 134(7): 821-828, 2021 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-33797468

RESUMO

BACKGROUND: Colorectal cancer is harmful to the patient's life. The treatment of patients is determined by accurate preoperative staging. Magnetic resonance imaging (MRI) played an important role in the preoperative examination of patients with rectal cancer, and artificial intelligence (AI) in the learning of images made significant achievements in recent years. Introducing AI into MRI recognition, a stable platform for image recognition and judgment can be established in a short period. This study aimed to establish an automatic diagnostic platform for predicting preoperative T staging of rectal cancer through a deep neural network. METHODS: A total of 183 rectal cancer patients' data were collected retrospectively as research objects. Faster region-based convolutional neural networks (Faster R-CNN) were used to build the platform. And the platform was evaluated according to the receiver operating characteristic (ROC) curve. RESULTS: An automatic diagnosis platform for T staging of rectal cancer was established through the study of MRI. The areas under the ROC curve (AUC) were 0.99 in the horizontal plane, 0.97 in the sagittal plane, and 0.98 in the coronal plane. In the horizontal plane, the AUC of T1 stage was 1, AUC of T2 stage was 1, AUC of T3 stage was 1, AUC of T4 stage was 1. In the coronal plane, AUC of T1 stage was 0.96, AUC of T2 stage was 0.97, AUC of T3 stage was 0.97, AUC of T4 stage was 0.97. In the sagittal plane, AUC of T1 stage was 0.95, AUC of T2 stage was 0.99, AUC of T3 stage was 0.96, and AUC of T4 stage was 1.00. CONCLUSION: Faster R-CNN AI might be an effective and objective method to build the platform for predicting rectal cancer T-staging. TRIAL REGISTRATION: chictr.org.cn: ChiCTR1900023575; http://www.chictr.org.cn/showproj.aspx?proj=39665.


Assuntos
Inteligência Artificial , Neoplasias Retais , Humanos , Imageamento por Ressonância Magnética , Estadiamento de Neoplasias , Redes Neurais de Computação , Neoplasias Retais/diagnóstico por imagem , Neoplasias Retais/patologia , Estudos Retrospectivos
16.
Neural Netw ; 138: 98-109, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33636485

RESUMO

Training a deep convolutional network from scratch requires a large amount of labeled data, which however may not be available for many practical tasks. To alleviate the data burden, a practical approach is to adapt a pre-trained model learned on the large source domain to the target domain, but the performance can be limited when the source and target domain data distributions have large differences. Some recent works attempt to alleviate this issue by imposing feature alignment over the intermediate feature maps between the source and target networks. However, for a source model, many of the channels/spatial-features for each layer can be irrelevant to the target task. Thus, directly applying feature alignment may not achieve promising performance. In this paper, we propose an Attentive Feature Alignment (AFA) method for effective domain knowledge transfer by identifying and attending on the relevant channels and spatial features between two domains. To this end, we devise two learnable attentive modules at both the channel and spatial levels. We then sequentially perform attentive spatial- and channel-level feature alignments between the source and target networks, in which the target model and attentive module are learned simultaneously. Moreover, we theoretically analyze the generalization performance of our method, which confirms its superiority to existing methods. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our method. The source code and the pre-trained models are available at https://github.com/xiezheng-cs/AFAhttps://github.com/xiezheng-cs/AFA.


Assuntos
Aprendizado de Máquina , Software/normas
18.
Nat Commun ; 12(1): 483, 2021 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-33473132

RESUMO

Artificial photosynthesis of H2O2 from H2O and O2, as a spotless method, has aroused widespread interest. Up to date, most photocatalysts still suffer from serious salt-deactivated effects with huge consumption of photogenerated charges, which severely limit their wide application. Herein, by using a phenolic condensation approach, carbon dots, organic dye molecule procyanidins and 4-methoxybenzaldehyde are composed into a metal-free photocatalyst for the photosynthetic production of H2O2 in seawater. This catalyst exhibits high photocatalytic ability to produce H2O2 with the yield of 1776 µmol g-1h-1 (λ ≥ 420 nm; 34.8 mW cm-2) in real seawater, about 4.8 times higher than the pure polymer. Combining with in-situ photoelectrochemical and transient photovoltage analysis, the active site and the catalytic mechanism of this composite catalyst in seawater are also clearly clarified. This work opens up an avenue for a highly efficient and practical, available catalyst for H2O2 photoproduction in real seawater.

19.
IEEE Trans Neural Netw Learn Syst ; 32(6): 2744-2757, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32701451

RESUMO

Developing an abstractive text summarization (ATS) system that is capable of generating concise, appropriate, and plausible summaries for the source documents is a long-term goal of artificial intelligence (AI). Recent advances in ATS are overwhelmingly contributed by deep learning techniques, which have taken the state-of-the-art of ATS to a new level. Despite the significant success of previous methods, generating high-quality and human-like abstractive summaries remains a challenge in practice. The human reading cognition, which is essential for reading comprehension and logical thinking, is still relatively new territory and underexplored in deep neural networks. In this article, we propose a novel Hierarchical Human-like deep neural network for ATS (HH-ATS), inspired by the process of how humans comprehend an article and write the corresponding summary. Specifically, HH-ATS is composed of three primary components (i.e., a knowledge-aware hierarchical attention module, a multitask learning module, and a dual discriminator generative adversarial network), which mimic the three stages of human reading cognition (i.e., rough reading, active reading, and postediting). Experimental results on two benchmark data sets (CNN/Daily Mail and Gigaword) demonstrate that HH-ATS consistently and substantially outperforms the compared methods.

20.
Artigo em Inglês | MEDLINE | ID: mdl-33055029

RESUMO

Image captioning, which aims to generate a sentence to describe the key content of a query image, is an important but challenging task. Existing image captioning approaches can be categorised into two types: generation-based methods and retrieval-based methods. Retrieval-based methods describe images by retrieving pre-existing captions from a repository. Generation-based methods synthesize a new sentence that verbalizes the query image. Both ways have certain advantages but suffer from their own disadvantages. In the paper, we propose a novel EnsCaption model, which aims at enhancing an ensemble of retrieval-based and generation-based image captioning methods through a novel dual generator generative adversarial network. Specifically, EnsCaption is composed of a caption generation model that synthesizes tailored captions for the query image, a caption re-ranking model that retrieves the best-matching caption from a candidate caption pool consisting of generated captions and pre-retrieved captions, and a discriminator that learns the multi-level difference between the generated/retrieved captions and the ground-truth captions. During the adversarial training process, the caption generation model and the caption re-ranking model provide improved synthetic and retrieved candidate captions with high ranking scores from the discriminator, while the discriminator based on multi-level ranking is trained to assign low ranking scores to the generated and retrieved image captions. Our model absorbs the merits of both generation-based and retrieval-based approaches. We conduct comprehensive experiments to evaluate the performance of EnsCaption on two benchmark datasets: MSCOCO and Flickr-30K. Experimental results show that EnsCaption achieves impressive performance compared to the strong baseline methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...