Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
iScience ; 27(6): 110088, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38947498

RESUMO

While photocatalytic technology has brought additional opportunities and possibilities for the green conversion and sustainable development of ammonium-based nitrogen fertilizers, the low activation efficiency of the molecular N2 has impeded its further application feasibility. Here to address the concern, we designed an amorphous molybdenum hydroxide anchored on the ultrathin magnesium-aluminum layered double hydroxide (Mo@MgAl-LDH) nanosheets for benefiting the N2 photofixation to NH3. With the aid of the designed amorphous Mo(V) species, the pristine MgAl-LDH exhibited a considerable performance of nitrogen photofixation under visible light irradiation (NH3 production rate of 114.4 µmol g-1 h-1) due to the improved N2 activation efficiency. The work demonstrated a feasible strategy for nitrogen photofixation using amorphous Mo(V) species, which may also deliver a novel inspiration for the development of amorphous photocatalysts toward the photoactivation of molecular N2.

2.
IEEE Trans Image Process ; 33: 3893-3906, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38896516

RESUMO

With the increasing availability of cameras in vehicles, obtaining license plate (LP) information via on-board cameras has become feasible in traffic scenarios. LPs play a pivotal role in vehicle identification, making automatic LP detection (ALPD) a crucial area within traffic analysis. Recent advancements in deep learning have spurred a surge of studies in ALPD. However, the computational limitations of on-board devices hinder the performance of real-time ALPD systems for moving vehicles. Therefore, we propose a real-time frame-by-frame LP detector focusing on real-time accurate LP detection. Specifically, video frames are categorized into keyframes and non-keyframes. Keyframes are processed by a deeper network (high-level stream), while non-keyframes are handled by a lightweight network (low-level stream), significantly enhancing efficiency. To achieve accurate detection, we design a knowledge distillation strategy to boost the performance of low-level stream and a feature propagation method to introduce the temporal clues in video LP detection. Our contributions are: (1) A real-time frame-by-frame LP detector for video LP detection is proposed, achieving a competitive performance with popular one-stage LP detectors. (2) A simple feature-based knowledge distillation strategy is introduced to improve the low-level stream performance. (3) A spatial-temporal attention feature propagation method is designed to refine the features from non-keyframes guided by the memory features from keyframes, leveraging the inherent temporal correlation in videos. The ablation studies show the effectiveness of knowledge distillation strategy and feature propagation method.

3.
J Am Chem Soc ; 146(7): 4842-4850, 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38295276

RESUMO

Although polylactic acid (PLA) represents a pivotal biodegradable polymer, its biodegradability has inadvertently overshadowed the development of effective recycling techniques, leading to the potential wastage of carbon resources. The photoreforming-recycling approach for PLA exhibits significant potential in terms of concepts and methods. However, the reaction faces enormous challenges due to the limited selectivity of organic oxidation products as well as the increased costs and challenging separation of organic products associated with alkali-solution-assisted prehydrolysis. Herein, we report an alkali-free direct-photoreforming pathway for real-world PLA plastics utilizing the Pd-CdS photocatalyst under visible-light illumination, obviating the need for chemical pretreatment of PLA. The devised pathway successfully produces H2 at a rate of 49.8 µmol gcat.-1 h-1, sustained over 100 h, and exhibits remarkable selectivity toward pyruvic acid (95.9% in liquid products). Additionally, experimental findings elucidate that Pd sites not only function as a typical cocatalyst for enhancing the photocatalytic evolution of H2 but also suppress competitive side reactions (e.g., lactic acid coupling or decarboxylation), consequently augmenting the yield and selectivity of pyruvic acid and H2. This investigation provides a straightforward and sustainable direct-photoreforming route capable of simultaneously mitigating and repurposing plastic waste into valuable chemicals, thus offering a promising solution to the current environmental challenges.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15949-15963, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37665706

RESUMO

With the explosive growth of videos, weakly-supervised temporal action localization (WS-TAL) task has become a promising research direction in pattern analysis and machine learning. WS-TAL aims to detect and localize action instances with only video-level labels during training. Modern approaches have achieved impressive progress via powerful deep neural networks. However, robust and reliable WS-TAL remains challenging and underexplored due to considerable uncertainty caused by weak supervision, noisy evaluation environment, and unknown categories in the open world. To this end, we propose a new paradigm, named vectorized evidential learning (VEL), to explore local-to-global evidence collection for facilitating model performance. Specifically, a series of learnable meta-action units (MAUs) are automatically constructed, which serve as fundamental elements constituting diverse action categories. Since the same meta-action unit can manifest as distinct action components within different action categories, we leverage MAUs and category representations to dynamically and adaptively learn action components and action-component relations. After performing uncertainty estimation at both category-level and unit-level, the local evidence from action components is accumulated and optimized under the Subject Logic theory. Extensive experiments on the regular, noisy, and open-set settings of three popular benchmarks show that VEL consistently obtains more robust and reliable action localization performance than state-of-the-arts.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15896-15911, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37624714

RESUMO

Weakly-supervised temporal action localization (WTAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly arises from background noise and neglect of non-salient action snippets. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WTAL, called Uncertainty-aware Dual-Evidential Learning (UDEL), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal with the guidance of epistemic and aleatoric uncertainties, of which the former comes from models lacking knowledge, while the latter comes from the inherent properties of samples themselves. Specifically, targeting excluding the undesirable background snippets, we fuse the video-level epistemic and aleatoric uncertainties to measure the interference of background noise to video-level prediction. Then, the snippet-level aleatoric uncertainty is further deduced for progressive mutual learning, which gradually focuses on the entire action instances in an "easy-to-hard" manner and encourages the snippet-level epistemic uncertainty to be complementary with the foreground attention scores. Extensive experiments show that UDEL achieves state-of-the-art performance on four public benchmarks. Our code is available in github/mengyuanchen2021/UDEL.

6.
ChemSusChem ; 16(22): e202300944, 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-37528771

RESUMO

Drawing inspiration from the enzyme nitrogenase in nature, researchers are increasingly delving into semiconductor photocatalytic nitrogen fixation due to its similar surface catalytic processes. Herein, we reported a facile and efficient approach to achieving the regulation of ZnO/ZnCr2 O4 photocatalysts with ZnCr-layered double hydroxide (ZnCr-LDH) as precursors. By optimizing the composition ratio of Zn/Cr in ZnCr-LDH to tune interfaces, we can achieve an enhanced nitrogen photofixation performance (an ammonia evolution rate of 31.7 µmol g-1 h-1 using pure water as a proton source) under ambient conditions. Further, photo-electrochemical measurements and transient surface photovoltage spectroscopy revealed that the enhanced photocatalytic activity can be ascribed to the effective carrier separation efficiency, originating from the abundant composite interfaces. This work further demonstrated a promising and viable strategy for the synthesis of nanocomposite photocatalysts for nitrogen photofixation and other challenging photocatalytic reactions.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12427-12443, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37335790

RESUMO

Weakly-supervised temporal action localization (WSTAL) aims to automatically identify and localize action instances in untrimmed videos with only video-level labels as supervision. In this task, there exist two challenges: (1) how to accurately discover the action categories in an untrimmed video (what to discover); (2) how to elaborately focus on the integral temporal interval of each action instance (where to focus). Empirically, to discover the action categories, discriminative semantic information should be extracted, while robust temporal contextual information is beneficial for complete action localization. However, most existing WSTAL methods ignore to explicitly and jointly model the semantic and temporal contextual correlation information for the above two challenges. In this article, a Semantic and Temporal Contextual Correlation Learning Network (STCL-Net) with the semantic (SCL) and temporal contextual correlation learning (TCL) modules is proposed, which achieves both accurate action discovery and complete action localization by modeling the semantic and temporal contextual correlation information for each snippet in the inter- and intra-video manners respectively. It is noteworthy that the two proposed modules are both designed in a unified dynamic correlation-embedding paradigm. Extensive experiments are performed on different benchmarks. On all the benchmarks, our proposed method exhibits superior or comparable performance in comparison to the existing state-of-the-art models, especially achieving gains as high as 7.2% in terms of the average mAP on THUMOS-14. In addition, comprehensive ablation studies also verify the effectiveness and robustness of each component in our model.

8.
Artigo em Inglês | MEDLINE | ID: mdl-37028355

RESUMO

Crowd localization is to predict each instance head position in crowd scenarios. Since the distance of pedestrians being to the camera are variant, there exists tremendous gaps among scales of instances within an image, which is called the intrinsic scale shift. The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic. To this end, the paper concentrates on access to tackle the chaos of the scale distribution incurred by intrinsic scale shift.We propose Gaussian Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely, the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution and decouples the mixture model into sub-normal distributions to regularize the chaos within the sub-distributions. Then, an alignment is introduced to regularize the chaos among sub-distributions. However, despite that GMS is effective in regularizing the data distribution, it amounts to dislodging the hard samples in training set, which incurs overfitting. We assert that it is blamed on the block of transferring the latent knowledge exploited by GMS from data to model. Therefore, a Scoped Teacher playing a role of bridge in knowledge transform is proposed. What' s more, the consistency regularization is also introduced to implement knowledge transform. To that effect, the further constraints are deployed on Scoped Teacher to derive feature consistence between teacher and student end. With proposed GMS and Scoped Teacher implemented on four mainstream datasets of crowd localization, the extensive experiments demonstrate the superiority of our work. Moreover, comparing with existing crowd locators, our work achieves state-of-the-art via F1-measure comprehensively on four datasets.

9.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4803-4815, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34767512

RESUMO

Recently, crowd counting using supervised learning achieves a remarkable improvement. Nevertheless, most counters rely on a large amount of manually labeled data. With the release of synthetic crowd data, a potential alternative is transferring knowledge from them to real data without any manual label. However, there is no method to effectively suppress domain gaps and output elaborate density maps during the transferring. To remedy the above problems, this article proposes a domain-adaptive crowd counting (DACC) framework, which consists of a high-quality image translation and density map reconstruction. To be specific, the former focuses on translating synthetic data to realistic images, which prompts the translation quality by segregating domain-shared/independent features and designing content-aware consistency loss. The latter aims at generating pseudo labels on real scenes to improve the prediction quality. Next, we retrain a final counter using these pseudo labels. Adaptation experiments on six real-world datasets demonstrate that the proposed method outperforms the state-of-the-art methods.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3933-3948, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35657841

RESUMO

We target at the task of weakly-supervised video object grounding (WSVOG), where only video-sentence annotations are available during model learning. It aims to localize objects described in the sentence to visual regions in the video, which is a fundamental capability needed in pattern analysis and machine learning. Despite the recent progress, existing methods all suffer from the severe problem of spurious association, which will harm the grounding performance. In this paper, we start from the definition of WSVOG and pinpoint the spurious association from two aspects: (1) the association itself is not object-relevant but extremely ambiguous due to weak supervision; and (2) the association is unavoidably confounded by the observational bias when taking the statistics-based matching strategy in existing methods. With this in mind, we design a unified causal framework to learn the deconfounded object-relevant association for more accurate and robust video object grounding. Specifically, we learn the object-relevant association by causal intervention from the perspective of video data generation process. To overcome the problems of lacking fine-grained supervision in terms of intervention, we propose a novel spatial-temporal adversarial contrastive learning paradigm. To further remove the accompanying confounding effect within the object-relevant association, we pursue the true causality by conducting causal intervention via backdoor adjustment. Finally, the deconfounded object-relevant association is learned and optimized under a unified causal framework in an end-to-end manner. Extensive experiments on both IID and OOD testing sets of three benchmarks demonstrate its accurate and robust grounding performance against state-of-the-arts.

11.
IEEE Trans Image Process ; 31: 7363-7377, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36409819

RESUMO

Point-level weakly-supervised temporal action localization (P-WSTAL) aims to localize temporal extents of action instances and identify the corresponding categories with only a single point label for each action instance for training. Due to the sparse frame-level annotations, most existing models are in the localization-by-classification pipeline. However, there exist two major issues in this pipeline: large intra-action variation due to task gap between classification and localization and noisy classification learning caused by unreliable pseudo training samples. In this paper, we propose a novel framework CRRC-Net, which introduces a co-supervised feature learning module and a probabilistic pseudo label mining module, to simultaneously address the above two issues. Specifically, the co-supervised feature learning module is applied to exploit the complementary information in different modalities for learning more compact feature representations. Furthermore, the probabilistic pseudo label mining module utilizes the feature distances from action prototypes to estimate the likelihood of pseudo samples and rectify their corresponding labels for more reliable classification learning. Comprehensive experiments are conducted on different benchmarks and the experimental results show that our method achieves favorable performance with the state-of-the-art.

12.
IEEE Trans Image Process ; 31: 6032-6047, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36103439

RESUMO

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighborhood attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism. Based on the multi-focus GNA, we develop a unified neural network called GNANet to accurately locate head centers in video clips by fully aggregating spatial-temporal information via a scene modeling module and a context cross-attention module. Moreover, to facilitate future researches in this field, we introduce a large-scale crowd video benchmark named VSCrowd (https://github.com/HopLee6/VSCrowd), which consists of 60K+ frames captured in various surveillance scenes and 2M+ head annotations. Finally, we conduct extensive experiments on three datasets including our VSCrowd, and the experiment results show that the proposed method is capable to achieve state-of-the-art performance for both video crowd localization and counting.

13.
IEEE Trans Cybern ; 52(6): 4675-4687, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33259316

RESUMO

Recently, crowd counting draws much attention on account of its significant meaning in congestion control, public safety, and ecological surveys. Although the performance is improved dramatically due to the development of deep learning, the scales of these networks also become larger and more complex. Moreover, a large model also entails more time to train for better performance. To tackle these problems, this article first constructs a lightweight model, which is composed of an image feature encoder and a simple but effective decoder, called the pixel shuffle decoder (PSD). PSD ends with a pixel shuffle operator, which can display more density information without increasing the number of convolutional layers. Second, a density-aware curriculum learning (DCL) training strategy is designed to fully tap the potential of crowd counting models. DCL gives each predicted pixel a weight to determine its predicting difficulty and provides guidance on obtaining better generalization. Experimental results exhibit that PSD can achieve outstanding performance on most mainstream datasets while training under the DCL training framework. Besides, we also conduct some experiments about adopting DCL on existing typical crowd counters, and the results show that they all obtain new better performance than before, which further validates the effectiveness of our method.


Assuntos
Currículo
14.
IEEE Trans Neural Netw Learn Syst ; 33(3): 1066-1078, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33290231

RESUMO

Many CNN-based segmentation methods have been applied in lane marking detection recently and gain excellent success for a strong ability in modeling semantic information. Although the accuracy of lane line prediction is getting better and better, lane markings' localization ability is relatively weak, especially when the lane marking point is remote. Traditional lane detection methods usually utilize highly specialized handcrafted features and carefully designed postprocessing to detect the lanes. However, these methods are based on strong assumptions and, thus, are prone to scalability. In this work, we propose a novel multitask method that: 1) integrates the ability to model semantic information of CNN and the strong localization ability provided by handcrafted features and 2) predicts the position of vanishing line. A novel lane fitting method based on vanishing line prediction is also proposed for sharp curves and nonflat road in this article. By integrating segmentation, specialized handcrafted features, and fitting, the accuracy of location and the convergence speed of networks are improved. Extensive experimental results on four-lane marking detection data sets show that our method achieves state-of-the-art performance.

15.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3238-3250, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-33502985

RESUMO

Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning. When it comes to specific tasks, we find that the domain shifts are reflected in model parameters' differences. To describe the domain gap directly at the parameter level, we propose a neuron linear transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters. Finally, the target neuron is generated via a linear transformation. Extensive experiments and analysis on six real-world data sets validate that NLT achieves top performance compared with other domain adaptation methods. An ablation study also shows that the NLT is robust and more effective than supervised and fine-tune training. Code is available at https://github.com/taohan10200/NLT.


Assuntos
Simulação por Computador , Neurônios , Redes Neurais de Computação , Neurônios/metabolismo
16.
Int J Ment Health Nurs ; 30(4): 975-987, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33811426

RESUMO

This study aimed to investigate the mental health status of nurses from low-risk areas of novel coronavirus (COVID-19) pandemic, its potential impact factors, and the main stressors under the normalized prevention and control in China. A mobile phone app-based survey was conducted among registered nurses in Jiangsu province via a region-stratified sampling method. The questionnaire consisted of items on the demographic characteristics of the nursing staff and their Depression, Anxiety, Stress Scale-21 (DASS-21) along with questions for self-assessment of stressors that are associated with COVID-19. STROBE guideline was used. Among 1803 nurses who were working in the low-risk areas in Jiangsu, 22.0%, 29.8%, and 16.1% of them reported moderate to extreme levels of depression, anxiety, and stress, respectively. Having 11-15 years of working experience and being a fixed-term contract nurse were associated with experiencing worse mental health outcomes while supporting-Wuhan working experience and having mental health preparation course training were independent factors that had beneficial impact on their psychological well-being afterward. In terms of source of pressure, a key finding of this study is that the main stressor among these nurses was the lack of patient's understanding and cooperation (71.2%) which calls for better psychosocial communication between nurses and patients. The present findings would provide information for other regions at low risk of COVID-19 and may aid the provision of support and interventions for the benefit of the psychological well-being of nurses who are exposed to life-threatening occupational risks and are more vulnerable to the pandemic than others.


Assuntos
COVID-19 , Enfermeiras e Enfermeiros , Ansiedade , China/epidemiologia , Estudos Transversais , Depressão , Nível de Saúde , Humanos , Pandemias , SARS-CoV-2 , Inquéritos e Questionários
17.
IEEE Trans Pattern Anal Mach Intell ; 43(10): 3476-3491, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32305892

RESUMO

With the explosive growth of video categories, zero-shot learning (ZSL) in video classification has become a promising research direction in pattern analysis and machine learning. Based on some auxiliary information such as word embeddings and attributes, the key to a robust ZSL method is to transfer the learned knowledge from seen classes to unseen classes, which requires relationship modeling between these concepts (e.g., categories and attributes). However, most existing approaches ignore to model the explicit relationships in an end-to-end manner, resulting in low effectiveness of knowledge transfer. To tackle this problem, we reconsider the video ZSL task as a task-driven message passing process to jointly enjoy several merits including alleviated heterogeneity gap, low domain shift, and robust temporal modeling. Specifically, we propose a prototype-sample GNN (PS-GNN) consisting of a prototype branch and a sample branch to directly and adaptively model all the relationships between category-attribute, category-category, and attribute-attribute. The prototype branch aims to learn robust representations of video categories, which takes as input a set of word-embedding vectors corresponding to the concepts. The sample branch is designed to generate features of a video sample by leveraging its object semantics. With the co-adaption and cooperation between both branches, a unified and robust ZSL framework is achieved. Extensive experiments strongly evidence that PS-GNN obtains favorable performance on five popular video benchmarks consistently.

18.
IEEE Trans Cybern ; 51(10): 4822-4833, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33259318

RESUMO

With the development of deep neural networks, the performance of crowd counting and pixel-wise density estimation is continually being refreshed. Despite this, there are still two challenging problems in this field: 1) current supervised learning needs a large amount of training data, but collecting and annotating them is difficult and 2) existing methods cannot generalize well to the unseen domain. A recently released synthetic crowd dataset alleviates these two problems. However, the domain gap between the real-world data and synthetic images decreases the models' performance. To reduce the gap, in this article, we propose a domain-adaptation-style crowd counting method, which can effectively adapt the model from synthetic data to the specific real-world scenes. It consists of multilevel feature-aware adaptation (MFA) and structured density map alignment (SDA). To be specific, MFA boosts the model to extract domain-invariant features from multiple layers. SDA guarantees the network outputs fine density maps with a reasonable distribution on the real domain. Finally, we evaluate the proposed method on four mainstream surveillance crowd datasets, Shanghai Tech Part B, WorldExpo'10, Mall, and UCSD. Extensive experiments are evidence that our approach outperforms the state-of-the-art methods for the same cross-domain counting problem.

19.
IEEE Trans Pattern Anal Mach Intell ; 43(6): 2141-2149, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32750840

RESUMO

In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc. Many convolutional neural networks (CNN) are designed for tackling this task. However, currently released datasets are so small-scale that they can not meet the needs of the supervised CNN-based algorithms. To remedy this problem, we construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range ( 0 âˆ¼ 20,033). Besides, a benchmark website is developed for impartially evaluating the different methods, which allows researchers to submit the results of the test set. Based on the proposed dataset, we further describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data. What's more, the benchmark is deployed at https://www.crowdbenchmark.com/, and the dataset/code/models/results are available at https://gjy3035.github.io/NWPU-Crowd-Sample-Code/.

20.
IEEE Trans Image Process ; 28(9): 4376-4386, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30998470

RESUMO

Semantic segmentation, a pixel-level vision task, is rapidly developed by using convolutional neural networks (CNNs). Training CNNs requires a large amount of labeled data, but manually annotating data is difficult. For emancipating manpower, in recent years, some synthetic datasets are released. However, they are still different from real scenes, which causes that training a model on the synthetic data (source domain) cannot achieve a good performance on real urban scenes (target domain). In this paper, we propose a weakly supervised adversarial domain adaptation to improve the segmentation performance from synthetic data to real scenes, which consists of three deep neural networks. A detection and segmentation (DS) model focuses on detecting objects and predicting segmentation map; a pixel-level domain classifier (PDC) tries to distinguish the image features from which domains; and an object-level domain classifier (ODC) discriminates the objects from which domains and predicts object classes. PDC and ODC are treated as the discriminators, and DS is considered as the generator. By the adversarial learning, DS is supposed to learn domain-invariant features. In experiments, our proposed method yields the new record of mIoU metric in the same problem.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...