Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 92
Filtrar
1.
J Environ Sci (China) ; 150: 594-603, 2025 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-39306432

RESUMO

Eutrophication is a significant challenge for surface water, with sediment phosphorus (P) release being a key contributor. Although biological aluminum-based P-inactivation agent (BA-PIA) has shown effectiveness in controlling P release from sediment, the efficiency and mechanism by BA-PIA capping is still not fully understood. This study explored the efficiency and mechanism of using BA-PIA capping controlling P release from sediment. The main mechanisms controlling P release from sediment via BA-PIA capping involved transforming mobile and less stable fractions into stable ones, passivating DGT-labile P and establishing a 13 mm 'P static layer' within the sediment. Additionally, BA-PIA's impact on Fe redox processes significantly influenced P release from the sediment. After BA-PIA capping, notable reductions were observed in total P, soluble reactive P (SRP), and diffusive gradient in thin-films (DGT)-measured labile P (DGT-labile P) concentration in the overlying water, with reduction rates of 95.6%, 92.7%, and 96.5%, respectively. After BA-PIA capping, the diffusion flux of SRP across the sediment-water interface and the apparent P diffusion flux decreased by 91.3% and 97.8%, respectively. Additionally, BA-PIA capping led to reduced concentrations of SRP, DGT-labile P, and DGT-measured labile Fe(II) in the sediment interstitial water. Notably, BA-PIA capping significantly reduced P content and facilitated transformation in the 0∼30 mm sediment layers but not in the 30∼45 mm and 45∼60 mm sediment layers for NaOH-extractable inorganic P and HCl-extracted P. These findings offer a theoretical basis and technical support for the practical application of BA-PIA capping to control P release from sediment.


Assuntos
Alumínio , Sedimentos Geológicos , Fósforo , Poluentes Químicos da Água , Fósforo/análise , Fósforo/química , Sedimentos Geológicos/química , Alumínio/química , Poluentes Químicos da Água/análise , Poluentes Químicos da Água/química , Eutrofização
2.
Artigo em Inglês | MEDLINE | ID: mdl-39259636

RESUMO

In this work, by re-examining the "matching" nature of Anomaly Detection (AD), we propose a novel AD framework that simultaneously enjoys new records of AD accuracy and dramatically high running speed. In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion. Given a test sample, the top-K most similar training images are first selected based on a robust histogram matching process. Secondly, the nearest neighbor of each test patch is retrieved over the similar geometrical locations on those "most similar images", by using a carefully trained local metric. Finally, the anomaly score of each test image patch is calculated based on the distance to its "nearest neighbor" and the "non-background" probability. The proposed method is termed "Cascade Patch Retrieval" (CPR) in this work. Different from the previous patch-matching-based AD algorithms, CPR selects proper "targets" (reference images and patches) before "shooting" (patch-matching). On the well-acknowledged MVTec AD, BTAD and MVTec-3D AD datasets, the proposed algorithm consistently outperforms all the comparing SOTA methods by remarkable margins, measured by various AD metrics. Furthermore, CPR is extremely efficient. It runs at the speed of 113 FPS with the standard setting while its simplified version only requires less than 1 ms to process an image at the cost of a trivial accuracy drop. The code of CPR is available at https://github.com/flyinghu123/CPR.

3.
Artigo em Inglês | MEDLINE | ID: mdl-39150798

RESUMO

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. State-of-the-art (SoTA) monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method currently ranks the 1st on various zero-shot and non-zero-shot benchmarks for metric depth, affine-invariant-depth as well as surface-normal prediction, shown in Fig. 1. Notably, we surpassed the ultra-recent MarigoldDepth and DepthAnything on various depth benchmarks including NYUv2 and KITTI. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. These applications highlight the versatility of Metric3D v2 models as geometric foundation models. Our project page is at https://JUGGHM.github.io/Metric3Dv2.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38743546

RESUMO

In this article, we investigate self-supervised 3D scene flow estimation and class-agnostic motion prediction on point clouds. A realistic scene can be well modeled as a collection of rigidly moving parts, therefore its scene flow can be represented as a combination of rigid motion of these individual parts. Building upon this observation, we propose to generate pseudo scene flow labels for self-supervised learning through piecewise rigid motion estimation, in which the source point cloud is decomposed into local regions and each region is treated as rigid. By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels. To mitigate the impact of potential outliers on label generation, when solving the rigid registration for each region, we alternately perform three steps: establishing point correspondences, measuring the confidence for the correspondences, and updating the rigid transformation based on the correspondences and their confidence. As a result, confident correspondences will dominate label generation, and a validity mask will be derived for the generated pseudo labels. By using the pseudo labels together with their validity mask for supervision, models can be trained in a self-supervised manner. Extensive experiments on FlyingThings3D and KITTI datasets demonstrate that our method achieves new state-of-the-art performance in self-supervised scene flow learning, without any ground truth scene flow for supervision, even performing better than some supervised counterparts. Additionally, our method is further extended to class-agnostic motion prediction and significantly outperforms previous state-of-the-art self-supervised methods on nuScenes dataset.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15665-15679, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37669204

RESUMO

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14905-14919, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37672381

RESUMO

Medical image benchmarks for the segmentation of organs and tumors suffer from the partially labeling issue due to its intensive cost of labor and expertise. Current mainstream approaches follow the practice of one network solving one task. With this pipeline, not only the performance is limited by the typically small dataset of a single task, but also the computation cost linearly increases with the number of tasks. To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets. Specifically, TransDoDNet has a hybrid backbone that is composed of the convolutional neural network and Transformer. A dynamic head enables the network to accomplish multiple segmentation tasks flexibly. Unlike existing approaches that fix kernels after training, the kernels in the dynamic head are generated adaptively by the Transformer, which employs the self-attention mechanism to model long-range organ-wise dependencies and decodes the organ embedding that can represent each organ. We create a large-scale partially labeled Multi-Organ and Tumor Segmentation benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors on seven organ and tumor segmentation tasks. This study also provides a general 3D medical image segmentation model, which has been pre-trained on the large-scale MOTS benchmark and has demonstrated advanced performance over current predominant self-supervised learning methods.


Assuntos
Algoritmos , Neoplasias , Humanos , Neoplasias/diagnóstico por imagem , Benchmarking , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
7.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12459-12473, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37167046

RESUMO

Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct quantization based on the pruned model. However, this strategy may ignore that the pruning and quantization would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations (i.e., pruning rates or bitwidths). Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address these issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS) for automatic loss-aware model compression. To this end, we consider the network pruning as a special case of quantization and provide a unified view for model pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations, where a high bitwidth value will be decomposed into the sum of a lowest bitwidth value and a series of re-assignment offsets. Relying on the single-path model, we introduce learnable binary gates to encode the choice of configurations and learn the binary gates and model parameters jointly. More importantly, the configuration search problem can be transformed into a subset selection problem, which helps to significantly reduce the optimization difficulty and computation cost. In this way, the compression configurations of each layer and the trade-off between pruning and quantization can be automatically determined. Extensive experiments on CIFAR-100 and ImageNet show that SBS significantly reduces computation cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6× Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.

8.
Food Chem ; 412: 135543, 2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-36724717

RESUMO

Furan compounds actively contribute to the characteristics of brandy. Herein, we have attempted to identify and quantify the furan compounds present in brandy using three different extraction methods combined with comprehensive two-dimensional gas chromatography and time-of-flight mass spectrometry. Threshold determination and omission experiments were carried out to verify their organoleptic contribution. Liquid-liquid extraction using dichloromethane was found to be the optimal extraction method. A total of 21 furan compounds were identified, in which 5 were detected in brandy for the first time. Our quantitative results showed a positive correlation between the furan compound content and the aging time. Among them, ethyl 5-oxotetrahydro-2-furancarboxylate exhibited a very high odor activity value (1.64 < OAV < 179.53) and smoky aroma. Omission tests showed that the three furan compounds with an OAV > 1 made a significant difference to brandy. These findings bring a new perspective to the sensory and chemical characteristics of brandy.


Assuntos
Bebidas Alcoólicas , Compostos Orgânicos Voláteis , Cromatografia Gasosa-Espectrometria de Massas/métodos , Bebidas Alcoólicas/análise , Odorantes/análise , Sensação , Furanos/análise , Compostos Orgânicos Voláteis/análise , Olfatometria/métodos
9.
New Phytol ; 238(2): 904-915, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36683442

RESUMO

Using microscopy to investigate stomatal behaviour is common in plant physiology research. Manual inspection and measurement of stomatal pore features is low throughput, relies upon expert knowledge to record stomatal features accurately, requires significant researcher time and investment, and can represent a significant bottleneck to research pipelines. To alleviate this, we introduce StomaAI (SAI): a reliable, user-friendly and adaptable tool for stomatal pore and density measurements via the application of deep computer vision, which has been initially calibrated and deployed for the model plant Arabidopsis (dicot) and the crop plant barley (monocot grass). SAI is capable of producing measurements consistent with human experts and successfully reproduced conclusions of published datasets. SAI boosts the number of images that can be evaluated in a fraction of the time, so can obtain a more accurate representation of stomatal traits than is routine through manual measurement. An online demonstration of SAI is hosted at https://sai.aiml.team, and the full local application is publicly available for free on GitHub through https://github.com/xdynames/sai-app.


Assuntos
Arabidopsis , Humanos , Fenótipo , Computadores , Estômatos de Plantas/fisiologia
10.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7035-7049, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32750784

RESUMO

In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem. Specifically, we study two structured distillation schemes: i) pair-wise distillation that distills the pair-wise similarities by building a static graph; and ii) holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. Code is available at https://git.io/StructKD.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5632-5648, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36288227

RESUMO

In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.

12.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 669-680, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35077358

RESUMO

We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst (conditional convolutions for instance and panoptic segmentation). In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance. In contrast, we propose to attend to the instances with dynamic conditional convolutions. Instead of using instance-wise ROIs as inputs to the instance mask head of fixed weights, we design dynamic instance-aware mask heads, conditioned on the instances to be predicted. CondInst enjoys three advantages: 1) Instance and panoptic segmentation are unified into a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2) The elimination of the ROI cropping also significantly improves the output instance mask resolution. 3) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference time per instance and making the overall inference time less relevant to the number of instances. We demonstrate a simpler method that can achieve improved accuracy and inference speed on both instance and panoptic segmentation tasks. On the COCO dataset, we outperform a few state-of-the-art methods. We hope that CondInst can be a strong baseline for instance and panoptic segmentation. Code is available at: https://git.io/AdelaiDet.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6480-6494, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36197868

RESUMO

Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train a metric-depth prediction model that can generalize well to diverse scenes mainly due to limited training data. Thus, researchers have built large-scale relative depth datasets that are much easier to collect. However, existing relative depth estimation models often fail to recover accurate 3D scene shapes due to the unknown depth shift caused by training with the relative depth data. We tackle this problem here and attempt to estimate accurate scene shapes by training on large-scale relative depth data, and estimating the depth shift. To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. As the two modules are trained separately, we do not need strictly paired training data. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to improve training with relative depth annotation. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation. Code is available at: https://github.com/aim-uofa/depth/.

14.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5697-5711, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36279351

RESUMO

In this paper, we come up with a simple yet effective approach for instance segmentation on 3D point cloud with strong robustness. Previous top-performing methods for this task adopt a bottom-up strategy, which often involves various inefficient operations or complex pipelines, such as grouping over-segmented components, introducing heuristic post-processing steps, and designing complex loss functions. As a result, the inevitable variations of the instances sizes make it vulnerable and sensitive to the values of pre-defined hyper-parameters. To this end, we instead propose a novel pipeline that applies dynamic convolution to generate instance-aware parameters in response to the characteristics of the instances. The representation capability of the parameters is greatly improved by gathering homogeneous points that have identical semantic categories and close votes for the geometric centroids. Instances are then decoded via several simple convolution layers, where the parameters are generated depending on the input. In addition, to introduce a large context and maintain limited computational overheads, a light-weight transformer is built upon the bottleneck layer to capture the long-range dependencies. With the only post-processing step, non-maximum suppression (NMS), we demonstrate a simpler and more robust approach that achieves promising performance on various datasets: ScanNetV2, S3DIS, and PartNet. The consistent improvements on both voxel- and point-based architectures imply the effectiveness of the proposed method. Code is available at: https://git.io/DyCo3D.

15.
Foods ; 13(1)2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38201053

RESUMO

This work aimed to compare the aroma characteristics of representative brandies with different grades from Yantai (one of the Chinese core production areas) and Cognac and to establish relationships between sensory descriptors and chemical composition. Descriptive analysis was performed with a trained panel to obtain the sensory profiles. Forty-three aroma-active compounds were quantified by four different methodologies. A prediction model on the basis of partial least squares analysis was performed to identify candidate compounds that were unique to a certain group of brandies. The result showed that brandies from Yantai could be distinguished from Cognac brandies on the basis of spicy, dried fruit, floral, and fruity-like aromas, which were associated with an aromatic balance between concentrations of a set of compounds such as 5-methylfurfural, γ-nonalactone, and γ-dodecalactone. Meanwhile, brandy with different grades could be distinguished on the basis of compounds derived mostly during the aging process.

16.
Environ Sci Technol ; 56(22): 16428-16440, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36301735

RESUMO

Increasing CO2 emission has resulted in pressing climate and environmental issues. While abiotic and biotic processes mediating the fate of CO2 have been studied separately, their interactions and combined effects have been poorly understood. To explore this knowledge gap, an iron-reducing organism, Orenia metallireducens, was cultured under 18 conditions that systematically varied in headspace CO2 concentrations, ferric oxide loading, and dolomite (CaMg(CO3)2) availability. The results showed that abiotic and biotic processes interactively mediate CO2 acidification and sequestration through "chain reactions", with pH being the dominant variable. Specifically, dolomite alleviated CO2 stress on microbial activity, possibly via pH control that transforms the inhibitory CO2 to the more benign bicarbonate species. The microbial iron reduction further impacted pH via the competition between proton (H+) consumption during iron reduction and H+ generation from oxidization of the organic substrate. Under Fe(III)-rich conditions, microbial iron reduction increased pH, driving dissolved CO2 to form bicarbonate. Spectroscopic and microscopic analyses showed enhanced formation of siderite (FeCO3) under elevated CO2, supporting its incorporation into solids. The results of these CO2-microbe-mineral experiments provide insights into the synergistic abiotic and biotic processes that alleviate CO2 acidification and favor its sequestration, which can be instructive for practical applications (e.g., acidification remediation, CO2 sequestration, and modeling of carbon flux).


Assuntos
Compostos Férricos , Ferro , Compostos Férricos/química , Ferro/química , Dióxido de Carbono , Bicarbonatos , Carbonatos/química , Minerais , Oxirredução
17.
Environ Sci Pollut Res Int ; 29(56): 85547-85558, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35794332

RESUMO

Phthalates have been shown to have adverse effects on neurodevelopment, which may be gender-specific. However, the association between prenatal mixed exposure to phthalates and children's neurodevelopment remains inconsistent. We measured 15 prenatal serum phthalate levels and evaluated children's neurodevelopmental indicators using Gesell Developmental Schedule (GDS) (n = 750). Generalized linear regression was fitted to examine the association. Among boys, mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) had adverse effects on gross motor [odds ratio (OR): 7.38, 95% confidence interval (CI):1.42, 38.46]. For gross motor in boys, joint effect was discovered between mono-2-ethylhexyl phthalate (MEHP) and MEHHP. Moreover, synergistic effects were found for MEHP with vanadium and cadmium, and antagonistic effects for MEHP with magnesium, calcium, titanium, iron, copper, selenium, rubidium, and strontium. We did not find statistically significant relationships in girls. In the 1st trimester, adverse effects were identified between mono-2-ethyl-5-oxoyhexyl phthalate (MEOHP) and adaptation (P = 0.024), and monomethyl phthalate (MMP) with social area (P = 0.017). In the 2nd trimester, MEHHP had adverse effects on social area (P = 0.035). In summary, we found boys may be more vulnerable to the neurotoxicity than girls in gross motor, and we also discovered the detrimental effects of phthalates on children's neurodevelopment in the 1st and 2nd trimesters. Therefore, the supplementation of appropriate elements in the 1st and 2nd trimesters may help reduce the adverse effects of phthalates on children's neurodevelopment, especially among boys.


Assuntos
Poluentes Ambientais , Ácidos Ftálicos , Gravidez , Masculino , Criança , Feminino , Humanos , Estudos de Coortes , Coorte de Nascimento , China , Ácidos Ftálicos/toxicidade , Exposição Ambiental/análise
18.
IEEE Trans Image Process ; 31: 2529-2540, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35275820

RESUMO

The explanation for deep neural networks has drawn extensive attention in the deep learning community over the past few years. In this work, we study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Compared to iteration based saliency methods, single backward pass based saliency methods benefit from faster speed, and they are widely used in downstream visual tasks. Thus, we focus on single backward pass based methods. However, existing methods in this category struggle to successfully produce fine-grained saliency maps concentrating on specific target classes. That said, producing faithful saliency maps satisfying both target-selectiveness and fine-grainedness using a single backward pass is a challenging problem in the field. To mitigate this problem, we revisit the gradient flow inside the network, and find that the entangled semantics and original weights may disturb the propagation of target-relevant saliency. Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps. The proposed TSGB consists of two components, namely, TSGB-Conv and TSGB-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively. Extensive qualitative and quantitative experiments on the ImageNet and Pascal VOC datasets show that the proposed method achieves more accurate and reliable results than the other competitive methods. Code is available at https://github.com/123fxdx/CNNvisualizationTSGB.


Assuntos
Atenção , Redes Neurais de Computação , Semântica
19.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 697-709, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-31796387

RESUMO

Visual Question Answering (VQA) has attracted extensive research focus recently. Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA. In this article, we show such a massive training cost is indeed plague. In contrast, a fine-grained design of the learning paradigm can be extremely beneficial in terms of both training efficiency and model accuracy. In particular, we argue that there exist two essential and unexplored issues in the existing VQA training paradigm that randomly samples data in each epoch, namely, the "difficulty diversity" and the "label redundancy". Concretely, "difficulty diversity" refers to the varying difficulty levels of different question types, while "label redundancy" refers to the redundant and noisy labels contained in individual question type. To tackle these two issues, in this article we propose a fine-grained VQA learning paradigm with an actor-critic based learning agent, termed FG-A1C. Instead of using all training data from scratch, FG-A1C includes a learning agent that adaptively and intelligently schedules the most difficult question types in each training epoch. Subsequently, two curriculum learning based schemes are further designed to identify the most useful data to be learned within each inidividual question type. We conduct extensive experiments on the VQA2.0 and VQA-CP v2 datasets, which demonstrate the significant benefits of our approach. For instance, on VQA-CP v2, with less than 75 percent of the training data, our learning paradigms can help the model achieves better performance than using the whole dataset. Meanwhile, we also shows the effectivenesss of our method in guiding data labeling. Finally, the proposed paradigm can be seamlessly integrated with any cutting-edge VQA models, without modifying their structures.


Assuntos
Algoritmos , Humanos , Aprendizagem
20.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8048-8064, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34460364

RESUMO

End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency. More importantly, as there is little work on quantization of text spotting models, we quantize our models to improve the inference time of the proposed ABCNet v2. This can be valuable for real-time applications. Code and model are available at: https://git.io/AdelaiDet.


Assuntos
Algoritmos , Benchmarking
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA