Pesquisa | BVS Bolivia

1.

Self-Supervised 3D Scene Flow Estimation and Motion Prediction using Local Rigidity Prior.

Li, Ruibo; Zhang, Chi; Wang, Zhe; Shen, Chunhua; Lin, Guosheng.

IEEE Trans Pattern Anal Mach Intell ; PP2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38743546

RESUMO

In this article, we investigate self-supervised 3D scene flow estimation and class-agnostic motion prediction on point clouds. A realistic scene can be well modeled as a collection of rigidly moving parts, therefore its scene flow can be represented as a combination of rigid motion of these individual parts. Building upon this observation, we propose to generate pseudo scene flow labels for self-supervised learning through piecewise rigid motion estimation, in which the source point cloud is decomposed into local regions and each region is treated as rigid. By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels. To mitigate the impact of potential outliers on label generation, when solving the rigid registration for each region, we alternately perform three steps: establishing point correspondences, measuring the confidence for the correspondences, and updating the rigid transformation based on the correspondences and their confidence. As a result, confident correspondences will dominate label generation, and a validity mask will be derived for the generated pseudo labels. By using the pseudo labels together with their validity mask for supervision, models can be trained in a self-supervised manner. Extensive experiments on FlyingThings3D and KITTI datasets demonstrate that our method achieves new state-of-the-art performance in self-supervised scene flow learning, without any ground truth scene flow for supervision, even performing better than some supervised counterparts. Additionally, our method is further extended to class-agnostic motion prediction and significantly outperforms previous state-of-the-art self-supervised methods on nuScenes dataset.

2.

SPTS v2: Single-Point Scene Text Spotting.

Liu, Yuliang; Zhang, Jiaxin; Peng, Dezhi; Huang, Mingxin; Wang, Xinyu; Tang, Jingqun; Huang, Can; Lin, Dahua; Shen, Chunhua; Bai, Xiang; Jin, Lianwen.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 15665-15679, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37669204

RESUMO

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms.

3.

Learning From Partially Labeled Data for Multi-Organ and Tumor Segmentation.

Xie, Yutong; Zhang, Jianpeng; Xia, Yong; Shen, Chunhua.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 14905-14919, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37672381

RESUMO

Medical image benchmarks for the segmentation of organs and tumors suffer from the partially labeling issue due to its intensive cost of labor and expertise. Current mainstream approaches follow the practice of one network solving one task. With this pipeline, not only the performance is limited by the typically small dataset of a single task, but also the computation cost linearly increases with the number of tasks. To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets. Specifically, TransDoDNet has a hybrid backbone that is composed of the convolutional neural network and Transformer. A dynamic head enables the network to accomplish multiple segmentation tasks flexibly. Unlike existing approaches that fix kernels after training, the kernels in the dynamic head are generated adaptively by the Transformer, which employs the self-attention mechanism to model long-range organ-wise dependencies and decodes the organ embedding that can represent each organ. We create a large-scale partially labeled Multi-Organ and Tumor Segmentation benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors on seven organ and tumor segmentation tasks. This study also provides a general 3D medical image segmentation model, which has been pre-trained on the large-scale MOTS benchmark and has demonstrated advanced performance over current predominant self-supervised learning methods.

Assuntos

Algoritmos , Neoplasias , Humanos , Neoplasias/diagnóstico por imagem , Benchmarking , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador

4.

Single-Path Bit Sharing for Automatic Loss-Aware Model Compression.

Liu, Jing; Zhuang, Bohan; Chen, Peng; Shen, Chunhua; Cai, Jianfei; Tan, Mingkui.

IEEE Trans Pattern Anal Mach Intell ; 45(10): 12459-12473, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37167046

RESUMO

Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct quantization based on the pruned model. However, this strategy may ignore that the pruning and quantization would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations (i.e., pruning rates or bitwidths). Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address these issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS) for automatic loss-aware model compression. To this end, we consider the network pruning as a special case of quantization and provide a unified view for model pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations, where a high bitwidth value will be decomposed into the sum of a lowest bitwidth value and a series of re-assignment offsets. Relying on the single-path model, we introduce learnable binary gates to encode the choice of configurations and learn the binary gates and model parameters jointly. More importantly, the configuration search problem can be transformed into a subset selection problem, which helps to significantly reduce the optimization difficulty and computation cost. In this way, the compression configurations of each layer and the trade-off between pruning and quantization can be automatically determined. Extensive experiments on CIFAR-100 and ImageNet show that SBS significantly reduces computation cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6× Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.

5.

Identification, quantitation and organoleptic contributions of furan compounds in brandy.

Yuan, Xiaomeng; Zhou, Junmeng; Zhang, Baochun; Shen, Chunhua; Yu, Lina; Gong, Chuanbin; Xu, Yan; Tang, Ke.

Food Chem ; 412: 135543, 2023 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-36724717

RESUMO

Furan compounds actively contribute to the characteristics of brandy. Herein, we have attempted to identify and quantify the furan compounds present in brandy using three different extraction methods combined with comprehensive two-dimensional gas chromatography and time-of-flight mass spectrometry. Threshold determination and omission experiments were carried out to verify their organoleptic contribution. Liquid-liquid extraction using dichloromethane was found to be the optimal extraction method. A total of 21 furan compounds were identified, in which 5 were detected in brandy for the first time. Our quantitative results showed a positive correlation between the furan compound content and the aging time. Among them, ethyl 5-oxotetrahydro-2-furancarboxylate exhibited a very high odor activity value (1.64 < OAV < 179.53) and smoky aroma. Omission tests showed that the three furan compounds with an OAV > 1 made a significant difference to brandy. These findings bring a new perspective to the sensory and chemical characteristics of brandy.

Assuntos

Bebidas Alcoólicas , Compostos Orgânicos Voláteis , Cromatografia Gasosa-Espectrometria de Massas/métodos , Bebidas Alcoólicas/análise , Odorantes/análise , Sensação , Furanos/análise , Compostos Orgânicos Voláteis/análise , Olfatometria/métodos

6.

StomaAI: an efficient and user-friendly tool for measurement of stomatal pores and density using deep computer vision.

Sai, Na; Bockman, James Paul; Chen, Hao; Watson-Haigh, Nathan; Xu, Bo; Feng, Xueying; Piechatzek, Adriane; Shen, Chunhua; Gilliham, Matthew.

New Phytol ; 238(2): 904-915, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36683442

RESUMO

Using microscopy to investigate stomatal behaviour is common in plant physiology research. Manual inspection and measurement of stomatal pore features is low throughput, relies upon expert knowledge to record stomatal features accurately, requires significant researcher time and investment, and can represent a significant bottleneck to research pipelines. To alleviate this, we introduce StomaAI (SAI): a reliable, user-friendly and adaptable tool for stomatal pore and density measurements via the application of deep computer vision, which has been initially calibrated and deployed for the model plant Arabidopsis (dicot) and the crop plant barley (monocot grass). SAI is capable of producing measurements consistent with human experts and successfully reproduced conclusions of published datasets. SAI boosts the number of images that can be evaluated in a fraction of the time, so can obtain a more accurate representation of stomatal traits than is routine through manual measurement. An online demonstration of SAI is hosted at https://sai.aiml.team, and the full local application is publicly available for free on GitHub through https://github.com/xdynames/sai-app.

Assuntos

Arabidopsis , Humanos , Fenótipo , Computadores , Estômatos de Plantas/fisiologia

7.

Structured Knowledge Distillation for Dense Prediction.

Liu, Yifan; Shu, Changyong; Wang, Jingdong; Shen, Chunhua.

IEEE Trans Pattern Anal Mach Intell ; 45(6): 7035-7049, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-32750784

RESUMO

In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem. Specifically, we study two structured distillation schemes: i) pair-wise distillation that distills the pair-wise similarities by building a static graph; and ii) holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. Code is available at https://git.io/StructKD.

8.

Instance and Panoptic Segmentation Using Conditional Convolutions.

Tian, Zhi; Zhang, Bowen; Chen, Hao; Shen, Chunhua.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 669-680, 2023 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-35077358

RESUMO

We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst (conditional convolutions for instance and panoptic segmentation). In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance. In contrast, we propose to attend to the instances with dynamic conditional convolutions. Instead of using instance-wise ROIs as inputs to the instance mask head of fixed weights, we design dynamic instance-aware mask heads, conditioned on the instances to be predicted. CondInst enjoys three advantages: 1) Instance and panoptic segmentation are unified into a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2) The elimination of the ROI cropping also significantly improves the output instance mask resolution. 3) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference time per instance and making the overall inference time less relevant to the number of instances. We demonstrate a simpler method that can achieve improved accuracy and inference speed on both instance and panoptic segmentation tasks. On the COCO dataset, we outperform a few state-of-the-art methods. We hope that CondInst can be a strong baseline for instance and panoptic segmentation. Code is available at: https://git.io/AdelaiDet.

9.

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning.

Zhang, Chi; Cai, Yujun; Lin, Guosheng; Shen, Chunhua.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 5632-5648, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36288227

RESUMO

In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.

10.

Dynamic Convolution for 3D Point Cloud Instance Segmentation.

He, Tong; Shen, Chunhua; van den Hengel, Anton.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 5697-5711, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36279351

RESUMO

In this paper, we come up with a simple yet effective approach for instance segmentation on 3D point cloud with strong robustness. Previous top-performing methods for this task adopt a bottom-up strategy, which often involves various inefficient operations or complex pipelines, such as grouping over-segmented components, introducing heuristic post-processing steps, and designing complex loss functions. As a result, the inevitable variations of the instances sizes make it vulnerable and sensitive to the values of pre-defined hyper-parameters. To this end, we instead propose a novel pipeline that applies dynamic convolution to generate instance-aware parameters in response to the characteristics of the instances. The representation capability of the parameters is greatly improved by gathering homogeneous points that have identical semantic categories and close votes for the geometric centroids. Instances are then decoded via several simple convolution layers, where the parameters are generated depending on the input. In addition, to introduce a large context and maintain limited computational overheads, a light-weight transformer is built upon the bottleneck layer to capture the long-range dependencies. With the only post-processing step, non-maximum suppression (NMS), we demonstrate a simpler and more robust approach that achieves promising performance on various datasets: ScanNetV2, S3DIS, and PartNet. The consistent improvements on both voxel- and point-based architectures imply the effectiveness of the proposed method. Code is available at: https://git.io/DyCo3D.

11.

Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image.

Yin, Wei; Zhang, Jianming; Wang, Oliver; Niklaus, Simon; Chen, Simon; Liu, Yifan; Shen, Chunhua.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 6480-6494, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36197868

RESUMO

Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train a metric-depth prediction model that can generalize well to diverse scenes mainly due to limited training data. Thus, researchers have built large-scale relative depth datasets that are much easier to collect. However, existing relative depth estimation models often fail to recover accurate 3D scene shapes due to the unknown depth shift caused by training with the relative depth data. We tackle this problem here and attempt to estimate accurate scene shapes by training on large-scale relative depth data, and estimating the depth shift. To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. As the two modules are trained separately, we do not need strictly paired training data. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to improve training with relative depth annotation. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation. Code is available at: https://github.com/aim-uofa/depth/.

12.

Chemosensory Characteristics of Brandies from Chinese Core Production Area and First Insights into Their Differences from Cognac.

Ma, Yue; Li, Yuanyi; Zhang, Baochun; Shen, Chunhua; Yu, Lina; Xu, Yan; Tang, Ke.

Foods ; 13(1)2023 Dec 20.

Artigo em Inglês | MEDLINE | ID: mdl-38201053

RESUMO

This work aimed to compare the aroma characteristics of representative brandies with different grades from Yantai (one of the Chinese core production areas) and Cognac and to establish relationships between sensory descriptors and chemical composition. Descriptive analysis was performed with a trained panel to obtain the sensory profiles. Forty-three aroma-active compounds were quantified by four different methodologies. A prediction model on the basis of partial least squares analysis was performed to identify candidate compounds that were unique to a certain group of brandies. The result showed that brandies from Yantai could be distinguished from Cognac brandies on the basis of spicy, dried fruit, floral, and fruity-like aromas, which were associated with an aromatic balance between concentrations of a set of compounds such as 5-methylfurfural, Î³-nonalactone, and Î³-dodecalactone. Meanwhile, brandy with different grades could be distinguished on the basis of compounds derived mostly during the aging process.

13.

Carbonate Minerals and Dissimilatory Iron-Reducing Organisms Trigger Synergistic Abiotic and Biotic Chain Reactions under Elevated CO₂ Concentration.

Li, Shuyi; Feng, Qi; Liu, Juan; He, Yu; Shi, Liang; Boyanov, Maxim I; O'Loughlin, Edward J; Kemner, Kenneth M; Sanford, Robert A; Shao, Hongbo; He, Xiao; Sheng, Anxu; Cheng, Hang; Shen, Chunhua; Tu, Wenmao; Dong, Yiran.

Environ Sci Technol ; 56(22): 16428-16440, 2022 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-36301735

RESUMO

Increasing CO2 emission has resulted in pressing climate and environmental issues. While abiotic and biotic processes mediating the fate of CO2 have been studied separately, their interactions and combined effects have been poorly understood. To explore this knowledge gap, an iron-reducing organism, Orenia metallireducens, was cultured under 18 conditions that systematically varied in headspace CO2 concentrations, ferric oxide loading, and dolomite (CaMg(CO3)2) availability. The results showed that abiotic and biotic processes interactively mediate CO2 acidification and sequestration through "chain reactions", with pH being the dominant variable. Specifically, dolomite alleviated CO2 stress on microbial activity, possibly via pH control that transforms the inhibitory CO2 to the more benign bicarbonate species. The microbial iron reduction further impacted pH via the competition between proton (H+) consumption during iron reduction and H+ generation from oxidization of the organic substrate. Under Fe(III)-rich conditions, microbial iron reduction increased pH, driving dissolved CO2 to form bicarbonate. Spectroscopic and microscopic analyses showed enhanced formation of siderite (FeCO3) under elevated CO2, supporting its incorporation into solids. The results of these CO2-microbe-mineral experiments provide insights into the synergistic abiotic and biotic processes that alleviate CO2 acidification and favor its sequestration, which can be instructive for practical applications (e.g., acidification remediation, CO2 sequestration, and modeling of carbon flux).

Assuntos

Compostos Férricos , Ferro , Compostos Férricos/química , Ferro/química , Dióxido de Carbono , Bicarbonatos , Carbonatos/química , Minerais , Oxirredução

14.

Gender-specific effects of prenatal mixed exposure to serum phthalates on neurodevelopment of children aged 2-3 years:the Guangxi Birth Cohort Study.

Zheng, Yuan; Li, Longman; Cheng, Hong; Huang, Shengzhu; Feng, Xiuming; Huang, Lulu; Wei, Luyun; Cao, Dehao; Wang, Sida; Tian, Long; Tang, Weijun; He, Caitong; Shen, Chunhua; Luo, Bangzhu; Zhu, Maoling; Liang, Tao; Pang, Baohong; Li, Mujun; Liu, Chaoqun; Chen, Xing; Wang, Fei; Mo, Zengnan; Yang, Xiaobo.

Environ Sci Pollut Res Int ; 29(56): 85547-85558, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-35794332

RESUMO

Phthalates have been shown to have adverse effects on neurodevelopment, which may be gender-specific. However, the association between prenatal mixed exposure to phthalates and children's neurodevelopment remains inconsistent. We measured 15 prenatal serum phthalate levels and evaluated children's neurodevelopmental indicators using Gesell Developmental Schedule (GDS) (n = 750). Generalized linear regression was fitted to examine the association. Among boys, mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) had adverse effects on gross motor [odds ratio (OR): 7.38, 95% confidence interval (CI):1.42, 38.46]. For gross motor in boys, joint effect was discovered between mono-2-ethylhexyl phthalate (MEHP) and MEHHP. Moreover, synergistic effects were found for MEHP with vanadium and cadmium, and antagonistic effects for MEHP with magnesium, calcium, titanium, iron, copper, selenium, rubidium, and strontium. We did not find statistically significant relationships in girls. In the 1st trimester, adverse effects were identified between mono-2-ethyl-5-oxoyhexyl phthalate (MEOHP) and adaptation (P = 0.024), and monomethyl phthalate (MMP) with social area (P = 0.017). In the 2nd trimester, MEHHP had adverse effects on social area (P = 0.035). In summary, we found boys may be more vulnerable to the neurotoxicity than girls in gross motor, and we also discovered the detrimental effects of phthalates on children's neurodevelopment in the 1st and 2nd trimesters. Therefore, the supplementation of appropriate elements in the 1st and 2nd trimesters may help reduce the adverse effects of phthalates on children's neurodevelopment, especially among boys.

Assuntos

Poluentes Ambientais , Ácidos Ftálicos , Gravidez , Masculino , Criança , Feminino , Humanos , Estudos de Coortes , Coorte de Nascimento , China , Ácidos Ftálicos/toxicidade , Exposição Ambiental/análise

15.

TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency.

Cheng, Lin; Fang, Pengfei; Liang, Yanjie; Zhang, Liao; Shen, Chunhua; Wang, Hanzi.

IEEE Trans Image Process ; 31: 2529-2540, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35275820

RESUMO

The explanation for deep neural networks has drawn extensive attention in the deep learning community over the past few years. In this work, we study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Compared to iteration based saliency methods, single backward pass based saliency methods benefit from faster speed, and they are widely used in downstream visual tasks. Thus, we focus on single backward pass based methods. However, existing methods in this category struggle to successfully produce fine-grained saliency maps concentrating on specific target classes. That said, producing faithful saliency maps satisfying both target-selectiveness and fine-grainedness using a single backward pass is a challenging problem in the field. To mitigate this problem, we revisit the gradient flow inside the network, and find that the entangled semantics and original weights may disturb the propagation of target-relevant saliency. Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps. The proposed TSGB consists of two components, namely, TSGB-Conv and TSGB-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively. Extensive qualitative and quantitative experiments on the ImageNet and Pascal VOC datasets show that the proposed method achieves more accurate and reliable results than the other competitive methods. Code is available at https://github.com/123fxdx/CNNvisualizationTSGB.

Assuntos

Atenção , Redes Neurais de Computação , Semântica

16.

Towards End-to-End Text Spotting in Natural Scenes.

Wang, Peng; Li, Hui; Shen, Chunhua.

IEEE Trans Pattern Anal Mach Intell ; 44(10): 7266-7281, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-34242162

RESUMO

Text spotting in natural scene images is of great importance for many image understanding tasks. It includes two sub-tasks: text detection and recognition. In this work, we propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping. The overall framework is trained end-to-end and is able to spot text of arbitrary shapes. The convolutional features are calculated only once and shared by both the detection and recognition modules. Through multi-task training, the learned features become more discriminative and improve the overall performance. By employing a 2D attention model in word recognition, the issue of text irregularity is robustly addressed. The attention model provides the spatial location for each character, which not only helps local feature extraction in word recognition, but also indicates an orientation angle to refine text localization. Experiments demonstrate that our proposed method can achieve state-of-the-art performance on several commonly used text spotting benchmarks, including both regular and irregular datasets. Extensive ablation experiments are performed to verify the effectiveness of each module design.

Assuntos

Redes Neurais de Computação , Algoritmos

17.

FCOS: A Simple and Strong Anchor-Free Object Detector.

Tian, Zhi; Shen, Chunhua; Chen, Hao; He, Tong.

IEEE Trans Pattern Anal Mach Intell ; 44(4): 1922-1933, 2022 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-33074804

RESUMO

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications. Recently one-stage methods have gained much attention over two-stage approaches due to their simpler design and competitive performance. Here we propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to other dense prediction problems such as semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating the intersection over union (IoU) scores during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at: git.io/AdelaiDet.

18.

Association of both prenatal and early childhood multiple metals exposure with neurodevelopment in infant: A prospective cohort study.

Liu, Chaoqun; Huang, Lulu; Huang, Shengzhu; Wei, Luyun; Cao, Dehao; Zan, Gaohui; Tan, Yanli; Wang, Sida; Yang, Minjing; Tian, Long; Tang, Weijun; He, Caitong; Shen, Chunhua; Luo, Bangzhu; Zhu, Maoling; Liang, Tao; Pang, Baohong; Li, Mujun; Mo, Zengnan; Yang, Xiaobo.

Environ Res ; 205: 112450, 2022 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-34861232

RESUMO

BACKGROUND: Impaired neurodevelopment of children has become a growing public concern; however, the associations between metals exposure and neurocognitive function have remained largely unknown. OBJECTIVES: We systematically evaluated the associations of multiple metals exposure during pregnancy and childhood on the neurodevelopment of children aged 2-3 years. METHODS: We measured 22 metals in the serum and urine among703 mother-child pairs from the Guangxi Birth Cohort Study. The neurocognitive development of children was assessed by the Gesell Development Diagnosis Scale (GDDS; Chinese version). Multiple linear regression models were used to evaluate the relationship between the metals (selected by elastic net regression) and the outcomes. The Bayesian kernel machine regression (BKMR) was used to evaluate the possible joint effect between the multiple metal mixture and the outcomes. RESULTS: Prenatal aluminum (Al) exposure was negatively associated with the fine motor developmental quotient (DQ) (ß = -1.545, 95%CI: 2.231, -0.859), adaption DQ (ß = -1.182, 95%CI: 1.632, -0.732), language DQ (ß = -1.284, 95% CI: 1.758, -0.809), and social DQ (ß = -1.729, 95% CI: 2.406, -1.052) in the multi-metal model. Prenatal cadmium (Cd) exposure was negatively associated with gross motor DQ (ß = -2.524, 95% CI: 4.060, -0.988), while postpartum Cd exposure was negatively associated with language DQ (ß = -1.678, 95% CI: 3.227, -0.129). In stratified analyses, infants of different sexes had different sensitivities to metal exposure, and neurobehavioral development was more significantly affected by metal exposure in the first and second trimester. BKMR analysis revealed a negative joint effect of the Al, Cd, and selenium (Se) on the language DQ score; postpartum Cd exposure played a major role in this relationship. CONCLUSION: Prenatal exposure to Al, Ba, Cd, molybdenum (Mo), lead (Pb), antimony (Sb), and strontium (Sr), and postpartum exposure to cobalt (Co), Cd, stannum (Sn), iron (Fe), nickel (Ni), and Se are associated with neurological development of infants. The first and second trimester might be the most sensitive period when metal exposure affects neurodevelopment.

Assuntos

Metais , Teorema de Bayes , Pré-Escolar , China , Estudos de Coortes , Feminino , Humanos , Lactente , Metais/toxicidade , Gravidez , Estudos Prospectivos

19.

Index Networks.

Lu, Hao; Dai, Yutong; Shen, Chunhua; Xu, Songcen.

IEEE Trans Pattern Anal Mach Intell ; 44(1): 242-255, 2022 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-32750793

RESUMO

We show that existing upsampling operators in convolutional networks can be unified using the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can often recover boundary details considerably better than other upsampling operators such as bilinear interpolation. By viewing the indices as a function of the feature map, we introduce the concept of 'learning to index', and present a novel index-guided encoder-decoder framework where indices are learned adaptively from data and are used to guide downsampling and upsampling stages, without extra training supervision. At the core of this framework is a new learnable module, termed Index Network (IndexNet), which dynamically generates indices conditioned on the feature map. IndexNet can be used as a plug-in, applicable to almost all convolutional networks that have coupled downsampling and upsampling stages, enabling the networks to dynamically capture variations of local patterns. In particular, we instantiate and investigate five families of IndexNet. We highlight their superiority in delivering spatial information over other upsampling operators with experiments on synthetic data, and demonstrate their effectiveness on four dense prediction tasks, including image matting, image denoising, semantic segmentation, and monocular depth estimation. Code and models are available at https://git.io/IndexNet.

20.

Improving Generative Adversarial Networks With Local Coordinate Coding.

Cao, Jiezhang; Guo, Yong; Wu, Qingyao; Shen, Chunhua; Huang, Junzhou; Tan, Mingkui.

IEEE Trans Pattern Anal Mach Intell ; 44(1): 211-227, 2022 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-32750833

RESUMO

Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GAN methods. In this paper, rather than sampling from the predefined prior distribution, we propose a GAN model with local coordinate coding (LCC), termed LCCGAN, to improve the performance of the image generation. First, we propose an LCC sampling method in LCCGAN to sample meaningful points from the latent manifold. With the LCC sampling method, we can explicitly exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an improved version, namely LCCGAN++, by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN and LCCGAN++ and prove that a low-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on several benchmark datasets demonstrate the superiority of the proposed method over existing GAN methods.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA