Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 90
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
New Phytol ; 238(2): 904-915, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36683442

RESUMO

Using microscopy to investigate stomatal behaviour is common in plant physiology research. Manual inspection and measurement of stomatal pore features is low throughput, relies upon expert knowledge to record stomatal features accurately, requires significant researcher time and investment, and can represent a significant bottleneck to research pipelines. To alleviate this, we introduce StomaAI (SAI): a reliable, user-friendly and adaptable tool for stomatal pore and density measurements via the application of deep computer vision, which has been initially calibrated and deployed for the model plant Arabidopsis (dicot) and the crop plant barley (monocot grass). SAI is capable of producing measurements consistent with human experts and successfully reproduced conclusions of published datasets. SAI boosts the number of images that can be evaluated in a fraction of the time, so can obtain a more accurate representation of stomatal traits than is routine through manual measurement. An online demonstration of SAI is hosted at https://sai.aiml.team, and the full local application is publicly available for free on GitHub through https://github.com/xdynames/sai-app.


Assuntos
Arabidopsis , Humanos , Fenótipo , Computadores , Estômatos de Plantas/fisiologia
2.
Environ Sci Technol ; 56(22): 16428-16440, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36301735

RESUMO

Increasing CO2 emission has resulted in pressing climate and environmental issues. While abiotic and biotic processes mediating the fate of CO2 have been studied separately, their interactions and combined effects have been poorly understood. To explore this knowledge gap, an iron-reducing organism, Orenia metallireducens, was cultured under 18 conditions that systematically varied in headspace CO2 concentrations, ferric oxide loading, and dolomite (CaMg(CO3)2) availability. The results showed that abiotic and biotic processes interactively mediate CO2 acidification and sequestration through "chain reactions", with pH being the dominant variable. Specifically, dolomite alleviated CO2 stress on microbial activity, possibly via pH control that transforms the inhibitory CO2 to the more benign bicarbonate species. The microbial iron reduction further impacted pH via the competition between proton (H+) consumption during iron reduction and H+ generation from oxidization of the organic substrate. Under Fe(III)-rich conditions, microbial iron reduction increased pH, driving dissolved CO2 to form bicarbonate. Spectroscopic and microscopic analyses showed enhanced formation of siderite (FeCO3) under elevated CO2, supporting its incorporation into solids. The results of these CO2-microbe-mineral experiments provide insights into the synergistic abiotic and biotic processes that alleviate CO2 acidification and favor its sequestration, which can be instructive for practical applications (e.g., acidification remediation, CO2 sequestration, and modeling of carbon flux).


Assuntos
Compostos Férricos , Ferro , Compostos Férricos/química , Ferro/química , Dióxido de Carbono , Bicarbonatos , Carbonatos/química , Minerais , Oxirredução
3.
Environ Res ; 205: 112450, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-34861232

RESUMO

BACKGROUND: Impaired neurodevelopment of children has become a growing public concern; however, the associations between metals exposure and neurocognitive function have remained largely unknown. OBJECTIVES: We systematically evaluated the associations of multiple metals exposure during pregnancy and childhood on the neurodevelopment of children aged 2-3 years. METHODS: We measured 22 metals in the serum and urine among703 mother-child pairs from the Guangxi Birth Cohort Study. The neurocognitive development of children was assessed by the Gesell Development Diagnosis Scale (GDDS; Chinese version). Multiple linear regression models were used to evaluate the relationship between the metals (selected by elastic net regression) and the outcomes. The Bayesian kernel machine regression (BKMR) was used to evaluate the possible joint effect between the multiple metal mixture and the outcomes. RESULTS: Prenatal aluminum (Al) exposure was negatively associated with the fine motor developmental quotient (DQ) (ß = -1.545, 95%CI: 2.231, -0.859), adaption DQ (ß = -1.182, 95%CI: 1.632, -0.732), language DQ (ß = -1.284, 95% CI: 1.758, -0.809), and social DQ (ß = -1.729, 95% CI: 2.406, -1.052) in the multi-metal model. Prenatal cadmium (Cd) exposure was negatively associated with gross motor DQ (ß = -2.524, 95% CI: 4.060, -0.988), while postpartum Cd exposure was negatively associated with language DQ (ß = -1.678, 95% CI: 3.227, -0.129). In stratified analyses, infants of different sexes had different sensitivities to metal exposure, and neurobehavioral development was more significantly affected by metal exposure in the first and second trimester. BKMR analysis revealed a negative joint effect of the Al, Cd, and selenium (Se) on the language DQ score; postpartum Cd exposure played a major role in this relationship. CONCLUSION: Prenatal exposure to Al, Ba, Cd, molybdenum (Mo), lead (Pb), antimony (Sb), and strontium (Sr), and postpartum exposure to cobalt (Co), Cd, stannum (Sn), iron (Fe), nickel (Ni), and Se are associated with neurological development of infants. The first and second trimester might be the most sensitive period when metal exposure affects neurodevelopment.


Assuntos
Metais , Teorema de Bayes , Pré-Escolar , China , Estudos de Coortes , Feminino , Humanos , Lactente , Metais/toxicidade , Gravidez , Estudos Prospectivos
4.
Artigo em Inglês | MEDLINE | ID: mdl-38743546

RESUMO

In this article, we investigate self-supervised 3D scene flow estimation and class-agnostic motion prediction on point clouds. A realistic scene can be well modeled as a collection of rigidly moving parts, therefore its scene flow can be represented as a combination of rigid motion of these individual parts. Building upon this observation, we propose to generate pseudo scene flow labels for self-supervised learning through piecewise rigid motion estimation, in which the source point cloud is decomposed into local regions and each region is treated as rigid. By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels. To mitigate the impact of potential outliers on label generation, when solving the rigid registration for each region, we alternately perform three steps: establishing point correspondences, measuring the confidence for the correspondences, and updating the rigid transformation based on the correspondences and their confidence. As a result, confident correspondences will dominate label generation, and a validity mask will be derived for the generated pseudo labels. By using the pseudo labels together with their validity mask for supervision, models can be trained in a self-supervised manner. Extensive experiments on FlyingThings3D and KITTI datasets demonstrate that our method achieves new state-of-the-art performance in self-supervised scene flow learning, without any ground truth scene flow for supervision, even performing better than some supervised counterparts. Additionally, our method is further extended to class-agnostic motion prediction and significantly outperforms previous state-of-the-art self-supervised methods on nuScenes dataset.

5.
Artigo em Inglês | MEDLINE | ID: mdl-39150798

RESUMO

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. State-of-the-art (SoTA) monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method currently ranks the 1st on various zero-shot and non-zero-shot benchmarks for metric depth, affine-invariant-depth as well as surface-normal prediction, shown in Fig. 1. Notably, we surpassed the ultra-recent MarigoldDepth and DepthAnything on various depth benchmarks including NYUv2 and KITTI. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. These applications highlight the versatility of Metric3D v2 models as geometric foundation models. Our project page is at https://JUGGHM.github.io/Metric3Dv2.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14905-14919, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37672381

RESUMO

Medical image benchmarks for the segmentation of organs and tumors suffer from the partially labeling issue due to its intensive cost of labor and expertise. Current mainstream approaches follow the practice of one network solving one task. With this pipeline, not only the performance is limited by the typically small dataset of a single task, but also the computation cost linearly increases with the number of tasks. To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets. Specifically, TransDoDNet has a hybrid backbone that is composed of the convolutional neural network and Transformer. A dynamic head enables the network to accomplish multiple segmentation tasks flexibly. Unlike existing approaches that fix kernels after training, the kernels in the dynamic head are generated adaptively by the Transformer, which employs the self-attention mechanism to model long-range organ-wise dependencies and decodes the organ embedding that can represent each organ. We create a large-scale partially labeled Multi-Organ and Tumor Segmentation benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors on seven organ and tumor segmentation tasks. This study also provides a general 3D medical image segmentation model, which has been pre-trained on the large-scale MOTS benchmark and has demonstrated advanced performance over current predominant self-supervised learning methods.


Assuntos
Algoritmos , Neoplasias , Humanos , Neoplasias/diagnóstico por imagem , Benchmarking , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
7.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7035-7049, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32750784

RESUMO

In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem. Specifically, we study two structured distillation schemes: i) pair-wise distillation that distills the pair-wise similarities by building a static graph; and ii) holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. Code is available at https://git.io/StructKD.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 669-680, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35077358

RESUMO

We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst (conditional convolutions for instance and panoptic segmentation). In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance. In contrast, we propose to attend to the instances with dynamic conditional convolutions. Instead of using instance-wise ROIs as inputs to the instance mask head of fixed weights, we design dynamic instance-aware mask heads, conditioned on the instances to be predicted. CondInst enjoys three advantages: 1) Instance and panoptic segmentation are unified into a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2) The elimination of the ROI cropping also significantly improves the output instance mask resolution. 3) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference time per instance and making the overall inference time less relevant to the number of instances. We demonstrate a simpler method that can achieve improved accuracy and inference speed on both instance and panoptic segmentation tasks. On the COCO dataset, we outperform a few state-of-the-art methods. We hope that CondInst can be a strong baseline for instance and panoptic segmentation. Code is available at: https://git.io/AdelaiDet.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5697-5711, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36279351

RESUMO

In this paper, we come up with a simple yet effective approach for instance segmentation on 3D point cloud with strong robustness. Previous top-performing methods for this task adopt a bottom-up strategy, which often involves various inefficient operations or complex pipelines, such as grouping over-segmented components, introducing heuristic post-processing steps, and designing complex loss functions. As a result, the inevitable variations of the instances sizes make it vulnerable and sensitive to the values of pre-defined hyper-parameters. To this end, we instead propose a novel pipeline that applies dynamic convolution to generate instance-aware parameters in response to the characteristics of the instances. The representation capability of the parameters is greatly improved by gathering homogeneous points that have identical semantic categories and close votes for the geometric centroids. Instances are then decoded via several simple convolution layers, where the parameters are generated depending on the input. In addition, to introduce a large context and maintain limited computational overheads, a light-weight transformer is built upon the bottleneck layer to capture the long-range dependencies. With the only post-processing step, non-maximum suppression (NMS), we demonstrate a simpler and more robust approach that achieves promising performance on various datasets: ScanNetV2, S3DIS, and PartNet. The consistent improvements on both voxel- and point-based architectures imply the effectiveness of the proposed method. Code is available at: https://git.io/DyCo3D.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5632-5648, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36288227

RESUMO

In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12459-12473, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37167046

RESUMO

Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct quantization based on the pruned model. However, this strategy may ignore that the pruning and quantization would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations (i.e., pruning rates or bitwidths). Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address these issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS) for automatic loss-aware model compression. To this end, we consider the network pruning as a special case of quantization and provide a unified view for model pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations, where a high bitwidth value will be decomposed into the sum of a lowest bitwidth value and a series of re-assignment offsets. Relying on the single-path model, we introduce learnable binary gates to encode the choice of configurations and learn the binary gates and model parameters jointly. More importantly, the configuration search problem can be transformed into a subset selection problem, which helps to significantly reduce the optimization difficulty and computation cost. In this way, the compression configurations of each layer and the trade-off between pruning and quantization can be automatically determined. Extensive experiments on CIFAR-100 and ImageNet show that SBS significantly reduces computation cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6× Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.

12.
Food Chem ; 412: 135543, 2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-36724717

RESUMO

Furan compounds actively contribute to the characteristics of brandy. Herein, we have attempted to identify and quantify the furan compounds present in brandy using three different extraction methods combined with comprehensive two-dimensional gas chromatography and time-of-flight mass spectrometry. Threshold determination and omission experiments were carried out to verify their organoleptic contribution. Liquid-liquid extraction using dichloromethane was found to be the optimal extraction method. A total of 21 furan compounds were identified, in which 5 were detected in brandy for the first time. Our quantitative results showed a positive correlation between the furan compound content and the aging time. Among them, ethyl 5-oxotetrahydro-2-furancarboxylate exhibited a very high odor activity value (1.64 < OAV < 179.53) and smoky aroma. Omission tests showed that the three furan compounds with an OAV > 1 made a significant difference to brandy. These findings bring a new perspective to the sensory and chemical characteristics of brandy.


Assuntos
Bebidas Alcoólicas , Compostos Orgânicos Voláteis , Cromatografia Gasosa-Espectrometria de Massas/métodos , Bebidas Alcoólicas/análise , Odorantes/análise , Sensação , Furanos/análise , Compostos Orgânicos Voláteis/análise , Olfatometria/métodos
13.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6480-6494, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36197868

RESUMO

Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train a metric-depth prediction model that can generalize well to diverse scenes mainly due to limited training data. Thus, researchers have built large-scale relative depth datasets that are much easier to collect. However, existing relative depth estimation models often fail to recover accurate 3D scene shapes due to the unknown depth shift caused by training with the relative depth data. We tackle this problem here and attempt to estimate accurate scene shapes by training on large-scale relative depth data, and estimating the depth shift. To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. As the two modules are trained separately, we do not need strictly paired training data. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to improve training with relative depth annotation. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation. Code is available at: https://github.com/aim-uofa/depth/.

14.
Foods ; 13(1)2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38201053

RESUMO

This work aimed to compare the aroma characteristics of representative brandies with different grades from Yantai (one of the Chinese core production areas) and Cognac and to establish relationships between sensory descriptors and chemical composition. Descriptive analysis was performed with a trained panel to obtain the sensory profiles. Forty-three aroma-active compounds were quantified by four different methodologies. A prediction model on the basis of partial least squares analysis was performed to identify candidate compounds that were unique to a certain group of brandies. The result showed that brandies from Yantai could be distinguished from Cognac brandies on the basis of spicy, dried fruit, floral, and fruity-like aromas, which were associated with an aromatic balance between concentrations of a set of compounds such as 5-methylfurfural, γ-nonalactone, and γ-dodecalactone. Meanwhile, brandy with different grades could be distinguished on the basis of compounds derived mostly during the aging process.

15.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15665-15679, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37669204

RESUMO

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms.

16.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 1922-1933, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33074804

RESUMO

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications. Recently one-stage methods have gained much attention over two-stage approaches due to their simpler design and competitive performance. Here we propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to other dense prediction problems such as semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating the intersection over union (IoU) scores during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at: git.io/AdelaiDet.

17.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 242-255, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32750793

RESUMO

We show that existing upsampling operators in convolutional networks can be unified using the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can often recover boundary details considerably better than other upsampling operators such as bilinear interpolation. By viewing the indices as a function of the feature map, we introduce the concept of 'learning to index', and present a novel index-guided encoder-decoder framework where indices are learned adaptively from data and are used to guide downsampling and upsampling stages, without extra training supervision. At the core of this framework is a new learnable module, termed Index Network (IndexNet), which dynamically generates indices conditioned on the feature map. IndexNet can be used as a plug-in, applicable to almost all convolutional networks that have coupled downsampling and upsampling stages, enabling the networks to dynamically capture variations of local patterns. In particular, we instantiate and investigate five families of IndexNet. We highlight their superiority in delivering spatial information over other upsampling operators with experiments on synthetic data, and demonstrate their effectiveness on four dense prediction tasks, including image matting, image denoising, semantic segmentation, and monocular depth estimation. Code and models are available at https://git.io/IndexNet.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 7266-7281, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34242162

RESUMO

Text spotting in natural scene images is of great importance for many image understanding tasks. It includes two sub-tasks: text detection and recognition. In this work, we propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping. The overall framework is trained end-to-end and is able to spot text of arbitrary shapes. The convolutional features are calculated only once and shared by both the detection and recognition modules. Through multi-task training, the learned features become more discriminative and improve the overall performance. By employing a 2D attention model in word recognition, the issue of text irregularity is robustly addressed. The attention model provides the spatial location for each character, which not only helps local feature extraction in word recognition, but also indicates an orientation angle to refine text localization. Experiments demonstrate that our proposed method can achieve state-of-the-art performance on several commonly used text spotting benchmarks, including both regular and irregular datasets. Extensive ablation experiments are performed to verify the effectiveness of each module design.


Assuntos
Redes Neurais de Computação , Algoritmos
19.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 7282-7295, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34270413

RESUMO

Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress in the evaluation metrics such as the pixel-wise relative error, most methods neglect the geometric constraints in the 3D space. In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, namely, virtual normal directions determined by randomly sampled three points in the reconstructed 3D space, we significantly improve the accuracy and robustness of monocular depth estimation. Importantly, the virtual normal loss can not only improve the performance of learning metric depth, but also disentangle the scale information and enrich the model with better shape information. Therefore, when not having access to absolute metric depth training data, we can use virtual normal to learn a robust affine-invariant depth generated on diverse scenes. Our experiments demonstrate state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI. From the high-quality predicted depth, we are now able to recover good 3D structures of the scene such as the point cloud and surface normal directly, eliminating the necessity of relying on additional models as was previously done. To demonstrate the excellent generalization capability of learning affine-invariant depth on diverse data with the virtual normal loss, we construct a large-scale and diverse dataset for training affine-invariant depth, termed Diverse Scene Depth dataset (DiverseDepth), and test on five datasets with the zero-shot test setting. Code is available at: https://git.io/Depth.

20.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8587-8601, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34516372

RESUMO

Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the "detect-then-segment" strategy (e.g., Mask R-CNN), or predict embedding vectors first then cluster pixels into individual instances. In this paper, we view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location. With this notion, we propose segmenting objects by locations (SOLO), a simple, direct, and fast framework for instance segmentation with strong performance. We derive a few SOLO variants (e.g., Vanilla SOLO, Decoupled SOLO, Dynamic SOLO) following the basic principle. Our method directly maps a raw input image to the desired object categories and instance masks, eliminating the need for the grouping post-processing or the bounding box detection. Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy, while being considerably simpler than the existing methods. Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation. We further demonstrate the flexibility and high-quality segmentation of SOLO by extending it to perform one-stage instance-level image matting. Code is available at: https://git.io/AdelaiDet.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA