Pesquisa | Portal Regional da BVS

1.

Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment.

Zeng, Kaipeng; Yang, Bo; Zhao, Xin; Zhang, Yu; Nie, Fan; Yang, Xiaokang; Jin, Yaohui; Xu, Yanyan.

J Cheminform ; 16(1): 80, 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39010144

RESUMO

MOTIVATION: Retrosynthesis planning poses a formidable challenge in the organic chemical industry, particularly in pharmaceuticals. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chemical knowledge dependency. RESULTS: This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules. Based on the fact that the majority of molecule structures remain unchanged during a chemical reaction, we propose a simple yet effective SMILES alignment technique to facilitate the reuse of unchanged structures for reactant generation. Extensive experiments show that our method substantially outperforms state-of-the-art template-free and semi-template-based approaches. Importantly, our template-free method achieves effectiveness comparable to, or even surpasses, established powerful template-based methods. SCIENTIFIC CONTRIBUTION: We present a novel graph-to-sequence template-free retrosynthesis prediction pipeline that overcomes the limitations of Transformer-based methods in molecular representation learning and insufficient utilization of chemical information. We propose an unsupervised learning mechanism for establishing product-atom correspondence with reactant SMILES tokens, achieving even better results than supervised SMILES alignment methods. Extensive experiments demonstrate that UAlign significantly outperforms state-of-the-art template-free methods and rivals or surpasses template-based approaches, with up to 5% (top-5) and 5.4% (top-10) increased accuracy over the strongest baseline.

2.

Machine Learning-Based Prediction of Large-for-Gestational Age Infants in Mothers with Gestational Diabetes Mellitus.

Kang, Mei; Zhu, Chengguang; Lai, Mengyu; Weng, Jianrong; Zhuang, Yan; He, Huichen; Qiu, Yan; Wu, Yixia; Qi, Zhangxuan; Zhang, Weixia; Xu, Xianming; Zhu, Yanhong; Wang, Yufan; Yang, Xiaokang.

J Clin Endocrinol Metab ; 2024 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-39011974

RESUMO

CONTEXT: Large-for-gestational-age (LGA), one of the most common complications of gestational diabetes mellitus (GDM), has become a global concern. The predictive performance of common continuous glucose monitoring (CGM) metrics for LGA is limited. OBJECTIVE: We aimed to develop and validate an artificial intelligence (AI) based model to determine the probability of women with GDM giving birth to LGA infants during pregnancy using CGM measurements together with demographic data and metabolic indicators. METHODS: A total of 371 women with GDM from a prospective cohort at a university hospital were included. CGM was performed during 20-34 gestational weeks, and glycemic fluctuations were evaluated and visualized in women with GDM who gave birth to LGA and non-LGA infants. A convolutional neural network (CNN)-based fusion model was developed to predict LGA. Comparisons among the novel fusion model and three conventional models were made using the area under the receiver-operating characteristic curve (AUCROC) and accuracy. RESULTS: Overall, 76 (20.5%) out of 371 GDM women developed LGA neonates. The visualized 24-h glucose profiles differed at midmorning. This difference was consistent among subgroups categorized by pregestational BMI, therapeutic protocol and CGM administration period. The AI based fusion prediction model using 24-h CGM data and 15 clinical variables for LGA prediction (AUCROC 0.852, 95% CI 0.680-0.966, accuracy 84.4%) showed superior discriminative power compared with the three classic models. CONCLUSIONS: We demonstrated better performance in predicting LGA infants among women with GDM using the AI based fusion model. The characteristics of the CGM profiles allowed us to determine the appropriate window for intervention.

3.

Sparsely-Supervised Object Tracking.

Zheng, Jilai; Li, Wenxi; Ma, Chao; Yang, Xiaokang.

IEEE Trans Image Process ; 33: 3470-3485, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38809731

RESUMO

Recent years have witnessed the incredible performance boost of data-driven deep visual object trackers. Despite the success, these trackers require millions of sequential manual labels on videos for supervised training, implying the heavy burden of human annotating. This raises a crucial question: how to train a powerful tracker from abundant videos using limited manual annotations? In this paper, we challenge the conventional belief that frame-by-frame labeling is indispensable, and show that providing a small number of annotated bounding boxes in each video is sufficient for training a strong tracker. To facilitate that, we design a novel SParsely-supervised Object Tracking (SPOT) framework. It regards the sparsely annotated boxes as anchors and progressively explores in the temporal span to discover unlabeled target snapshots. Under the teacher-student paradigm, SPOT leverages the unique transitive consistency inherent in the tracking task as supervision, extracting knowledge from both anchor snapshots and unlabeled target snapshots. We also utilize several effective training strategies, i.e., IoU filtering, asymmetric augmentation, and temporal calibration to further improve the training robustness of SPOT. The experimental results demonstrate that, given less than 5 labels for each video, trackers trained via SPOT perform on par with their fully-supervised counterparts. Moreover, our SPOT exhibits two desirable properties: 1) SPOT enables us to fully exploit large-scale video datasets by efficiently allocating sparse labels to more videos even under a limited labeling budget; 2) when equipped with a target discovery module, SPOT can even learn from purely unlabeled videos for performance gain. We hope this work could inspire the community to rethink the current annotation principles and make a step towards practical label-efficient deep tracking.

4.

Efficient Deformable Tissue Reconstruction via Orthogonal Neural Plane.

Yang, Chen; Wang, Kailing; Wang, Yuehao; Dou, Qi; Yang, Xiaokang; Shen, Wei.

IEEE Trans Med Imaging ; PP2024 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-38625765

RESUMO

Intraoperative imaging techniques for reconstructing deformable tissues in vivo are pivotal for advanced surgical systems. Existing methods either compromise on rendering quality or are excessively computationally intensive, often demanding dozens of hours to perform, which significantly hinders their practical application. In this paper, we introduce Fast Orthogonal Plane (Forplane), a novel, efficient framework based on neural radiance fields (NeRF) for the reconstruction of deformable tissues. We conceptualize surgical procedures as 4D volumes, and break them down into static and dynamic fields comprised of orthogonal neural planes. This factorization discretizes the four-dimensional space, leading to a decreased memory usage and faster optimization. A spatiotemporal importance sampling scheme is introduced to improve performance in regions with tool occlusion as well as large motions and accelerate training. An efficient ray marching method is applied to skip sampling among empty regions, significantly improving inference speed. Forplane accommodates both binocular and monocular endoscopy videos, demonstrating its extensive applicability and flexibility. Our experiments, carried out on two in vivo datasets, the EndoNeRF and Hamlyn datasets, demonstrate the effectiveness of our framework. In all cases, Forplane substantially accelerates both the optimization process (by over 100 times) and the inference process (by over 15 times) while maintaining or even improving the quality across a variety of non-rigid deformations. This significant performance improvement promises to be a valuable asset for future intraoperative surgical applications. The code of our project is now available at https://github.com/Loping151/ForPlane.

5.

Task-Specific Normalization for Continual Learning of Blind Image Quality Models.

Zhang, Weixia; Ma, Kede; Zhai, Guangtao; Yang, Xiaokang.

IEEE Trans Image Process ; 33: 1898-1910, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38451761

RESUMO

In this paper, we present a simple yet effective continual learning method for blind image quality assessment (BIQA) with improved quality prediction accuracy, plasticity-stability trade-off, and task-order/-length robustness. The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability, and learn task-specific normalization parameters for plasticity. We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by a weighted summation of predictions from all heads with a lightweight K -means gating mechanism. Extensive experiments on six IQA datasets demonstrate the advantages of the proposed method in comparison to previous training techniques for BIQA.

6.

DRAC 2022: A public benchmark for diabetic retinopathy analysis on ultra-wide optical coherence tomography angiography images.

Qian, Bo; Chen, Hao; Wang, Xiangning; Guan, Zhouyu; Li, Tingyao; Jin, Yixiao; Wu, Yilan; Wen, Yang; Che, Haoxuan; Kwon, Gitaek; Kim, Jaeyoung; Choi, Sungjin; Shin, Seoyoung; Krause, Felix; Unterdechler, Markus; Hou, Junlin; Feng, Rui; Li, Yihao; El Habib Daho, Mostafa; Yang, Dawei; Wu, Qiang; Zhang, Ping; Yang, Xiaokang; Cai, Yiyu; Tan, Gavin Siew Wei; Cheung, Carol Y; Jia, Weiping; Li, Huating; Tham, Yih Chung; Wong, Tien Yin; Sheng, Bin.

Patterns (N Y) ; 5(3): 100929, 2024 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-38487802

RESUMO

We described a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Within this challenge, we provided the DRAC datset, an ultra-wide optical coherence tomography angiography (UW-OCTA) dataset (1,103 images), addressing three primary clinical tasks: diabetic retinopathy (DR) lesion segmentation, image quality assessment, and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams submitting different solutions for these three tasks, respectively. This paper presents a concise summary and analysis of the top-performing solutions and results across all challenge tasks. These solutions could provide practical guidance for developing accurate classification and segmentation models for image quality assessment and DR diagnosis using UW-OCTA images, potentially improving the diagnostic capabilities of healthcare professionals. The dataset has been released to support the development of computer-aided diagnostic systems for DR evaluation.

7.

A deep learning system for predicting time to progression of diabetic retinopathy.

Dai, Ling; Sheng, Bin; Chen, Tingli; Wu, Qiang; Liu, Ruhan; Cai, Chun; Wu, Liang; Yang, Dawei; Hamzah, Haslina; Liu, Yuexing; Wang, Xiangning; Guan, Zhouyu; Yu, Shujie; Li, Tingyao; Tang, Ziqi; Ran, Anran; Che, Haoxuan; Chen, Hao; Zheng, Yingfeng; Shu, Jia; Huang, Shan; Wu, Chan; Lin, Shiqun; Liu, Dan; Li, Jiajia; Wang, Zheyuan; Meng, Ziyao; Shen, Jie; Hou, Xuhong; Deng, Chenxin; Ruan, Lei; Lu, Feng; Chee, Miaoli; Quek, Ten Cheer; Srinivasan, Ramyaa; Raman, Rajiv; Sun, Xiaodong; Wang, Ya Xing; Wu, Jiarui; Jin, Hai; Dai, Rongping; Shen, Dinggang; Yang, Xiaokang; Guo, Minyi; Zhang, Cuntai; Cheung, Carol Y; Tan, Gavin Siew Wei; Tham, Yih-Chung; Cheng, Ching-Yu; Li, Huating.

Nat Med ; 30(2): 584-594, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38177850

RESUMO

Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. The risk of DR progression is highly variable among different individuals, making it difficult to predict risk and personalize screening intervals. We developed and validated a deep learning system (DeepDR Plus) to predict time to DR progression within 5 years solely from fundus images. First, we used 717,308 fundus images from 179,327 participants with diabetes to pretrain the system. Subsequently, we trained and validated the system with a multiethnic dataset comprising 118,868 images from 29,868 participants with diabetes. For predicting time to DR progression, the system achieved concordance indexes of 0.754-0.846 and integrated Brier scores of 0.153-0.241 for all times up to 5 years. Furthermore, we validated the system in real-world cohorts of participants with diabetes. The integration with clinical workflow could potentially extend the mean screening interval from 12 months to 31.97 months, and the percentage of participants recommended to be screened at 1-5 years was 30.62%, 20.00%, 19.63%, 11.85% and 17.89%, respectively, while delayed detection of progression to vision-threatening DR was 0.18%. Altogether, the DeepDR Plus system could predict individualized risk and time to DR progression over 5 years, potentially allowing personalized screening intervals.

Assuntos

Aprendizado Profundo , Diabetes Mellitus , Retinopatia Diabética , Humanos , Retinopatia Diabética/diagnóstico , Cegueira

8.

Model-Based Reinforcement Learning With Isolated Imaginations.

Pan, Minting; Zhu, Xiangming; Zheng, Yitao; Wang, Yunbo; Yang, Xiaokang.

IEEE Trans Pattern Anal Mach Intell ; 46(5): 2788-2803, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-37999968

RESUMO

World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios like autonomous driving, noncontrollable dynamics that are independent or sparsely dependent on action signals often exist, making it challenging to learn effective world models. To address this issue, we propose Iso-Dream++, a model-based reinforcement learning approach that has two main contributions. First, we optimize the inverse dynamics to encourage the world model to isolate controllable state transitions from the mixed spatiotemporal variations of the environment. Second, we perform policy optimization based on the decoupled latent imaginations, where we roll out noncontrollable states into the future and adaptively associate them with the current controllable state. This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild, such as self-driving cars that can anticipate the movement of other vehicles, thereby avoiding potential risks. On top of our previous work (Pan et al. 2022), we further consider the sparse dependencies between controllable and noncontrollable states, address the training collapse problem of state decoupling, and validate our approach in transfer learning setups. Our empirical study demonstrates that Iso-Dream++ outperforms existing reinforcement learning models significantly on CARLA and DeepMind Control.

9.

Artificial intelligence in diabetes management: Advancements, opportunities, and challenges.

Guan, Zhouyu; Li, Huating; Liu, Ruhan; Cai, Chun; Liu, Yuexing; Li, Jiajia; Wang, Xiangning; Huang, Shan; Wu, Liang; Liu, Dan; Yu, Shujie; Wang, Zheyuan; Shu, Jia; Hou, Xuhong; Yang, Xiaokang; Jia, Weiping; Sheng, Bin.

Cell Rep Med ; 4(10): 101213, 2023 10 17.

Artigo em Inglês | MEDLINE | ID: mdl-37788667

RESUMO

The increasing prevalence of diabetes, high avoidable morbidity and mortality due to diabetes and diabetic complications, and related substantial economic burden make diabetes a significant health challenge worldwide. A shortage of diabetes specialists, uneven distribution of medical resources, low adherence to medications, and improper self-management contribute to poor glycemic control in patients with diabetes. Recent advancements in digital health technologies, especially artificial intelligence (AI), provide a significant opportunity to achieve better efficiency in diabetes care, which may diminish the increase in diabetes-related health-care expenditures. Here, we review the recent progress in the application of AI in the management of diabetes and then discuss the opportunities and challenges of AI application in clinical practice. Furthermore, we explore the possibility of combining and expanding upon existing digital health technologies to develop an AI-assisted digital health-care ecosystem that includes the prevention and management of diabetes.

Assuntos

Inteligência Artificial , Diabetes Mellitus , Humanos , Diabetes Mellitus/terapia

10.

MNGNAS: Distilling Adaptive Combination of Multiple Searched Networks for One-Shot Neural Architecture Search.

Chen, Zhihua; Qiu, Guhao; Li, Ping; Zhu, Lei; Yang, Xiaokang; Sheng, Bin.

IEEE Trans Pattern Anal Mach Intell ; 45(11): 13489-13508, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37432801

RESUMO

Recently neural architecture (NAS) search has attracted great interest in academia and industry. It remains a challenging problem due to the huge search space and computational costs. Recent studies in NAS mainly focused on the usage of weight sharing to train a SuperNet once. However, the corresponding branch of each subnetwork is not guaranteed to be fully trained. It may not only incur huge computation costs but also affect the architecture ranking in the retraining procedure. We propose a multi-teacher-guided NAS, which proposes to use the adaptive ensemble and perturbation-aware knowledge distillation algorithm in the one-shot-based NAS algorithm. The optimization method aiming to find the optimal descent directions is used to obtain adaptive coefficients for the feature maps of the combined teacher model. Besides, we propose a specific knowledge distillation process for optimal architectures and perturbed ones in each searching process to learn better feature maps for later distillation procedures. Comprehensive experiments verify our approach is flexible and effective. We show improvement in precision and search efficiency in the standard recognition dataset. We also show improvement in correlation between the accuracy of the search algorithm and true accuracy by NAS benchmark datasets.

11.

Neural Architecture Selection as a Nash Equilibrium With Batch Entanglement.

Li, Qian; Xue, Chao; Li, Mingming; Li, Chun-Guang; Ma, Chao; Yang, Xiaokang.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jul 11.

Artigo em Inglês | MEDLINE | ID: mdl-37432819

RESUMO

Modeling the architecture search process on a supernet and applying a differentiable method to find the importance of architecture are among the leading tools for differentiable neural architectures search (DARTS). One fundamental problem in DARTS is how to discretize or select a single-path architecture from the pretrained one-shot architecture. Previous approaches mainly exploit heuristic or progressive search methods for discretization and selection, which are not efficient and easily trapped by local optimizations. To address these issues, we formulate the task of finding a proper single-path architecture as an architecture game among the edges and operations with the strategies "keep" and "drop" and show that the optimal one-shot architecture is a Nash equilibrium of the architecture game. Then, we propose a novel and effective approach for discretizing and selecting a proper single-path architecture, which is based on extracting the single-path architecture that associates the maximal coefficient of the Nash equilibrium with the strategy "keep" in the architecture game. To further improve the efficiency, we employ a mechanism of entangled Gaussian representation of mini-batches, inspired by the classic Parrondo's paradox. If some mini-batch formed uncompetitive strategies, the entanglement of mini-batches would ensure the games be combined and, thus, turn into strong ones. We conduct extensive experiments on benchmark datasets and demonstrate that our approach is significantly faster than the state-of-the-art progressive discretizing methods while maintaining competitive performance with higher maximum accuracy.

12.

Blind Image Quality Assessment for Pathological Microscopic Image Under Screen and Immersion Scenarios.

Guo, Yifei; Hu, Menghan; Min, Xiongkuo; Wang, Yan; Dai, Min; Zhai, Guangtao; Zhang, Xiao-Ping; Yang, Xiaokang.

IEEE Trans Med Imaging ; 42(11): 3295-3306, 2023 11.

Artigo em Inglês | MEDLINE | ID: mdl-37267133

RESUMO

The high-quality pathological microscopic images are essential for physicians or pathologists to make a correct diagnosis. Image quality assessment (IQA) can quantify the visual distortion degree of images and guide the imaging system to improve image quality, thus raising the quality of pathological microscopic images. Current IQA methods are not ideal for pathological microscopy images due to their specificity. In this paper, we present deep learning-based blind image quality assessment model with saliency block and patch block for pathological microscopic images. The saliency block and patch block can handle the local and global distortions, respectively. To better capture the area of interest of pathologists when viewing pathological images, the saliency block is fine-tuned by eye movement data of pathologists. The patch block can capture lots of global information strongly related to image quality via the interaction between different image patches from different positions. The performance of the developed model is validated by the home-made Pathological Microscopic Image Quality Database under Screen and Immersion Scenarios (PMIQD-SIS) and cross-validated by the five public datasets. The results of ablation experiments demonstrate the contribution of the added blocks. The dataset and the corresponding code are publicly available at: https://github.com/mikugyf/PMIQD-SIS.

Assuntos

Imersão , Microscopia , Bases de Dados Factuais

13.

Unsupervised Learning of Graph Matching With Mixture of Modes via Discrepancy Minimization.

Wang, Runzhong; Yan, Junchi; Yang, Xiaokang.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 10500-10518, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37030721

RESUMO

Graph matching (GM) has been a long-standing combinatorial problem due to its NP-hard nature. Recently (deep) learning-based approaches have shown their superiority over the traditional solvers while the methods are almost based on supervised learning which can be expensive or even impractical. We develop a unified unsupervised framework from matching two graphs to multiple graphs, without correspondence ground truth for training. Specifically, a Siamese-style unsupervised learning framework is devised and trained by minimizing the discrepancy of a second-order classic solver and a first-order (differentiable) Sinkhorn net as two branches for matching prediction. The two branches share the same CNN backbone for visual graph matching. Our framework further allows unsupervised learning with graphs from a mixture of modes which is ubiquitous in reality. Specifically, we develop and unify the graduated assignment (GA) strategy for matching two-graph, multi-graph, and graphs from a mixture of modes, whereby two-way constraint and clustering confidence (for mixture case) are modulated by two separate annealing parameters, respectively. Moreover, for partial and outlier matching, an adaptive reweighting technique is developed to suppress the overmatching issue. Experimental results on real-world benchmarks including natural image matching show our unsupervised method performs comparatively and even better against two-graph based supervised approaches.

Assuntos

Algoritmos , Aprendizado de Máquina não Supervisionado , Análise por Conglomerados

14.

StyleVR: Stylizing Character Animations with Normalizing Flows.

Ji, Bin; Pan, Ye; Yan, Yichao; Chen, Ruizhao; Yang, Xiaokang.

IEEE Trans Vis Comput Graph ; PP2023 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-37030765

RESUMO

The significance of artistry in creating animated virtual characters is widely acknowledged, and motion style is a crucial element in this process. There has been a long-standing interest in stylizing character animations with style transfer methods. However, this kind of models can only deal with short-term motions and yield deterministic outputs. To address this issue, we propose a generative model based on normalizing flows for stylizing long and aperiodic animations in the VR scene. Our approach breaks down this task into two sub-problems: motion style transfer and stylized motion generation, both formulated as the instances of conditional normalizing flows with multi-class latent space. Specifically, we encode high-frequency style features into the latent space for varied results and control the generation process with style-content labels for disentangled edits of style and content. We have developed a prototype, StyleVR, in Unity, which allows casual users to apply our method in VR. Through qualitative and quantitative comparisons, we demonstrate that our system outperforms other methods in terms of style transfer as well as stochastic stylized motion generation.

15.

A Survey on Label-Efficient Deep Image Segmentation: Bridging the Gap Between Weak Supervision and Dense Prediction.

Shen, Wei; Peng, Zelin; Wang, Xuehui; Wang, Huayu; Cen, Jiazhong; Jiang, Dongsheng; Xie, Lingxi; Yang, Xiaokang; Tian, Qi.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 9284-9305, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37027561

RESUMO

The rapid development of deep learning has made a great progress in image segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based image segmentation algorithms. This paper offers a comprehensive review on label-efficient image segmentation methods. To this end, we first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels (including no supervision, inexact supervision, incomplete supervision and inaccurate supervision) and supplemented by the types of segmentation problems (including semantic segmentation, instance segmentation and panoptic segmentation). Next, we summarize the existing label-efficient image segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction - the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, and cross-image relation. Finally, we share our opinions about the future research directions for label-efficient deep image segmentation.

Assuntos

Algoritmos , Semântica , Processamento de Imagem Assistida por Computador

16.

Sequence as A Whole: A Unified Framework for Video Action Localization with Long-range Text Query.

Su, Yuting; Wang, Weikang; Liu, Jing; Ma, Shuang; Yang, Xiaokang.

IEEE Trans Image Process ; PP2023 Feb 13.

Artigo em Inglês | MEDLINE | ID: mdl-37022864

RESUMO

Comprehensive understanding of video content requires both spatial and temporal localization. However, there lacks a unified video action localization framework, which hinders the coordinated development of this field. Existing 3D CNN methods take fixed and limited input length at the cost of ignoring temporally long-range cross-modal interaction. On the other hand, despite having large temporal context, existing sequential methods often avoid dense cross-modal interactions for complexity reasons. To address this issue, in this paper, we propose a unified framework which handles the whole video in sequential manner with long-range and dense visual-linguistic interaction in an end-to-end manner. Specifically, a lightweight relevance filtering based transformer (Ref-Transformer) is designed, which is composed of relevance filtering based attention and temporally expanded MLP. The text-relevant spatial regions and temporal clips in video can be efficiently highlighted through the relevance filtering and then propagated among the whole video sequence with the temporally expanded MLP. Extensive experiments on three sub-tasks of referring video action localization, i.e., referring video segmentation, temporal sentence grounding, and spatiotemporal video grounding, show that the proposed framework achieves the state-of-the-art performance in all referring video action localization tasks.

17.

Combinatorial Learning of Robust Deep Graph Matching: An Embedding Based Approach.

Wang, Runzhong; Yan, Junchi; Yang, Xiaokang.

IEEE Trans Pattern Anal Mach Intell ; 45(6): 6984-7000, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-32750800

RESUMO

Graph matching aims to establish node correspondence between two graphs, which has been a fundamental problem for its NP-hard nature. One practical consideration is the effective modeling of the affinity function in the presence of noise, such that the mathematically optimal matching result is also physically meaningful. This paper resorts to deep neural networks to learn the node and edge feature, as well as the affinity model for graph matching in an end-to-end fashion. The learning is supervised by combinatorial permutation loss over nodes. Specifically, the parameters belong to convolutional neural networks for image feature extraction, graph neural networks for node embedding that convert the structural (beyond second-order) information into node-wise features that leads to a linear assignment problem, as well as the affinity kernel between two graphs. Our approach enjoys flexibility in that the permutation loss is agnostic to the number of nodes, and the embedding model is shared among nodes such that the network can deal with varying numbers of nodes for both training and inference. Moreover, our network is class-agnostic. Experimental results on extensive benchmarks show its state-of-the-art performance. It bears some generalization capability across categories and datasets, and is capable for robust matching against outliers.

18.

Robust Mesh Representation Learning via Efficient Local Structure-Aware Anisotropic Convolution.

Gao, Zhongpai; Yan, Junchi; Zhai, Guangtao; Zhang, Juyong; Yang, Xiaokang.

IEEE Trans Neural Netw Learn Syst ; 34(11): 8566-8578, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-35226610

RESUMO

Mesh is a type of data structure commonly used for 3-D shapes. Representation learning for 3-D meshes is essential in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e.g., images) suggests the value of adapting insights from CNN for 3-D shapes. However, 3-D shape data are irregular since each node's neighbors are unordered. Various graph neural networks for 3-D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this article, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each template's node according to its neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer-a new Transformer model for natural language processing (NLP). Since the learnable weighting matrices require large amounts of parameters for high-resolution 3-D shapes, we introduce a matrix factorization technique to notably reduce the parameter size, denoted as LSA-small. Furthermore, a residual connection with a linear transformation is introduced to improve the performance of our LSA-Conv. Comprehensive experiments demonstrate that our model produces significant improvement in 3-D shape reconstruction compared to state-of-the-art methods.

19.

SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing.

Yang, Xue; Yan, Junchi; Liao, Wenlong; Yang, Xiaokang; Tang, Jin; He, Tao.

IEEE Trans Pattern Anal Mach Intell ; 45(2): 2384-2399, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-35412976

RESUMO

Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S 2 TLD by this paper. The results show the effectiveness of our approach. The released dataset S 2 TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories.

20.

Continual Learning for Blind Image Quality Assessment.

Zhang, Weixia; Li, Dingquan; Ma, Chao; Zhai, Guangtao; Yang, Xiaokang; Ma, Kede.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 2864-2878, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-35635807

RESUMO

The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, which are weak at adapting to subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the continual setting with three criteria to quantify the prediction accuracy, plasticity, and stability, respectively. We then propose a simple yet effective continual learning method for BIQA. Specifically, based on a shared backbone network, we add a prediction head for a new dataset and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the overall quality score by a weighted summation of predictions from all heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA, with and without experience replay. We made the code publicly available at https://github.com/zwx8981/BIQA_CL.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA