Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
1.
Med Phys ; 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38648676

RESUMO

BACKGROUND: Liver lesions mainly occur inside the liver parenchyma, which are difficult to locate and have complicated relationships with essential vessels. Thus, preoperative planning is crucial for the resection of liver lesions. Accurate segmentation of the hepatic and portal veins (PVs) on computed tomography (CT) images is of great importance for preoperative planning. However, manually labeling the mask of vessels is laborious and time-consuming, and the labeling results of different clinicians are prone to inconsistencies. Hence, developing an automatic segmentation algorithm for hepatic and PVs on CT images has attracted the attention of researchers. Unfortunately, existing deep learning based automatic segmentation methods are prone to misclassifying peripheral vessels into wrong categories. PURPOSE: This study aims to provide a fully automatic and robust semantic segmentation algorithm for hepatic and PVs, guiding subsequent preoperative planning. In addition, to address the deficiency of the public dataset for hepatic and PV segmentation, we revise the annotations of the Medical Segmentation Decathlon (MSD) hepatic vessel segmentation dataset and add the masks of the hepatic veins (HVs) and PVs. METHODS: We proposed a structure with a dual-stream encoder combining convolution and Transformer block, named Dual-stream Hepatic Portal Vein segmentation Network, to extract local features and long-distance spatial information, thereby extracting anatomical information of hepatic and portal vein, avoiding misdivisions of adjacent peripheral vessels. Besides, a multi-scale feature fusion block based on dilated convolution is proposed to extract multi-scale features on expanded perception fields for local features, and a multi-level fusing attention module is introduced for efficient context information extraction. Paired t-test is conducted to evaluate the significant difference in dice between the proposed methods and the comparing methods. RESULTS: Two datasets are constructed from the original MSD dataset. For each dataset, 50 cases are randomly selected for model evaluation in the scheme of 5-fold cross-validation. The results show that our method outperforms the state-of-the-art Convolutional Neural Network-based and transformer-based methods. Specifically, for the first dataset, our model reaches 0.815, 0.830, and 0.807 at overall dice, precision, and sensitivity. The dice of the hepatic and PVs are 0.835 and 0.796, which also exceed the numeric result of the comparing methods. Almost all the p-values of paired t-tests on the proposed approach and comparing approaches are smaller than 0.05. On the second dataset, the proposed algorithm achieves 0.749, 0.762, 0.726, 0.835, and 0.796 for overall dice, precision, sensitivity, dice for HV, and dice for PV, among which the first four numeric results exceed comparing methods. CONCLUSIONS: The proposed method is effective in solving the problem of misclassifying interlaced peripheral veins for the HV and PV segmentation task and outperforming the comparing methods on the relabeled dataset.

2.
Nat Comput Sci ; 4(3): 210-223, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38467870

RESUMO

Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep learning functional model. We build the essential non-locality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy to Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those seen in training, which unleashes the appealing scaling of OFDFT for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38349823

RESUMO

Deep Neural Network classifiers are vulnerable to adversarial attacks, where an imperceptible perturbation could result in misclassification. However, the vulnerability of DNN-based image ranking systems remains under-explored. In this paper, we propose two attacks against deep ranking systems, i.e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations. Specifically, the expected ranking order is first represented as a set of inequalities. Then a triplet-like objective function is designed to obtain the optimal perturbation. Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks, where the model learns to prevent the adversarial attack from pulling the positive and negative samples close to each other. To comprehensively measure the empirical adversarial robustness of a ranking model with our defense, we propose an empirical robustness score, which involves a set of representative attacks against ranking models. Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196, and Stanford Online Products datasets. Experimental results demonstrate that our attacks can effectively compromise a typical deep ranking system. Nevertheless, our defense can significantly improve the ranking system's robustness and simultaneously mitigate a wide range of attacks.

4.
Nat Commun ; 15(1): 313, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38182565

RESUMO

Geometric deep learning has been revolutionizing the molecular modeling field. Despite the state-of-the-art neural network models are approaching ab initio accuracy for molecular property prediction, their applications, such as drug discovery and molecular dynamics (MD) simulation, have been hindered by insufficient utilization of geometric information and high computational costs. Here we propose an equivariant geometry-enhanced graph neural network called ViSNet, which elegantly extracts geometric features and efficiently models molecular structures with low computational costs. Our proposed ViSNet outperforms state-of-the-art approaches on multiple MD benchmarks, including MD17, revised MD17 and MD22, and achieves excellent chemical property prediction on QM9 and Molecule3D datasets. Furthermore, through a series of simulations and case studies, ViSNet can efficiently explore the conformational space and provide reasonable interpretability to map geometric representations to molecular structures.

5.
Med Phys ; 51(3): 1775-1797, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37681965

RESUMO

BACKGROUND: Atherosclerotic cardiovascular disease is the leading cause of death worldwide. Early detection of carotid atherosclerosis can prevent the progression of cardiovascular disease. Many (semi-) automatic methods have been designed for the segmentation of carotid vessel wall and the diagnosis of carotid atherosclerosis (i.e., the lumen segmentation, the outer wall segmentation, and the carotid atherosclerosis diagnosis) on black blood magnetic resonance imaging (BB-MRI). However, most of these methods ignore the intrinsic correlation among different tasks on BB-MRI, leading to limited performance. PURPOSE: Thus, we model the intrinsic correlation among the lumen segmentation, the outer wall segmentation, and the carotid atherosclerosis diagnosis tasks on BB-MRI by using the multi-task learning technique and propose a gated multi-task network (GMT-Net) to perform three related tasks in a neural network (i.e., carotid artery lumen segmentation, outer wall segmentation, and carotid atherosclerosis diagnosis). METHODS: In the proposed method, the GMT-Net is composed of three modules, including the sharing module, the segmentation module, and the diagnosis module, which interact with each other to achieve better learning performance. At the same time, two new adaptive layers, namely, the gated exchange layer and the gated fusion layer, are presented to exchange and merge branch features. RESULTS: The proposed method is applied to the CAREII dataset (i.e., 1057 scans) for the lumen segmentation, the outer wall segmentation, and the carotid atherosclerosis diagnosis. The proposed method can achieve promising segmentation performances (0.9677 Dice for the lumen and 0.9669 Dice for the outer wall) and better diagnosis accuracy of carotid atherosclerosis (0.9516 AUC and 0.9024 Accuracy) in the "CAREII test" dataset (i.e., 106 scans). The results show that the proposed method has statistically significant accuracy and efficiency. CONCLUSIONS: Even without the intervention of reviewers required for the previous works, the proposed method automatically segments the lumen and outer wall together and diagnoses carotid atherosclerosis with high performance. The proposed method can be used in clinical trials to help radiologists get rid of tedious reading tasks, such as screening review to separate normal carotid arteries from atherosclerotic arteries and to outline vessel wall contours.


Assuntos
Doenças Cardiovasculares , Doenças das Artérias Carótidas , Humanos , Doenças Cardiovasculares/patologia , Artérias Carótidas/diagnóstico por imagem , Artérias Carótidas/patologia , Doenças das Artérias Carótidas/diagnóstico por imagem , Doenças das Artérias Carótidas/patologia , Angiografia por Ressonância Magnética/métodos , Imageamento por Ressonância Magnética/métodos
6.
Int J Neural Syst ; 34(1): 2450002, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38084473

RESUMO

Functional MRI (fMRI) is a brain signal with high spatial resolution, and visual cognitive processes and semantic information in the brain can be represented and obtained through fMRI. In this paper, we design single-graphic and matched/unmatched double-graphic visual stimulus experiments and collect 12 subjects' fMRI data to explore the brain's visual perception processes. In the double-graphic stimulus experiment, we focus on the high-level semantic information as "matching", and remove tail-to-tail conjunction by designing a model to screen the matching-related voxels. Then, we perform Bayesian causal learning between fMRI voxels based on the transfer entropy, establish a hierarchical Bayesian causal network (HBcausalNet) of the visual cortex, and use the model for visual stimulus image reconstruction. HBcausalNet achieves an average accuracy of 70.57% and 53.70% in single- and double-graphic stimulus image reconstruction tasks, respectively, higher than HcorrNet and HcasaulNet. The results show that the matching-related voxel screening and causality analysis method in this paper can extract the "matching" information in fMRI, obtain a direct causal relationship between matching information and fMRI, and explore the causal inference process in the brain. It suggests that our model can effectively extract high-level semantic information in brain signals and model effective connections and visual perception processes in the visual cortex of the brain.


Assuntos
Mapeamento Encefálico , Córtex Visual , Humanos , Mapeamento Encefálico/métodos , Teorema de Bayes , Semântica , Encéfalo , Imageamento por Ressonância Magnética/métodos , Córtex Visual/diagnóstico por imagem
7.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3753-3771, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38145531

RESUMO

Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This article investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.

8.
Front Plant Sci ; 14: 1282212, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38046604

RESUMO

Accurate recognition of pest categories is crucial for effective pest control. Due to issues such as the large variation in pest appearance, low data quality, and complex real-world environments, pest recognition poses challenges in practical applications. At present, many models have made great efforts on the real scene dataset IP102, but the highest recognition accuracy is only 75%. To improve pest recognition in practice, this paper proposes a multi-image fusion recognition method. Considering that farmers have easy access to data, the method performs fusion recognition on multiple images of the same pest instead of the conventional single image. Specifically, the method first uses convolutional neural network (CNN) to extract feature maps from these images. Then, an effective feature localization module (EFLM) captures the feature maps outputted by all blocks of the last convolutional stage of the CNN, marks the regions with large activation values as pest locations, and then integrates and crops them to obtain the localized features. Next, the adaptive filtering fusion module (AFFM) learns gate masks and selection masks for these features to eliminate interference from useless information, and uses the attention mechanism to select beneficial features for fusion. Finally, the classifier categorizes the fused features and the soft voting (SV) module integrates these results to obtain the final pest category. The principle of the model is activation value localization, feature filtering and fusion, and voting integration. The experimental results indicate that the proposed method can train high-performance feature extractors and classifiers, achieving recognition accuracy of 73.9%, 99.8%, and 99.7% on IP102, D0, and ETP, respectively, surpassing most single models. The results also show that thanks to the positive role of each module, the accuracy of multi-image fusion recognition reaches the state-of-the-art level of 96.1%, 100%, and 100% on IP102, D0, and ETP using 5, 2, and 2 images, respectively, which meets the requirements of practical applications. Additionally, we have developed a web application that applies our research findings in practice to assist farmers in reliable pest identification and drive the advancement of smart agriculture.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38083165

RESUMO

Vessel centerline extraction is essential for carotid stenosis assessment and atherosclerotic plaque identification in clinical diagnosis. Simultaneously, it provides a region of interest identification and boundary initialization for computer-assisted diagnosis tools. In magnetic resonance imaging (MRI) cross-sectional images, the lumen shape and vascular topology result in a challenging task to extract the centerline accurately. To this end, we propose a space-refine framework, which exploits the positional continuity of the carotid artery from frame to frame to extract the carotid artery centerline. The proposed framework consists of a detector and a refinement module. Specifically, the detector roughly extracts the carotid lumen region from the original image. Then, we introduce a refinement module that uses the cascade of regressors from a detector to perform sequence realignment of lumen bounding boxes for each subject. It improves the lumen localization results and further enhances the centerline extraction accuracy. Verified by large carotid artery data, the proposed framework achieves state-of-the-art performance compared to conventional vessel centerline extraction methods or standard convolutional neural network approaches.Clinical relevance- Our proposed framework can be used as an important aid for physicians to quantitatively analyze the carotid artery in clinical practice. It is also used as a new paradigm for extracting the centerline of carotid vessels in computer-assisted tools.


Assuntos
Artérias Carótidas , Placa Aterosclerótica , Humanos , Artérias Carótidas/diagnóstico por imagem , Redes Neurais de Computação , Artéria Carótida Primitiva , Placa Aterosclerótica/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos
10.
Artigo em Inglês | MEDLINE | ID: mdl-38083432

RESUMO

Lymphomas are a group of malignant tumors developed from lymphocytes, which may occur in many organs. Therefore, accurately distinguishing lymphoma from solid tumors is of great clinical significance. Due to the strong ability of graph structure to capture the topology of the micro-environment of cells, graph convolutional networks (GCNs) have been widely used in pathological image processing. Nevertheless, the softmax classification layer of the graph convolutional models cannot drive learned representations compact enough to distinguish some types of lymphomas and solid tumors with strong morphological analogies on H&E-stained images. To alleviate this problem, a prototype learning based model is proposed, namely graph convolutional prototype network (GCPNet). Specifically, the method follows the patch-to-slide architecture first to perform patch-level classification and obtain image-level results by fusing patch-level predictions. The classification model is assembled with a graph convolutional feature extractor and prototype-based classification layer to build more robust feature representations for classification. For model training, a dynamic prototype loss is proposed to give the model different optimization priorities at different stages of training. Besides, a prototype reassignment operation is designed to prevent the model from getting stuck in local minima during optimization. Experiments are conducted on a dataset of 183 Whole slide images (WSI) of gastric mucosa biopsy. The proposed method achieved superior performance than existing methods.Clinical relevance- The work proposed a new deep learning framework tailored to lymphoma recognition on pathological image of gastric mucosal biopsy to differentiate lymphoma, adenocarcinoma and inflammation.


Assuntos
Linfoma , Estômago , Humanos , Biópsia , Mucosa Gástrica , Gastroscopia , Linfoma/diagnóstico , Microambiente Tumoral
11.
Artigo em Inglês | MEDLINE | ID: mdl-38113156

RESUMO

Point cloud-based 3-D object detection is a significant and critical issue in numerous applications. While most existing methods attempt to capitalize on the geometric characteristics of point clouds, they neglect the internal semantic properties of point and the consistency between the semantic and geometric clues. We introduce a semantic consistency (SC) mechanism for 3-D object detection in this article, by reasoning about the semantic relations between 3-D object boxes and its internal points. This mechanism is based on a natural principle: the semantic category of a 3-D bounding box should be consistent with the categories of all points within the box. Driven by the SC mechanism, we propose a novel SC network (SCNet) to detect 3-D objects from point clouds. Specifically, the SCNet is composed of a feature extraction module, a detection decision module, and a semantic segmentation module. In inference, the feature extraction and the detection decision modules are used to detect 3-D objects. In training, the semantic segmentation module is jointly trained with the other two modules to produce more robust and applicable model parameters. The performance is greatly boosted through reasoning about the relations between the output 3-D object boxes and segmented points. The proposed SC mechanism is model-agnostic and can be integrated into other base 3-D object detection models. We test the proposed model on three challenging indoor and outdoor benchmark datasets: ScanNetV2, SUN RGB-D, and KITTI. Furthermore, to validate the universality of the SC mechanism, we implement it in three different 3-D object detectors. The experiments show that the performance is impressively improved and the extensive ablation studies also demonstrate the effectiveness of the proposed model.

12.
J Neurosci Methods ; 399: 109980, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37783351

RESUMO

BACKGROUND: The brain aggregates meaningless local sensory elements to form meaningful global patterns in a process called perceptual grouping. Current brain imaging studies have found that neural activities in V1 are modulated during visual grouping. However, how grouping is represented in each of the early visual areas, and how attention alters these representations, is still unknown. NEW METHOD: We adopted MVPA to decode the specific content of perceptual grouping by comparing neural activity patterns between gratings and dot lattice stimuli which can be grouped with proximity law. Furthermore, we quantified the grouping effect by defining the strength of grouping, and assessed the effect of attention on grouping. RESULTS: We found that activity patterns to proximity grouped stimuli in early visual areas resemble these to grating stimuli with the same orientations. This similarity exists even when there is no attention focused on the stimuli. The results also showed a progressive increase of representational strength of grouping from V1 to V3, and attention modulation to grouping is only significant in V3 among all the visual areas. COMPARISON WITH EXISTING METHODS: Most of the previous work on perceptual grouping has focused on how activity amplitudes are modulated by grouping. Using MVPA, the present work successfully decoded the contents of neural activity patterns corresponding to proximity grouping stimuli, thus shed light on the availability of content-decoding approach in the research on perceptual grouping. CONCLUSIONS: Our work found that the content of the neural activity patterns during perceptual grouping can be decoded in the early visual areas under both attended and unattended task, and provide novel evidence that there is a cascade processing for proximity grouping through V1 to V3. The strength of grouping was larger in V3 than in any other visual areas, and the attention modulation to the strength of grouping was only significant in V3 among all the visual areas, implying that V3 plays an important role in proximity grouping.


Assuntos
Atenção , Córtex Visual , Humanos , Encéfalo , Mapeamento Encefálico , Estimulação Luminosa , Percepção Visual
13.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12022-12037, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37819807

RESUMO

Inferring the unseen attribute-object composition is critical to make machines learn to decompose and compose complex concepts like people. Most existing methods are limited to the composition recognition of single-attribute-object, and can hardly learn relations between the attributes and objects. In this paper, we propose an attribute-object semantic association graph model to learn the complex relations and enable knowledge transfer between primitives. With nodes representing attributes and objects, the graph can be constructed flexibly, which realizes both single- and multi-attribute-object composition recognition. In order to reduce mis-classifications of similar compositions (e.g., scratched screen and broken screen), driven by the contrastive loss, the anchor image feature is pulled closer to the corresponding label feature and pushed away from other negative label features. Specifically, a novel balance loss is proposed to alleviate the domain bias, where a model prefers to predict seen compositions. In addition, we build a large-scale Multi-Attribute Dataset (MAD) with 116,099 images and 8,030 label categories for inferring unseen multi-attribute-object compositions. Along with MAD, we propose two novel metrics Hard and Soft to give a comprehensive evaluation in the multi-attribute setting. Experiments on MAD and two other single-attribute-object benchmarks (MIT-States and UT-Zappos50K) demonstrate the effectiveness of our approach.

14.
Neural Netw ; 168: 171-179, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37757725

RESUMO

Contrastive learning methods aim to learn shared representations by minimizing distances between positive pairs, and maximizing distances between negative pairs in the embedding space. To achieve better performance of contrastive learning, one of the key problems is to design appropriate sample pairs. In most previous works, random cropping on the input image is utilized to obtain two views as positive pairs. However, such strategies lead to suboptimal performance since the sampled crops may have inconsistent semantic information, which consequently degrades the quality of contrastive views. To address this limitation, we explore to replenish sample views with better consistency of the image and propose a novel self-supervised learning (SSL) framework RepCo. Instead of searching for semantically consistent patches between two different views, we select patches on the same image as the replenishment of positive/negative pairs, encourage patches that are similar but come from different positions as positive pairs, and force patches that are dissimilar but come from adjacent positions to have different representations, i.e. construct negative pairs to enrich the learned representations. Our method effectively generates high-quality contrastive views, explores the untapped semantic consistency on images, and provides more informative representations for downstream tasks. Experiments on adequate downstream tasks have shown that, our approach achieves +2.1 AP50 (COCO pre-trained) and +1.6 AP50 (ImageNet pre-trained) gains on Pascal VOC object detection, +2.3 mIoU gains on Cityscapes semantic segmentation, respectively.


Assuntos
Aprendizagem , Redes Neurais de Computação , Semântica
15.
Front Neurosci ; 17: 1247315, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37746136

RESUMO

This paper investigates the selection of voxels for functional Magnetic Resonance Imaging (fMRI) brain data. We aim to identify a comprehensive set of discriminative voxels associated with human learning when exposed to a neutral visual stimulus that predicts an aversive outcome. However, due to the nature of the unconditioned stimuli (typically a noxious stimulus), it is challenging to obtain sufficient sample sizes for psychological experiments, given the tolerability of the subjects and ethical considerations. We propose a stable hierarchical voting (SHV) mechanism based on stability selection to address this challenge. This mechanism enables us to evaluate the quality of spatial random sampling and minimizes the risk of false and missed detections. We assess the performance of the proposed algorithm using simulated and publicly available datasets. The experiments demonstrate that the regularization strategy choice significantly affects the results' interpretability. When applying our algorithm to our collected fMRI dataset, it successfully identifies sparse and closely related patterns across subjects and displays stable weight maps for three experimental phases under the fear conditioning paradigm. These findings strongly support the causal role of aversive conditioning in altering visual-cortical activity.

16.
Int J Neural Syst ; 33(7): 2350035, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37314000

RESUMO

Zero-shot detection (ZSD) aims to locate and classify unseen objects in pictures or videos by semantic auxiliary information without additional training examples. Most of the existing ZSD methods are based on two-stage models, which achieve the detection of unseen classes by aligning object region proposals with semantic embeddings. However, these methods have several limitations, including poor region proposals for unseen classes, lack of consideration of semantic representations of unseen classes or their inter-class correlations, and domain bias towards seen classes, which can degrade overall performance. To address these issues, the Trans-ZSD framework is proposed, which is a transformer-based multi-scale contextual detection framework that explicitly exploits inter-class correlations between seen and unseen classes and optimizes feature distribution to learn discriminative features. Trans-ZSD is a single-stage approach that skips proposal generation and performs detection directly, allowing the encoding of long-term dependencies at multiple scales to learn contextual features while requiring fewer inductive biases. Trans-ZSD also introduces a foreground-background separation branch to alleviate the confusion of unseen classes and backgrounds, contrastive learning to learn inter-class uniqueness and reduce misclassification between similar classes, and explicit inter-class commonality learning to facilitate generalization between related classes. Trans-ZSD addresses the domain bias problem in end-to-end generalized zero-shot detection (GZSD) models by using balance loss to maximize response consistency between seen and unseen predictions, ensuring that the model does not bias towards seen classes. The Trans-ZSD framework is evaluated on the PASCAL VOC and MS COCO datasets, demonstrating significant improvements over existing ZSD models.


Assuntos
Aprendizagem , Redes Neurais de Computação
17.
J Biophotonics ; 16(9): e202300059, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37289201

RESUMO

Automated analysis of the vessel structure in intravascular optical coherence tomography (IVOCT) images is critical to assess the health status of vessels and monitor coronary artery disease progression. However, deep learning-based methods usually require well-annotated large datasets, which are difficult to obtain in the field of medical image analysis. Hence, an automatic layers segmentation method based on meta-learning was proposed, which can simultaneously extract the surfaces of the lumen, intima, media, and adventitia using a handful of annotated samples. Specifically, we leverage a bi-level gradient strategy to train a meta-learner for capturing the shared meta-knowledge among different anatomical layers and quickly adapting to unknown anatomical layers. Then, a Claw-type network and a contrast consistency loss were designed to better learn the meta-knowledge according to the characteristic of annotation of the lumen and anatomical layers. Experimental results on the two cardiovascular IVOCT datasets show that the proposed method achieved state-of-art performance.


Assuntos
Doença da Artéria Coronariana , Tomografia de Coerência Óptica , Humanos , Tomografia de Coerência Óptica/métodos , Pulmão
18.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12550-12561, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37159310

RESUMO

Trajectory forecasting for traffic participants (e.g., vehicles) is critical for autonomous platforms to make safe plans. Currently, most trajectory forecasting methods assume that object trajectories have been extracted and directly develop trajectory predictors based on the ground truth trajectories. However, this assumption does not hold in practical situations. Trajectories obtained from object detection and tracking are inevitably noisy, which could cause serious forecasting errors to predictors built on ground truth trajectories. In this paper, we propose to predict trajectories directly based on detection results without relying on explicitly formed trajectories. Different from traditional methods which encode the motion cues of an agent based on its clearly defined trajectory, we extract the motion information only based on the affinity cues among detection results, in which an affinity-aware state update mechanism is designed to manage the state information. In addition, considering that there could be multiple plausible matching candidates, we aggregate the states of them. These designs take the uncertainty of association into account which relax the undesirable effect of noisy trajectory obtained from data association and improve the robustness of the predictor. Extensive experiments validate the effectiveness of our method and its generalization ability to different detectors or forecasting schemes.

19.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9504-9519, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37021919

RESUMO

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching the local, global and multi-scale contexts in the popular two-stage temporal localization framework. Our proposed model, dubbed ContextLoc++, can be divided into three sub-networks: L-Net, G-Net, and M-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. Furthermore, the spatial and temporal snippet-level features, functioning as keys and values, are fused by temporal gating. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. M-Net further fuses the local and global contexts with multi-scale proposal features. Specially, proposal-level features from multi-scale video snippets can focus on different action characteristics. Short-term snippets with fewer frames pay attention to the action details while long-term snippets with more frames focus on the action variations. Experiments on the THUMOS14 and ActivityNet v1.3 datasets validate the efficacy of our method against existing state-of-the-art TAL algorithms.

20.
Artigo em Inglês | MEDLINE | ID: mdl-37022905

RESUMO

Depth maps generally suffer from large erroneous areas even in public RGB-Depth datasets. Existing learning-based depth recovery methods are limited by insufficient high-quality datasets and optimization-based methods generally depend on local contexts not to effectively correct large erroneous areas. This paper develops an RGB-guided depth map recovery method based on the fully connected conditional random field (dense CRF) model to jointly utilize local and global contexts of depth maps and RGB images. A high-quality depth map is inferred by maximizing its probability conditioned upon a low-quality depth map and a reference RGB image based on the dense CRF model. The optimization function is composed of redesigned unary and pairwise components, which constraint local structure and global structure of depth map, respectively, with the guidance of RGB image. In addition, the texture-copy artifacts problem is handled by two-stage dense CRF models in a coarse-to-fine way. A coarse depth map is first recovered by embedding RGB image in a dense CRF model in unit of 3×3 blocks. It is refined afterward by embedding RGB image in another model in unit of individual pixels and restricting the model mainly work in discontinued regions. Extensive experiments on six datasets verify that the proposed method considerably outperforms a dozen of baseline methods in correcting erroneous areas and diminishing texture-copy artifacts of depth maps.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA