Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-39283785

RESUMEN

Unsupervised person re-identification (Re-ID) is challenging due to the lack of ground-truth labels. Most existing methods rely on pseudo labels estimated via iterative clustering and thus are highly susceptible to performance penalties incurred by the inaccurate estimated number of clusters. Alternatively, we utilize the sample pairs with pairwise pseudo labels to guide the feature learning to avoid the dilemma of determining cluster numbers. In this article, we propose a meta pairwise relationship distillation (MPRD) method that incorporates a graph convolutional network (GCN) to provide high-fidelity pairwise relationships to supervise the model training. A small amount of metadata with very-confidence pairwise relationships and the unlabeled pairs with the provided pseudo pairwise relationships participate in the GCN training. Besides, we introduce a hard sample deduction (HSD) module to timely mine the sample pairs with error-prone pairwise pseudo labels to mitigate the misled optimization by noisy labels. Furthermore, since the features of each positive pair represent the same person, we design a positive pair alignment (PPA) module to reduce the redundant information in each feature, which is achieved by minimizing the difference between each positive pair's feature distributions. Extensive experiments on the Market-1501, DukeMTMC-reID, and MSMT17 datasets show that our method outperforms the state-of-the-art unsupervised methods.

2.
Med Phys ; 51(8): 5441-5456, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38648676

RESUMEN

BACKGROUND: Liver lesions mainly occur inside the liver parenchyma, which are difficult to locate and have complicated relationships with essential vessels. Thus, preoperative planning is crucial for the resection of liver lesions. Accurate segmentation of the hepatic and portal veins (PVs) on computed tomography (CT) images is of great importance for preoperative planning. However, manually labeling the mask of vessels is laborious and time-consuming, and the labeling results of different clinicians are prone to inconsistencies. Hence, developing an automatic segmentation algorithm for hepatic and PVs on CT images has attracted the attention of researchers. Unfortunately, existing deep learning based automatic segmentation methods are prone to misclassifying peripheral vessels into wrong categories. PURPOSE: This study aims to provide a fully automatic and robust semantic segmentation algorithm for hepatic and PVs, guiding subsequent preoperative planning. In addition, to address the deficiency of the public dataset for hepatic and PV segmentation, we revise the annotations of the Medical Segmentation Decathlon (MSD) hepatic vessel segmentation dataset and add the masks of the hepatic veins (HVs) and PVs. METHODS: We proposed a structure with a dual-stream encoder combining convolution and Transformer block, named Dual-stream Hepatic Portal Vein segmentation Network, to extract local features and long-distance spatial information, thereby extracting anatomical information of hepatic and portal vein, avoiding misdivisions of adjacent peripheral vessels. Besides, a multi-scale feature fusion block based on dilated convolution is proposed to extract multi-scale features on expanded perception fields for local features, and a multi-level fusing attention module is introduced for efficient context information extraction. Paired t-test is conducted to evaluate the significant difference in dice between the proposed methods and the comparing methods. RESULTS: Two datasets are constructed from the original MSD dataset. For each dataset, 50 cases are randomly selected for model evaluation in the scheme of 5-fold cross-validation. The results show that our method outperforms the state-of-the-art Convolutional Neural Network-based and transformer-based methods. Specifically, for the first dataset, our model reaches 0.815, 0.830, and 0.807 at overall dice, precision, and sensitivity. The dice of the hepatic and PVs are 0.835 and 0.796, which also exceed the numeric result of the comparing methods. Almost all the p-values of paired t-tests on the proposed approach and comparing approaches are smaller than 0.05. On the second dataset, the proposed algorithm achieves 0.749, 0.762, 0.726, 0.835, and 0.796 for overall dice, precision, sensitivity, dice for HV, and dice for PV, among which the first four numeric results exceed comparing methods. CONCLUSIONS: The proposed method is effective in solving the problem of misclassifying interlaced peripheral veins for the HV and PV segmentation task and outperforming the comparing methods on the relabeled dataset.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Vena Porta , Tomografía Computarizada por Rayos X , Vena Porta/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Humanos , Venas Hepáticas/diagnóstico por imagen , Aprendizaje Profundo , Hígado/diagnóstico por imagen , Hígado/irrigación sanguínea
3.
Nat Comput Sci ; 4(3): 210-223, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38467870

RESUMEN

Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep learning functional model. We build the essential non-locality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy to Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those seen in training, which unleashes the appealing scaling of OFDFT for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.

4.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5306-5324, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38349823

RESUMEN

Deep Neural Network classifiers are vulnerable to adversarial attacks, where an imperceptible perturbation could result in misclassification. However, the vulnerability of DNN-based image ranking systems remains under-explored. In this paper, we propose two attacks against deep ranking systems, i.e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations. Specifically, the expected ranking order is first represented as a set of inequalities. Then a triplet-like objective function is designed to obtain the optimal perturbation. Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks, where the model learns to prevent the adversarial attack from pulling the positive and negative samples close to each other. To comprehensively measure the empirical adversarial robustness of a ranking model with our defense, we propose an empirical robustness score, which involves a set of representative attacks against ranking models. Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196, and Stanford Online Products datasets. Experimental results demonstrate that our attacks can effectively compromise a typical deep ranking system. Nevertheless, our defense can significantly improve the ranking system's robustness and simultaneously mitigate a wide range of attacks.

5.
Nat Commun ; 15(1): 313, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38182565

RESUMEN

Geometric deep learning has been revolutionizing the molecular modeling field. Despite the state-of-the-art neural network models are approaching ab initio accuracy for molecular property prediction, their applications, such as drug discovery and molecular dynamics (MD) simulation, have been hindered by insufficient utilization of geometric information and high computational costs. Here we propose an equivariant geometry-enhanced graph neural network called ViSNet, which elegantly extracts geometric features and efficiently models molecular structures with low computational costs. Our proposed ViSNet outperforms state-of-the-art approaches on multiple MD benchmarks, including MD17, revised MD17 and MD22, and achieves excellent chemical property prediction on QM9 and Molecule3D datasets. Furthermore, through a series of simulations and case studies, ViSNet can efficiently explore the conformational space and provide reasonable interpretability to map geometric representations to molecular structures.

6.
Med Phys ; 51(3): 1775-1797, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37681965

RESUMEN

BACKGROUND: Atherosclerotic cardiovascular disease is the leading cause of death worldwide. Early detection of carotid atherosclerosis can prevent the progression of cardiovascular disease. Many (semi-) automatic methods have been designed for the segmentation of carotid vessel wall and the diagnosis of carotid atherosclerosis (i.e., the lumen segmentation, the outer wall segmentation, and the carotid atherosclerosis diagnosis) on black blood magnetic resonance imaging (BB-MRI). However, most of these methods ignore the intrinsic correlation among different tasks on BB-MRI, leading to limited performance. PURPOSE: Thus, we model the intrinsic correlation among the lumen segmentation, the outer wall segmentation, and the carotid atherosclerosis diagnosis tasks on BB-MRI by using the multi-task learning technique and propose a gated multi-task network (GMT-Net) to perform three related tasks in a neural network (i.e., carotid artery lumen segmentation, outer wall segmentation, and carotid atherosclerosis diagnosis). METHODS: In the proposed method, the GMT-Net is composed of three modules, including the sharing module, the segmentation module, and the diagnosis module, which interact with each other to achieve better learning performance. At the same time, two new adaptive layers, namely, the gated exchange layer and the gated fusion layer, are presented to exchange and merge branch features. RESULTS: The proposed method is applied to the CAREII dataset (i.e., 1057 scans) for the lumen segmentation, the outer wall segmentation, and the carotid atherosclerosis diagnosis. The proposed method can achieve promising segmentation performances (0.9677 Dice for the lumen and 0.9669 Dice for the outer wall) and better diagnosis accuracy of carotid atherosclerosis (0.9516 AUC and 0.9024 Accuracy) in the "CAREII test" dataset (i.e., 106 scans). The results show that the proposed method has statistically significant accuracy and efficiency. CONCLUSIONS: Even without the intervention of reviewers required for the previous works, the proposed method automatically segments the lumen and outer wall together and diagnoses carotid atherosclerosis with high performance. The proposed method can be used in clinical trials to help radiologists get rid of tedious reading tasks, such as screening review to separate normal carotid arteries from atherosclerotic arteries and to outline vessel wall contours.


Asunto(s)
Enfermedades Cardiovasculares , Enfermedades de las Arterias Carótidas , Humanos , Enfermedades Cardiovasculares/patología , Arterias Carótidas/diagnóstico por imagen , Arterias Carótidas/patología , Enfermedades de las Arterias Carótidas/diagnóstico por imagen , Enfermedades de las Arterias Carótidas/patología , Angiografía por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/métodos
7.
Int J Neural Syst ; 34(1): 2450002, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38084473

RESUMEN

Functional MRI (fMRI) is a brain signal with high spatial resolution, and visual cognitive processes and semantic information in the brain can be represented and obtained through fMRI. In this paper, we design single-graphic and matched/unmatched double-graphic visual stimulus experiments and collect 12 subjects' fMRI data to explore the brain's visual perception processes. In the double-graphic stimulus experiment, we focus on the high-level semantic information as "matching", and remove tail-to-tail conjunction by designing a model to screen the matching-related voxels. Then, we perform Bayesian causal learning between fMRI voxels based on the transfer entropy, establish a hierarchical Bayesian causal network (HBcausalNet) of the visual cortex, and use the model for visual stimulus image reconstruction. HBcausalNet achieves an average accuracy of 70.57% and 53.70% in single- and double-graphic stimulus image reconstruction tasks, respectively, higher than HcorrNet and HcasaulNet. The results show that the matching-related voxel screening and causality analysis method in this paper can extract the "matching" information in fMRI, obtain a direct causal relationship between matching information and fMRI, and explore the causal inference process in the brain. It suggests that our model can effectively extract high-level semantic information in brain signals and model effective connections and visual perception processes in the visual cortex of the brain.


Asunto(s)
Mapeo Encefálico , Corteza Visual , Humanos , Mapeo Encefálico/métodos , Teorema de Bayes , Semántica , Encéfalo , Imagen por Resonancia Magnética/métodos , Corteza Visual/diagnóstico por imagen
8.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3753-3771, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38145531

RESUMEN

Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This article investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.

9.
Artículo en Inglés | MEDLINE | ID: mdl-38083165

RESUMEN

Vessel centerline extraction is essential for carotid stenosis assessment and atherosclerotic plaque identification in clinical diagnosis. Simultaneously, it provides a region of interest identification and boundary initialization for computer-assisted diagnosis tools. In magnetic resonance imaging (MRI) cross-sectional images, the lumen shape and vascular topology result in a challenging task to extract the centerline accurately. To this end, we propose a space-refine framework, which exploits the positional continuity of the carotid artery from frame to frame to extract the carotid artery centerline. The proposed framework consists of a detector and a refinement module. Specifically, the detector roughly extracts the carotid lumen region from the original image. Then, we introduce a refinement module that uses the cascade of regressors from a detector to perform sequence realignment of lumen bounding boxes for each subject. It improves the lumen localization results and further enhances the centerline extraction accuracy. Verified by large carotid artery data, the proposed framework achieves state-of-the-art performance compared to conventional vessel centerline extraction methods or standard convolutional neural network approaches.Clinical relevance- Our proposed framework can be used as an important aid for physicians to quantitatively analyze the carotid artery in clinical practice. It is also used as a new paradigm for extracting the centerline of carotid vessels in computer-assisted tools.


Asunto(s)
Arterias Carótidas , Placa Aterosclerótica , Humanos , Arterias Carótidas/diagnóstico por imagen , Redes Neurales de la Computación , Arteria Carótida Común , Placa Aterosclerótica/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos
10.
Artículo en Inglés | MEDLINE | ID: mdl-38083432

RESUMEN

Lymphomas are a group of malignant tumors developed from lymphocytes, which may occur in many organs. Therefore, accurately distinguishing lymphoma from solid tumors is of great clinical significance. Due to the strong ability of graph structure to capture the topology of the micro-environment of cells, graph convolutional networks (GCNs) have been widely used in pathological image processing. Nevertheless, the softmax classification layer of the graph convolutional models cannot drive learned representations compact enough to distinguish some types of lymphomas and solid tumors with strong morphological analogies on H&E-stained images. To alleviate this problem, a prototype learning based model is proposed, namely graph convolutional prototype network (GCPNet). Specifically, the method follows the patch-to-slide architecture first to perform patch-level classification and obtain image-level results by fusing patch-level predictions. The classification model is assembled with a graph convolutional feature extractor and prototype-based classification layer to build more robust feature representations for classification. For model training, a dynamic prototype loss is proposed to give the model different optimization priorities at different stages of training. Besides, a prototype reassignment operation is designed to prevent the model from getting stuck in local minima during optimization. Experiments are conducted on a dataset of 183 Whole slide images (WSI) of gastric mucosa biopsy. The proposed method achieved superior performance than existing methods.Clinical relevance- The work proposed a new deep learning framework tailored to lymphoma recognition on pathological image of gastric mucosal biopsy to differentiate lymphoma, adenocarcinoma and inflammation.


Asunto(s)
Linfoma , Estómago , Humanos , Biopsia , Mucosa Gástrica , Gastroscopía , Linfoma/diagnóstico , Microambiente Tumoral
11.
Front Plant Sci ; 14: 1282212, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38046604

RESUMEN

Accurate recognition of pest categories is crucial for effective pest control. Due to issues such as the large variation in pest appearance, low data quality, and complex real-world environments, pest recognition poses challenges in practical applications. At present, many models have made great efforts on the real scene dataset IP102, but the highest recognition accuracy is only 75%. To improve pest recognition in practice, this paper proposes a multi-image fusion recognition method. Considering that farmers have easy access to data, the method performs fusion recognition on multiple images of the same pest instead of the conventional single image. Specifically, the method first uses convolutional neural network (CNN) to extract feature maps from these images. Then, an effective feature localization module (EFLM) captures the feature maps outputted by all blocks of the last convolutional stage of the CNN, marks the regions with large activation values as pest locations, and then integrates and crops them to obtain the localized features. Next, the adaptive filtering fusion module (AFFM) learns gate masks and selection masks for these features to eliminate interference from useless information, and uses the attention mechanism to select beneficial features for fusion. Finally, the classifier categorizes the fused features and the soft voting (SV) module integrates these results to obtain the final pest category. The principle of the model is activation value localization, feature filtering and fusion, and voting integration. The experimental results indicate that the proposed method can train high-performance feature extractors and classifiers, achieving recognition accuracy of 73.9%, 99.8%, and 99.7% on IP102, D0, and ETP, respectively, surpassing most single models. The results also show that thanks to the positive role of each module, the accuracy of multi-image fusion recognition reaches the state-of-the-art level of 96.1%, 100%, and 100% on IP102, D0, and ETP using 5, 2, and 2 images, respectively, which meets the requirements of practical applications. Additionally, we have developed a web application that applies our research findings in practice to assist farmers in reliable pest identification and drive the advancement of smart agriculture.

12.
Artículo en Inglés | MEDLINE | ID: mdl-38113156

RESUMEN

Point cloud-based 3-D object detection is a significant and critical issue in numerous applications. While most existing methods attempt to capitalize on the geometric characteristics of point clouds, they neglect the internal semantic properties of point and the consistency between the semantic and geometric clues. We introduce a semantic consistency (SC) mechanism for 3-D object detection in this article, by reasoning about the semantic relations between 3-D object boxes and its internal points. This mechanism is based on a natural principle: the semantic category of a 3-D bounding box should be consistent with the categories of all points within the box. Driven by the SC mechanism, we propose a novel SC network (SCNet) to detect 3-D objects from point clouds. Specifically, the SCNet is composed of a feature extraction module, a detection decision module, and a semantic segmentation module. In inference, the feature extraction and the detection decision modules are used to detect 3-D objects. In training, the semantic segmentation module is jointly trained with the other two modules to produce more robust and applicable model parameters. The performance is greatly boosted through reasoning about the relations between the output 3-D object boxes and segmented points. The proposed SC mechanism is model-agnostic and can be integrated into other base 3-D object detection models. We test the proposed model on three challenging indoor and outdoor benchmark datasets: ScanNetV2, SUN RGB-D, and KITTI. Furthermore, to validate the universality of the SC mechanism, we implement it in three different 3-D object detectors. The experiments show that the performance is impressively improved and the extensive ablation studies also demonstrate the effectiveness of the proposed model.

13.
J Neurosci Methods ; 399: 109980, 2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37783351

RESUMEN

BACKGROUND: The brain aggregates meaningless local sensory elements to form meaningful global patterns in a process called perceptual grouping. Current brain imaging studies have found that neural activities in V1 are modulated during visual grouping. However, how grouping is represented in each of the early visual areas, and how attention alters these representations, is still unknown. NEW METHOD: We adopted MVPA to decode the specific content of perceptual grouping by comparing neural activity patterns between gratings and dot lattice stimuli which can be grouped with proximity law. Furthermore, we quantified the grouping effect by defining the strength of grouping, and assessed the effect of attention on grouping. RESULTS: We found that activity patterns to proximity grouped stimuli in early visual areas resemble these to grating stimuli with the same orientations. This similarity exists even when there is no attention focused on the stimuli. The results also showed a progressive increase of representational strength of grouping from V1 to V3, and attention modulation to grouping is only significant in V3 among all the visual areas. COMPARISON WITH EXISTING METHODS: Most of the previous work on perceptual grouping has focused on how activity amplitudes are modulated by grouping. Using MVPA, the present work successfully decoded the contents of neural activity patterns corresponding to proximity grouping stimuli, thus shed light on the availability of content-decoding approach in the research on perceptual grouping. CONCLUSIONS: Our work found that the content of the neural activity patterns during perceptual grouping can be decoded in the early visual areas under both attended and unattended task, and provide novel evidence that there is a cascade processing for proximity grouping through V1 to V3. The strength of grouping was larger in V3 than in any other visual areas, and the attention modulation to the strength of grouping was only significant in V3 among all the visual areas, implying that V3 plays an important role in proximity grouping.


Asunto(s)
Atención , Corteza Visual , Humanos , Encéfalo , Mapeo Encefálico , Estimulación Luminosa , Percepción Visual
14.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12022-12037, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37819807

RESUMEN

Inferring the unseen attribute-object composition is critical to make machines learn to decompose and compose complex concepts like people. Most existing methods are limited to the composition recognition of single-attribute-object, and can hardly learn relations between the attributes and objects. In this paper, we propose an attribute-object semantic association graph model to learn the complex relations and enable knowledge transfer between primitives. With nodes representing attributes and objects, the graph can be constructed flexibly, which realizes both single- and multi-attribute-object composition recognition. In order to reduce mis-classifications of similar compositions (e.g., scratched screen and broken screen), driven by the contrastive loss, the anchor image feature is pulled closer to the corresponding label feature and pushed away from other negative label features. Specifically, a novel balance loss is proposed to alleviate the domain bias, where a model prefers to predict seen compositions. In addition, we build a large-scale Multi-Attribute Dataset (MAD) with 116,099 images and 8,030 label categories for inferring unseen multi-attribute-object compositions. Along with MAD, we propose two novel metrics Hard and Soft to give a comprehensive evaluation in the multi-attribute setting. Experiments on MAD and two other single-attribute-object benchmarks (MIT-States and UT-Zappos50K) demonstrate the effectiveness of our approach.

15.
Front Neurosci ; 17: 1247315, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37746136

RESUMEN

This paper investigates the selection of voxels for functional Magnetic Resonance Imaging (fMRI) brain data. We aim to identify a comprehensive set of discriminative voxels associated with human learning when exposed to a neutral visual stimulus that predicts an aversive outcome. However, due to the nature of the unconditioned stimuli (typically a noxious stimulus), it is challenging to obtain sufficient sample sizes for psychological experiments, given the tolerability of the subjects and ethical considerations. We propose a stable hierarchical voting (SHV) mechanism based on stability selection to address this challenge. This mechanism enables us to evaluate the quality of spatial random sampling and minimizes the risk of false and missed detections. We assess the performance of the proposed algorithm using simulated and publicly available datasets. The experiments demonstrate that the regularization strategy choice significantly affects the results' interpretability. When applying our algorithm to our collected fMRI dataset, it successfully identifies sparse and closely related patterns across subjects and displays stable weight maps for three experimental phases under the fear conditioning paradigm. These findings strongly support the causal role of aversive conditioning in altering visual-cortical activity.

16.
Neural Netw ; 168: 171-179, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37757725

RESUMEN

Contrastive learning methods aim to learn shared representations by minimizing distances between positive pairs, and maximizing distances between negative pairs in the embedding space. To achieve better performance of contrastive learning, one of the key problems is to design appropriate sample pairs. In most previous works, random cropping on the input image is utilized to obtain two views as positive pairs. However, such strategies lead to suboptimal performance since the sampled crops may have inconsistent semantic information, which consequently degrades the quality of contrastive views. To address this limitation, we explore to replenish sample views with better consistency of the image and propose a novel self-supervised learning (SSL) framework RepCo. Instead of searching for semantically consistent patches between two different views, we select patches on the same image as the replenishment of positive/negative pairs, encourage patches that are similar but come from different positions as positive pairs, and force patches that are dissimilar but come from adjacent positions to have different representations, i.e. construct negative pairs to enrich the learned representations. Our method effectively generates high-quality contrastive views, explores the untapped semantic consistency on images, and provides more informative representations for downstream tasks. Experiments on adequate downstream tasks have shown that, our approach achieves +2.1 AP50 (COCO pre-trained) and +1.6 AP50 (ImageNet pre-trained) gains on Pascal VOC object detection, +2.3 mIoU gains on Cityscapes semantic segmentation, respectively.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Semántica
17.
Int J Neural Syst ; 33(7): 2350035, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37314000

RESUMEN

Zero-shot detection (ZSD) aims to locate and classify unseen objects in pictures or videos by semantic auxiliary information without additional training examples. Most of the existing ZSD methods are based on two-stage models, which achieve the detection of unseen classes by aligning object region proposals with semantic embeddings. However, these methods have several limitations, including poor region proposals for unseen classes, lack of consideration of semantic representations of unseen classes or their inter-class correlations, and domain bias towards seen classes, which can degrade overall performance. To address these issues, the Trans-ZSD framework is proposed, which is a transformer-based multi-scale contextual detection framework that explicitly exploits inter-class correlations between seen and unseen classes and optimizes feature distribution to learn discriminative features. Trans-ZSD is a single-stage approach that skips proposal generation and performs detection directly, allowing the encoding of long-term dependencies at multiple scales to learn contextual features while requiring fewer inductive biases. Trans-ZSD also introduces a foreground-background separation branch to alleviate the confusion of unseen classes and backgrounds, contrastive learning to learn inter-class uniqueness and reduce misclassification between similar classes, and explicit inter-class commonality learning to facilitate generalization between related classes. Trans-ZSD addresses the domain bias problem in end-to-end generalized zero-shot detection (GZSD) models by using balance loss to maximize response consistency between seen and unseen predictions, ensuring that the model does not bias towards seen classes. The Trans-ZSD framework is evaluated on the PASCAL VOC and MS COCO datasets, demonstrating significant improvements over existing ZSD models.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación
18.
J Biophotonics ; 16(9): e202300059, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37289201

RESUMEN

Automated analysis of the vessel structure in intravascular optical coherence tomography (IVOCT) images is critical to assess the health status of vessels and monitor coronary artery disease progression. However, deep learning-based methods usually require well-annotated large datasets, which are difficult to obtain in the field of medical image analysis. Hence, an automatic layers segmentation method based on meta-learning was proposed, which can simultaneously extract the surfaces of the lumen, intima, media, and adventitia using a handful of annotated samples. Specifically, we leverage a bi-level gradient strategy to train a meta-learner for capturing the shared meta-knowledge among different anatomical layers and quickly adapting to unknown anatomical layers. Then, a Claw-type network and a contrast consistency loss were designed to better learn the meta-knowledge according to the characteristic of annotation of the lumen and anatomical layers. Experimental results on the two cardiovascular IVOCT datasets show that the proposed method achieved state-of-art performance.


Asunto(s)
Enfermedad de la Arteria Coronaria , Tomografía de Coherencia Óptica , Humanos , Tomografía de Coherencia Óptica/métodos , Pulmón
19.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12550-12561, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37159310

RESUMEN

Trajectory forecasting for traffic participants (e.g., vehicles) is critical for autonomous platforms to make safe plans. Currently, most trajectory forecasting methods assume that object trajectories have been extracted and directly develop trajectory predictors based on the ground truth trajectories. However, this assumption does not hold in practical situations. Trajectories obtained from object detection and tracking are inevitably noisy, which could cause serious forecasting errors to predictors built on ground truth trajectories. In this paper, we propose to predict trajectories directly based on detection results without relying on explicitly formed trajectories. Different from traditional methods which encode the motion cues of an agent based on its clearly defined trajectory, we extract the motion information only based on the affinity cues among detection results, in which an affinity-aware state update mechanism is designed to manage the state information. In addition, considering that there could be multiple plausible matching candidates, we aggregate the states of them. These designs take the uncertainty of association into account which relax the undesirable effect of noisy trajectory obtained from data association and improve the robustness of the predictor. Extensive experiments validate the effectiveness of our method and its generalization ability to different detectors or forecasting schemes.

20.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 11184-11202, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37074900

RESUMEN

Representing multimodal behaviors is a critical challenge for pedestrian trajectory prediction. Previous methods commonly represent this multimodality with multiple latent variables repeatedly sampled from a latent space, encountering difficulties in interpretable trajectory prediction. Moreover, the latent space is usually built by encoding global interaction into future trajectory, which inevitably introduces superfluous interactions and thus leads to performance reduction. To tackle these issues, we propose a novel Interpretable Multimodality Predictor (IMP) for pedestrian trajectory prediction, whose core is to represent a specific mode by its mean location. We model the distribution of mean location as a Gaussian Mixture Model (GMM) conditioned on sparse spatio-temporal features, and sample multiple mean locations from the decoupled components of GMM to encourage multimodality. Our IMP brings four-fold benefits: 1) Interpretable prediction to provide semantics about the motion behavior of a specific mode; 2) Friendly visualization to present multimodal behaviors; 3) Well theoretical feasibility to estimate the distribution of mean locations supported by the central-limit theorem; 4) Effective sparse spatio-temporal features to reduce superfluous interactions and model temporal continuity of interaction. Extensive experiments validate that our IMP not only outperforms state-of-the-art methods but also can achieve a controllable prediction by customizing the corresponding mean location.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...