Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
1.
Sensors (Basel) ; 17(2)2017 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-28208684

RESUMO

This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.

2.
IEEE Trans Vis Comput Graph ; 30(5): 2347-2356, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38437096

RESUMO

Projector video compensation aims to cancel the geometric and photometric distortions caused by non-ideal projection surfaces and environments when projecting videos. Most existing projector compensation methods start by projecting and capturing a set of sampling images, followed by an offline compensation model training step. Thus, abundant user effort is required before the users can watch the video. Moreover, the sampling images have little prior knowledge of the video content and may lead to suboptimal results. To address these issues, this paper builds a video compensation system that can online adapt the compensation parameters. Our approach consists of five threads and can perform compensation, projection, capturing, and short-term and long-term model updates in parallel. Due to the parallel mechanism, rather than projecting and capturing hundreds of sampling images and training the model offline, we can directly use the projected and captured video frames for model updates on the fly. To quickly apply to the new environment, we introduce a deep learning-based compensation model that integrates a fixed transformer-based method and a novel CNN-based network. Moreover, for fast convergence and to reduce error accumulation during fine-tuning, we present a strategy that cooperates with short-term and long-term memory model updates. Experiments show that it significantly outperforms state-of-the-art baselines.

3.
IEEE Trans Image Process ; 33: 3369-3384, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38801686

RESUMO

Recently studies have shown the potential of weakly supervised multi-object tracking and segmentation, but the drawbacks of coarse pseudo mask label and limited utilization of temporal information remain to be unresolved. To address these issues, we present a framework that directly uses box label to supervise the segmentation network without resorting to pseudo mask label. In addition, we propose to fully exploit the temporal information from two perspectives. Firstly, we integrate optical flow-based pairwise consistency to ensure mask consistency across frames, thereby improving mask quality for segmentation. Secondly, we propose a temporally adjacent pair-based sampling strategy to adapt instance embedding learning for data association in tracking. We combine these techniques into an end-to-end deep model, named BoxMOTS, which requires only box annotation without mask supervision. Extensive experiments demonstrate that our model surpasses current state-of-the-art by a large margin, and produces promising results on KITTI MOTS and BDD100K MOTS. The source code is available at https://github.com/Spritea/BoxMOTS.

4.
IEEE Trans Image Process ; 33: 3508-3519, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38809733

RESUMO

Domain Generalization (DG) aims to learn a generalizable model on the unseen target domain by only training on the multiple observed source domains. Although a variety of DG methods have focused on extracting domain-invariant features, the domain-specific class-relevant features have attracted attention and been argued to benefit generalization to the unseen target domain. To take into account the class-relevant domain-specific information, in this paper we propose an Information theory iNspired diSentanglement and pURification modEl (INSURE) to explicitly disentangle the latent features to obtain sufficient and compact (necessary) class-relevant feature for generalization to the unseen domain. Specifically, we first propose an information theory inspired loss function to ensure the disentangled class-relevant features contain sufficient class label information and the other disentangled auxiliary feature has sufficient domain information. We further propose a paired purification loss function to let the auxiliary feature discard all the class-relevant information and thus the class-relevant feature will contain sufficient and compact (necessary) class-relevant information. Moreover, instead of using multiple encoders, we propose to use a learnable binary mask as our disentangler to make the disentanglement more efficient and make the disentangled features complementary to each other. We conduct extensive experiments on five widely used DG benchmark datasets including PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet. The proposed INSURE achieves state-of-the-art performance. We also empirically show that domain-specific class-relevant features are beneficial for domain generalization. The code is available at https://github.com/yuxi120407/INSURE.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38833398

RESUMO

Multimodal vision-language (VL) learning has noticeably pushed the tendency toward generic intelligence owing to emerging large foundation models. However, tracking, as a fundamental vision problem, surprisingly enjoys less bonus from recent flourishing VL learning. We argue that the reasons are two-fold: the lack of large-scale vision-language annotated videos and ineffective visionlanguage interaction learning of current works. These nuisances motivate us to design more effective vision-language representation for tracking, meanwhile constructing a large database with language annotation for model learning. Particularly, in this paper, we first propose a general attribute annotation strategy to decorate videos in six popular tracking benchmarks, which contributes a largescale vision-language tracking database with more than 23,000 videos. We then introduce a novel framework to improve tracking by learning a unified-adaptive VL representation, where the cores are the proposed asymmetric architecture search and modality mixer (ModaMixer). To further improve VL representation, we introduce a contrastive loss to align different modalities. To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i.e., the CNNbased SiamCAR [1], the Transformer-based OSTrack [2], and the hybrid structure TransT [3]. The experiments demonstrate that our framework can significantly improve all baselines on six benchmarks. Besides empirical results, we theoretically analyze our approach to show its rationality. By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking with diversified multimodal messages.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38885110

RESUMO

Deep learning-based solutions have achieved impressive performance in semantic segmentation but often require large amounts of training data with fine-grained annotations. To alleviate such requisition, a variety of weakly supervised annotation strategies have been proposed, among which scribble supervision is emerging as a popular one due to its user-friendly annotation way. However, the sparsity and diversity of scribble annotations make it nontrivial to train a network to produce deterministic and consistent predictions directly. To address these issues, in this paper we propose holistic solutions involving the design of network structure, loss and training procedure, named CC4S to improve Certainty and Consistency for Scribble-Supervised Semantic Segmentation. Specifically, to reduce uncertainty, CC4S embeds a random walk module into the network structure to make neural representations uniformly distributed within similar semantic regions, which works together with a soft entropy loss function to force the network to produce deterministic predictions. To encourage consistency, CC4S adopts self-supervision training and imposes the consistency loss on the eigenspace of the probability transition matrix in the random walk module (we named neural eigenspace). Such self-supervision inherits the category-level discriminability from the neural eigenspace and meanwhile helps the network focus on producing consistent predictions for the salient parts and neglect semantically heterogeneous backgrounds. Finally, to further improve the performance, CC4S uses the network predictions as pseudo-labels and retrains the network with an extra color constraint regularizer on pseudo-labels to boost semantic consistency in color space. Rich experiments demonstrate the excellent performance of CC4S. In particular, under scribble supervision, CC4S achieves comparable performance to those from fully supervised methods. Comprehensive ablation experiments verify the effectiveness of the design choices in CC4S and its robustness under extreme supervision cases, i.e., when scribbles are shrunk proportionally or dropped randomly. The code for this work has been open-sourced at https://github.com/panzhiyi/CC4S.

7.
Artigo em Inglês | MEDLINE | ID: mdl-38889031

RESUMO

Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.

8.
Artigo em Inglês | MEDLINE | ID: mdl-38502624

RESUMO

Many complex social, biological, or physical systems are characterized as networks, and recovering the missing links of a network could shed important lights on its structure and dynamics. A good topological representation is crucial to accurate link modeling and prediction, yet how to account for the kaleidoscopic changes in link formation patterns remains a challenge, especially for analysis in cross-domain studies. We propose a new link representation scheme by projecting the local environment of a link into a "dipole plane", where neighboring nodes of the link are positioned via their relative proximity to the two anchors of the link, like a dipole. By doing this, complex and discrete topology arising from link formation is turned to differentiable point-cloud distribution, opening up new possibilities for topological feature-engineering with desired expressiveness, interpretability and generalization. Our approach has comparable or even superior results against state-of-the-art GNNs, meanwhile with a model up to hundreds of times smaller and running much faster. Furthermore, it provides a universal platform to systematically profile, study, and compare link-patterns from miscellaneous real-world networks. This allows building a global link-pattern atlas, based on which we have uncovered interesting common patterns of link formation, i.e., the bridge-style, the radiation-style, and the community-style across a wide collection of networks with highly different nature.

9.
Artigo em Inglês | MEDLINE | ID: mdl-37141078

RESUMO

Health care is entering a new era where data mining is applied to artificial intelligence. The number of dental implant systems has been increasing worldwide. Patient mobility from different dental offices can make identification of implants for clinicians extremely challenging if there are no past available records, and it would be advantageous to use a reliable tool to identify the various implant system designs in the same practice, as there is a great need for identifying the systems in the field of periodontology and restorative dentistry. However, there have not been any studies devoted to using artificial intelligence/convolutional neural networks to classify implant attributes. Thus, the present study used artificial intelligence to identify the attributes of radiographic images of implants. An average accuracy rate of over 95% was achieved with various machine learning networks to identify three implant manufacturers and their subtypes placed during the past 9 years.


Assuntos
Implantes Dentários , Humanos , Inteligência Artificial , Radiografia
10.
IEEE Trans Vis Comput Graph ; 29(4): 2102-2116, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34990364

RESUMO

In this paper, we present ARCHIE++, a testing framework for conducting AR system testing and collecting user feedback in the wild. Our system addresses challenges in AR testing practices by aggregating usability feedback data (collected in situ) with system performance data from that same time period. These data packets can then be leveraged to identify edge cases encountered by testers during unconstrained usage scenarios. We begin by presenting a set of current trends in performing human testing of AR systems, identified by reviewing a selection of recent work from leading conferences in mixed reality, human factors, and mobile and pervasive systems. From the trends, we identify a set of challenges to be faced when attempting to adopt these practices to testing in the wild. These challenges are used to inform the design of our framework, which provides a cloud-enabled and device-agnostic way for AR systems developers to improve their knowledge of environmental conditions and to support scalability and reproducibility when testing in the wild. We then present a series of case studies demonstrating how ARCHIE++ can be used to support a range of AR testing scenarios, and demonstrate the limited overhead of the framework through a series of evaluations. We close with additional discussion on the design and utility of ARCHIE++ under various edge conditions.

11.
IEEE Trans Cybern ; 53(1): 526-538, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35417367

RESUMO

Salient object detection (SOD) in optical remote sensing images (RSIs), or RSI-SOD, is an emerging topic in understanding optical RSIs. However, due to the difference between optical RSIs and natural scene images (NSIs), directly applying NSI-SOD methods to optical RSIs fails to achieve satisfactory results. In this article, we propose a novel adjacent context coordination network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for RSI-SOD. Specifically, ACCoNet consists of three parts: 1) an encoder; 2) adjacent context coordination modules (ACCoMs); and 3) a decoder. As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder. ACCoM contains a local branch and two adjacent branches to coordinate the multilevel features simultaneously. The local branch highlights the salient regions in an adaptive way, while the adjacent branches introduce global information of adjacent levels to enhance salient regions. In addition, to extend the capabilities of the classic decoder block (i.e., several cascaded convolutional layers), we extend it with two bifurcations and propose a bifurcation-aggregation block (BAB) to capture the contextual information in the decoder. Extensive experiments on two benchmark datasets demonstrate that the proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/ACCoNet.

12.
IEEE Trans Image Process ; 32: 5257-5269, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37721873

RESUMO

Existing methods for Salient Object Detection in Optical Remote Sensing Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the backbone, such as VGG and ResNet. Since CNNs can only extract features within certain receptive fields, most ORSI-SOD methods generally follow the local-to-contextual paradigm. In this paper, we propose a novel Global Extraction Local Exploration Network (GeleNet) for ORSI-SOD following the global-to-local paradigm. Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Then, GeleNet employs a Direction-aware Shuffle Weighted Spatial Attention Module (D-SWSAM) and its simplified version (SWSAM) to enhance local interactions, and a Knowledge Transfer Module (KTM) to further enhance cross-level contextual interactions. D-SWSAM comprehensively perceives the orientation information in the lowest-level features through directional convolutions to adapt to various orientations of salient objects in ORSIs, and effectively enhances the details of salient objects with an improved attention mechanism. SWSAM discards the direction-aware part of D-SWSAM to focus on localizing salient objects in the highest-level features. KTM models the contextual correlation knowledge of two middle-level features of different scales based on the self-attention mechanism, and transfers the knowledge to the raw features to generate more discriminative features. Finally, a saliency predictor is used to generate the saliency map based on the outputs of the above three modules. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods. The code and results of our method are available at https://github.com/MathLee/GeleNet.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 197-210, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35104213

RESUMO

Subspace clustering is a classical technique that has been widely used for human motion segmentation and other related tasks. However, existing segmentation methods often cluster data without guidance from prior knowledge, resulting in unsatisfactory segmentation results. To this end, we propose a novel Consistency and Diversity induced human Motion Segmentation (CDMS) algorithm. Specifically, our model factorizes the source and target data into distinct multi-layer feature spaces, in which transfer subspace learning is conducted on different layers to capture multi-level information. A multi-mutual consistency learning strategy is carried out to reduce the domain gap between the source and target data. In this way, the domain-specific knowledge and domain-invariant properties can be explored simultaneously. Besides, a novel constraint based on the Hilbert Schmidt Independence Criterion (HSIC) is introduced to ensure the diversity of multi-level subspace representations, which enables the complementarity of multi-level representations to be explored to boost the transfer learning performance. Moreover, to preserve the temporal correlations, an enhanced graph regularizer is imposed on the learned representation coefficients and the multi-level representations of the source data. The proposed model can be efficiently solved using the Alternating Direction Method of Multipliers (ADMM) algorithm. Extensive experimental results on public human motion datasets demonstrate the effectiveness of our method against several state-of-the-art approaches.


Assuntos
Algoritmos , Humanos , Análise por Conglomerados
14.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 211-228, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35196225

RESUMO

Differentiable ARchiTecture Search, i.e., DARTS, has drawn great attention in neural architecture search. It tries to find the optimal architecture in a shallow search network and then measures its performance in a deep evaluation network. The independent optimization of the search and evaluation networks, however, leaves a room for potential improvement by allowing interaction between the two networks. To address the problematic optimization issue, we propose new joint optimization objectives and a novel Cyclic Differentiable ARchiTecture Search framework, dubbed CDARTS. Considering the structure difference, CDARTS builds a cyclic feedback mechanism between the search and evaluation networks with introspective distillation. First, the search network generates an initial architecture for evaluation, and the weights of the evaluation network are optimized. Second, the architecture weights in the search network are further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks and thus enables the evolution of the architecture to fit the final evaluation network. The experiments and analysis on CIFAR, ImageNet and NATS-Bench [95] demonstrate the effectiveness of the proposed approach over the state-of-the-art ones. Specifically, in the DARTS search space, we achieve 97.52% top-1 accuracy on CIFAR10 and 76.3% top-1 accuracy on ImageNet. In the chain-structured search space, we achieve 78.2% top-1 accuracy on ImageNet, which is 1.1% higher than EfficientNet-B0. Our code and models are publicly available at https://github.com/microsoft/Cream.

15.
Commun Biol ; 6(1): 298, 2023 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-36944712

RESUMO

Cerebral blood flow (CBF) is widely used to assess brain function. However, most preclinical CBF studies have been performed under anesthesia, which confounds findings. High spatiotemporal-resolution CBF imaging of awake animals is challenging due to motion artifacts and background noise, particularly for Doppler-based flow imaging. Here, we report ultrahigh-resolution optical coherence Doppler tomography (µODT) for 3D imaging of CBF velocity (CBFv) dynamics in awake mice by developing self-supervised deep-learning for effective image denoising and motion-artifact removal. We compare cortical CBFv in awake vs. anesthetized mice and their dynamic responses in arteriolar, venular and capillary networks to acute cocaine (1 mg/kg, i.v.), a highly addictive drug associated with neurovascular toxicity. Compared with awake, isoflurane (2-2.5%) induces vasodilation and increases CBFv within 2-4 min, whereas dexmedetomidine (0.025 mg/kg, i.p.) does not change vessel diameters nor flow. Acute cocaine decreases CBFv to the same extent in dexmedetomidine and awake states, whereas decreases are larger under isoflurane, suggesting that isoflurane-induced vasodilation might have facilitated detection of cocaine-induced vasoconstriction. Awake mice after chronic cocaine show severe vasoconstriction, CBFv decreases and vascular adaptations with extended diving arteriolar/venular vessels that prioritize blood supply to deeper cortical capillaries. The 3D imaging platform we present provides a powerful tool to study dynamic changes in vessel diameters and morphology alongside CBFv networks in the brain of awake animals that can advance our understanding of the effects of drugs and disease conditions (ischemia, tumors, wound healing).


Assuntos
Cocaína , Dexmedetomidina , Isoflurano , Camundongos , Animais , Isoflurano/farmacologia , Imageamento Tridimensional/métodos , Vigília , Dexmedetomidina/farmacologia , Circulação Cerebrovascular/fisiologia , Tomografia de Coerência Óptica/métodos , Cocaína/farmacologia
16.
BMC Bioinformatics ; 13 Suppl 3: S1, 2012 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-22536893

RESUMO

BACKGROUND: The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute. RESULTS: Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction. CONCLUSIONS: By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.


Assuntos
Encéfalo/metabolismo , Transcriptoma/métodos , Algoritmos , Animais , Inteligência Artificial , Camundongos , Análise de Regressão
17.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 2953-2967, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33417538

RESUMO

Full projector compensation aims to modify a projector input image to compensate for both geometric and photometric disturbance of the projection surface. Traditional methods usually solve the two parts separately and may suffer from suboptimal solutions. In this paper, we propose the first end-to-end differentiable solution, named CompenNeSt++, to solve the two problems jointly. First, we propose a novel geometric correction subnet, named WarpingNet, which is designed with a cascaded coarse-to-fine structure to learn the sampling grid directly from sampling images. Second, we propose a novel photometric compensation subnet, named CompenNeSt, which is designed with a siamese architecture to capture the photometric interactions between the projection surface and the projected images, and to use such information to compensate the geometrically corrected images. By concatenating WarpingNet with CompenNeSt, CompenNeSt++ accomplishes full projector compensation and is end-to-end trainable. Third, to improve practicability, we propose a novel synthetic data-based pre-training strategy to significantly reduce the number of training images and training time. Moreover, we construct the first setup-independent full compensation benchmark to facilitate future studies. In thorough experiments, our method shows clear advantages over prior art with promising compensation quality and meanwhile being practically convenient.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8212-8229, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34473624

RESUMO

Geometric model fitting is a fundamental task in computer vision, which serves as the pre-requisite of many downstream applications. While the problem has a simple intrinsic structure where the solution can be parameterized within a few degrees of freedom, the ubiquitously existing outliers are the main challenge. In previous studies, random sampling techniques have been established as the practical choice, since optimization-based methods are usually too time-demanding. This prospective study is intended to design efficient algorithms that benefit from a general optimization-based view. In particular, two important types of loss functions are discussed, i.e., truncated and l1 losses, and efficient solvers have been derived for both upon specific approximations. Based on this philosophy, a class of algorithms are introduced to perform deterministic search for the inliers or geometric model. Recommendations are made based on theoretical and experimental analyses. Compared with the existing solutions, the proposed methods are both simple in computation and robust to outliers. Extensive experiments are conducted on publicly available datasets for geometric estimation, which demonstrate the superiority of our methods compared with the state-of-the-art ones. Additionally, we apply our method to the recent benchmark for wide-baseline stereo evaluation, leading to a significant improvement of performance. Our code is publicly available at https://github.com/AoxiangFan/EifficientDeterministicSearch.

19.
Diagnostics (Basel) ; 12(10)2022 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-36292226

RESUMO

The aim of this study was to determine if a convolutional neural network (CNN) can be trained to automatically detect and localize cervical carotid artery calcifications (CACs) in CBCT. A total of 56 CBCT studies (15,257 axial slices) were utilized to train, validate, and test the deep learning model. The study comprised of two steps: Step 1: Localizing axial slices that are below the C2-C3 disc space. For this step the openly available Inception V3 architecture was trained on the ImageNet dataset of real-world images, and retrained on 40 CBCT studies. Step 2: Detecting CACs in slices from step 1. For this step, two methods were implemented; Method A: Segmentation neural network trained using small patches at random coordinates of the original axial slices; Method B: Segmentation neural network trained using two larger patches at fixed coordinates of the original axial slices with an improved loss function to account for class imbalance. Our approach resulted in 94.2% sensitivity and 96.5% specificity. The mean intersection over union metric for Method A was 76.26% and Method B improved this metric to 82.51%. The proposed CNN model shows the feasibility of deep learning in the detection and localization of CAC in CBCT images.

20.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 502-518, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32750838

RESUMO

This study proposes a novel unified and unsupervised end-to-end image fusion network, termed as U2Fusion, which is capable of solving different fusion problems, including multi-modal, multi-exposure, and multi-focus cases. Using feature extraction and information measurement, U2Fusion automatically estimates the importance of corresponding source images and comes up with adaptive information preservation degrees. Hence, different fusion tasks are unified in the same framework. Based on the adaptive degrees, a network is trained to preserve the adaptive similarity between the fusion result and source images. Therefore, the stumbling blocks in applying deep learning for image fusion, e.g., the requirement of ground-truth and specifically designed metrics, are greatly mitigated. By avoiding the loss of previous fusion capabilities when training a single model for different tasks sequentially, we obtain a unified model that is applicable to multiple fusion tasks. Moreover, a new aligned infrared and visible image dataset, RoadScene (available at https://github.com/hanna-xu/RoadScene), is released to provide a new option for benchmark evaluation. Qualitative and quantitative experimental results on three typical image fusion tasks validate the effectiveness and universality of U2Fusion. Our code is publicly available at https://github.com/hanna-xu/U2Fusion.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA