Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3679-3691, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38145534

RESUMO

The new generation of organic light emitting diode display is designed to enable the high dynamic range (HDR), going beyond the standard dynamic range (SDR) supported by the traditional display devices. However, a large quantity of videos are still of SDR format. Further, most pre-existing videos are compressed at varying degrees for minimizing the storage and traffic flow demands. To enable movie-going experience on new generation devices, converting the compressed SDR videos to the HDR format (i.e., compressed-SDR to HDR conversion) is in great demands. The key challenge with this new problem is how to solve the intrinsic many-to-many mapping issue. However, without constraining the solution space or simply imitating the inverse camera imaging pipeline in stages, existing SDR-to-HDR methods can not formulate the HDR video generation process explicitly. Besides, they ignore the fact that videos are often compressed. To address these challenges, in this work we propose a novel imaging knowledge-inspired parallel networks (termed as KPNet) for compressed-SDR to HDR (CSDR-to-HDR) video reconstruction. KPNet has two key designs: Knowledge-Inspired Block (KIB) and Information Fusion Module (IFM). Concretely, mathematically formulated using some priors with compressed videos, our conversion from a CSDR-to-HDR video reconstruction is conceptually divided into four synergistic parts: reducing compression artifacts, recovering missing details, adjusting imaging parameters, and reducing image noise. We approximate this process by a compact KIB. To capture richer details, we learn HDR representations with a set of KIBs connected in parallel and fused with the IFM. Extensive evaluations show that our KPNet achieves superior performance over the state-of-the-art methods.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12113-12132, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37167049

RESUMO

Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and Big Data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal Big Data era, (2) a systematic review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.

3.
Quant Imaging Med Surg ; 13(1): 384-393, 2023 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-36620160

RESUMO

Background: To quantify the association between the free distal segment length of the internal carotid artery (FDS-ICA) and permanent cranial nerve injury (p-CNI) following carotid body tumor (CBT) resection. Methods: This study is a case-control study. We surveyed 109 consecutive patients who underwent CBT resection between June 2015 and June 2020 at our single center. A total of 89 patients met the inclusion criteria and were selected for analysis. The FDS-ICA was measured by image post-processing software for computed tomography angiography (CTA). Postoperative p-CNI complications were evaluated using comprehensive statistical approaches. Results: The cohort was divided into 2 groups depending on the presence of p-CNI, namely the p-CNI group (n=17) and the non-CNI group (n=79). The average FDS-ICA of patients with p-CNI complications was shorter than that of those without p-CNI complications (P<0.001). For every 1 mm increase in FDS-ICA, there was an associated decrease of 8% in the risk of p-CNI (0.92, 95% CI: 0.85 to 0.98, P<0.05). Threshold effect analysis of the FDS-ICA on p-CNI identified that the FDS-ICA was 28.7 (95% CI: 23.8 to 30.9) mm. Conclusions: The results of this study revealed a significant independent association between FDS-ICA and permanent postoperative cranial nerve injury complications of CBTs. Further study is warranted to confirm these results in a larger patient cohort.

4.
Artigo em Inglês | MEDLINE | ID: mdl-36006881

RESUMO

State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime.Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. Recent progress in these paradigms has indicated the strong benefits of leveraging unlabeled data to improve model generalization and provide better model initialization. In this survey, we review the recent advanced deep learning algorithms on semi-supervised learning (SSL) and unsupervised learning (UL) for visual recognition from a unified perspective. To offer a holistic understanding of the state-of-the-art in these areas, we propose a unified taxonomy. We categorize existing representative SSL and UL with comprehensive and insightful analysis to highlight their design rationales in different learning scenarios and applications in different computer vision tasks. Lastly, we discuss the emerging trends and open challenges in SSL and UL to shed light on future critical research directions.

5.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6517-6533, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34106846

RESUMO

Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution diverges from the target one. Mainstream UDA methods strive to learn domain-aligned features such that classifiers trained on the source features can be readily applied to the target ones. Although impressive results have been achieved, these methods have a potential risk of damaging the intrinsic data structures of target discrimination, raising an issue of generalization particularly for UDA tasks in an inductive setting. To address this issue, we are motivated by a UDA assumption of structural similarity across domains, and propose to directly uncover the intrinsic target discrimination via constrained clustering, where we constrain the clustering solutions using structural source regularization that hinges on the very same assumption. Technically, we propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one, and we thus term our method as H-SRDC. Our hybrid model is based on a deep clustering framework that minimizes the Kullback-Leibler divergence between the distribution of network prediction and an auxiliary one, where we impose structural regularization by learning domain-shared classifier and cluster centroids. By enriching the structural similarity assumption, we are able to extend H-SRDC for a pixel-level UDA task of semantic segmentation. We conduct extensive experiments on seven UDA benchmarks of image classification and semantic segmentation. With no explicit feature alignment, our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings. We make our implementation codes publicly available at https://github.com/huitangtang/H-SRDC.


Assuntos
Algoritmos , Aprendizado de Máquina , Benchmarking , Análise por Conglomerados , Semântica
6.
IEEE Trans Image Process ; 30: 6829-6842, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34343090

RESUMO

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (e.g., local convolution kernel). While recent global aggregation methods are beneficial for long-range structure information modelling, they would oversmooth and bring noise to the regions contain fine details (e.g., boundaries and small objects), which are very much cared in the semantic segmentation task. To alleviate this problem, we propose to explore the local context for making the aggregated long-range relationship being distributed more accurately in local regions. In particular, we design a novel local distribution module which models the affinity map between global and local relationship for each pixel adaptively. Integrating existing global aggregation modules, we show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks, giving rise to the GALD networks. Despite its simplicity and versatility, our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff. Code and trained models are released at https://github.com/lxtGH/GALD-DGCNet to foster further research.

7.
IEEE Trans Pattern Anal Mach Intell ; 42(7): 1770-1782, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-30843803

RESUMO

Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in a practical re-id deployment, due to the lack of exhaustive identity labelling of positive and negative image pairs for every camera-pair. In this work, we present an unsupervised re-id deep learning approach. It is capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data end-to-end. We formulate an Unsupervised Tracklet Association Learning (UTAL) framework. This is by jointly learning within-camera tracklet discrimination and cross-camera tracklet association in order to maximise the discovery of tracklet identity matching both within and across camera views. Extensive experiments demonstrate the superiority of the proposed model over the state-of-the-art unsupervised learning and domain adaptation person re-id methods on eight benchmarking datasets.

8.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 392-408, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-28207383

RESUMO

The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion. While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learned are view-generic and thus potentially less effective in quantifying the feature distortion inherent to each camera view. Learning view-specific feature transformations for re-id (i.e., view-specific re-id), an under-studied approach, becomes an alternative resort for this problem. In this work, we formulate a novel view-specific person re-identification framework from the feature augmentation point of view, called Camera coR relation Aware Feature augmenTation (CRAFT). Specifically, CRAFT performs cross-view adaptation by automatically measuring camera correlation from cross-view visual data distribution and adaptively conducting feature augmentation to transform the original features into a new adaptive space. Through our augmentation framework, view-generic learning algorithms can be readily generalized to learn and optimize view-specific sub-models whilst simultaneously modelling view-generic discrimination information. Therefore, our framework not only inherits the strength of view-generic model learning but also provides an effective way to take into account view specific characteristics. Our CRAFT framework can be extended to jointly learn view-specific feature transformations for person re-id across a large network with more than two cameras, a largely under-investigated but realistic re-id setting. Additionally, we present a domain-generic deep person appearance representation which is designed particularly to be towards view invariant for facilitating cross-view adaptation by CRAFT. We conducted extensively comparative experiments to validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.

9.
IEEE Trans Image Process ; 27(5): 2286-2300, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-28816668

RESUMO

Existing person re-identification (re-id) methods typically assume that: 1) any probe person is guaranteed to appear in the gallery target population during deployment (i.e., closed-world) and 2) the probe set contains only a limited number of people (i.e., small search scale). Both assumptions are artificial and breached in real-world applications, since the probe population in target people search can be extremely vast in practice due to the ambiguity of probe search space boundary. Therefore, it is unrealistic that any probe person is assumed as one target people, and a large-scale search in person images is inherently demanded. In this paper, we introduce a new person re-id search setting, called large scale open-world (LSOW) re-id, characterized by huge size probe images and open person population in search thus more close to practical deployments. Under LSOW, the under-studied problem of person re-id efficiency is essential in addition to that of commonly studied re-id accuracy. We, therefore, develop a novel fast person re-id method, called Cross-view Identity Correlation and vErification (X-ICE) hashing, for joint learning of cross-view identity representation binarisation and discrimination in a unified manner. Extensive comparative experiments on three large-scale benchmarks have been conducted to validate the superiority and advantages of the proposed X-ICE method over a wide range of the state-of-the-art hashing models, person re-id methods, and their combinations.

10.
Int J Comput Vis ; 126(12): 1288-1310, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30930537

RESUMO

Most existing person re-identification (re-id) methods are unsuitable for real-world deployment due to two reasons: Unscalability to large population size, and Inadaptability over time. In this work, we present a unified solution to address both problems. Specifically, we propose to construct an identity regression space (IRS) based on embedding different training person identities (classes) and formulate re-id as a regression problem solved by identity regression in the IRS. The IRS approach is characterised by a closed-form solution with high learning efficiency and an inherent incremental learning capability with human-in-the-loop. Extensive experiments on four benchmarking datasets (VIPeR, CUHK01, CUHK03 and Market-1501) show that the IRS model not only outperforms state-of-the-art re-id methods, but also is more scalable to large re-id population size by rapidly updating model and actively selecting informative samples with reduced human labelling effort.

11.
IEEE Trans Pattern Anal Mach Intell ; 38(12): 2501-2514, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-26829777

RESUMO

Current person re-identification (ReID) methods typically rely on single-frame imagery features, whilst ignoring space-time information from image sequences often available in the practical surveillance scenarios. Single-frame (single-shot) based visual appearance matching is inherently limited for person ReID in public spaces due to the challenging visual ambiguity and uncertainty arising from non-overlapping camera views where viewing condition changes can cause significant people appearance variations. In this work, we present a novel model to automatically select the most discriminative video fragments from noisy/incomplete image sequences of people from which reliable space-time and appearance features can be computed, whilst simultaneously learning a video ranking function for person ReID. Using the PRID 2011, iLIDS-VID, and HDA+ image sequence datasets, we extensively conducted comparative evaluations to demonstrate the advantages of the proposed model over contemporary gait recognition, holistic image sequence matching and state-of-the-art single-/multi-shot ReID methods.


Assuntos
Algoritmos , Identificação Biométrica/métodos , Análise Discriminante , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Gravação em Vídeo/métodos , Humanos
12.
IEEE Trans Neural Netw Learn Syst ; 27(6): 1345-57, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-25622327

RESUMO

While clustering is usually an unsupervised operation, there are circumstances where we have access to prior belief that pairs of samples should (or should not) be assigned with the same cluster. Constrained clustering aims to exploit this prior belief as constraint (or weak supervision) to influence the cluster formation so as to obtain a data structure more closely resembling human perception. Two important issues remain open: 1) how to exploit sparse constraints effectively and 2) how to handle ill-conditioned/noisy constraints generated by imperfect oracles. In this paper, we present a novel pairwise similarity measure framework to address the above issues. Specifically, in contrast to existing constrained clustering approaches that blindly rely on all features for constraint propagation, our approach searches for neighborhoods driven by discriminative feature selection for more effective constraint diffusion. Crucially, we formulate a novel approach to handling the noisy constraint problem, which has been unrealistically ignored in the constrained clustering literature. Extensive comparative results show that our method is superior to the state-of-the-art constrained clustering approaches and can generally benefit existing pairwise similarity-based data clustering algorithms, such as spectral clustering and affinity propagation.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...