Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Nanotechnology ; 35(22)2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38387099

RESUMEN

Two-dimensional (2D) materials have been increasingly widely used in biomedical and cosmetical products nowadays, yet their safe usage in human body and environment necessitates a comprehensive understanding of their nanotoxicity. In this work, the effect of pristine graphene and graphene oxide (GO) on the adsorption and conformational changes of skin keratin using molecular dynamics simulations. It is found that skin keratin can be absorbed through various noncovalent driving forces, such as van der Waals (vdW) and electrostatics. In the case of GO, the oxygen-containing groups prevent tighter contact between skin keratin and the graphene basal plane through steric effects and electrostatic repulsion. On the other hand, electrostatic attraction and hydrogen bonding enhance their binding affinity to positively charged residues such as lysine and arginine. The secondary structure of skin keratin is better preserved in GO system, suggesting that GO has good biocompatibility. The charged groups on GO surface perform as the hydrogen bond acceptors, which is like to the natural receptors of keratin in this physiological environment. This work contributes to a better knowledge of the nanotoxicity of cutting-edge 2D materials on human health, thereby advancing their potential biological applications.


Asunto(s)
Grafito , Nanoestructuras , Humanos , Grafito/química , Queratinas , Simulación de Dinámica Molecular , Nanoestructuras/toxicidad , Nanoestructuras/química
2.
Artículo en Inglés | MEDLINE | ID: mdl-38865228

RESUMEN

This work pays the first research effort to leverage point cloud sequence-based Self-supervised 3-D Action Feature Learning (S3AFL), under text's cross-modality weak supervision. We intend to fill the huge performance gap between point cloud sequence and 3-D skeleton-based manners. The key intuition derives from the observation that skeleton-based manners actually hold the human pose's high-level knowledge that leads to attention on the body's joint-aware local parts. Inspired by this, we propose to introduce the text's weak supervision of high-level semantics into a point cloud sequence-based paradigm. With RGB-point cloud pair sequence acquired via RGB-D camera, text sequence is first generated from RGB component using pretrained image captioning model, as auxiliary weak supervision. Then, S3AFL runs in a cross and intra-modality contrastive learning (CL) way. To resist text's missing and redundant semantics, feature learning is conducted in a multistage way with semantic refinement. Essentially, text is only required for training. To facilitate the feature's representation power on fine-grained actions, a multirank max-pooling (MR-MP) way is also proposed for the point set network to better maintain discriminative clues. Experiments verify that the text's weak supervision can facilitate performance by 10.8%, 10.4%, and 8.0% on NTU RGB + D 60, 120, and N-UCLA at most. The performance gap between point cloud sequence and skeleton-based manners has been remarkably narrowed down. The idea of transferring text's weak supervision to S3AFL can also be applied to a skeleton manner, with strong generality. The source code is available at https://github.com/tangent-T/W3AMT.

3.
IEEE J Biomed Health Inform ; 28(3): 1516-1527, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38206781

RESUMEN

Breast lesion segmentation in ultrasound images is essential for computer-aided breast-cancer diagnosis. To improve the segmentation performance, most approaches design sophisticated deep-learning models by mining the patterns of foreground lesions and normal backgrounds simultaneously or by unilaterally enhancing foreground lesions via various focal losses. However, the potential of normal backgrounds is underutilized, which could reduce false positives by compacting the feature representation of all normal backgrounds. From a novel viewpoint of bilateral enhancement, we propose a negative-positive cross-attention network to concentrate on normal backgrounds and foreground lesions, respectively. Derived from the complementing opposites of bipolarity in TaiChi, the network is denoted as TaiChiNet, which consists of the negative normal-background and positive foreground-lesion paths. To transmit the information across the two paths, a cross-attention module, a complementary MLP-head, and a complementary loss are built for deep-layer features, shallow-layer features, and mutual-learning supervision, separately. To the best of our knowledge, this is the first work to formulate breast lesion segmentation as a mutual supervision task from the foreground-lesion and normal-background views. Experimental results have demonstrated the effectiveness of TaiChiNet on two breast lesion segmentation datasets with a lightweight architecture. Furthermore, extensive experiments on the thyroid nodule segmentation and retinal optic cup/disc segmentation datasets indicate the application potential of TaiChiNet.


Asunto(s)
Neoplasias de la Mama , Disco Óptico , Humanos , Femenino , Ultrasonografía , Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador , Conocimiento , Procesamiento de Imagen Asistido por Computador
4.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2551-2566, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-35503823

RESUMEN

Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustworthiness of each view for different samples could provide reliable integration. This can be achieved through uncertainty estimation. With this in mind, we propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC), providing a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed TMC can promote classification reliability by considering evidence from each view. Specifically, we introduce the variational Dirichlet to characterize the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness against possible noise or corruption. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10443-10465, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37030852

RESUMEN

Temporal sentence grounding in videos (TSGV), a.k.a., natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, TSGV has drawn significant attention from researchers in both communities. This survey attempts to provide a summary of fundamental concepts in TSGV and current research status, as well as future research directions. As the background, we present a common structure of functional components in TSGV, in a tutorial style: from feature extraction from raw video and language query, to answer prediction of the target moment. Then we review the techniques for multimodal understanding and interaction, which is the key focus of TSGV for effective alignment between the two modalities. We construct a taxonomy of TSGV techniques and elaborate the methods in different categories with their strengths and weaknesses. Lastly, we discuss issues with the current TSGV research and share our insights about promising research directions.


Asunto(s)
Algoritmos , Lenguaje
6.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13586-13598, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37428671

RESUMEN

Time series analysis is essential to many far-reaching applications of data science and statistics including economic and financial forecasting, surveillance, and automated business processing. Though being greatly successful of Transformer in computer vision and natural language processing, the potential of employing it as the general backbone in analyzing the ubiquitous times series data has not been fully released yet. Prior Transformer variants on time series highly rely on task-dependent designs and pre-assumed "pattern biases", revealing its insufficiency in representing nuanced seasonal, cyclic, and outlier patterns which are highly prevalent in time series. As a consequence, they can not generalize well to different time series analysis tasks. To tackle the challenges, we propose DifFormer, an effective and efficient Transformer architecture that can serve as a workhorse for a variety of time-series analysis tasks. DifFormer incorporates a novel multi-resolutional differencing mechanism, which is able to progressively and adaptively make nuanced yet meaningful changes prominent, meanwhile, the periodic or cyclic patterns can be dynamically captured with flexible lagging and dynamic ranging operations. Extensive experiments demonstrate DifFormer significantly outperforms state-of-the-art models on three essential time-series analysis tasks, including classification, regression, and forecasting. In addition to its superior performances, DifFormer also excels in efficiency - a linear time/memory complexity with empirically lower time consumption.

7.
Artículo en Inglés | MEDLINE | ID: mdl-37022080

RESUMEN

Medical image segmentation is a vital stage in medical image analysis. Numerous deep-learning methods are booming to improve the performance of 2-D medical image segmentation, owing to the fast growth of the convolutional neural network. Generally, the manually defined ground truth is utilized directly to supervise models in the training phase. However, direct supervision of the ground truth often results in ambiguity and distractors as complex challenges appear simultaneously. To alleviate this issue, we propose a gradually recurrent network with curriculum learning, which is supervised by gradual information of the ground truth. The whole model is composed of two independent networks. One is the segmentation network denoted as GREnet, which formulates 2-D medical image segmentation as a temporal task supervised by pixel-level gradual curricula in the training phase. The other is a curriculum-mining network. To a certain degree, the curriculum-mining network provides curricula with an increasing difficulty in the ground truth of the training set by progressively uncovering hard-to-segmentation pixels via a data-driven manner. Given that segmentation is a pixel-level dense-prediction challenge, to the best of our knowledge, this is the first work to function 2-D medical image segmentation as a temporal task with pixel-level curriculum learning. In GREnet, the naive UNet is adopted as the backbone, while ConvLSTM is used to establish the temporal link between gradual curricula. In the curriculum-mining network, UNet ++ supplemented by transformer is designed to deliver curricula through the outputs of the modified UNet ++ at different layers. Experimental results have demonstrated the effectiveness of GREnet on seven datasets, i.e., three lesion segmentation datasets in dermoscopic images, an optic disc and cup segmentation dataset and a blood vessel segmentation dataset in retinal images, a breast lesion segmentation dataset in ultrasound images, and a lung segmentation dataset in computed tomography (CT).

8.
Artículo en Inglés | MEDLINE | ID: mdl-37729565

RESUMEN

This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB + D 120, respectively. The source code is available at https://github.com/tangent-T/FACL.

9.
Med Image Anal ; 83: 102664, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36332357

RESUMEN

Pneumonia can be difficult to diagnose since its symptoms are too variable, and the radiographic signs are often very similar to those seen in other illnesses such as a cold or influenza. Deep neural networks have shown promising performance in automated pneumonia diagnosis using chest X-ray radiography, allowing mass screening and early intervention to reduce the severe cases and death toll. However, they usually require many well-labelled chest X-ray images for training to achieve high diagnostic accuracy. To reduce the need for training data and annotation resources, we propose a novel method called Contrastive Domain Adaptation with Consistency Match (CDACM). It transfers the knowledge from different but relevant datasets to the unlabelled small-size target dataset and improves the semantic quality of the learnt representations. Specifically, we design a conditional domain adversarial network to exploit discriminative information conveyed in the predictions to mitigate the domain gap between the source and target datasets. Furthermore, due to the small scale of the target dataset, we construct a feature cloud for each target sample and leverage contrastive learning to extract more discriminative features. Lastly, we propose adaptive feature cloud expansion to push the decision boundary to a low-density area. Unlike most existing transfer learning methods that aim only to mitigate the domain gap, our method instead simultaneously considers the domain gap and the data deficiency problem of the target dataset. The conditional domain adaptation and the feature cloud generation of our method are learning jointly to extract discriminative features in an end-to-end manner. Besides, the adaptive feature cloud expansion improves the model's generalisation ability in the target domain. Extensive experiments on pneumonia and COVID-19 diagnosis tasks demonstrate that our method outperforms several state-of-the-art unsupervised domain adaptation approaches, which verifies the effectiveness of CDACM for automated pneumonia diagnosis using chest X-ray imaging.


Asunto(s)
Prueba de COVID-19 , COVID-19 , Humanos
10.
Artículo en Inglés | MEDLINE | ID: mdl-35749327

RESUMEN

Current one-stage methods for visual grounding encode the language query as one holistic sentence embedding before fusion with visual features for target localization. Such a formulation provides insufficient ability to model query at the word level, and therefore is prone to neglect words that may not be the most important ones for a sentence but are critical for the referred object. In this article, we propose Word2Pix: a one-stage visual grounding network based on the encoder-decoder transformer architecture that enables learning for textual to visual feature correspondence via word to pixel attention. Each word from the query sentence is given an equal opportunity when attending to visual pixels through multiple stacks of transformer decoder layers. In this way, the decoder can learn to model the language query and fuse language with the visual features for target prediction simultaneously. We conduct the experiments on RefCOCO, RefCOCO + , and RefCOCOg datasets, and the proposed Word2Pix outperforms the existing one-stage methods by a notable margin. The results obtained also show that Word2Pix surpasses the two-stage visual grounding models, while at the same time keeping the merits of the one-stage paradigm, namely, end-to-end training and fast inference speed. Code is available at https://github.com/azurerain7/Word2Pix.

11.
Artículo en Inglés | MEDLINE | ID: mdl-35560072

RESUMEN

Edge devices demand low energy consumption, cost, and small form factor. To efficiently deploy convolutional neural network (CNN) models on the edge device, energy-aware model compression becomes extremely important. However, existing work did not study this problem well because of the lack of considering the diversity of dataflow types in hardware architectures. In this article, we propose EDCompress (EDC), an energy-aware model compression method for various dataflows. It can effectively reduce the energy consumption of various edge devices, with different dataflow types. Considering the very nature of model compression procedures, we recast the optimization process to a multistep problem and solve it by reinforcement learning algorithms. We also propose a multidimensional multistep (MDMS) optimization method, which shows higher compressing capability than the traditional multistep method. Experiments show that EDC could improve 20x, 17x, and 26x energy efficiency in VGG-16, MobileNet, and LeNet-5 networks, respectively, with negligible loss of accuracy. EDC could also indicate the optimal dataflow type for specific neural networks in terms of energy consumption, which can guide the deployment of CNN on hardware.

12.
IEEE Trans Neural Netw Learn Syst ; 33(3): 1079-1092, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33296312

RESUMEN

The core component of most anomaly detectors is a self-supervised model, tasked with modeling patterns included in training samples and detecting unexpected patterns as the anomalies in testing samples. To cope with normal patterns, this model is typically trained with reconstruction constraints. However, the model has the risk of overfitting to training samples and being sensitive to hard normal patterns in the inference phase, which results in irregular responses at normal frames. To address this problem, we formulate anomaly detection as a mutual supervision problem. Due to collaborative training, the complementary information of mutual learning can alleviate the aforementioned problem. Based on this motivation, a SIamese generative network (SIGnet), including two subnetworks with the same architecture, is proposed to simultaneously model the patterns of the forward and backward frames. During training, in addition to traditional constraints on improving the reconstruction performance, a bidirectional consistency loss based on the forward and backward views is designed as the regularization term to improve the generalization ability of the model. Moreover, we introduce a consistency-based evaluation criterion to achieve stable scores at the normal frames, which will benefit detecting anomalies with fluctuant scores in the inference phase. The results on several challenging benchmark data sets demonstrate the effectiveness of our proposed method.

13.
Artículo en Inglés | MEDLINE | ID: mdl-35998171

RESUMEN

Efficient neural network training is essential for in situ training of edge artificial intelligence (AI) and carbon footprint reduction in general. Train neural network on the edge is challenging because there is a large gap between limited resources on edge and the resource requirement of current training methods. Existing training methods are based on the assumption that the underlying computing infrastructure has sufficient memory and energy supplies. These methods involve two copies of the model parameters, which is usually beyond the capacity of on-chip memory in processors. The data movement between off-chip and on-chip memory incurs large amounts of energy. We propose resource constrained training (RCT) to realize resource-efficient training for edge devices and servers. RCT only keeps a quantized model throughout the training so that the memory requirement for model parameters in training is reduced. It adjusts per-layer bitwidth dynamically to save energy when a model can learn effectively with lower precision. We carry out experiments with representative models and tasks in image classification, natural language processing, and crowd counting applications. Experiments show that on average, 8-15-bit weight update is sufficient for achieving SOTA performance in these applications. RCT saves 63.5%-80% memory for model parameters and saves more energy for communications. Through experiments, we observe that the common practice on the first/last layer in model compression does not apply to efficient training. Also, interestingly, the more challenging a dataset is, the lower bitwidth is required for efficient training.

14.
Nanomaterials (Basel) ; 12(7)2022 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-35407299

RESUMEN

Graphene-based nanocomposite films (NCFs) are in high demand due to their superior photoelectric and thermal properties, but their stability and mechanical properties form a bottleneck. Herein, a facile approach was used to prepare nacre-mimetic NCFs through the non-covalent self-assembly of graphene oxide (GO) and biocompatible proteins. Various characterization techniques were employed to characterize the as-prepared NCFs and to track the interactions between GO and proteins. The conformational changes of various proteins induced by GO determined the film-forming ability of NCFs, and the binding of bull serum albumin (BSA)/hemoglobin (HB) on GO's surface was beneficial for improving the stability of as-prepared NCFs. Compared with the GO film without any additive, the indentation hardness and equivalent elastic modulus could be improved by 50.0% and 68.6% for GO-BSA NCF; and 100% and 87.5% for GO-HB NCF. Our strategy should be facile and effective for fabricating well-designed bio-nanocomposites for universal functional applications.

15.
Curr Med Chem ; 29(4): 700-718, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33992055

RESUMEN

Type Ⅰ enveloped viruses bind to cell receptors through surface glycoproteins to initiate infection or undergo receptor-mediated endocytosis and initiate membrane fusion in the acidic environment of endocytic compartments, releasing genetic material into the cell. In the process of membrane fusion, envelope protein exposes fusion peptide, followed by an insertion into the cell membrane or endosomal membrane. Further conformational changes ensue in which the type 1 envelope protein forms a typical six-helix bundle structure, shortening the distance between viral and cell membranes so that fusion can occur. Entry inhibitors targeting viral envelope proteins, or host factors, are effective antiviral agents and have been widely studied. Some have been used clinically, such as T20 and Maraviroc for human immunodeficiency virus 1 (HIV-1) or Myrcludex B for hepatitis D virus (HDV). This review focuses on entry inhibitors that target the six-helical bundle core against highly pathogenic enveloped viruses with class I fusion proteins, including retroviruses, coronaviruses, influenza A viruses, paramyxoviruses, and filoviruses.


Asunto(s)
VIH-1 , Internalización del Virus , Endocitosis , VIH-1/metabolismo , Humanos , Fusión de Membrana , Proteínas del Envoltorio Viral/metabolismo , Proteínas del Envoltorio Viral/farmacología
16.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2402-2415, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-33180720

RESUMEN

Although multi-view learning has made significant progress over the past few decades, it is still challenging due to the difficulty in modeling complex correlations among different views, especially under the context of view missing. To address the challenge, we propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets), which aims to fully and flexibly take advantage of multiple partial views. We first provide a formal definition of completeness and versatility for multi-view representation and then theoretically prove the versatility of the learned latent representations. For completeness, the task of learning latent multi-view representation is specifically translated to a degradation process by mimicking data transmission, such that the optimal tradeoff between consistency and complementarity across different views can be implicitly achieved. Equipped with adversarial strategy, our model stably imputes missing views, encoding information from all views for each sample to be encoded into latent representation to further enhance the completeness. Furthermore, a nonparametric classification loss is introduced to produce structured representations and prevent overfitting, which endows the algorithm with promising generalization under view-missing cases. Extensive experimental results validate the effectiveness of our algorithm over existing state of the arts for classification, representation learning and data imputation.

17.
IEEE Trans Cybern ; 52(3): 1736-1749, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32520713

RESUMEN

Face verification can be regarded as a two-class fine-grained visual-recognition problem. Enhancing the feature's discriminative power is one of the key problems to improve its performance. Metric-learning technology is often applied to address this need while achieving a good tradeoff between underfitting, and overfitting plays a vital role in metric learning. Hence, we propose a novel ensemble cascade metric-learning (ECML) mechanism. In particular, hierarchical metric learning is executed in a cascade way to alleviate underfitting. Meanwhile, at each learning level, the features are split into nonoverlapping groups. Then, metric learning is executed among the feature groups in the ensemble manner to resist overfitting. Considering the feature distribution characteristics of faces, a robust Mahalanobis metric-learning method (RMML) with a closed-form solution is additionally proposed. It can avoid the computation failure issue on an inverse matrix faced by some well-known metric-learning approaches (e.g., KISSME). Embedding RMML into the proposed ECML mechanism, our metric-learning paradigm (EC-RMML) can run in the one-pass learning manner. The experimental results demonstrate that EC-RMML is superior to state-of-the-art metric-learning methods for face verification. The proposed ECML mechanism is also applicable to other metric-learning approaches.


Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas , Cara , Aprendizaje , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos
18.
IEEE Trans Neural Netw Learn Syst ; 33(2): 798-810, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-33090960

RESUMEN

Cross-modal retrieval (CMR) enables flexible retrieval experience across different modalities (e.g., texts versus images), which maximally benefits us from the abundance of multimedia data. Existing deep CMR approaches commonly require a large amount of labeled data for training to achieve high performance. However, it is time-consuming and expensive to annotate the multimedia data manually. Thus, how to transfer valuable knowledge from existing annotated data to new data, especially from the known categories to new categories, becomes attractive for real-world applications. To achieve this end, we propose a deep multimodal transfer learning (DMTL) approach to transfer the knowledge from the previously labeled categories (source domain) to improve the retrieval performance on the unlabeled new categories (target domain). Specifically, we employ a joint learning paradigm to transfer knowledge by assigning a pseudolabel to each target sample. During training, the pseudolabel is iteratively updated and passed through our model in a self-supervised manner. At the same time, to reduce the domain discrepancy of different modalities, we construct multiple modality-specific neural networks to learn a shared semantic space for different modalities by enforcing the compactness of homoinstance samples and the scatters of heteroinstance samples. Our method is remarkably different from most of the existing transfer learning approaches. To be specific, previous works usually assume that the source domain and the target domain have the same label set. In contrast, our method considers a more challenging multimodal learning situation where the label sets of the two domains are different or even disjoint. Experimental studies on four widely used benchmarks validate the effectiveness of the proposed method in multimodal transfer learning and demonstrate its superior performance in CMR compared with 11 state-of-the-art methods.

19.
IEEE Trans Cybern ; 52(8): 7732-7741, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33566780

RESUMEN

Image annotation aims to jointly predict multiple tags for an image. Although significant progress has been achieved, existing approaches usually overlook aligning specific labels and their corresponding regions due to the weak supervised information (i.e., "bag of labels" for regions), thus failing to explicitly exploit the discrimination from different classes. In this article, we propose the deep label-specific feature (Deep-LIFT) learning model to build the explicit and exact correspondence between the label and the local visual region, which improves the effectiveness of feature learning and enhances the interpretability of the model itself. Deep-LIFT extracts features for each label by aligning each label and its region. Specifically, Deep-LIFTs are achieved through learning multiple correlation maps between image convolutional features and label embeddings. Moreover, we construct two variant graph convolutional networks (GCNs) to further capture the interdependency among labels. Empirical studies on benchmark datasets validate that the proposed model achieves superior performance on multilabel classification over other existing state-of-the-art methods.


Asunto(s)
Algoritmos , Curaduría de Datos
20.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3602-3613, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33534703

RESUMEN

Imbalanced data distribution in crowd counting datasets leads to severe under-estimation and over-estimation problems, which has been less investigated in existing works. In this paper, we tackle this challenging problem by proposing a simple but effective locality-based learning paradigm to produce generalizable features by alleviating sample bias. Our proposed method is locality-aware in two aspects. First, we introduce a locality-aware data partition (LADP) approach to group the training data into different bins via locality-sensitive hashing. As a result, a more balanced data batch is then constructed by LADP. To further reduce the training bias and enhance the collaboration with LADP, a new data augmentation method called locality-aware data augmentation (LADA) is proposed where the image patches are adaptively augmented based on the loss. The proposed method is independent of the backbone network architectures, and thus could be smoothly integrated with most existing deep crowd counting approaches in an end-to-end paradigm to boost their performance. We also demonstrate the versatility of the proposed method by applying it for adversarial defense. Extensive experiments verify the superiority of the proposed method over the state of the arts.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA