Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
PLoS One ; 19(5): e0302069, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38701098

RESUMEN

The U.S. Transuranium and Uranium Registries performs autopsies on each of its deceased Registrants as a part of its mission to follow up occupationally-exposed individuals. This provides a unique opportunity to explore death certificate misclassification errors, and the factors that influence them, among this small population of former nuclear workers. Underlying causes of death from death certificates and autopsy reports were coded using the 10th revision of the International Classification of Diseases (ICD-10). These codes were then used to quantify misclassification rates among 268 individuals for whom both full autopsy reports and death certificates with legible underlying causes of death were available. When underlying causes of death were compared between death certificates and autopsy reports, death certificates correctly identified the underlying cause of death's ICD-10 disease chapter in 74.6% of cases. The remaining 25.4% of misclassified cases resulted in over-classification rates that ranged from 1.2% for external causes of mortality to 12.2% for circulatory disease, and under-classification rates that ranged from 7.7% for external causes of mortality to 47.4% for respiratory disease. Neoplasms had generally lower misclassification rates with 4.3% over-classification and 13.3% under-classification. A logistic regression revealed that the odds of a match were 2.8 times higher when clinical history was mentioned on the autopsy report than when it was not. Similarly, the odds of a match were 3.4 times higher when death certificates were completed using autopsy findings than when autopsy findings were not used. This analysis excluded cases where it could not be determined if autopsy findings were used to complete death certificates. The findings of this study are useful to investigate the impact of death certificate misclassification errors on radiation risk estimates and, therefore, improve the reliability of epidemiological studies.


Asunto(s)
Autopsia , Causas de Muerte , Certificado de Defunción , Humanos , Masculino , Persona de Mediana Edad , Femenino , Clasificación Internacional de Enfermedades , Adulto , Exposición Profesional/efectos adversos , Anciano , Sistema de Registros
2.
Nanotechnology ; 35(22)2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38387099

RESUMEN

Two-dimensional (2D) materials have been increasingly widely used in biomedical and cosmetical products nowadays, yet their safe usage in human body and environment necessitates a comprehensive understanding of their nanotoxicity. In this work, the effect of pristine graphene and graphene oxide (GO) on the adsorption and conformational changes of skin keratin using molecular dynamics simulations. It is found that skin keratin can be absorbed through various noncovalent driving forces, such as van der Waals (vdW) and electrostatics. In the case of GO, the oxygen-containing groups prevent tighter contact between skin keratin and the graphene basal plane through steric effects and electrostatic repulsion. On the other hand, electrostatic attraction and hydrogen bonding enhance their binding affinity to positively charged residues such as lysine and arginine. The secondary structure of skin keratin is better preserved in GO system, suggesting that GO has good biocompatibility. The charged groups on GO surface perform as the hydrogen bond acceptors, which is like to the natural receptors of keratin in this physiological environment. This work contributes to a better knowledge of the nanotoxicity of cutting-edge 2D materials on human health, thereby advancing their potential biological applications.


Asunto(s)
Grafito , Nanoestructuras , Humanos , Grafito/química , Queratinas , Simulación de Dinámica Molecular , Nanoestructuras/toxicidad , Nanoestructuras/química
3.
IEEE J Biomed Health Inform ; 28(3): 1516-1527, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38206781

RESUMEN

Breast lesion segmentation in ultrasound images is essential for computer-aided breast-cancer diagnosis. To improve the segmentation performance, most approaches design sophisticated deep-learning models by mining the patterns of foreground lesions and normal backgrounds simultaneously or by unilaterally enhancing foreground lesions via various focal losses. However, the potential of normal backgrounds is underutilized, which could reduce false positives by compacting the feature representation of all normal backgrounds. From a novel viewpoint of bilateral enhancement, we propose a negative-positive cross-attention network to concentrate on normal backgrounds and foreground lesions, respectively. Derived from the complementing opposites of bipolarity in TaiChi, the network is denoted as TaiChiNet, which consists of the negative normal-background and positive foreground-lesion paths. To transmit the information across the two paths, a cross-attention module, a complementary MLP-head, and a complementary loss are built for deep-layer features, shallow-layer features, and mutual-learning supervision, separately. To the best of our knowledge, this is the first work to formulate breast lesion segmentation as a mutual supervision task from the foreground-lesion and normal-background views. Experimental results have demonstrated the effectiveness of TaiChiNet on two breast lesion segmentation datasets with a lightweight architecture. Furthermore, extensive experiments on the thyroid nodule segmentation and retinal optic cup/disc segmentation datasets indicate the application potential of TaiChiNet.


Asunto(s)
Neoplasias de la Mama , Disco Óptico , Humanos , Femenino , Ultrasonografía , Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador , Conocimiento , Procesamiento de Imagen Asistido por Computador
4.
Artículo en Inglés | MEDLINE | ID: mdl-37729565

RESUMEN

This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB + D 120, respectively. The source code is available at https://github.com/tangent-T/FACL.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13586-13598, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37428671

RESUMEN

Time series analysis is essential to many far-reaching applications of data science and statistics including economic and financial forecasting, surveillance, and automated business processing. Though being greatly successful of Transformer in computer vision and natural language processing, the potential of employing it as the general backbone in analyzing the ubiquitous times series data has not been fully released yet. Prior Transformer variants on time series highly rely on task-dependent designs and pre-assumed "pattern biases", revealing its insufficiency in representing nuanced seasonal, cyclic, and outlier patterns which are highly prevalent in time series. As a consequence, they can not generalize well to different time series analysis tasks. To tackle the challenges, we propose DifFormer, an effective and efficient Transformer architecture that can serve as a workhorse for a variety of time-series analysis tasks. DifFormer incorporates a novel multi-resolutional differencing mechanism, which is able to progressively and adaptively make nuanced yet meaningful changes prominent, meanwhile, the periodic or cyclic patterns can be dynamically captured with flexible lagging and dynamic ranging operations. Extensive experiments demonstrate DifFormer significantly outperforms state-of-the-art models on three essential time-series analysis tasks, including classification, regression, and forecasting. In addition to its superior performances, DifFormer also excels in efficiency - a linear time/memory complexity with empirically lower time consumption.

6.
Radiat Prot Dosimetry ; 199(8-9): 681-688, 2023 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-37225186

RESUMEN

The skeleton is a major plutonium retention site in the human body. Estimation of the total plutonium activity in the skeleton is a challenging problem. For most tissue donors at the United States Transuranium and Uranium Registries, a limited number of bone samples is available. The skeleton activity is calculated using plutonium activity concentration (Cskel) and skeleton weight. In this study, latent bone modelling was used to estimate Cskel from the limited number of analysed bone samples. Data from 13 non-osteoporotic whole-body donors were used to develop latent bone model (LBM) to estimate Cskel for seven cases with four to eight analysed bone samples. LBM predictions were compared to Cskel estimated using an arithmetic mean in terms of accuracy and precision. For the studied cases, LBM offered a significant reduction of uncertainty of Cskel estimate.


Asunto(s)
Plutonio , Humanos , Esqueleto , Radiofármacos , Sistema de Registros , Incertidumbre
7.
Artículo en Inglés | MEDLINE | ID: mdl-37022080

RESUMEN

Medical image segmentation is a vital stage in medical image analysis. Numerous deep-learning methods are booming to improve the performance of 2-D medical image segmentation, owing to the fast growth of the convolutional neural network. Generally, the manually defined ground truth is utilized directly to supervise models in the training phase. However, direct supervision of the ground truth often results in ambiguity and distractors as complex challenges appear simultaneously. To alleviate this issue, we propose a gradually recurrent network with curriculum learning, which is supervised by gradual information of the ground truth. The whole model is composed of two independent networks. One is the segmentation network denoted as GREnet, which formulates 2-D medical image segmentation as a temporal task supervised by pixel-level gradual curricula in the training phase. The other is a curriculum-mining network. To a certain degree, the curriculum-mining network provides curricula with an increasing difficulty in the ground truth of the training set by progressively uncovering hard-to-segmentation pixels via a data-driven manner. Given that segmentation is a pixel-level dense-prediction challenge, to the best of our knowledge, this is the first work to function 2-D medical image segmentation as a temporal task with pixel-level curriculum learning. In GREnet, the naive UNet is adopted as the backbone, while ConvLSTM is used to establish the temporal link between gradual curricula. In the curriculum-mining network, UNet ++ supplemented by transformer is designed to deliver curricula through the outputs of the modified UNet ++ at different layers. Experimental results have demonstrated the effectiveness of GREnet on seven datasets, i.e., three lesion segmentation datasets in dermoscopic images, an optic disc and cup segmentation dataset and a blood vessel segmentation dataset in retinal images, a breast lesion segmentation dataset in ultrasound images, and a lung segmentation dataset in computed tomography (CT).

8.
Sci Data ; 10(1): 189, 2023 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-37024500

RESUMEN

We present the Canadian Open Neuroscience Platform (CONP) portal to answer the research community's need for flexible data sharing resources and provide advanced tools for search and processing infrastructure capacity. This portal differs from previous data sharing projects as it integrates datasets originating from a number of already existing platforms or databases through DataLad, a file level data integrity and access layer. The portal is also an entry point for searching and accessing a large number of standardized and containerized software and links to a computing infrastructure. It leverages community standards to help document and facilitate reuse of both datasets and tools, and already shows a growing community adoption giving access to more than 60 neuroscience datasets and over 70 tools. The CONP portal demonstrates the feasibility and offers a model of a distributed data and tool management system across 17 institutions throughout Canada.


Asunto(s)
Bases de Datos Factuales , Programas Informáticos , Canadá , Difusión de la Información
9.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10443-10465, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37030852

RESUMEN

Temporal sentence grounding in videos (TSGV), a.k.a., natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, TSGV has drawn significant attention from researchers in both communities. This survey attempts to provide a summary of fundamental concepts in TSGV and current research status, as well as future research directions. As the background, we present a common structure of functional components in TSGV, in a tutorial style: from feature extraction from raw video and language query, to answer prediction of the target moment. Then we review the techniques for multimodal understanding and interaction, which is the key focus of TSGV for effective alignment between the two modalities. We construct a taxonomy of TSGV techniques and elaborate the methods in different categories with their strengths and weaknesses. Lastly, we discuss issues with the current TSGV research and share our insights about promising research directions.


Asunto(s)
Algoritmos , Lenguaje
10.
Med Image Anal ; 83: 102664, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36332357

RESUMEN

Pneumonia can be difficult to diagnose since its symptoms are too variable, and the radiographic signs are often very similar to those seen in other illnesses such as a cold or influenza. Deep neural networks have shown promising performance in automated pneumonia diagnosis using chest X-ray radiography, allowing mass screening and early intervention to reduce the severe cases and death toll. However, they usually require many well-labelled chest X-ray images for training to achieve high diagnostic accuracy. To reduce the need for training data and annotation resources, we propose a novel method called Contrastive Domain Adaptation with Consistency Match (CDACM). It transfers the knowledge from different but relevant datasets to the unlabelled small-size target dataset and improves the semantic quality of the learnt representations. Specifically, we design a conditional domain adversarial network to exploit discriminative information conveyed in the predictions to mitigate the domain gap between the source and target datasets. Furthermore, due to the small scale of the target dataset, we construct a feature cloud for each target sample and leverage contrastive learning to extract more discriminative features. Lastly, we propose adaptive feature cloud expansion to push the decision boundary to a low-density area. Unlike most existing transfer learning methods that aim only to mitigate the domain gap, our method instead simultaneously considers the domain gap and the data deficiency problem of the target dataset. The conditional domain adaptation and the feature cloud generation of our method are learning jointly to extract discriminative features in an end-to-end manner. Besides, the adaptive feature cloud expansion improves the model's generalisation ability in the target domain. Extensive experiments on pneumonia and COVID-19 diagnosis tasks demonstrate that our method outperforms several state-of-the-art unsupervised domain adaptation approaches, which verifies the effectiveness of CDACM for automated pneumonia diagnosis using chest X-ray imaging.


Asunto(s)
Prueba de COVID-19 , COVID-19 , Humanos
11.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2551-2566, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-35503823

RESUMEN

Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustworthiness of each view for different samples could provide reliable integration. This can be achieved through uncertainty estimation. With this in mind, we propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC), providing a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed TMC can promote classification reliability by considering evidence from each view. Specifically, we introduce the variational Dirichlet to characterize the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness against possible noise or corruption. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.

12.
Artículo en Inglés | MEDLINE | ID: mdl-35969543

RESUMEN

Spiking neural networks (SNNs) have advantages in latency and energy efficiency over traditional artificial neural networks (ANNs) due to their event-driven computation mechanism and the replacement of energy-consuming weight multiplication with addition. However, to achieve high accuracy, it usually requires long spike trains to ensure accuracy, usually more than 1000 time steps. This offsets the computation efficiency brought by SNNs because a longer spike train means a larger number of operations and larger latency. In this article, we propose a radix-encoded SNN, which has ultrashort spike trains. Specifically, it is able to use less than six time steps to achieve even higher accuracy than its traditional counterpart. We also develop a method to fit our radix encoding technique into the ANN-to-SNN conversion approach so that we can train radix-encoded SNNs more efficiently on mature platforms and hardware. Experiments show that our radix encoding can achieve 25 × improvement in latency and 1.7% improvement in accuracy compared to the state-of-the-art method using the VGG-16 network on the CIFAR-10 dataset.

13.
Artículo en Inglés | MEDLINE | ID: mdl-35998171

RESUMEN

Efficient neural network training is essential for in situ training of edge artificial intelligence (AI) and carbon footprint reduction in general. Train neural network on the edge is challenging because there is a large gap between limited resources on edge and the resource requirement of current training methods. Existing training methods are based on the assumption that the underlying computing infrastructure has sufficient memory and energy supplies. These methods involve two copies of the model parameters, which is usually beyond the capacity of on-chip memory in processors. The data movement between off-chip and on-chip memory incurs large amounts of energy. We propose resource constrained training (RCT) to realize resource-efficient training for edge devices and servers. RCT only keeps a quantized model throughout the training so that the memory requirement for model parameters in training is reduced. It adjusts per-layer bitwidth dynamically to save energy when a model can learn effectively with lower precision. We carry out experiments with representative models and tasks in image classification, natural language processing, and crowd counting applications. Experiments show that on average, 8-15-bit weight update is sufficient for achieving SOTA performance in these applications. RCT saves 63.5%-80% memory for model parameters and saves more energy for communications. Through experiments, we observe that the common practice on the first/last layer in model compression does not apply to efficient training. Also, interestingly, the more challenging a dataset is, the lower bitwidth is required for efficient training.

14.
Med Image Anal ; 81: 102535, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35872361

RESUMEN

Accurate skin lesion diagnosis requires a great effort from experts to identify the characteristics from clinical and dermoscopic images. Deep multimodal learning-based methods can reduce intra- and inter-reader variability and improve diagnostic accuracy compared to the single modality-based methods. This study develops a novel method, named adversarial multimodal fusion with attention mechanism (AMFAM), to perform multimodal skin lesion classification. Specifically, we adopt a discriminator that uses adversarial learning to enforce the feature extractor to learn the correlated information explicitly. Moreover, we design an attention-based reconstruction strategy to encourage the feature extractor to concentrate on learning the features of the lesion area, thus, enhancing the feature vector from each modality with more discriminative information. Unlike existing multimodal-based approaches, which only focus on learning complementary features from dermoscopic and clinical images, our method considers both correlated and complementary information of the two modalities for multimodal fusion. To verify the effectiveness of our method, we conduct comprehensive experiments on a publicly available multimodal and multi-task skin lesion classification dataset: 7-point criteria evaluation database. The experimental results demonstrate that our proposed method outperforms the current state-of-the-art methods and improves the average AUC score by above 2% on the test set.


Asunto(s)
Diagnóstico por Imagen , Enfermedades de la Piel , Piel , Bases de Datos Factuales , Humanos , Aprendizaje Automático , Piel/patología , Enfermedades de la Piel/clasificación , Enfermedades de la Piel/diagnóstico
15.
Artículo en Inglés | MEDLINE | ID: mdl-35749327

RESUMEN

Current one-stage methods for visual grounding encode the language query as one holistic sentence embedding before fusion with visual features for target localization. Such a formulation provides insufficient ability to model query at the word level, and therefore is prone to neglect words that may not be the most important ones for a sentence but are critical for the referred object. In this article, we propose Word2Pix: a one-stage visual grounding network based on the encoder-decoder transformer architecture that enables learning for textual to visual feature correspondence via word to pixel attention. Each word from the query sentence is given an equal opportunity when attending to visual pixels through multiple stacks of transformer decoder layers. In this way, the decoder can learn to model the language query and fuse language with the visual features for target prediction simultaneously. We conduct the experiments on RefCOCO, RefCOCO + , and RefCOCOg datasets, and the proposed Word2Pix outperforms the existing one-stage methods by a notable margin. The results obtained also show that Word2Pix surpasses the two-stage visual grounding models, while at the same time keeping the merits of the one-stage paradigm, namely, end-to-end training and fast inference speed. Code is available at https://github.com/azurerain7/Word2Pix.

16.
PLoS One ; 17(6): e0265712, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35749431

RESUMEN

The FDA's Accelerated Approval program (AA) is a regulatory program to expedite availability of products to treat serious or life-threatening illnesses that lack effective treatment alternatives. Ideally, all of the many stakeholders such as patients, physicians, regulators, and health technology assessment [HTA] agencies that are affected by AA should benefit from it. In practice, however, there is intense debate over whether evidence supporting AA is sufficient to meet the needs of the stakeholders who collectively bring an approved product into routine clinical care. As AAs have become more common, it becomes essential to be able to determine their impact objectively and reproducibly in a way that provides for consistent evaluation of therapeutic decision alternatives. We describe the basic features of an approach for evaluating AA impact that accommodates stakeholder-specific views about potential benefits, risks, and costs. The approach is based on a formal decision-analytic framework combining predictive distributions for therapeutic outcomes (efficacy and safety) based on statistical models that incorporate findings from AA trials with stakeholder assessments of various actions that might be taken. The framework described here provides a starting point for communicating the value of a treatment granted AA in the context of what is important to various stakeholders.


Asunto(s)
Aprobación de Drogas , Evaluación de la Tecnología Biomédica , Humanos , Resultado del Tratamiento , Estados Unidos , United States Food and Drug Administration
17.
Artículo en Inglés | MEDLINE | ID: mdl-35560072

RESUMEN

Edge devices demand low energy consumption, cost, and small form factor. To efficiently deploy convolutional neural network (CNN) models on the edge device, energy-aware model compression becomes extremely important. However, existing work did not study this problem well because of the lack of considering the diversity of dataflow types in hardware architectures. In this article, we propose EDCompress (EDC), an energy-aware model compression method for various dataflows. It can effectively reduce the energy consumption of various edge devices, with different dataflow types. Considering the very nature of model compression procedures, we recast the optimization process to a multistep problem and solve it by reinforcement learning algorithms. We also propose a multidimensional multistep (MDMS) optimization method, which shows higher compressing capability than the traditional multistep method. Experiments show that EDC could improve 20x, 17x, and 26x energy efficiency in VGG-16, MobileNet, and LeNet-5 networks, respectively, with negligible loss of accuracy. EDC could also indicate the optimal dataflow type for specific neural networks in terms of energy consumption, which can guide the deployment of CNN on hardware.

18.
Nanomaterials (Basel) ; 12(7)2022 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-35407299

RESUMEN

Graphene-based nanocomposite films (NCFs) are in high demand due to their superior photoelectric and thermal properties, but their stability and mechanical properties form a bottleneck. Herein, a facile approach was used to prepare nacre-mimetic NCFs through the non-covalent self-assembly of graphene oxide (GO) and biocompatible proteins. Various characterization techniques were employed to characterize the as-prepared NCFs and to track the interactions between GO and proteins. The conformational changes of various proteins induced by GO determined the film-forming ability of NCFs, and the binding of bull serum albumin (BSA)/hemoglobin (HB) on GO's surface was beneficial for improving the stability of as-prepared NCFs. Compared with the GO film without any additive, the indentation hardness and equivalent elastic modulus could be improved by 50.0% and 68.6% for GO-BSA NCF; and 100% and 87.5% for GO-HB NCF. Our strategy should be facile and effective for fabricating well-designed bio-nanocomposites for universal functional applications.

19.
IEEE Trans Neural Netw Learn Syst ; 33(2): 798-810, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-33090960

RESUMEN

Cross-modal retrieval (CMR) enables flexible retrieval experience across different modalities (e.g., texts versus images), which maximally benefits us from the abundance of multimedia data. Existing deep CMR approaches commonly require a large amount of labeled data for training to achieve high performance. However, it is time-consuming and expensive to annotate the multimedia data manually. Thus, how to transfer valuable knowledge from existing annotated data to new data, especially from the known categories to new categories, becomes attractive for real-world applications. To achieve this end, we propose a deep multimodal transfer learning (DMTL) approach to transfer the knowledge from the previously labeled categories (source domain) to improve the retrieval performance on the unlabeled new categories (target domain). Specifically, we employ a joint learning paradigm to transfer knowledge by assigning a pseudolabel to each target sample. During training, the pseudolabel is iteratively updated and passed through our model in a self-supervised manner. At the same time, to reduce the domain discrepancy of different modalities, we construct multiple modality-specific neural networks to learn a shared semantic space for different modalities by enforcing the compactness of homoinstance samples and the scatters of heteroinstance samples. Our method is remarkably different from most of the existing transfer learning approaches. To be specific, previous works usually assume that the source domain and the target domain have the same label set. In contrast, our method considers a more challenging multimodal learning situation where the label sets of the two domains are different or even disjoint. Experimental studies on four widely used benchmarks validate the effectiveness of the proposed method in multimodal transfer learning and demonstrate its superior performance in CMR compared with 11 state-of-the-art methods.

20.
IEEE Trans Cybern ; 52(3): 1736-1749, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32520713

RESUMEN

Face verification can be regarded as a two-class fine-grained visual-recognition problem. Enhancing the feature's discriminative power is one of the key problems to improve its performance. Metric-learning technology is often applied to address this need while achieving a good tradeoff between underfitting, and overfitting plays a vital role in metric learning. Hence, we propose a novel ensemble cascade metric-learning (ECML) mechanism. In particular, hierarchical metric learning is executed in a cascade way to alleviate underfitting. Meanwhile, at each learning level, the features are split into nonoverlapping groups. Then, metric learning is executed among the feature groups in the ensemble manner to resist overfitting. Considering the feature distribution characteristics of faces, a robust Mahalanobis metric-learning method (RMML) with a closed-form solution is additionally proposed. It can avoid the computation failure issue on an inverse matrix faced by some well-known metric-learning approaches (e.g., KISSME). Embedding RMML into the proposed ECML mechanism, our metric-learning paradigm (EC-RMML) can run in the one-pass learning manner. The experimental results demonstrate that EC-RMML is superior to state-of-the-art metric-learning methods for face verification. The proposed ECML mechanism is also applicable to other metric-learning approaches.


Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas , Cara , Aprendizaje , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...