ABSTRACT
In this article, we propose a method, generative image reconstruction from gradients (GIRG), for recovering training images from gradients in a federated learning (FL) setting, where privacy is preserved by sharing model weights and gradients rather than raw training data. Previous studies have shown the potential for revealing clients' private information or even pixel-level recovery of training images from shared gradients. However, existing methods are limited to low-resolution images and small batch sizes (BSs) or require prior knowledge about the client data. GIRG utilizes a conditional generative model to reconstruct training images and their corresponding labels from the shared gradients. Unlike previous generative model-based methods, GIRG does not require prior knowledge of the training data. Furthermore, GIRG optimizes the weights of the conditional generative model to generate highly accurate "dummy" images instead of optimizing the input vectors of the generative model. Comprehensive empirical results show that GIRG is able to recover high-resolution images with large BSs and can even recover images from the aggregation of gradients from multiple participants. These results reveal the vulnerability of current FL practices and call for immediate efforts to prevent inversion attacks in gradient-sharing-based collaborative training.
ABSTRACT
BACKGROUND: Focusing on the complicated pathological features, such as blurred boundaries, severe scale differences between symptoms, and background noise interference, we aim to enhance the reliability of multiple lesions joint segmentation from medical images. PURPOSE: Propose a novel reliable multi-scale wavelet-enhanced transformer network, which can provide accurate segmentation results with reliability assessment. METHODS: Focusing on enhancing the model's capability to capture intricate pathological features in medical images, this work introduces a novel segmentation backbone. The backbone integrates a wavelet-enhanced feature extractor network and incorporates a multi-scale transformer module developed within the scope of this work. Simultaneously, to enhance the reliability of segmentation outcomes, a novel uncertainty segmentation head is proposed. This segmentation head is rooted in the SL, contributing to the generation of final segmentation results along with an associated overall uncertainty evaluation score map. RESULTS: Comprehensive experiments are conducted on the public database of AI-Challenge 2018 for retinal edema lesions segmentation and the segmentation of Thoracic Organs at Risk in CT images. The experimental results highlight the superior segmentation accuracy and heightened reliability achieved by the proposed method in comparison to other state-of-the-art segmentation approaches. CONCLUSIONS: Unlike previous segmentation methods, the proposed approach can produce reliable segmentation results with an estimated uncertainty and higher accuracy, enhancing the overall reliability of the model. The code will be release on https://github.com/LooKing9218/ReMultiSeg.
Subject(s)
Image Processing, Computer-Assisted , Image Processing, Computer-Assisted/methods , Humans , Tomography, X-Ray Computed , Reproducibility of Results , Organs at Risk/diagnostic imaging , Organs at Risk/radiation effects , Wavelet AnalysisABSTRACT
Domain generalization (DG) aims to learn a model on one or multiple observed source domains that can generalize to unseen target test domains. Previous approaches have focused on extracting domain-invariant information from multiple source domains, but domain-specific information is also closely tied to semantics in individual domains and is not well-suited for generalization to the target domain. In this article, we propose a novel DG method called continuous disentangled joint space learning (CJSL), which leverages both domain-invariant and domain-specific information for more effective DG. The key idea behind CJSL is to formulate and learn a continuous joint space (CJS) for domain-specific representations from source domains through iterative feature disentanglement. This learned CJS can then be used to simulate domain-specific representations for test samples from a mixture of multiple domains via Monte Carlo sampling during the inference stage. Unlike existing approaches, which exploit domain-invariant feature vectors only or aim to learn a universal domain-specific feature extractor, we simulate domain-specific representations via sampling the latent vectors in the learned CJS for the test sample to fully use the power of multiple domain-specific classifiers for robust prediction. Empirical results demonstrate that CJSL outperforms 19 state-of-the-art (SOTA) methods on seven benchmarks, indicating the effectiveness of our proposed method.
ABSTRACT
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. Firstly, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Secondly, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERT Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39× in area overhead and 18× in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68× reduction in the area-delay product and a significant 69% energy consumption reduction.
ABSTRACT
Color fundus photography (CFP) and Optical coherence tomography (OCT) images are two of the most widely used modalities in the clinical diagnosis and management of retinal diseases. Despite the widespread use of multimodal imaging in clinical practice, few methods for automated diagnosis of eye diseases utilize correlated and complementary information from multiple modalities effectively. This paper explores how to leverage the information from CFP and OCT images to improve the automated diagnosis of retinal diseases. We propose a novel multimodal learning method, named geometric correspondence-based multimodal learning network (GeCoM-Net), to achieve the fusion of CFP and OCT images. Specifically, inspired by clinical observations, we consider the geometric correspondence between the OCT slice and the CFP region to learn the correlated features of the two modalities for robust fusion. Furthermore, we design a new feature selection strategy to extract discriminative OCT representations by automatically selecting the important feature maps from OCT slices. Unlike the existing multimodal learning methods, GeCoM-Net is the first method that formulates the geometric relationships between the OCT slice and the corresponding region of the CFP image explicitly for CFP and OCT fusion. Experiments have been conducted on a large-scale private dataset and a publicly available dataset to evaluate the effectiveness of GeCoM-Net for diagnosing diabetic macular edema (DME), impaired visual acuity (VA) and glaucoma. The empirical results show that our method outperforms the current state-of-the-art multimodal learning methods by improving the AUROC score 0.4%, 1.9% and 2.9% for DME, VA and glaucoma detection, respectively.
Subject(s)
Image Interpretation, Computer-Assisted , Multimodal Imaging , Tomography, Optical Coherence , Humans , Tomography, Optical Coherence/methods , Multimodal Imaging/methods , Image Interpretation, Computer-Assisted/methods , Algorithms , Retinal Diseases/diagnostic imaging , Retina/diagnostic imaging , Machine Learning , Photography/methods , Diagnostic Techniques, Ophthalmological , Databases, FactualABSTRACT
Federated learning (FL) is a distributed machine learning framework that is gaining traction in view of increasing health data privacy protection needs. By conducting a systematic review of FL applications in healthcare, we identify relevant articles in scientific, engineering, and medical journals in English up to August 31st, 2023. Out of a total of 22,693 articles under review, 612 articles are included in the final analysis. The majority of articles are proof-of-concepts studies, and only 5.2% are studies with real-life application of FL. Radiology and internal medicine are the most common specialties involved in FL. FL is robust to a variety of machine learning models and data types, with neural networks and medical imaging being the most common, respectively. We highlight the need to address the barriers to clinical translation and to assess its real-world impact in this new digital data-driven healthcare scene.
Subject(s)
Delivery of Health Care , Machine Learning , Humans , Neural Networks, ComputerABSTRACT
In-memory deep learning executes neural network models where they are stored, thus avoiding long-distance communication between memory and computation units, resulting in considerable savings in energy and time. In-memory deep learning has already demonstrated orders of magnitude higher performance density and energy efficiency. The use of emerging memory technology (EMT) promises to increase density, energy, and performance even further. However, EMT is intrinsically unstable, resulting in random data read fluctuations. This can translate to nonnegligible accuracy loss, potentially nullifying the gains. In this article, we propose three optimization techniques that can mathematically overcome the instability problem of EMT. They can improve the accuracy of the in-memory deep learning model while maximizing its energy efficiency. Experiments show that our solution can fully recover most models' state-of-the-art (SOTA) accuracy and achieves at least an order of magnitude higher energy efficiency than the SOTA.
ABSTRACT
BACKGROUND: The 2019 novel COVID-19 has severely burdened the health care system through its rapid transmission. Mobile health (mHealth) is a viable solution to facilitate remote monitoring and continuity of care for patients with COVID-19 in a home environment. However, the conceptualization and development of mHealth apps are often time and labor-intensive and are laden with concerns relating to data security and privacy. Implementing mHealth apps is also a challenging feat as language-related barriers limit adoption, whereas its perceived lack of benefits affects sustained use. The rapid development of an mHealth app that is cost-effective, secure, and user-friendly will be a timely enabler. OBJECTIVE: This project aimed to develop an mHealth app, DrCovid+, to facilitate remote monitoring and continuity of care for patients with COVID-19 by using the rapid development approach. It also aimed to address the challenges of mHealth app adoption and sustained use. METHODS: The Rapid Application Development approach was adopted. Stakeholders including decision makers, physicians, nurses, health care administrators, and research engineers were engaged. The process began with requirements gathering to define and finalize the project scope, followed by an iterative process of developing a working prototype, conducting User Acceptance Tests, and improving the prototype before implementation. Co-designing principles were applied to ensure equal collaborative efforts and collective agreement among stakeholders. RESULTS: DrCovid+ was developed on Telegram Messenger and hosted on a cloud server. It features a secure patient enrollment and data interface, a multilingual communication channel, and both automatic and personalized push messaging. A back-end dashboard was also developed to collect patients' vital signs for remote monitoring and continuity of care. To date, 400 patients have been enrolled into the system, amounting to 2822 hospital bed-days saved. CONCLUSIONS: The rapid development and implementation of DrCovid+ allowed for timely clinical care management for patients with COVID-19. It facilitated early patient hospital discharge and continuity of care while addressing issues relating to data security and labor-, time-, and cost-effectiveness. The use case for DrCovid+ may be extended to other medical conditions to advance patient care and empowerment within the community, thereby meeting existing and rising population health challenges.
ABSTRACT
Purpose: This study explores the association between the duration and variation of infant sleep trajectories and subsequent cognitive school readiness at 48-50 months. Methods: Participants were 288 multi-ethnic children, within the Growing Up in Singapore Towards healthy Outcomes (GUSTO) cohort. Caregiver-reported total, night and day sleep durations were obtained at 3, 6, 9, 12, 18, 24 using the Brief Infant Sleep Questionnaire and 54 months using the Child Sleep Habits Questionnaire. Total, night and day sleep trajectories with varying durations (short, moderate, or long) and variability (consistent or variable; defined by standard errors) were identified. The cognitive school readiness test battery was administered when the children were between 48 and 50 months old. Both unadjusted adjusted analysis of variance models and adjusted analysis of covariance models (for confounders) were performed to assess associations between sleep trajectories and individual school readiness tests in the domains of language, numeracy, general cognition and memory. Results: In the unadjusted models, children with short variable total sleep trajectories had poorer performance on language tests compared to those with longer and more consistent trajectories. In both unadjusted and adjusted models, children with short variable night sleep trajectories had poorer numeracy knowledge compared to their counterparts with long consistent night sleep trajectories. There were no equivalent associations between sleep trajectories and school readiness performance for tests in the general cognition or memory domains. There were no significant findings for day sleep trajectories. Conclusion: Findings suggest that individual differences in longitudinal sleep duration patterns from as early as 3 months of age may be associated with language and numeracy aspects of school readiness at 48-50 months of age. This is important, as early school readiness, particularly the domains of language and mathematics, is a key predictor of subsequent academic achievement.
ABSTRACT
STUDY OBJECTIVES: Examine how different trajectories of reported sleep duration associate with early childhood cognition. METHODS: Caregiver-reported sleep duration data (nâ =â 330) were collected using the Brief Infant Sleep Questionnaire at 3, 6, 9, 12, 18, and 24 months and Children's Sleep Habits Questionnaire at 54 months. Multiple group-based day-, night-, and/or total sleep trajectories were derived-each differing in duration and variability. Bayley Scales of Infant and Toddler Development-III (Bayley-III) and the Kaufman Brief Intelligence Test- 2 (KBIT-2) were used to assess cognition at 24 and 54 months, respectively. RESULTS: Compared to short variable night sleep trajectory, long consistent night sleep trajectory was associated with higher scores on Bayley-III (cognition and language), while moderate/long consistent night sleep trajectories were associated with higher KBIT-2 (verbal and composite) scores. Children with a long consistent total sleep trajectory had higher Bayley-III (cognition and expressive language) and KBIT-2 (verbal and composite) scores compared to children with a short variable total sleep trajectory. Moderate consistent total sleep trajectory was associated with higher Bayley-III language and KBIT-2 verbal scores relative to the short variable total trajectory. Children with a long variable day sleep had lower Bayley-III (cognition and fine motor) and KBIT-2 (verbal and composite) scores compared to children with a short consistent day sleep trajectory. CONCLUSIONS: Longer and more consistent night- and total sleep trajectories, and a short day sleep trajectory in early childhood were associated with better cognition at 2 and 4.5 years.
Subject(s)
Child Development , Sleep Duration , Infant , Humans , Child, Preschool , CognitionABSTRACT
Failure to recognize samples from the classes unseen during training is a major limitation of artificial intelligence in the real-world implementation for recognition and classification of retinal anomalies. We establish an uncertainty-inspired open set (UIOS) model, which is trained with fundus images of 9 retinal conditions. Besides assessing the probability of each category, UIOS also calculates an uncertainty score to express its confidence. Our UIOS model with thresholding strategy achieves an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set, external target categories (TC)-JSIEC dataset and TC-unseen testing set, respectively, compared to the F1 score of 92.20%, 80.69% and 64.74% by the standard AI model. Furthermore, UIOS correctly predicts high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images. UIOS provides a robust method for real-world screening of retinal anomalies.
Subject(s)
Eye Abnormalities , Retinal Diseases , Humans , Artificial Intelligence , Algorithms , Uncertainty , Retina/diagnostic imaging , Fundus Oculi , Retinal Diseases/diagnostic imagingABSTRACT
Purpose: The COVID-19 pandemic has drastically disrupted global healthcare systems. With the higher demand for healthcare and misinformation related to COVID-19, there is a need to explore alternative models to improve communication. Artificial Intelligence (AI) and Natural Language Processing (NLP) have emerged as promising solutions to improve healthcare delivery. Chatbots could fill a pivotal role in the dissemination and easy accessibility of accurate information in a pandemic. In this study, we developed a multi-lingual NLP-based AI chatbot, DR-COVID, which responds accurately to open-ended, COVID-19 related questions. This was used to facilitate pandemic education and healthcare delivery. Methods: First, we developed DR-COVID with an ensemble NLP model on the Telegram platform (https://t.me/drcovid_nlp_chatbot). Second, we evaluated various performance metrics. Third, we evaluated multi-lingual text-to-text translation to Chinese, Malay, Tamil, Filipino, Thai, Japanese, French, Spanish, and Portuguese. We utilized 2,728 training questions and 821 test questions in English. Primary outcome measurements were (A) overall and top 3 accuracies; (B) Area Under the Curve (AUC), precision, recall, and F1 score. Overall accuracy referred to a correct response for the top answer, whereas top 3 accuracy referred to an appropriate response for any one answer amongst the top 3 answers. AUC and its relevant matrices were obtained from the Receiver Operation Characteristics (ROC) curve. Secondary outcomes were (A) multi-lingual accuracy; (B) comparison to enterprise-grade chatbot systems. The sharing of training and testing datasets on an open-source platform will also contribute to existing data. Results: Our NLP model, utilizing the ensemble architecture, achieved overall and top 3 accuracies of 0.838 [95% confidence interval (CI): 0.826-0.851] and 0.922 [95% CI: 0.913-0.932] respectively. For overall and top 3 results, AUC scores of 0.917 [95% CI: 0.911-0.925] and 0.960 [95% CI: 0.955-0.964] were achieved respectively. We achieved multi-linguicism with nine non-English languages, with Portuguese performing the best overall at 0.900. Lastly, DR-COVID generated answers more accurately and quickly than other chatbots, within 1.12-2.15 s across three devices tested. Conclusion: DR-COVID is a clinically effective NLP-based conversational AI chatbot, and a promising solution for healthcare delivery in the pandemic era.
Subject(s)
COVID-19 , Deep Learning , Humans , Natural Language Processing , Artificial Intelligence , Pandemics , IndiaABSTRACT
Precision medicine promises to transform healthcare for groups and individuals through early disease detection, refining diagnoses and tailoring treatments. Analysis of large-scale genomic-phenotypic databases is a critical enabler of precision medicine. Although Asia is home to 60% of the world's population, many Asian ancestries are under-represented in existing databases, leading to missed opportunities for new discoveries, particularly for diseases most relevant for these populations. The Singapore National Precision Medicine initiative is a whole-of-government 10-year initiative aiming to generate precision medicine data of up to one million individuals, integrating genomic, lifestyle, health, social and environmental data. Beyond technologies, routine adoption of precision medicine in clinical practice requires social, ethical, legal and regulatory barriers to be addressed. Identifying driver use cases in which precision medicine results in standardized changes to clinical workflows or improvements in population health, coupled with health economic analysis to demonstrate value-based healthcare, is a vital prerequisite for responsible health system adoption.
Subject(s)
Delivery of Health Care , Precision Medicine , Humans , Singapore , Precision Medicine/methods , AsiaABSTRACT
Genomic researchers increasingly utilize commercial cloud service providers (CSPs) to manage data and analytics needs. CSPs allow researchers to grow Information Technology (IT) infrastructure on demand to overcome bottlenecks when combining large datasets. However, without adequate security controls, the risk of unauthorized access may be higher for data stored on the cloud. Additionally, regulators are mandating data access patterns and specific security protocols for the storage and use of genomic data. While CSP provides tools for security and regulatory compliance, building the necessary controls required for cloud solutions is not trivial. Research Assets Provisioning and Tracking Online Repository (RAPTOR) by the Genome Institute of Singapore is a cloud-native genomics data repository and analytics platform that implements a "five-safes" framework to provide security and governance controls to data contributors and users, leveraging CSP for sharing and analysis of genomic datasets without the risk of security breaches or running afoul of regulations.
ABSTRACT
Medical artificial intelligence (AI) has tremendous potential to advance healthcare by supporting and contributing to the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving both healthcare provider and patient experience. Unlocking this potential requires systematic, quantitative evaluation of the performance of medical AI models on large-scale, heterogeneous data capturing diverse patient populations. Here, to meet this need, we introduce MedPerf, an open platform for benchmarking AI models in the medical domain. MedPerf focuses on enabling federated evaluation of AI models, by securely distributing them to different facilities, such as healthcare organizations. This process of bringing the model to the data empowers each facility to assess and verify the performance of AI models in an efficient and human-supervised process, while prioritizing privacy. We describe the current challenges healthcare and AI communities face, the need for an open platform, the design philosophy of MedPerf, its current implementation status and real-world deployment, our roadmap and, importantly, the use of MedPerf with multiple international institutions within cloud-based technology and on-premises scenarios. Finally, we welcome new contributions by researchers and organizations to further strengthen MedPerf as an open benchmarking platform.
ABSTRACT
Edge devices demand low energy consumption, cost, and small form factor. To efficiently deploy convolutional neural network (CNN) models on the edge device, energy-aware model compression becomes extremely important. However, existing work did not study this problem well because of the lack of considering the diversity of dataflow types in hardware architectures. In this article, we propose EDCompress (EDC), an energy-aware model compression method for various dataflows. It can effectively reduce the energy consumption of various edge devices, with different dataflow types. Considering the very nature of model compression procedures, we recast the optimization process to a multistep problem and solve it by reinforcement learning algorithms. We also propose a multidimensional multistep (MDMS) optimization method, which shows higher compressing capability than the traditional multistep method. Experiments show that EDC could improve 20x, 17x, and 26x energy efficiency in VGG-16, MobileNet, and LeNet-5 networks, respectively, with negligible loss of accuracy. EDC could also indicate the optimal dataflow type for specific neural networks in terms of energy consumption, which can guide the deployment of CNN on hardware.
ABSTRACT
Spiking neural networks (SNNs) have advantages in latency and energy efficiency over traditional artificial neural networks (ANNs) due to their event-driven computation mechanism and the replacement of energy-consuming weight multiplication with addition. However, to achieve high accuracy, it usually requires long spike trains to ensure accuracy, usually more than 1000 time steps. This offsets the computation efficiency brought by SNNs because a longer spike train means a larger number of operations and larger latency. In this article, we propose a radix-encoded SNN, which has ultrashort spike trains. Specifically, it is able to use less than six time steps to achieve even higher accuracy than its traditional counterpart. We also develop a method to fit our radix encoding technique into the ANN-to-SNN conversion approach so that we can train radix-encoded SNNs more efficiently on mature platforms and hardware. Experiments show that our radix encoding can achieve 25 × improvement in latency and 1.7% improvement in accuracy compared to the state-of-the-art method using the VGG-16 network on the CIFAR-10 dataset.
ABSTRACT
Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds to a text query. Existing approaches mainly solve the NLVL problem from the perspective of computer vision by formulating it as ranking, anchor, or regression tasks. These methods suffer from large performance degradation when localizing on long videos. In this work, we address the NLVL from a new perspective, i.e., span-based question answering (QA), by treating the input video as a text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework (named VSLBase), to address NLVL. VSLNet tackles the differences between NLVL and span-based QA through a simple yet effective query-guided highlighting (QGH) strategy. QGH guides VSLNet to search for the matching video span within a highlighted region. To address the performance degradation on long videos, we further extend VSLNet to VSLNet-L by applying a multi-scale split-and-concatenation strategy. VSLNet-L first splits the untrimmed video into short clip segments; then, it predicts which clip segment contains the target moment and suppresses the importance of other segments. Finally, the clip segments are concatenated, with different confidences, to locate the target moment accurately. Extensive experiments on three benchmark datasets show that the proposed VSLNet and VSLNet-L outperform the state-of-the-art methods; VSLNet-L addresses the issue of performance degradation on long videos. Our study suggests that the span-based QA framework is an effective strategy to solve the NLVL problem.
ABSTRACT
Cross-modal retrieval (CMR) enables flexible retrieval experience across different modalities (e.g., texts versus images), which maximally benefits us from the abundance of multimedia data. Existing deep CMR approaches commonly require a large amount of labeled data for training to achieve high performance. However, it is time-consuming and expensive to annotate the multimedia data manually. Thus, how to transfer valuable knowledge from existing annotated data to new data, especially from the known categories to new categories, becomes attractive for real-world applications. To achieve this end, we propose a deep multimodal transfer learning (DMTL) approach to transfer the knowledge from the previously labeled categories (source domain) to improve the retrieval performance on the unlabeled new categories (target domain). Specifically, we employ a joint learning paradigm to transfer knowledge by assigning a pseudolabel to each target sample. During training, the pseudolabel is iteratively updated and passed through our model in a self-supervised manner. At the same time, to reduce the domain discrepancy of different modalities, we construct multiple modality-specific neural networks to learn a shared semantic space for different modalities by enforcing the compactness of homoinstance samples and the scatters of heteroinstance samples. Our method is remarkably different from most of the existing transfer learning approaches. To be specific, previous works usually assume that the source domain and the target domain have the same label set. In contrast, our method considers a more challenging multimodal learning situation where the label sets of the two domains are different or even disjoint. Experimental studies on four widely used benchmarks validate the effectiveness of the proposed method in multimodal transfer learning and demonstrate its superior performance in CMR compared with 11 state-of-the-art methods.
ABSTRACT
Accurate skin lesion diagnosis requires a great effort from experts to identify the characteristics from clinical and dermoscopic images. Deep multimodal learning-based methods can reduce intra- and inter-reader variability and improve diagnostic accuracy compared to the single modality-based methods. This study develops a novel method, named adversarial multimodal fusion with attention mechanism (AMFAM), to perform multimodal skin lesion classification. Specifically, we adopt a discriminator that uses adversarial learning to enforce the feature extractor to learn the correlated information explicitly. Moreover, we design an attention-based reconstruction strategy to encourage the feature extractor to concentrate on learning the features of the lesion area, thus, enhancing the feature vector from each modality with more discriminative information. Unlike existing multimodal-based approaches, which only focus on learning complementary features from dermoscopic and clinical images, our method considers both correlated and complementary information of the two modalities for multimodal fusion. To verify the effectiveness of our method, we conduct comprehensive experiments on a publicly available multimodal and multi-task skin lesion classification dataset: 7-point criteria evaluation database. The experimental results demonstrate that our proposed method outperforms the current state-of-the-art methods and improves the average AUC score by above 2% on the test set.