Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

A dual-modal graph learning framework for identifying interaction events among chemical and biotech drugs.

Ru, Zhongying; Wu, Yangyang; Shao, Jinning; Yin, Jianwei; Qian, Linghui; Miao, Xiaoye.

Brief Bioinform ; 24(5)2023 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-37507113

RESUMO

Drug-drug interaction (DDI) identification is essential to clinical medicine and drug discovery. The two categories of drugs (i.e. chemical drugs and biotech drugs) differ remarkably in molecular properties, action mechanisms, etc. Biotech drugs are up-to-comers but highly promising in modern medicine due to higher specificity and fewer side effects. However, existing DDI prediction methods only consider chemical drugs of small molecules, not biotech drugs of large molecules. Here, we build a large-scale dual-modal graph database named CB-DB and customize a graph-based framework named CB-TIP to reason event-aware DDIs for both chemical and biotech drugs. CB-DB comprehensively integrates various interaction events and two heterogeneous kinds of molecular structures. It imports endogenous proteins founded on the fact that most drugs take effects by interacting with endogenous proteins. In the modality of molecular structure, drugs and endogenous proteins are two heterogeneous kinds of graphs, while in the modality of interaction, they are nodes connected by events (i.e. edges of different relationships). CB-TIP employs graph representation learning methods to generate drug representations from either modality and then contrastively mixes them to predict how likely an event occurs when a drug meets another in an end-to-end manner. Experiments demonstrate CB-TIP's great superiority in DDI prediction and the promising potential of uncovering novel DDIs.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Descoberta de Drogas , Estrutura Molecular , Proteínas

2.

Hybrid multimodal fusion for graph learning in disease prediction.

Wang, Ruomei; Guo, Wei; Wang, Yongjie; Zhou, Xin; Leung, Jonathan Cyril; Yan, Shuo; Cui, Lizhen.

Methods ; 229: 41-48, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38880433

RESUMO

Graph neural networks (GNNs) have gained significant attention in disease prediction where the latent embeddings of patients are modeled as nodes and the similarities among patients are represented through edges. The graph structure, which determines how information is aggregated and propagated, plays a crucial role in graph learning. Recent approaches typically create graphs based on patients' latent embeddings, which may not accurately reflect their real-world closeness. Our analysis reveals that raw data, such as demographic attributes and laboratory results, offers a wealth of information for assessing patient similarities and can serve as a compensatory measure for graphs constructed exclusively from latent embeddings. In this study, we first construct adaptive graphs from both latent representations and raw data respectively, and then merge these graphs via weighted summation. Given that the graphs may contain extraneous and noisy connections, we apply degree-sensitive edge pruning and kNN sparsification techniques to selectively sparsify and prune these edges. We conducted intensive experiments on two diagnostic prediction datasets, and the results demonstrate that our proposed method surpasses current state-of-the-art techniques.

Assuntos

Redes Neurais de Computação , Humanos , Aprendizado de Máquina , Algoritmos

3.

TransCDR: a deep learning model for enhancing the generalizability of drug activity prediction through transfer learning and multimodal data fusion.

Xia, Xiaoqiong; Zhu, Chaoyu; Zhong, Fan; Liu, Lei.

BMC Biol ; 22(1): 227, 2024 Oct 09.

Artigo em Inglês | MEDLINE | ID: mdl-39385185

RESUMO

BACKGROUND: Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel drugs or cell lines. RESULTS: We introduce TransCDR, which uses transfer learning to learn drug representations and fuses multi-modality features of drugs and cell lines by a self-attention mechanism, to predict the IC50 values or sensitive states of drugs on cell lines. We are the first to systematically evaluate the generalization of the CDR prediction model to novel (i.e., never-before-seen) compound scaffolds and cell line clusters. TransCDR shows better generalizability than 8 state-of-the-art models. TransCDR outperforms its 5 variants that train drug encoders (i.e., RNN and AttentiveFP) from scratch under various scenarios. The most critical contributors among multiple drug notations and omics profiles are Extended Connectivity Fingerprint and genetic mutation. Additionally, the attention-based fusion module further enhances the predictive performance of TransCDR. TransCDR, trained on the GDSC dataset, demonstrates strong predictive performance on the external testing set CCLE. It is also utilized to predict missing CDRs on GDSC. Moreover, we investigate the biological mechanisms underlying drug response by classifying 7675 patients from TCGA into drug-sensitive or drug-resistant groups, followed by a Gene Set Enrichment Analysis. CONCLUSIONS: TransCDR emerges as a potent tool with significant potential in drug response prediction.

Assuntos

Antineoplásicos , Aprendizado Profundo , Humanos , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Medicina de Precisão/métodos

4.

MMDB: Multimodal dual-branch model for multi-functional bioactive peptide prediction.

Kang, Yan; Zhang, Huadong; Wang, Xinchao; Yang, Yun; Jia, Qi.

Anal Biochem ; 690: 115491, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38460901

RESUMO

Bioactive peptides can hinder oxidative processes and microbial spoilage in foodstuffs and play important roles in treating diverse diseases and disorders. While most of the methods focus on single-functional bioactive peptides and have obtained promising prediction performance, it is still a significant challenge to accurately detect complex and diverse functions simultaneously with the quick increase of multi-functional bioactive peptides. In contrast to previous research on multi-functional bioactive peptide prediction based solely on sequence, we propose a novel multimodal dual-branch (MMDB) lightweight deep learning model that designs two different branches to effectively capture the complementary information of peptide sequence and structural properties. Specifically, a multi-scale dilated convolution with Bi-LSTM branch is presented to effectively model the different scales sequence properties of peptides while a multi-layer convolution branch is proposed to capture structural information. To the best of our knowledge, this is the first effective extraction of peptide sequence features using multi-scale dilated convolution without parameter increase. Multimodal features from both branches are integrated via a fully connected layer for multi-label classification. Compared to state-of-the-art methods, our MMDB model exhibits competitive results across metrics, with a 9.1% Coverage increase and 5.3% and 3.5% improvements in Precision and Accuracy, respectively.

5.

Tracing Microplastic Aging Processes Using Multimodal Deep Learning: A Predictive Model for Enhanced Traceability.

Li, Yunlong; Wang, Xue; Zhang, Han; Wang, Qing; Cao, Xun; Gong, Rongyi; Guo, Jianli; Shan, Jiajia.

Environ Sci Technol ; 2024 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-39251361

RESUMO

The aging process of microplastics (MPs) affects their surface physicochemical properties, thereby influencing their behaviors in releasing harmful chemicals, adsorption of organic contaminants, sinking, and more. Understanding the aging process is crucial for evaluating MPs' environmental behaviors and risks, but tracing the aging process remains challenging. Here, we propose a multimodal deep learning model to trace typical aging factors of aged MPs based on MPs' physicochemical characteristics. A total of 1353 surface morphology images and 1353 Fourier transform infrared spectroscopy spectra were achieved from 130 aged MPs undergoing different aging processes, demonstrating that physicochemical properties of aged MPs vary from aging processes. The multimodal deep learning model achieved an accuracy of 93% in predicting the major aging factors of aged MPs. The multimodal deep learning model improves the model's accuracy by approximately 5-20% and reduces prediction bias compared to the single-modal model. In practice, the established model was performed to predict the major aging factors of naturally aged MPs collected from typical environment matrices. The prediction results aligned with the aging conditions of specific environments, as reported in previous studies. Our findings provide new insights into tracing and understanding the plastic aging process, contributing more accurately to the environmental risk assessment of aged MPs.

6.

Perspectives on Advancing Multimodal Learning in Environmental Science and Engineering Studies.

Liu, Wenjia; Chen, Jingwen; Wang, Haobo; Fu, Zhiqiang; Peijnenburg, Willie J G M; Hong, Huixiao.

Environ Sci Technol ; 2024 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-39226136

RESUMO

The environment faces increasing anthropogenic impacts, resulting in a rapid increase in environmental issues that undermine the natural capital essential for human wellbeing. These issues are complex and often influenced by various factors represented by data with different modalities. While machine learning (ML) provides data-driven tools for addressing the environmental issues, the current ML models in environmental science and engineering (ES&E) often neglect the utilization of multimodal data. With the advancement in deep learning, multimodal learning (MML) holds promise for comprehensive descriptions of the environmental issues by harnessing data from diverse modalities. This advancement has the potential to significantly elevate the accuracy and robustness of prediction models in ES&E studies, providing enhanced solutions for various environmental modeling tasks. This perspective summarizes MML methodologies and proposes potential applications of MML models in ES&E studies, including environmental quality assessment, prediction of chemical hazards, and optimization of pollution control techniques. Additionally, we discuss the challenges associated with implementing MML in ES&E and propose future research directions in this domain.

7.

A novel deep learning model based on transformer and cross modality attention for classification of sleep stages.

Mostafaei, Sahar Hassanzadeh; Tanha, Jafar; Sharafkhaneh, Amir.

J Biomed Inform ; 157: 104689, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39029770

RESUMO

The classification of sleep stages is crucial for gaining insights into an individual's sleep patterns and identifying potential health issues. Employing several important physiological channels in different views, each providing a distinct perspective on sleep patterns, can have a great impact on the efficiency of the classification models. In the context of neural networks and deep learning models, transformers are very effective, especially when dealing with time series data, and have shown remarkable compatibility with sequential data analysis as physiological channels. On the other hand, cross-modality attention by integrating information from multiple views of the data enables to capture relationships among different modalities, allowing models to selectively focus on relevant information from each modality. In this paper, we introduce a novel deep-learning model based on transformer encoder-decoder and cross-modal attention for sleep stage classification. The proposed model processes information from various physiological channels with different modalities using the Sleep Heart Health Study Dataset (SHHS) data and leverages transformer encoders for feature extraction and cross-modal attention for effective integration to feed into the transformer decoder. The combination of these elements increased the accuracy of the model up to 91.33% in classifying five classes of sleep stages. Empirical evaluations demonstrated the model's superior performance compared to standalone approaches and other state-of-the-art techniques, showcasing the potential of combining transformer and cross-modal attention for improved sleep stage classification.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Fases do Sono , Humanos , Fases do Sono/fisiologia , Polissonografia/métodos , Eletroencefalografia/métodos , Algoritmos , Processamento de Sinais Assistido por Computador , Masculino

8.

Integrating Biomarkers From Virtual Reality and Magnetic Resonance Imaging for the Early Detection of Mild Cognitive Impairment Using a Multimodal Learning Approach: Validation Study.

Park, Bogyeom; Kim, Yuwon; Park, Jinseok; Choi, Hojin; Kim, Seong-Eun; Ryu, Hokyoung; Seo, Kyoungwon.

J Med Internet Res ; 26: e54538, 2024 Apr 17.

Artigo em Inglês | MEDLINE | ID: mdl-38631021

RESUMO

BACKGROUND: Early detection of mild cognitive impairment (MCI), a transitional stage between normal aging and Alzheimer disease, is crucial for preventing the progression of dementia. Virtual reality (VR) biomarkers have proven to be effective in capturing behaviors associated with subtle deficits in instrumental activities of daily living, such as challenges in using a food-ordering kiosk, for early detection of MCI. On the other hand, magnetic resonance imaging (MRI) biomarkers have demonstrated their efficacy in quantifying observable structural brain changes that can aid in early MCI detection. Nevertheless, the relationship between VR-derived and MRI biomarkers remains an open question. In this context, we explored the integration of VR-derived and MRI biomarkers to enhance early MCI detection through a multimodal learning approach. OBJECTIVE: We aimed to evaluate and compare the efficacy of VR-derived and MRI biomarkers in the classification of MCI while also examining the strengths and weaknesses of each approach. Furthermore, we focused on improving early MCI detection by leveraging multimodal learning to integrate VR-derived and MRI biomarkers. METHODS: The study encompassed a total of 54 participants, comprising 22 (41%) healthy controls and 32 (59%) patients with MCI. Participants completed a virtual kiosk test to collect 4 VR-derived biomarkers (hand movement speed, scanpath length, time to completion, and the number of errors), and T1-weighted MRI scans were performed to collect 22 MRI biomarkers from both hemispheres. Analyses of covariance were used to compare these biomarkers between healthy controls and patients with MCI, with age considered as a covariate. Subsequently, the biomarkers that exhibited significant differences between the 2 groups were used to train and validate a multimodal learning model aimed at early screening for patients with MCI among healthy controls. RESULTS: The support vector machine (SVM) using only VR-derived biomarkers achieved a sensitivity of 87.5% and specificity of 90%, whereas the MRI biomarkers showed a sensitivity of 90.9% and specificity of 71.4%. Moreover, a correlation analysis revealed a significant association between MRI-observed brain atrophy and impaired performance in instrumental activities of daily living in the VR environment. Notably, the integration of both VR-derived and MRI biomarkers into a multimodal SVM model yielded superior results compared to unimodal SVM models, achieving higher accuracy (94.4%), sensitivity (100%), specificity (90.9%), precision (87.5%), and F1-score (93.3%). CONCLUSIONS: The results indicate that VR-derived biomarkers, characterized by their high specificity, can be valuable as a robust, early screening tool for MCI in a broader older adult population. On the other hand, MRI biomarkers, known for their high sensitivity, excel at confirming the presence of MCI. Moreover, the multimodal learning approach introduced in our study provides valuable insights into the improvement of early MCI detection by integrating a diverse set of biomarkers.

Assuntos

Doença de Alzheimer , Disfunção Cognitiva , Realidade Virtual , Humanos , Idoso , Atividades Cotidianas , Disfunção Cognitiva/patologia , Imageamento por Ressonância Magnética/métodos , Doença de Alzheimer/diagnóstico , Biomarcadores

9.

Deep learning model integrating positron emission tomography and clinical data for prognosis prediction in non-small cell lung cancer patients.

Oh, Seungwon; Kang, Sae-Ryung; Oh, In-Jae; Kim, Min-Soo.

BMC Bioinformatics ; 24(1): 39, 2023 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-36747153

RESUMO

BACKGROUND: Lung cancer is the leading cause of cancer-related deaths worldwide. The majority of lung cancers are non-small cell lung cancer (NSCLC), accounting for approximately 85% of all lung cancer types. The Cox proportional hazards model (CPH), which is the standard method for survival analysis, has several limitations. The purpose of our study was to improve survival prediction in patients with NSCLC by incorporating prognostic information from F-18 fluorodeoxyglucose positron emission tomography (FDG PET) images into a traditional survival prediction model using clinical data. RESULTS: The multimodal deep learning model showed the best performance, with a C-index and mean absolute error of 0.756 and 399 days under a five-fold cross-validation, respectively, followed by ResNet3D for PET (0.749 and 405 days) and CPH for clinical data (0.747 and 583 days). CONCLUSION: The proposed deep learning-based integrative model combining the two modalities improved the survival prediction in patients with NSCLC.

Assuntos

Carcinoma Pulmonar de Células não Pequenas , Aprendizado Profundo , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico por imagem , Fluordesoxiglucose F18 , Compostos Radiofarmacêuticos , Tomografia por Emissão de Pósitrons , Estudos Retrospectivos

10.

Temporal configuration and modality of components determine the performance of bumble bees during the learning of a multimodal signal.

Riveros, Andre J.

J Exp Biol ; 226(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36601985

RESUMO

Across communicative systems, the ability of compound signals to enhance receiver's perception and decoding is a potent explanation for the evolution of complexity. In nature, complex signaling involves spatiotemporal variation in perception of signal components; yet, how the synchrony between components affects performance of the receiver is much less understood. In the coevolution of plants and pollinators, bees are a model for understanding how visual and chemical components of floral displays may interact to influence performance. Understanding whether the temporal dimension of signal components impacts performance is central for evaluating hypotheses about the facilitation of information processing and for predicting how particular trait combinations function in nature. Here, I evaluated the role of the temporal dimension by testing the performance of bumble bees under restrained conditions while learning a bimodal (olfactory and visual) stimulus. I trained bumble bees under six different stimuli varying in their internal synchrony and structure. I also evaluated the acquisition of the individual components. I show that the temporal configuration and the identity of the components impact their combined and separate acquisition. Performance was favored by partial asynchrony and the initial presentation of the visual component, leading to higher acquisition of the olfactory component. This indicates that compound stimuli resembling the partially synchronous presentation of a floral display favor performance in a pollinator, thus highlighting the time dimension as crucial for the enhancement. Moreover, this supports the hypothesis that the evolution of multimodal floral signals may have been favored by the asynchrony perceived by the receiver during free flight.

Assuntos

Flores , Aprendizagem , Abelhas , Animais , Plantas , Olfato , Cognição , Polinização

11.

A scoping review on multimodal deep learning in biomedical images and texts.

Sun, Zhaoyi; Lin, Mingquan; Zhu, Qingqing; Xie, Qianqian; Wang, Fei; Lu, Zhiyong; Peng, Yifan.

J Biomed Inform ; 146: 104482, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37652343

RESUMO

OBJECTIVE: Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. METHODS: In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. RESULT: This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. CONCLUSION: Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.

Assuntos

Aprendizado Profundo , Diagnóstico por Imagem , Semântica , Processamento de Linguagem Natural , Diagnóstico por Computador

12.

Multimodal learning on graphs for disease relation extraction.

Lin, Yucong; Lu, Keming; Yu, Sheng; Cai, Tianxi; Zitnik, Marinka.

J Biomed Inform ; 143: 104415, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37276949

RESUMO

Disease knowledge graphs have emerged as a powerful tool for artificial intelligence to connect, organize, and access diverse information about diseases. Relations between disease concepts are often distributed across multiple datasets, including unstructured plain text datasets and incomplete disease knowledge graphs. Extracting disease relations from multimodal data sources is thus crucial for constructing accurate and comprehensive disease knowledge graphs. We introduce REMAP, a multimodal approach for disease relation extraction. The REMAP machine learning approach jointly embeds a partial, incomplete knowledge graph and a medical language dataset into a compact latent vector space, aligning the multimodal embeddings for optimal disease relation extraction. Additionally, REMAP utilizes a decoupled model structure to enable inference in single-modal data, which can be applied under missing modality scenarios. We apply the REMAP approach to a disease knowledge graph with 96,913 relations and a text dataset of 1.24 million sentences. On a dataset annotated by human experts, REMAP improves language-based disease relation extraction by 10.0% (accuracy) and 17.2% (F1-score) by fusing disease knowledge graphs with language information. Furthermore, REMAP leverages text information to recommend new relationships in the knowledge graph, outperforming graph-based methods by 8.4% (accuracy) and 10.4% (F1-score). REMAP is a flexible multimodal approach for extracting disease relations by fusing structured knowledge and language information. This approach provides a powerful model to easily find, access, and evaluate relations between disease concepts.

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Humanos , Unified Medical Language System , Idioma , Processamento de Linguagem Natural

13.

A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth.

Culley, Christopher; Vijayakumar, Supreeta; Zampieri, Guido; Angione, Claudio.

Proc Natl Acad Sci U S A ; 117(31): 18869-18879, 2020 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-32675233

RESUMO

Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype-phenotype-environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning-based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143 Saccharomyces cerevisiae mutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.

Assuntos

Aprendizado de Máquina , Análise do Fluxo Metabólico/métodos , Saccharomyces cerevisiae , Biologia de Sistemas/métodos , Modelos Biológicos , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Transcriptoma

14.

Multimodal Federated Learning: A Survey.

Che, Liwei; Wang, Jiaqi; Zhou, Yao; Ma, Fenglong.

Sensors (Basel) ; 23(15)2023 Aug 06.

Artigo em Inglês | MEDLINE | ID: mdl-37571768

RESUMO

Federated learning (FL), which provides a collaborative training scheme for distributed data sources with privacy concerns, has become a burgeoning and attractive research area. Most existing FL studies focus on taking unimodal data, such as image and text, as the model input and resolving the heterogeneity challenge, i.e., the challenge of non-identical distribution (non-IID) caused by a data distribution imbalance related to data labels and data amount. In real-world applications, data are usually described by multiple modalities. However, to the best of our knowledge, only a handful of studies have been conducted to improve system performance utilizing multimodal data. In this survey paper, we identify the significance of this emerging research topic of multimodal federated learning (MFL) and present a literature review on the state-of-art MFL methods. Furthermore, we categorize multimodal federated learning into congruent and incongruent multimodal federated learning based on whether all clients possess the same modal combinations. We investigate the feasible application tasks and related benchmarks for MFL. Lastly, we summarize the promising directions and fundamental challenges in this field for future research.

15.

IDAF: Iterative Dual-Scale Attentional Fusion Network for Automatic Modulation Recognition.

Liu, Bohan; Ge, Ruixing; Zhu, Yuxuan; Zhang, Bolin; Zhang, Xiaokai; Bao, Yanfei.

Sensors (Basel) ; 23(19)2023 Sep 28.

Artigo em Inglês | MEDLINE | ID: mdl-37836964

RESUMO

Recently, deep learning models have been widely applied to modulation recognition, and they have become a hot topic due to their excellent end-to-end learning capabilities. However, current methods are mostly based on uni-modal inputs, which suffer from incomplete information and local optimization. To complement the advantages of different modalities, we focus on the multi-modal fusion method. Therefore, we introduce an iterative dual-scale attentional fusion (iDAF) method to integrate multimodal data. Firstly, two feature maps with different receptive field sizes are constructed using local and global embedding layers. Secondly, the feature inputs are iterated into the iterative dual-channel attention module (iDCAM), where the two branches capture the details of high-level features and the global weights of each modal channel, respectively. The iDAF not only extracts the recognition characteristics of each of the specific domains, but also complements the strengths of different modalities to obtain a fruitful view. Our iDAF achieves a recognition accuracy of 93.5% at 10 dB and 0.6232 at full signal-to-noise ratio (SNR). The comparative experiments and ablation studies effectively demonstrate the effectiveness and superiority of the iDAF.

16.

Progressive Learning of a Multimodal Classifier Accounting for Different Modality Combinations.

John, Vijay; Kawanishi, Yasutomo.

Sensors (Basel) ; 23(10)2023 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-37430579

RESUMO

In classification tasks, such as face recognition and emotion recognition, multimodal information is used for accurate classification. Once a multimodal classification model is trained with a set of modalities, it estimates the class label by using the entire modality set. A trained classifier is typically not formulated to perform classification for various subsets of modalities. Thus, the model would be useful and portable if it could be used for any subset of modalities. We refer to this problem as the multimodal portability problem. Moreover, in the multimodal model, classification accuracy is reduced when one or more modalities are missing. We term this problem the missing modality problem. This article proposes a novel deep learning model, termed KModNet, and a novel learning strategy, termed progressive learning, to simultaneously address missing modality and multimodal portability problems. KModNet, formulated with the transformer, contains multiple branches corresponding to different k-combinations of the modality set S. KModNet is trained using a multi-step progressive learning framework, where the k-th step uses a k-modal model to train different branches up to the k-th combination branch. To address the missing modality problem, the training multimodal data is randomly ablated. The proposed learning framework is formulated and validated using two multimodal classification problems: audio-video-thermal person classification and audio-video emotion classification. The two classification problems are validated using the Speaking Faces, RAVDESS, and SAVEE datasets. The results demonstrate that the progressive learning framework enhances the robustness of multimodal classification, even under the conditions of missing modalities, while being portable to different modality subsets.

Assuntos

Fontes de Energia Elétrica , Reconhecimento Facial , Humanos , Emoções , Reconhecimento Psicológico

17.

Multimodal Early Birth Weight Prediction Using Multiple Kernel Learning.

Camargo-Marín, Lisbeth; Guzmán-Huerta, Mario; Piña-Ramirez, Omar; Perez-Gonzalez, Jorge.

Sensors (Basel) ; 24(1)2023 Dec 19.

Artigo em Inglês | MEDLINE | ID: mdl-38202864

RESUMO

In this work, a novel multimodal learning approach for early prediction of birth weight is presented. Fetal weight is one of the most relevant indicators in the assessment of fetal health status. The aim is to predict early birth weight using multimodal maternal-fetal variables from the first trimester of gestation (Anthropometric data, as well as metrics obtained from Fetal Biometry, Doppler and Maternal Ultrasound). The proposed methodology starts with the optimal selection of a subset of multimodal features using an ensemble-based approach of feature selectors. Subsequently, the selected variables feed the nonparametric Multiple Kernel Learning regression algorithm. At this stage, a set of kernels is selected and weighted to maximize performance in birth weight prediction. The proposed methodology is validated and compared with other computational learning algorithms reported in the state of the art. The obtained results (absolute error of 234 g) suggest that the proposed methodology can be useful as a tool for the early evaluation and monitoring of fetal health status through indicators such as birth weight.

Assuntos

Feto , Cuidado Pré-Natal , Humanos , Feminino , Gravidez , Peso ao Nascer , Algoritmos , Antropometria

18.

Effective Techniques for Multimodal Data Fusion: A Comparative Analysis.

Pawlowski, Maciej; Wróblewska, Anna; Sysko-Romanczuk, Sylwia.

Sensors (Basel) ; 23(5)2023 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-36904585

RESUMO

Data processing in robotics is currently challenged by the effective building of multimodal and common representations. Tremendous volumes of raw data are available and their smart management is the core concept of multimodal learning in a new paradigm for data fusion. Although several techniques for building multimodal representations have been proven successful, they have not yet been analyzed and compared in a given production setting. This paper explored three of the most common techniques, (1) the late fusion, (2) the early fusion, and (3) the sketch, and compared them in classification tasks. Our paper explored different types of data (modalities) that could be gathered by sensors serving a wide range of sensor applications. Our experiments were conducted on Amazon Reviews, MovieLens25M, and Movie-Lens1M datasets. Their outcomes allowed us to confirm that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination. Consequently, we designed criteria for choosing this optimal data fusion technique.

19.

Use of multimodal dataset in AI for detecting glaucoma based on fundus photographs assessed with OCT: focus group study on high prevalence of myopia.

Lim, Wee Shin; Ho, Heng-Yen; Ho, Heng-Chen; Chen, Yan-Wu; Lee, Chih-Kuo; Chen, Pao-Ju; Lai, Feipei; Jang, Jyh-Shing Roger; Ko, Mei-Lan.

BMC Med Imaging ; 22(1): 206, 2022 11 24.

Artigo em Inglês | MEDLINE | ID: mdl-36434508

RESUMO

BACKGROUND: Glaucoma is one of the major causes of blindness; it is estimated that over 110 million people will be affected by glaucoma worldwide by 2040. Research on glaucoma detection using deep learning technology has been increasing, but the diagnosis of glaucoma in a large population with high incidence of myopia remains a challenge. This study aimed to provide a decision support system for the automatic detection of glaucoma using fundus images, which can be applied for general screening, especially in areas of high incidence of myopia. METHODS: A total of 1,155 fundus images were acquired from 667 individuals with a mean axial length of 25.60 ± 2.0 mm at the National Taiwan University Hospital, Hsinchu Br. These images were graded based on the findings of complete ophthalmology examinations, visual field test, and optical coherence tomography into three groups: normal (N, n = 596), pre-perimetric glaucoma (PPG, n = 66), and glaucoma (G, n = 493), and divided into a training-validation (N: 476, PPG: 55, G: 373) and test (N: 120, PPG: 11, G: 120) sets. A multimodal model with the Xception model as image feature extraction and machine learning algorithms [random forest (RF), support vector machine (SVM), dense neural network (DNN), and others] was applied. RESULTS: The Xception model classified the N, PPG, and G groups with 93.9% of the micro-average area under the receiver operating characteristic curve (AUROC) with tenfold cross-validation. Although normal and glaucoma sensitivity can reach 93.51% and 86.13% respectively, the PPG sensitivity was only 30.27%. The AUROC increased to 96.4% in the N + PPG and G groups. The multimodal model with the N + PPG and G groups showed that the AUROCs of RF, SVM, and DNN were 99.56%, 99.59%, and 99.10%, respectively; The N and PPG + G groups had less than 1% difference. The test set showed an overall 3%-5% less AUROC than the validation results. CONCLUSION: The multimodal model had good AUROC while detecting glaucoma in a population with high incidence of myopia. The model shows the potential for general automatic screening and telemedicine, especially in Asia. TRIAL REGISTRATION: The study was approved by the Institutional Review Board of the National Taiwan University Hospital, Hsinchu Branch (no. NTUHHCB 108-025-E).

Assuntos

Glaucoma , Miopia , Humanos , Prevalência , Grupos Focais , Glaucoma/diagnóstico por imagem , Glaucoma/epidemiologia , Miopia/diagnóstico por imagem , Miopia/epidemiologia , Inteligência Artificial

20.

Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization.

Tominaga, Rihito; Seo, Masataka.

Sensors (Basel) ; 23(1)2022 Dec 26.

Artigo em Inglês | MEDLINE | ID: mdl-36616847

RESUMO

Image generation from natural language has become a very promising area of research on multimodal learning in recent years. In recent years, the performance of this theme has improved rapidly, and the release of powerful tools has caused a great response in various places. The Stacked Generative Adversarial Networks (StackGAN) model is a representative method to generate images from text descriptions. Although it can generate high-resolution images, it involves several limitations; some of the images generated are typically unintelligible, and mode collapse may occur. Therefore, in this study, we aim to solve these two problems to generate images that follow a given text description more closely. First, we incorporate a new consistency regularization technique for conditional generation tasks into StackGAN, called Improved Consistency Regularization or ICR. The ICR technique learns the meaning of data by matching the semantic information of input data before and after data augmentation, and can also stabilize learning in adversarial networks. In this research, this method mainly suppresses mode collapse by expanding the variation of generated images. However, this method may lead to excessive variations in the generated images, which may result in images that do not match the meaning of the input text or that are ambiguous. Therefore, we further propose a new regularization method called ICCR as a modification of ICR, which is designed to perform conditional generation tasks and eliminate the negative impacts of the generator. This method realized the generation of various images along the input text. The proposed StackGAN with ICCR performed 16% better than StackGAN and 4% better than StackGAN with ICR and AttnGAN on the Inception Score using the CUB dataset. AttnGAN, similar to StackGAN, is a GAN-based text-to-image model that incorporates the attention mechanism, which has achieved great results in recent years. It is very important that our proposed model, which incorporates ICCR into a simple model, obtained better results than AttnGAN. In addition, StackGAN with ICCR was effective in eliminating mode collapse. The probability of mode collapse in the original StackGAN was 20%, while in StackGAN with ICCR the probability was 0%. In the questionnaire survey, our proposed method was rated 18% higher than StackGAN with ICR. This indicates that ICCR is more effective for conditional tasks than ICR.

Assuntos

Processamento de Imagem Assistida por Computador , Envio de Mensagens de Texto , Processamento de Imagem Assistida por Computador/métodos , Semântica , Probabilidade , Inquéritos e Questionários

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA