Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 124
Filtrar
1.
Environ Sci Technol ; 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39226136

RESUMO

The environment faces increasing anthropogenic impacts, resulting in a rapid increase in environmental issues that undermine the natural capital essential for human wellbeing. These issues are complex and often influenced by various factors represented by data with different modalities. While machine learning (ML) provides data-driven tools for addressing the environmental issues, the current ML models in environmental science and engineering (ES&E) often neglect the utilization of multimodal data. With the advancement in deep learning, multimodal learning (MML) holds promise for comprehensive descriptions of the environmental issues by harnessing data from diverse modalities. This advancement has the potential to significantly elevate the accuracy and robustness of prediction models in ES&E studies, providing enhanced solutions for various environmental modeling tasks. This perspective summarizes MML methodologies and proposes potential applications of MML models in ES&E studies, including environmental quality assessment, prediction of chemical hazards, and optimization of pollution control techniques. Additionally, we discuss the challenges associated with implementing MML in ES&E and propose future research directions in this domain.

2.
Heliyon ; 10(16): e36236, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39262949

RESUMO

Lithium-ion batteries are widely used in various applications, including electric vehicles and renewable energy storage. The prediction of the remaining useful life (RUL) of batteries is crucial for ensuring reliable and efficient operation, as well as reducing maintenance costs. However, determining the life cycle of batteries in real-world scenarios is challenging, and existing methods have limitations in predicting the number of cycles iteratively. In addition, existing works often oversimplify the datasets, neglecting important features of the batteries such as temperature, internal resistance, and material type. To address these limitations, this paper proposes a two-stage RUL prediction scheme for Lithium-ion batteries using a spatio-temporal multimodal attention network (ST-MAN). The proposed ST-MAN is to capture the complex spatio-temporal dependencies in the battery data, including the features that are often neglected in existing works. Despite operating without prior knowledge of end-of-life (EOL) events, our method consistently achieves lower error rates, boasting mean absolute error (MAE) and mean square error (MSE) of 0.0275 and 0.0014, respectively, compared to existing convolutional neural networks (CNN) and long short-term memory (LSTM)-based methods. The proposed method has the potential to improve the reliability and efficiency of battery operations and is applicable in various industries.

3.
Adv Sci (Weinh) ; : e2406242, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39258724

RESUMO

Multimodal machine learning, as a prospective advancement in artificial intelligence, endeavors to emulate the brain's multimodal learning abilities with the objective to enhance interactions with humans. However, this approach requires simultaneous processing of diverse types of data, leading to increased model complexity, longer training times, and higher energy consumption. Multimodal neuromorphic devices have the capability to preprocess spatio-temporal information from various physical signals into unified electrical signals with high information density, thereby enabling more biologically plausible multimodal learning with low complexity and high energy-efficiency. Here, this work conducts a comparison between the expression of multimodal machine learning and multimodal neuromorphic computing, followed by an overview of the key characteristics associated with multimodal neuromorphic devices. The bio-plausible operational principles and the multimodal learning abilities of emerging devices are examined, which are classified into heterogeneous and homogeneous multimodal neuromorphic devices. Subsequently, this work provides a detailed description of the multimodal learning capabilities demonstrated by neuromorphic circuits and their respective applications. Finally, this work highlights the limitations and challenges of multimodal neuromorphic computing in order to hopefully provide insight into potential future research directions.

4.
Artif Intell Med ; 157: 102972, 2024 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-39232270

RESUMO

The integration of morphological attributes extracted from histopathological images and genomic data holds significant importance in advancing tumor diagnosis, prognosis, and grading. Histopathological images are acquired through microscopic examination of tissue slices, providing valuable insights into cellular structures and pathological features. On the other hand, genomic data provides information about tumor gene expression and functionality. The fusion of these two distinct data types is crucial for gaining a more comprehensive understanding of tumor characteristics and progression. In the past, many studies relied on single-modal approaches for tumor diagnosis. However, these approaches had limitations as they were unable to fully harness the information from multiple data sources. To address these limitations, researchers have turned to multi-modal methods that concurrently leverage both histopathological images and genomic data. These methods better capture the multifaceted nature of tumors and enhance diagnostic accuracy. Nonetheless, existing multi-modal methods have, to some extent, oversimplified the extraction processes for both modalities and the fusion process. In this study, we presented a dual-branch neural network, namely SG-Fusion. Specifically, for the histopathological modality, we utilize the Swin-Transformer structure to capture both local and global features and incorporate contrastive learning to encourage the model to discern commonalities and differences in the representation space. For the genomic modality, we developed a graph convolutional network based on gene functional and expression level similarities. Additionally, our model integrates a cross-attention module to enhance information interaction and employs divergence-based regularization to enhance the model's generalization performance. Validation conducted on glioma datasets from the Cancer Genome Atlas unequivocally demonstrates that our SG-Fusion model outperforms both single-modal methods and existing multi-modal approaches in both survival analysis and tumor grading.

5.
Neural Netw ; 180: 106670, 2024 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-39299035

RESUMO

Radiologists must utilize medical images of multiple modalities for tumor segmentation and diagnosis due to the limitations of medical imaging technology and the diversity of tumor signals. This has led to the development of multimodal learning in medical image segmentation. However, the redundancy among modalities creates challenges for existing subtraction-based joint learning methods, such as misjudging the importance of modalities, ignoring specific modal information, and increasing cognitive load. These thorny issues ultimately decrease segmentation accuracy and increase the risk of overfitting. This paper presents the complementary information mutual learning (CIML) framework, which can mathematically model and address the negative impact of inter-modal redundant information. CIML adopts the idea of addition and removes inter-modal redundant information through inductive bias-driven task decomposition and message passing-based redundancy filtering. CIML first decomposes the multimodal segmentation task into multiple subtasks based on expert prior knowledge, minimizing the information dependence between modalities. Furthermore, CIML introduces a scheme in which each modality can extract information from other modalities additively through message passing. To achieve non-redundancy of extracted information, the redundant filtering is transformed into complementary information learning inspired by the variational information bottleneck. The complementary information learning procedure can be efficiently solved by variational inference and cross-modal spatial attention. Numerical results from the verification task and standard benchmarks indicate that CIML efficiently removes redundant information between modalities, outperforming SOTA methods regarding validation accuracy and segmentation effect. To emphasize, message-passing-based redundancy filtering allows neural network visualization techniques to visualize the knowledge relationship among different modalities, which reflects interpretability.

6.
Environ Sci Technol ; 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39251361

RESUMO

The aging process of microplastics (MPs) affects their surface physicochemical properties, thereby influencing their behaviors in releasing harmful chemicals, adsorption of organic contaminants, sinking, and more. Understanding the aging process is crucial for evaluating MPs' environmental behaviors and risks, but tracing the aging process remains challenging. Here, we propose a multimodal deep learning model to trace typical aging factors of aged MPs based on MPs' physicochemical characteristics. A total of 1353 surface morphology images and 1353 Fourier transform infrared spectroscopy spectra were achieved from 130 aged MPs undergoing different aging processes, demonstrating that physicochemical properties of aged MPs vary from aging processes. The multimodal deep learning model achieved an accuracy of 93% in predicting the major aging factors of aged MPs. The multimodal deep learning model improves the model's accuracy by approximately 5-20% and reduces prediction bias compared to the single-modal model. In practice, the established model was performed to predict the major aging factors of naturally aged MPs collected from typical environment matrices. The prediction results aligned with the aging conditions of specific environments, as reported in previous studies. Our findings provide new insights into tracing and understanding the plastic aging process, contributing more accurately to the environmental risk assessment of aged MPs.

7.
Med Image Anal ; 97: 103303, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39154617

RESUMO

The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.


Assuntos
Aprendizado Profundo , Humanos , Algoritmos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Processamento de Imagem Assistida por Computador/métodos
8.
Sci Rep ; 14(1): 20120, 2024 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-39209988

RESUMO

Autism spectrum disorder (ASD) is diagnosed using comprehensive behavioral information. Neuroimaging offers additional information but lacks clinical utility for diagnosis. This study investigates whether multi-forms of magnetic resonance imaging (MRI) contrast can be used individually and in combination to produce a categorical classification of young individuals with ASD. MRI data were accessed from the Autism Brain Imaging Data Exchange (ABIDE). Young participants (ages 2-30) were selected, and two group cohorts consisted of 702 participants: 351 ASD and 351 controls. Image-based classification was performed using one-channel and two-channel inputs to 3D-DenseNet deep learning networks. The models were trained and tested using tenfold cross-validation. Two-channel models were twinned with combinations of structural MRI (sMRI) maps and amplitude of low-frequency fluctuations (ALFF) or fractional ALFF (fALFF) maps from resting-state functional MRI (rs-fMRI). All models produced classification accuracy that exceeded 65.1%. The two-channel ALFF-sMRI model achieved the highest mean accuracy of 76.9% ± 2.34. The one-channel ALFF-based model alone had mean accuracy of 72% ± 3.1. This study leveraged the ABIDE dataset to produce ASD classification results that are comparable and/or exceed literature values. The deep learning approach was conducive to diverse neuroimaging inputs. Findings reveal that the ALFF-sMRI two-channel model outperformed all others.


Assuntos
Transtorno do Espectro Autista , Encéfalo , Imageamento por Ressonância Magnética , Neuroimagem , Humanos , Transtorno do Espectro Autista/diagnóstico por imagem , Transtorno do Espectro Autista/classificação , Masculino , Imageamento por Ressonância Magnética/métodos , Adolescente , Feminino , Criança , Adulto Jovem , Adulto , Neuroimagem/métodos , Pré-Escolar , Encéfalo/diagnóstico por imagem , Aprendizado Profundo , Mapeamento Encefálico/métodos
9.
Int J Biol Macromol ; 276(Pt 2): 133825, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39002900

RESUMO

Predicting compound-induced inhibition of cardiac ion channels is crucial and challenging, significantly impacting cardiac drug efficacy and safety assessments. Despite the development of various computational methods for compound-induced inhibition prediction in cardiac ion channels, their performance remains limited. Most methods struggle to fuse multi-source data, relying solely on specific dataset training, leading to poor accuracy and generalization. We introduce MultiCBlo, a model that fuses multimodal information through a progressive learning approach, designed to predict compound-induced inhibition of cardiac ion channels with high accuracy. MultiCBlo employs progressive multimodal information fusion technology to integrate the compound's SMILES sequence, graph structure, and fingerprint, enhancing its representation. This is the first application of progressive multimodal learning for predicting compound-induced inhibition of cardiac ion channels, to our knowledge. The objective of this study was to predict the compound-induced inhibition of three major cardiac ion channels: hERG, Cav1.2, and Nav1.5. The results indicate that MultiCBlo significantly outperforms current models in predicting compound-induced inhibition of cardiac ion channels. We hope that MultiCBlo will facilitate cardiac drug development and reduce compound toxicity risks. Code and data are accessible at: https://github.com/taowang11/MultiCBlo. The online prediction platform is freely accessible at: https://huggingface.co/spaces/wtttt/PCICB.


Assuntos
Canais Iônicos , Humanos , Canais Iônicos/metabolismo , Canais Iônicos/antagonistas & inibidores , Canal de Sódio Disparado por Voltagem NAV1.5/metabolismo , Canais de Cálcio Tipo L/metabolismo , Canais de Cálcio Tipo L/química , Aprendizado de Máquina , Canal de Potássio ERG1/metabolismo , Canal de Potássio ERG1/antagonistas & inibidores
10.
J Biomed Inform ; 157: 104689, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39029770

RESUMO

The classification of sleep stages is crucial for gaining insights into an individual's sleep patterns and identifying potential health issues. Employing several important physiological channels in different views, each providing a distinct perspective on sleep patterns, can have a great impact on the efficiency of the classification models. In the context of neural networks and deep learning models, transformers are very effective, especially when dealing with time series data, and have shown remarkable compatibility with sequential data analysis as physiological channels. On the other hand, cross-modality attention by integrating information from multiple views of the data enables to capture relationships among different modalities, allowing models to selectively focus on relevant information from each modality. In this paper, we introduce a novel deep-learning model based on transformer encoder-decoder and cross-modal attention for sleep stage classification. The proposed model processes information from various physiological channels with different modalities using the Sleep Heart Health Study Dataset (SHHS) data and leverages transformer encoders for feature extraction and cross-modal attention for effective integration to feed into the transformer decoder. The combination of these elements increased the accuracy of the model up to 91.33% in classifying five classes of sleep stages. Empirical evaluations demonstrated the model's superior performance compared to standalone approaches and other state-of-the-art techniques, showcasing the potential of combining transformer and cross-modal attention for improved sleep stage classification.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Fases do Sono , Humanos , Fases do Sono/fisiologia , Polissonografia/métodos , Eletroencefalografia/métodos , Algoritmos , Processamento de Sinais Assistido por Computador , Masculino
11.
PeerJ Comput Sci ; 10: e2097, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983207

RESUMO

With the rapid advancement of robotics technology, an increasing number of researchers are exploring the use of natural language as a communication channel between humans and robots. In scenarios where language conditioned manipulation grounding, prevailing methods rely heavily on supervised multimodal deep learning. In this paradigm, robots assimilate knowledge from both language instructions and visual input. However, these approaches lack external knowledge for comprehending natural language instructions and are hindered by the substantial demand for a large amount of paired data, where vision and language are usually linked through manual annotation for the creation of realistic datasets. To address the above problems, we propose the knowledge enhanced bottom-up affordance grounding network (KBAG-Net), which enhances natural language understanding through external knowledge, improving accuracy in object grasping affordance segmentation. In addition, we introduce a semi-automatic data generation method aimed at facilitating the quick establishment of the language following manipulation grounding dataset. The experimental results on two standard dataset demonstrate that our method outperforms existing methods with the external knowledge. Specifically, our method outperforms the two-stage method by 12.98% and 1.22% of mIoU on the two dataset, respectively. For broader community engagement, we will make the semi-automatic data construction method publicly available at https://github.com/wmqu/Automated-Dataset-Construction4LGM.

12.
Comput Struct Biotechnol J ; 23: 2708-2716, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39035833

RESUMO

In the field of computational oncology, patient status is often assessed using radiology-genomics, which includes two key technologies and data, such as radiology and genomics. Recent advances in deep learning have facilitated the integration of radiology-genomics data, and even new omics data, significantly improving the robustness and accuracy of clinical predictions. These factors are driving artificial intelligence (AI) closer to practical clinical applications. In particular, deep learning models are crucial in identifying new radiology-genomics biomarkers and therapeutic targets, supported by explainable AI (xAI) methods. This review focuses on recent developments in deep learning for radiology-genomics integration, highlights current challenges, and outlines some research directions for multimodal integration and biomarker discovery of radiology-genomics or radiology-omics that are urgently needed in computational oncology.

13.
Cogn Neurodyn ; 18(3): 863-875, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38826642

RESUMO

The human brain can effectively perform Facial Expression Recognition (FER) with a few samples by utilizing its cognitive ability. However, unlike the human brain, even the well-trained deep neural network is data-dependent and lacks cognitive ability. To tackle this challenge, this paper proposes a novel framework, Brain Machine Generative Adversarial Networks (BM-GAN), which utilizes the concept of brain's cognitive ability to guide a Convolutional Neural Network to generate LIKE-electroencephalograph (EEG) features. More specifically, we firstly obtain EEG signals triggered from facial emotion images, then we adopt BM-GAN to carry out the mutual generation of image visual features and EEG cognitive features. BM-GAN intends to use the cognitive knowledge learnt from EEG signals to instruct the model to perceive LIKE-EEG features. Thereby, BM-GAN has a superior performance for FER like the human brain. The proposed model consists of VisualNet, EEGNet, and BM-GAN. More specifically, VisualNet can obtain image visual features from facial emotion images and EEGNet can obtain EEG cognitive features from EEG signals. Subsequently, the BM-GAN completes the mutual generation of image visual features and EEG cognitive features. Finally, the predicted LIKE-EEG features of test images are used for FER. After learning, without the participation of the EEG signals, an average classification accuracy of 96.6 % is obtained on Chinese Facial Affective Picture System dataset using LIKE-EEG features for FER. Experiments demonstrate that the proposed method can produce an excellent performance for FER.

14.
Methods ; 229: 41-48, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38880433

RESUMO

Graph neural networks (GNNs) have gained significant attention in disease prediction where the latent embeddings of patients are modeled as nodes and the similarities among patients are represented through edges. The graph structure, which determines how information is aggregated and propagated, plays a crucial role in graph learning. Recent approaches typically create graphs based on patients' latent embeddings, which may not accurately reflect their real-world closeness. Our analysis reveals that raw data, such as demographic attributes and laboratory results, offers a wealth of information for assessing patient similarities and can serve as a compensatory measure for graphs constructed exclusively from latent embeddings. In this study, we first construct adaptive graphs from both latent representations and raw data respectively, and then merge these graphs via weighted summation. Given that the graphs may contain extraneous and noisy connections, we apply degree-sensitive edge pruning and kNN sparsification techniques to selectively sparsify and prune these edges. We conducted intensive experiments on two diagnostic prediction datasets, and the results demonstrate that our proposed method surpasses current state-of-the-art techniques.


Assuntos
Redes Neurais de Computação , Humanos , Aprendizado de Máquina , Algoritmos
15.
Annu Rev Biomed Data Sci ; 7(1): 345-368, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38749465

RESUMO

In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Humanos , Gráficos por Computador
16.
Comput Struct Biotechnol J ; 23: 1666-1679, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38680871

RESUMO

Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, mono-modal learning is inherently limited as it relies solely on a single modality of molecular representation, which restricts a comprehensive understanding of drug molecules. To overcome the limitations, we propose a multimodal fused deep learning (MMFDL) model to leverage information from different molecular representations. Specifically, we construct a triple-modal learning model by employing Transformer-Encoder, Bidirectional Gated Recurrent Unit (BiGRU), and graph convolutional network (GCN) to process three modalities of information from chemical language and molecular graph: SMILES-encoded vectors, ECFP fingerprints, and molecular graphs, respectively. We evaluate the proposed triple-modal model using five fusion approaches on six molecule datasets, including Delaney, Llinas2020, Lipophilicity, SAMPL, BACE, and pKa from DataWarrior. The results show that the MMFDL model achieves the highest Pearson coefficients, and stable distribution of Pearson coefficients in the random splitting test, outperforming mono-modal models in accuracy and reliability. Furthermore, we validate the generalization ability of our model in the prediction of binding constants for protein-ligand complex molecules, and assess the resilience capability against noise. Through analysis of feature distributions in chemical space and the assigned contribution of each modal model, we demonstrate that the MMFDL model shows the ability to acquire complementary information by using proper models and suitable fusion approaches. By leveraging diverse sources of bioinformatics information, multimodal deep learning models hold the potential for successful drug discovery.

17.
Phys Med ; 121: 103359, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38688073

RESUMO

PURPOSE: Strokes are severe cardiovascular and circulatory diseases with two main types: ischemic and hemorrhagic. Clinically, brain images such as computed tomography (CT) and computed tomography angiography (CTA) are widely used to recognize stroke types. However, few studies have combined imaging and clinical data to classify stroke or consider a factor as an Independent etiology. METHODS: In this work, we propose a classification model that automatically distinguishes stroke types with hypertension as an independent etiology based on brain imaging and clinical data. We first present a preprocessing workflow for head axial CT angiograms, including noise reduction and feature enhancement of the images, followed by an extraction of regions of interest. Next, we develop a multi-scale feature fusion model that combines the location information of position features and the semantic information of deep features. Furthermore, we integrate brain imaging with clinical information through a multimodal learning model to achieve more reliable results. RESULTS: Experimental results show our proposed models outperform state-of-the-art models on real imaging and clinical data, which reveals the potential of multimodal learning in brain disease diagnosis. CONCLUSION: The proposed methodologies can be extended to create AI-driven diagnostic assistance technology for categorizing strokes.


Assuntos
Angiografia por Tomografia Computadorizada , Cabeça , Hipertensão , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Acidente Vascular Cerebral , Humanos , Acidente Vascular Cerebral/diagnóstico por imagem , Cabeça/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Hipertensão/diagnóstico por imagem , Hipertensão/complicações , Encéfalo/diagnóstico por imagem
18.
Artif Intell Med ; 152: 102871, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38685169

RESUMO

For the diagnosis and outcome prediction of gastric cancer (GC), machine learning methods based on whole slide pathological images (WSIs) have shown promising performance and reduced the cost of manual analysis. Nevertheless, accurate prediction of GC outcome may rely on multiple modalities with complementary information, particularly gene expression data. Thus, there is a need to develop multimodal learning methods to enhance prediction performance. In this paper, we collect a dataset from Ruijin Hospital and propose a multimodal learning method for GC diagnosis and outcome prediction, called GaCaMML, which is featured by a cross-modal attention mechanism and Per-Slide training scheme. Additionally, we perform feature attribution analysis via integrated gradient (IG) to identify important input features. The proposed method improves prediction accuracy over the single-modal learning method on three tasks, i.e., survival prediction (by 4.9% on C-index), pathological stage classification (by 11.6% on accuracy), and lymph node classification (by 12.0% on accuracy). Especially, the Per-Slide strategy addresses the issue of a high WSI-to-patient ratio and leads to much better results compared with the Per-Person training scheme. For the interpretable analysis, we find that although WSIs dominate the prediction for most samples, there is still a substantial portion of samples whose prediction highly relies on gene expression information. This study demonstrates the great potential of multimodal learning in GC-related prediction tasks and investigates the contribution of WSIs and gene expression, respectively, which not only shows how the model makes a decision but also provides insights into the association between macroscopic pathological phenotypes and microscopic molecular features.


Assuntos
Aprendizado de Máquina , Neoplasias Gástricas , Neoplasias Gástricas/genética , Neoplasias Gástricas/patologia , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Prognóstico , Perfilação da Expressão Gênica/métodos
19.
J Med Internet Res ; 26: e54538, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38631021

RESUMO

BACKGROUND: Early detection of mild cognitive impairment (MCI), a transitional stage between normal aging and Alzheimer disease, is crucial for preventing the progression of dementia. Virtual reality (VR) biomarkers have proven to be effective in capturing behaviors associated with subtle deficits in instrumental activities of daily living, such as challenges in using a food-ordering kiosk, for early detection of MCI. On the other hand, magnetic resonance imaging (MRI) biomarkers have demonstrated their efficacy in quantifying observable structural brain changes that can aid in early MCI detection. Nevertheless, the relationship between VR-derived and MRI biomarkers remains an open question. In this context, we explored the integration of VR-derived and MRI biomarkers to enhance early MCI detection through a multimodal learning approach. OBJECTIVE: We aimed to evaluate and compare the efficacy of VR-derived and MRI biomarkers in the classification of MCI while also examining the strengths and weaknesses of each approach. Furthermore, we focused on improving early MCI detection by leveraging multimodal learning to integrate VR-derived and MRI biomarkers. METHODS: The study encompassed a total of 54 participants, comprising 22 (41%) healthy controls and 32 (59%) patients with MCI. Participants completed a virtual kiosk test to collect 4 VR-derived biomarkers (hand movement speed, scanpath length, time to completion, and the number of errors), and T1-weighted MRI scans were performed to collect 22 MRI biomarkers from both hemispheres. Analyses of covariance were used to compare these biomarkers between healthy controls and patients with MCI, with age considered as a covariate. Subsequently, the biomarkers that exhibited significant differences between the 2 groups were used to train and validate a multimodal learning model aimed at early screening for patients with MCI among healthy controls. RESULTS: The support vector machine (SVM) using only VR-derived biomarkers achieved a sensitivity of 87.5% and specificity of 90%, whereas the MRI biomarkers showed a sensitivity of 90.9% and specificity of 71.4%. Moreover, a correlation analysis revealed a significant association between MRI-observed brain atrophy and impaired performance in instrumental activities of daily living in the VR environment. Notably, the integration of both VR-derived and MRI biomarkers into a multimodal SVM model yielded superior results compared to unimodal SVM models, achieving higher accuracy (94.4%), sensitivity (100%), specificity (90.9%), precision (87.5%), and F1-score (93.3%). CONCLUSIONS: The results indicate that VR-derived biomarkers, characterized by their high specificity, can be valuable as a robust, early screening tool for MCI in a broader older adult population. On the other hand, MRI biomarkers, known for their high sensitivity, excel at confirming the presence of MCI. Moreover, the multimodal learning approach introduced in our study provides valuable insights into the improvement of early MCI detection by integrating a diverse set of biomarkers.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Realidade Virtual , Humanos , Idoso , Atividades Cotidianas , Disfunção Cognitiva/patologia , Imageamento por Ressonância Magnética/métodos , Doença de Alzheimer/diagnóstico , Biomarcadores
20.
Int J Comput Assist Radiol Surg ; 19(6): 1075-1083, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38558289

RESUMO

Purpose Surgical workflow recognition is a challenging task that requires understanding multiple aspects of surgery, such as gestures, phases, and steps. However, most existing methods focus on single-task or single-modal models and rely on costly annotations for training. To address these limitations, we propose a novel semi-supervised learning approach that leverages multimodal data and self-supervision to create meaningful representations for various surgical tasks. Methods Our representation learning approach conducts two processes. In the first stage, time contrastive learning is used to learn spatiotemporal visual features from video data, without any labels. In the second stage, multimodal VAE fuses the visual features with kinematic data to obtain a shared representation, which is fed into recurrent neural networks for online recognition. Results Our method is evaluated on two datasets: JIGSAWS and MISAW. We confirmed that it achieved comparable or better performance in multi-granularity workflow recognition compared to fully supervised models specialized for each task. On the JIGSAWS Suturing dataset, we achieve a gesture recognition accuracy of 83.3%. In addition, our model is more efficient in annotation usage, as it can maintain high performance with only half of the labels. On the MISAW dataset, we achieve 84.0% AD-Accuracy in phase recognition and 56.8% AD-Accuracy in step recognition. Conclusion Our multimodal representation exhibits versatility across various surgical tasks and enhances annotation efficiency. This work has significant implications for real-time decision-making systems within the operating room.


Assuntos
Aprendizado de Máquina Supervisionado , Fluxo de Trabalho , Humanos , Gravação em Vídeo , Redes Neurais de Computação , Gestos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA