Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Skin Res Technol ; 30(6): e13770, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38881051

RESUMEN

BACKGROUND: Melanoma is one of the most malignant forms of skin cancer, with a high mortality rate in the advanced stages. Therefore, early and accurate detection of melanoma plays an important role in improving patients' prognosis. Biopsy is the traditional method for melanoma diagnosis, but this method lacks reliability. Therefore, it is important to apply new methods to diagnose melanoma effectively. AIM: This study presents a new approach to classify melanoma using deep neural networks (DNNs) with combined multiple modal imaging and genomic data, which could potentially provide more reliable diagnosis than current medical methods for melanoma. METHOD: We built a dataset of dermoscopic images, histopathological slides and genomic profiles. We developed a custom framework composed of two widely established types of neural networks for analysing image data Convolutional Neural Networks (CNNs) and networks that can learn graph structure for analysing genomic data-Graph Neural Networks. We trained and evaluated the proposed framework on this dataset. RESULTS: The developed multi-modal DNN achieved higher accuracy than traditional medical approaches. The mean accuracy of the proposed model was 92.5% with an area under the receiver operating characteristic curve of 0.96, suggesting that the multi-modal DNN approach can detect critical morphologic and molecular features of melanoma beyond the limitations of traditional AI and traditional machine learning approaches. The combination of cutting-edge AI may allow access to a broader range of diagnostic data, which can allow dermatologists to make more accurate decisions and refine treatment strategies. However, the application of the framework will have to be validated at a larger scale and more clinical trials need to be conducted to establish whether this novel diagnostic approach will be more effective and feasible.


Asunto(s)
Aprendizaje Profundo , Dermoscopía , Melanoma , Neoplasias Cutáneas , Humanos , Melanoma/genética , Melanoma/diagnóstico por imagen , Melanoma/diagnóstico , Melanoma/patología , Neoplasias Cutáneas/genética , Neoplasias Cutáneas/diagnóstico por imagen , Neoplasias Cutáneas/patología , Dermoscopía/métodos , Redes Neurales de la Computación , Reproducibilidad de los Resultados , Genómica/métodos , Femenino , Masculino , Persona de Mediana Edad , Adulto , Anciano
2.
Sensors (Basel) ; 24(18)2024 Sep 19.
Artículo en Inglés | MEDLINE | ID: mdl-39338806

RESUMEN

The proliferation of fake news across multiple modalities has emerged as a critical challenge in the modern information landscape, necessitating advanced detection methods. This study proposes a comprehensive framework for fake news detection integrating text, images, and videos using machine learning and deep learning techniques. The research employs a dual-phased methodology, first analyzing textual data using various classifiers, then developing a multimodal approach combining BERT for text analysis and a modified CNN for visual data. Experiments on the ISOT fake news dataset and MediaEval 2016 image verification corpus demonstrate the effectiveness of the proposed models. For textual data, the Random Forest classifier achieved 99% accuracy, outperforming other algorithms. The multimodal approach showed superior performance compared to baseline models, with a 3.1% accuracy improvement over existing multimodal techniques. This research contributes to the ongoing efforts to combat misinformation by providing a robust, adaptable framework for detecting fake news across different media formats, addressing the complexities of modern information dissemination and manipulation.

3.
Entropy (Basel) ; 26(1)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38275499

RESUMEN

The profound impacts of severe air pollution on human health, ecological balance, and economic stability are undeniable. Precise air quality forecasting stands as a crucial necessity, enabling governmental bodies and vulnerable communities to proactively take essential measures to reduce exposure to detrimental pollutants. Previous research has primarily focused on predicting air quality using only time-series data. However, the importance of remote-sensing image data has received limited attention. This paper proposes a new multi-modal deep-learning model, Res-GCN, which integrates high spatial resolution remote-sensing images and time-series air quality data from multiple stations to forecast future air quality. Res-GCN employs two deep-learning networks, one utilizing the residual network to extract hidden visual information from remote-sensing images, and another using a dynamic spatio-temporal graph convolution network to capture spatio-temporal information from time-series data. By extracting features from two different modalities, improved predictive performance can be achieved. To demonstrate the effectiveness of the proposed model, experiments were conducted on two real-world datasets. The results show that the Res-GCN model effectively extracts multi-modal features, significantly enhancing the accuracy of multi-step predictions. Compared to the best-performing baseline model, the multi-step prediction's mean absolute error, root mean square error, and mean absolute percentage error increased by approximately 6%, 7%, and 7%, respectively.

4.
Methods ; 204: 340-347, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35314343

RESUMEN

Emotional and physical health are strongly connected and should be taken care of simultaneously to ensure completely healthy persons. A person's emotional health can be determined by detecting emotional states from various physiological measurements (EDA, RB, EEG, etc.). Affective Computing has become the field of interest, which uses software and hardware to detect emotional states. In the IoT era, wearable sensor-based real-time multi-modal emotion state classification has become one of the hottest topics. In such setting, a data stream is generated from wearable-sensor devices, data accessibility is restricted to those devices only and usually a high data generation rate should be processed to achieve real-time emotion state responses. Additionally, protecting the users' data privacy makes the processing of such data even more challenging. Traditional classifiers have limitations to achieve high accuracy of emotional state detection under demanding requirements of decentralized data and protecting users' privacy of sensitive information as such classifiers need to see all data. Here comes the federated learning, whose main idea is to create a global classifier without accessing the users' local data. Therefore, we have developed a federated learning framework for real-time emotion state classification using multi-modal physiological data streams from wearable sensors, called Fed-ReMECS. The main findings of our Fed-ReMECS framework are the development of an efficient and scalable real-time emotion classification system from distributed multimodal physiological data streams, where the global classifier is built without accessing (privacy protection) the users' data in an IoT environment. The experimental study is conducted using the popularly used multi-modal benchmark DEAP dataset for emotion classification. The results show the effectiveness of our developed approach in terms of accuracy, efficiency, scalability and users' data privacy protection.


Asunto(s)
Electroencefalografía , Emociones , Electroencefalografía/métodos , Emociones/fisiología , Humanos
5.
J Biomed Inform ; 147: 104512, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37813325

RESUMEN

OBJECTIVE: The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS: However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS: Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION: The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.


Asunto(s)
Carcinoma , Multiómica , Humanos , Metilación de ADN , Progresión de la Enfermedad
6.
Sensors (Basel) ; 23(19)2023 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-37837167

RESUMEN

In interpreting a scene for numerous applications, including autonomous driving and robotic navigation, semantic segmentation is crucial. Compared to single-modal data, multi-modal data allow us to extract a richer set of features, which is the benefit of improving segmentation accuracy and effect. We propose a point cloud semantic segmentation method, and a fusion graph convolutional network (FGCN) which extracts the semantic information of each point involved in the two-modal data of images and point clouds. The two-channel k-nearest neighbors (KNN) module of the FGCN was created to address the issue of the feature extraction's poor efficiency by utilizing picture data. Notably, the FGCN utilizes the spatial attention mechanism to better distinguish more important features and fuses multi-scale features to enhance the generalization capability of the network and increase the accuracy of the semantic segmentation. In the experiment, a self-made semantic segmentation KITTI (SSKIT) dataset was made for the fusion effect. The mean intersection over union (MIoU) of the SSKIT can reach 88.06%. As well as the public datasets, the S3DIS showed that our method can enhance data features and outperform other methods: the MIoU of the S3DIS can reach up to 78.55%. The segmentation accuracy is significantly improved compared with the existing methods, which verifies the effectiveness of the improved algorithms.

7.
Sensors (Basel) ; 23(18)2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37766005

RESUMEN

With the increasing demand for person re-identification (Re-ID) tasks, the need for all-day retrieval has become an inevitable trend. Nevertheless, single-modal Re-ID is no longer sufficient to meet this requirement, making Multi-Modal Data crucial in Re-ID. Consequently, a Visible-Infrared Person Re-Identification (VI Re-ID) task is proposed, which aims to match pairs of person images from the visible and infrared modalities. The significant modality discrepancy between the modalities poses a major challenge. Existing VI Re-ID methods focus on cross-modal feature learning and modal transformation to alleviate the discrepancy but overlook the impact of person contour information. Contours exhibit modality invariance, which is vital for learning effective identity representations and cross-modal matching. In addition, due to the low intra-modal diversity in the visible modality, it is difficult to distinguish the boundaries between some hard samples. To address these issues, we propose the Graph Sampling-based Multi-stream Enhancement Network (GSMEN). Firstly, the Contour Expansion Module (CEM) incorporates the contour information of a person into the original samples, further reducing the modality discrepancy and leading to improved matching stability between image pairs of different modalities. Additionally, to better distinguish cross-modal hard sample pairs during the training process, an innovative Cross-modality Graph Sampler (CGS) is designed for sample selection before training. The CGS calculates the feature distance between samples from different modalities and groups similar samples into the same batch during the training process, effectively exploring the boundary relationships between hard classes in the cross-modal setting. Some experiments conducted on the SYSU-MM01 and RegDB datasets demonstrate the superiority of our proposed method. Specifically, in the VIS→IR task, the experimental results on the RegDB dataset achieve 93.69% for Rank-1 and 92.56% for mAP.

8.
Sensors (Basel) ; 23(8)2023 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-37112337

RESUMEN

Multi-human detection and tracking in indoor surveillance is a challenging task due to various factors such as occlusions, illumination changes, and complex human-human and human-object interactions. In this study, we address these challenges by exploring the benefits of a low-level sensor fusion approach that combines grayscale and neuromorphic vision sensor (NVS) data. We first generate a custom dataset using an NVS camera in an indoor environment. We then conduct a comprehensive study by experimenting with different image features and deep learning networks, followed by a multi-input fusion strategy to optimize our experiments with respect to overfitting. Our primary goal is to determine the best input feature types for multi-human motion detection using statistical analysis. We find that there is a significant difference between the input features of optimized backbones, with the best strategy depending on the amount of available data. Specifically, under a low-data regime, event-based frames seem to be the preferred input feature type, while higher data availability benefits the combined use of grayscale and optical flow features. Our results demonstrate the potential of sensor fusion and deep learning techniques for multi-human tracking in indoor surveillance, although it is acknowledged that further studies are needed to confirm our findings.


Asunto(s)
Cultura , Flujo Optico , Humanos , Iluminación , Movimiento (Física) , Proyectos de Investigación
9.
Adv Exp Med Biol ; 1359: 237-259, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35471542

RESUMEN

It has previously been shown that it is possible to derive a new class of biophysically detailed brain tissue models when one computationally analyzes and exploits the interdependencies or the multi-modal and multi-scale organization of the brain. These reconstructions, sometimes referred to as digital twins, enable a spectrum of scientific investigations. Building such models has become possible because of increase in quantitative data but also advances in computational capabilities, algorithmic and methodological innovations. This chapter presents the computational science concepts that provide the foundation to the data-driven approach to reconstructing and simulating brain tissue as developed by the EPFL Blue Brain Project, which was originally applied to neocortical microcircuitry and extended to other brain regions. Accordingly, the chapter covers aspects such as a knowledge graph-based data organization and the importance of the concept of a dataset release. We illustrate algorithmic advances in finding suitable parameters for electrical models of neurons or how spatial constraints can be exploited for predicting synaptic connections. Furthermore, we explain how in silico experimentation with such models necessitates specific addressing schemes or requires strategies for an efficient simulation. The entire data-driven approach relies on the systematic validation of the model. We conclude by discussing complementary strategies that not only enable judging the fidelity of the model but also form the basis for its systematic refinements.


Asunto(s)
Encéfalo , Neuronas , Simulación por Computador
10.
Neuroimage ; 229: 117695, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33422711

RESUMEN

Connectomes are typically mapped at low resolution based on a specific brain parcellation atlas. Here, we investigate high-resolution connectomes independent of any atlas, propose new methodologies to facilitate their mapping and demonstrate their utility in predicting behavior and identifying individuals. Using structural, functional and diffusion-weighted MRI acquired in 1000 healthy adults, we aimed to map the cortical correlates of identity and behavior at ultra-high spatial resolution. Using methods based on sparse matrix representations, we propose a computationally feasible high-resolution connectomic approach that improves neural fingerprinting and behavior prediction. Using this high-resolution approach, we find that the multimodal cortical gradients of individual uniqueness reside in the association cortices. Furthermore, our analyses identified a striking dichotomy between the facets of a person's neural identity that best predict their behavior and cognition, compared to those that best differentiate them from other individuals. Functional connectivity was one of the most accurate predictors of behavior, yet resided among the weakest differentiators of identity; whereas the converse was found for morphological properties, such as cortical curvature. This study provides new insights into the neural basis of personal identity and new tools to facilitate ultra-high-resolution connectomics.


Asunto(s)
Mapeo Encefálico/métodos , Encéfalo/diagnóstico por imagen , Conectoma/métodos , Imagen de Difusión Tensora/métodos , Red Nerviosa/diagnóstico por imagen , Encéfalo/fisiología , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Red Nerviosa/fisiología , Adulto Joven
11.
Neuroimage ; 224: 117002, 2021 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32502668

RESUMEN

Dealing with confounds is an essential step in large cohort studies to address problems such as unexplained variance and spurious correlations. UK Biobank is a powerful resource for studying associations between imaging and non-imaging measures such as lifestyle factors and health outcomes, in part because of the large subject numbers. However, the resulting high statistical power also raises the sensitivity to confound effects, which therefore have to be carefully considered. In this work we describe a set of possible confounds (including non-linear effects and interactions that researchers may wish to consider for their studies using such data). We include descriptions of how we can estimate the confounds, and study the extent to which each of these confounds affects the data, and the spurious correlations that may arise if they are not controlled. Finally, we discuss several issues that future studies should consider when dealing with confounds.


Asunto(s)
Bancos de Muestras Biológicas , Encéfalo , Neuroimagen , Procesamiento Automatizado de Datos , Cabeza , Humanos , Neuroimagen/métodos , Factores de Tiempo , Reino Unido
12.
Curr Genomics ; 22(8): 564-582, 2021 Dec 31.
Artículo en Inglés | MEDLINE | ID: mdl-35386189

RESUMEN

Background: Recent development in neuroimaging and genetic testing technologies have made it possible to measure pathological features associated with Alzheimer's disease (AD) in vivo. Mining potential molecular markers of AD from high-dimensional, multi-modal neuroimaging and omics data will provide a new basis for early diagnosis and intervention in AD. In order to discover the real pathogenic mutation and even understand the pathogenic mechanism of AD, lots of machine learning methods have been designed and successfully applied to the analysis and processing of large-scale AD biomedical data. Objective: To introduce and summarize the applications and challenges of machine learning methods in Alzheimer's disease multi-source data analysis. Methods: The literature selected in the review is obtained from Google Scholar, PubMed, and Web of Science. The keywords of literature retrieval include Alzheimer's disease, bioinformatics, image genetics, genome-wide association research, molecular interaction network, multi-omics data integration, and so on. Conclusion: This study comprehensively introduces machine learning-based processing techniques for AD neuroimaging data and then shows the progress of computational analysis methods in omics data, such as the genome, proteome, and so on. Subsequently, machine learning methods for AD imaging analysis are also summarized. Finally, we elaborate on the current emerging technology of multi-modal neuroimaging, multi-omics data joint analysis, and present some outstanding issues and future research directions.

13.
Sensors (Basel) ; 20(17)2020 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-32872218

RESUMEN

Due to the limitation of less information in a single image, it is very difficult to generate a high-precision 3D model based on the image. There are some problems in the generation of 3D voxel models, e.g., the information loss at the upper level of a network. To solve these problems, we design a 3D model generation network based on multi-modal data constraints and multi-level feature fusion, named as 3DMGNet. Moreover, 3DMGNet is trained by self-supervised method to achieve 3D voxel model generation from an image. The image feature extraction network (2DNet) and 3D feature extraction network (3D auxiliary network) are used to extract the features of the image and 3D voxel model. Then, feature fusion is used to integrate the low-level features into the high-level features in the 3D auxiliary network. To extract more effective features, each layer of the feature map in feature extraction network is processed by an attention network. Finally, the extracted features generate 3D models by a 3D deconvolution network. The feature extraction of 3D model and the generation of voxelization play an auxiliary role in the training of the whole network for the 3D model generation based on an image. Additionally, a multi-view contour constraint method is proposed, to enhance the effect of the 3D model generation. In the experiment, the ShapeNet dataset is adapted to prove the effect of the 3DMGNet, which verifies the robust performance of the proposed method.

14.
Sensors (Basel) ; 19(3)2019 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-30678188

RESUMEN

Wearable health monitoring has emerged as a promising solution to the growing need for remote health assessment and growing demand for personalized preventative care and wellness management. Vital signs can be monitored and alerts can be made when anomalies are detected, potentially improving patient outcomes. One major challenge for the use of wearable health devices is their energy efficiency and battery-lifetime, which motivates the recent efforts towards the development of self-powered wearable devices. This article proposes a method for context aware dynamic sensor selection for power optimized physiological prediction using multi-modal wearable data streams. We first cluster the data by physical activity using the accelerometer data, and then fit a group lasso model to each activity cluster. We find the optimal reduced set of groups of sensor features, in turn reducing power usage by duty cycling these and optimizing prediction accuracy. We show that using activity state-based contextual information increases accuracy while decreasing power usage. We also show that the reduced feature set can be used in other regression models increasing accuracy and decreasing energy burden. We demonstrate the potential reduction in power usage using a custom-designed multi-modal wearable system prototype.


Asunto(s)
Actigrafía/instrumentación , Suministros de Energía Eléctrica/economía , Telemedicina/economía , Dispositivos Electrónicos Vestibles/economía , Acelerometría/estadística & datos numéricos , Análisis por Conglomerados , Humanos , Dispositivos Electrónicos Vestibles/normas
15.
Neuroimage ; 166: 400-424, 2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-29079522

RESUMEN

UK Biobank is a large-scale prospective epidemiological study with all data accessible to researchers worldwide. It is currently in the process of bringing back 100,000 of the original participants for brain, heart and body MRI, carotid ultrasound and low-dose bone/fat x-ray. The brain imaging component covers 6 modalities (T1, T2 FLAIR, susceptibility weighted MRI, Resting fMRI, Task fMRI and Diffusion MRI). Raw and processed data from the first 10,000 imaged subjects has recently been released for general research access. To help convert this data into useful summary information we have developed an automated processing and QC (Quality Control) pipeline that is available for use by other researchers. In this paper we describe the pipeline in detail, following a brief overview of UK Biobank brain imaging and the acquisition protocol. We also describe several quantitative investigations carried out as part of the development of both the imaging protocol and the processing pipeline.


Asunto(s)
Encéfalo/diagnóstico por imagen , Bases de Datos Factuales , Conjuntos de Datos como Asunto , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Imagen por Resonancia Magnética/métodos , Neuroimagen/métodos , Control de Calidad , Bases de Datos Factuales/normas , Conjuntos de Datos como Asunto/normas , Humanos , Procesamiento de Imagen Asistido por Computador/normas , Aprendizaje Automático/normas , Imagen por Resonancia Magnética/normas , Neuroimagen/normas , Reino Unido
16.
Neuroimage ; 183: 212-226, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30099077

RESUMEN

This work presents an efficient framework, based on manifold approximation, for generating brain fingerprints from multi-modal data. The proposed framework represents images as bags of local features which are used to build a subject proximity graph. Compact fingerprints are obtained by projecting this graph in a low-dimensional manifold using spectral embedding. Experiments using the T1/T2-weighted MRI, diffusion MRI, and resting-state fMRI data of 945 Human Connectome Project subjects demonstrate the benefit of combining multiple modalities, with multi-modal fingerprints more discriminative than those generated from individual modalities. Results also highlight the link between fingerprint similarity and genetic proximity, monozygotic twins having more similar fingerprints than dizygotic or non-twin siblings. This link is also reflected in the differences of feature correspondences between twin/sibling pairs, occurring in major brain structures and across hemispheres. The robustness of the proposed framework to factors like image alignment and scan resolution, as well as the reproducibility of results on retest scans, suggest the potential of multi-modal brain fingerprinting for characterizing individuals in a large cohort analysis.


Asunto(s)
Encéfalo , Neuroimagen Funcional/métodos , Individualidad , Imagen por Resonancia Magnética/métodos , Hermanos , Gemelos , Adulto , Encéfalo/anatomía & histología , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Estudios de Cohortes , Conectoma/métodos , Imagen de Difusión por Resonancia Magnética/métodos , Femenino , Humanos , Masculino , Adulto Joven
17.
Neural Netw ; 179: 106553, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39053303

RESUMEN

Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.


Asunto(s)
Atención , Aprendizaje Profundo , Atención/fisiología , Humanos , Redes Neurales de la Computación , Algoritmos
18.
Spectrochim Acta A Mol Biomol Spectrosc ; 318: 124454, 2024 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-38788500

RESUMEN

For species identification analysis, methods based on deep learning are becoming prevalent due to their data-driven and task-oriented nature. The most commonly used convolutional neural network (CNN) model has been well applied in Raman spectra recognition. However, when faced with similar molecules or functional groups, the features of overlapping peaks and weak peaks may not be fully extracted using the CNN model, which can potentially hinder accurate species identification. Based on these practical challenges, the fusion of multi-modal data can effectively meet the comprehensive and accurate analysis of actual samples when compared with single-modal data. In this study, we propose a double-branch CNN model by integrating Raman and image multi-modal data, named SI-DBNet. In addition, we have developed a one-dimensional convolutional neural network combining dilated convolutions and efficient channel attention mechanisms for spectral branching. The effectiveness of the model has been demonstrated using the Grad-CAM method to visualize the key regions concerned by the model. When compared to single-modal and multi-modal classification methods, our SI-DBNet model achieved superior performance with a classification accuracy of 98.8%. The proposed method provided a new reference for species identification based on multi-modal data fusion.

19.
Front Radiol ; 4: 1339612, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38426080

RESUMEN

Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest x-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes. We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model's accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.

20.
Front Oncol ; 14: 1353446, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38690169

RESUMEN

Objective: The objective of this study was to provide a multi-modal deep learning framework for forecasting the survival of rectal cancer patients by utilizing both digital pathological images data and non-imaging clinical data. Materials and methods: The research included patients diagnosed with rectal cancer by pathological confirmation from January 2015 to December 2016. Patients were allocated to training and testing sets in a randomized manner, with a ratio of 4:1. The tissue microarrays (TMAs) and clinical indicators were obtained. Subsequently, we selected distinct deep learning models to individually forecast patient survival. We conducted a scanning procedure on the TMAs in order to transform them into digital pathology pictures. Additionally, we performed pre-processing on the clinical data of the patients. Subsequently, we selected distinct deep learning algorithms to conduct survival prediction analysis using patients' pathological images and clinical data, respectively. Results: A total of 292 patients with rectal cancer were randomly allocated into two groups: a training set consisting of 234 cases, and a testing set consisting of 58 instances. Initially, we make direct predictions about the survival status by using pre-processed Hematoxylin and Eosin (H&E) pathological images of rectal cancer. We utilized the ResNest model to extract data from histopathological images of patients, resulting in a survival status prediction with an AUC (Area Under the Curve) of 0.797. Furthermore, we employ a multi-head attention fusion (MHAF) model to combine image features and clinical features in order to accurately forecast the survival rate of rectal cancer patients. The findings of our experiment show that the multi-modal structure works better than directly predicting from histopathological images. It achieves an AUC of 0.837 in predicting overall survival (OS). Conclusions: Our study highlights the potential of multi-modal deep learning models in predicting survival status from histopathological images and clinical information, thus offering valuable insights for clinical applications.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA