RESUMO
Drug combination therapy has gradually become a promising treatment strategy for complex or co-existing diseases. As drug-drug interactions (DDIs) may cause unexpected adverse drug reactions, DDI prediction is an important task in pharmacology and clinical applications. Recently, researchers have proposed several deep learning methods to predict DDIs. However, these methods mainly exploit the chemical or biological features of drugs, which is insufficient and limits the performances of DDI prediction. Here, we propose a new deep multimodal feature fusion framework for DDI prediction, DMFDDI, which fuses drug molecular graph, DDI network and the biochemical similarity features of drugs to predict DDIs. To fully extract drug molecular structure, we introduce an attention-gated graph neural network for capturing the global features of the molecular graph and the local features of each atom. A sparse graph convolution network is introduced to learn the topological structure information of the DDI network. In the multimodal feature fusion module, an attention mechanism is used to efficiently fuse different features. To validate the performance of DMFDDI, we compare it with 10 state-of-the-art methods. The comparison results demonstrate that DMFDDI achieves better performance in DDI prediction. Our method DMFDDI is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDEBLab/DMFDDI.git.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Interações Medicamentosas , Estrutura Molecular , Biblioteca GênicaRESUMO
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground miners (UAUM) was constructed and included ten categories of such actions. Underground images were enhanced using spatial- and frequency-domain enhancement algorithms. A combination of the YOLOX object detection algorithm and the Lite-HRNet human key-point detection algorithm was utilized to obtain skeleton modal data. The CBAM-PoseC3D model, a skeleton modal action-recognition model incorporating the CBAM attention module, was proposed and combined with the RGB modal feature-extraction model CBAM-SlowOnly. Ultimately, this formed the Convolutional Block Attention Module-Multimodal Feature-Fusion Action Recognition (CBAM-MFFAR) model for recognizing unsafe actions of underground miners. The improved CBAM-MFFAR model achieved a recognition accuracy of 95.8% on the NTU60 RGB+D public dataset under the X-Sub benchmark. Compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, the recognition accuracy was improved by 2%, 2.7%, 7.3%, and 14.3%, respectively. On the UAUM dataset, the CBAM-MFFAR model achieved a recognition accuracy of 94.6%, with improvements of 2.6%, 4%, 12%, and 17.3% compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, respectively. In field validation at mining sites, the CBAM-MFFAR model accurately recognized similar and multiple unsafe actions among underground miners.
RESUMO
Aiming at the traditional single sensor vibration signal cannot fully express the bearing running state, and in the high noise background, the traditional algorithm is insufficient for fault feature extraction. This paper proposes a fault diagnosis algorithm based on multi-sensor and hybrid multimodal feature fusion to achieve high-precision fault diagnosis by leveraging the operating state information of bearings in a high-noise environment to the fullest extent possible. First, the horizontal and vertical vibration signals from two sensors are fused using principal component analysis, aiming to provide a more comprehensive description of the bearing's operating condition, followed by data set segmentation. Following fusion, time-frequency feature maps are generated using a continuous wavelet transform for global time-frequency feature extraction. A first diagnostic model is then developed utilizing a residual neural network. Meanwhile, the feature data is normalized, and 28 time-frequency feature indexes are extracted. Subsequently, a second diagnostic model is constructed using a support vector machine. Lastly, the two diagnosis models are integrated to derive the final model through an ensemble learning algorithm fused at the decision level and complemented by a genetic algorithm solution to improve the diagnosis accuracy. Experimental results demonstrate the effectiveness of the proposed algorithm in achieving superior diagnostic performance with a 97.54% accuracy rate.
RESUMO
The identification of multi-source signals with time-frequency aliasing is a complex problem in wideband signal reception. The traditional method of first separation and identification especially fails due to the significant separation error under underdetermined conditions when the degree of time-frequency aliasing is high. The single-mode recognition method does not need to be separated first. However, the single-mode features contain less signal information, making it challenging to identify time-frequency aliasing signals accurately. To solve the above problems, this article proposes a time-frequency aliasing signal recognition method based on multi-mode fusion (TRMM). This method uses the U-Net network to extract pixel-by-pixel features of the time-frequency and wave-frequency images and then performs weighted fusion. The multimodal feature scores are used as the classification basis to realize the recognition of the time-frequency aliasing signals. When the SNR is 0 dB, the recognition rate of the four-signal aliasing model can reach more than 97.3%.
RESUMO
BACKGROUND: Dental health issues are on the rise, necessitating prompt and precise diagnosis. Automated dental condition classification can support this need. OBJECTIVE: The study aims to evaluate the effectiveness of deep learning methods and multimodal feature fusion techniques in advancing the field of automated dental condition classification. METHODS AND MATERIALS: A dataset of 11,653 clinically sourced images representing six prevalent dental conditions-caries, calculus, gingivitis, tooth discoloration, ulcers, and hypodontia-was utilized. Features were extracted using five Convolutional Neural Network (CNN) models, then fused into a matrix. Classification models were constructed using Support Vector Machines (SVM) and Naive Bayes classifiers. Evaluation metrics included accuracy, recall rate, precision, and Kappa index. RESULTS: The SVM classifier integrated with feature fusion demonstrated superior performance with a Kappa index of 0.909 and accuracy of 0.925. This significantly surpassed individual CNN models such as EfficientNetB0, which achieved a Kappa of 0.814 and accuracy of 0.847. CONCLUSIONS: The amalgamation of feature fusion with advanced machine learning algorithms can significantly bolster the precision and robustness of dental condition classification systems. Such a method presents a valuable tool for dental professionals, facilitating enhanced diagnostic accuracy and subsequently improved patient outcomes.
Assuntos
Aprendizado Profundo , Humanos , Teorema de Bayes , Redes Neurais de Computação , Algoritmos , Aprendizado de Máquina , Máquina de Vetores de SuporteRESUMO
The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.
Assuntos
Cuidados Críticos , Fontes de Energia Elétrica , Humanos , Semântica , TecnologiaRESUMO
In low-voltage distribution systems, the load types are complex, so traditional detection methods cannot effectively identify series arc faults. To address this problem, this paper proposes an arc fault detection method based on multimodal feature fusion. Firstly, the different mode features of the current signal are extracted by mathematical statistics, Fourier transform, wavelet packet transform, and continuous wavelet transform. The different modal features include one-dimensional features, such as time-domain features, frequency-domain features, and wavelet packet energy features, and two-dimensional features of time-spectrum images. Secondly, the extracted features are preprocessed and prioritized for importance based on different machine learning algorithms to improve the feature data quality. The features of higher importance are input into an arc fault detection model. Finally, an arc fault detection model is constructed based on a one-dimensional convolutional network and a deep residual shrinkage network to achieve high accuracy. The proposed detection method has higher detection accuracy and better performance compared with the arc fault detection method based on single-mode features.
RESUMO
BACKGROUND AND OBJECTIVE: Sudden cardiac death (SCD) is a critical health issue characterized by the sudden failure of heart function, often caused by ventricular fibrillation (VF). Early prediction of SCD is crucial to enable timely interventions. However, current methods predict SCD only a few minutes before its onset, limiting intervention time. This study aims to develop a deep learning-based model for the early prediction of SCD using electrocardiography (ECG) signals. METHODS: A multimodal explainable deep learning-based model is developed to analyze ECG signals at discrete intervals ranging from 5 to 30 min before SCD onset. The raw ECG signals, 2D scalograms generated through wavelet transform and 2D Hilbert spectrum generated through Hilbert-Huang transform (HHT) of ECG signals were applied to multiple deep learning algorithms. For raw ECG, a combination of 1D-convolutional neural networks (1D-CNN) and long short-term memory networks were employed for feature extraction and temporal pattern recognition. Besides, to extract and analyze features from scalograms and Hilbert spectra, Vision Transformer (ViT) and 2D-CNN have been used. RESULTS: The developed model achieved high performance, with accuracy, precision, recall and F1-score of 98.81%, 98.83%, 98.81%, and 98.81% respectively to predict SCD onset 30 min in advance. Further, the proposed model can accurately classify SCD patients and normal controls with 100% accuracy. Thus, the proposed method outperforms the existing state-of-the-art methods. CONCLUSIONS: The developed model is capable of capturing diverse patterns on ECG signals recorded at multiple discrete time intervals (at 5-minute increments from 5 min to 30 min) prior to SCD onset that could discriminate for SCD. The proposed model significantly improves early SCD prediction, providing a valuable tool for continuous ECG monitoring in high-risk patients.
RESUMO
Cardiovascular disease (CVD) remains a leading cause of death globally, presenting significant challenges in early detection and treatment. The complexity of CVD arises from its multifaceted nature, influenced by a combination of genetic, environmental, and lifestyle factors. Traditional diagnostic approaches often struggle to effectively integrate and interpret the heterogeneous data associated with CVD. Addressing this challenge, we introduce a novel Attention-Based Cross-Modal (ABCM) transfer learning framework. This framework innovatively merges diverse data types, including clinical records, medical imagery, and genetic information, through an attention-driven mechanism. This mechanism adeptly identifies and focuses on the most pertinent attributes from each data source, thereby enhancing the model's ability to discern intricate interrelationships among various data types. Our extensive testing and validation demonstrate that the ABCM framework significantly surpasses traditional single-source models and other advanced multi-source methods in predicting CVD. Specifically, our approach achieves an accuracy of 93.5%, precision of 92.0%, recall of 94.5%, and an impressive area under the curve (AUC) of 97.2%. These results not only underscore the superior predictive capability of our model but also highlight its potential in offering more accurate and early detection of CVD. The integration of cross-modal data through attention-based mechanisms provides a deeper understanding of the disease, paving the way for more informed clinical decision-making and personalized patient care.
Assuntos
Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/diagnóstico , Aprendizagem , Área Sob a Curva , Tomada de Decisão Clínica , Aprendizado de MáquinaRESUMO
The issue of air pollution from transportation sources remains a major concern, particularly the emissions from heavy-duty diesel vehicles, which pose serious threats to ecosystems and human health. China VI emission standards mandate On-Board Diagnostics (OBD) systems in heavy-duty diesel vehicles for real-time data transmission, yet the current data quality, especially concerning crucial parameters like NOx output, remains inadequate for effective regulation. To address this, a novel approach integrating Multimodal Feature Fusion with Particle Swarm Optimization (OBD-PSOMFF) is proposed. This network employs Long Short-Term Memory (LSTM) networks to extract features from OBD indicators, capturing temporal dependencies. PSO optimizes feature weights, enhancing prediction accuracy. Testing on 23 heavy-duty vehicles demonstrates significant improvements in predicting NOx and CO2 mass emission rates, with mean squared errors reduced by 65.205 % and 70.936 % respectively compared to basic LSTM models. This innovative multimodal fusion method offers a robust framework for emission prediction, crucial for effective vehicle emission regulation and environmental preservation.
RESUMO
Purpose: Cardiotocography (CTG), which measures uterine contraction (UC) and fetal heart rate (FHR), is a crucial tool for assessing fetal health during pregnancy. However, traditional computerized cardiotocography (cCTG) approaches have non-negligible calibration errors in feature extraction and heavily rely on the expertise and prior experience to define diagnostic features from CTG or FHR signals. Although previous works have studied deep learning methods for extracting CTG or FHR features, these methods still neglect the clinical information of pregnant women. Methods: In this paper, we proposed a multimodal deep learning architecture (MMDLA) for intelligent antepartum fetal monitoring that is capable of performing automatic CTG feature extraction, fusion with clinical data and classification. The multimodal feature fusion was achieved by concatenating high-level CTG features, which were extracted from preprocessed CTG signals via a convolution neural network (CNN) with six convolution layers and five fully connected layers, and the clinical data of pregnant women. Eventually, light gradient boosting machine (LGBM) was implemented as fetal status assessment classifier. The effectiveness of MMDLA was evaluated using a dataset of 16,355 cases, each of which includes FHR signal, UC signal and pertinent clinical data like maternal age and gestational age. Results: With an accuracy of 90.77% and an area under the curve (AUC) value of 0.9201, the multimodal features performed admirably. The data imbalance issue was also effectively resolved by the LGBM classifier, with a normal-F1 value of 0.9376 and an abnormal-F1 value of 0.8223. Conclusion: In summary, the proposed MMDLA is conducive to the realization of intelligent antepartum fetal monitoring.
RESUMO
Computer-aided diagnosis (CAD) methods such as the X-rays-based method is one of the cheapest and safe alternative options to diagnose the disease compared to other alternatives such as Computed Tomography (CT) scan, and so on. However, according to our experiments on X-ray public datasets and real clinical datasets, we found that there are two challenges in the current classification of pneumonia: existing public datasets have been preprocessed too well, making the accuracy of the results relatively high; existing models have weak ability to extract features from the clinical pneumonia X-ray dataset. To solve the dataset problems, we collected a new dataset of pediatric pneumonia with labels obtained through a comprehensive pathogen-radiology-clinical diagnostic screening. Then, to accurately capture the important features in imbalanced data, based on the new dataset, we proposed for the first time a two-stage training multimodal pneumonia classification method combining X-ray images and blood testing data, which improves the image feature extraction ability through a global-local attention module and mitigate the influence of class imbalance data on the results through the two-stage training strategy. In experiments, the performance of our proposed model is the best on new clinical data and outperforms the diagnostic accuracy of four experienced radiologists. Through further research on the performance of various blood testing indicators in the model, we analyzed the conclusions that are helpful for radiologists to diagnose.
RESUMO
The outbreak of coronavirus disease 2019 (COVID-19) has caused massive infections and large death tolls worldwide. Despite many studies on the clinical characteristics and the treatment plans of COVID-19, they rarely conduct in-depth prognostic research on leveraging consecutive rounds of multimodal clinical examination and laboratory test data to facilitate clinical decision-making for the treatment of COVID-19. To address this issue, we propose a multistage multimodal deep learning (MMDL) model to (1) first assess the patient's current condition (i.e., the mild and severe symptoms), then (2) give early warnings to patients with mild symptoms who are at high risk to develop severe illness. In MMDL, we build a sequential stage-wise learning architecture whose design philosophy embodies the model's predicted outcome and does not only depend on the current situation but also the history. Concretely, we meticulously combine the latest round of multimodal clinical data and the decayed past information to make assessments and predictions. In each round (stage), we design a two-layer multimodal feature extractor to extract the latent feature representation across different modalities of clinical data, including patient demographics, clinical manifestation, and 11 modalities of laboratory test results. We conduct experiments on a clinical dataset consisting of 216 COVID-19 patients that have passed the ethical review of the medical ethics committee. Experimental results validate our assumption that sequential stage-wise learning outperforms single-stage learning, but history long ago has little influence on the learning outcome. Also, comparison tests show the advantage of multimodal learning. MMDL with multimodal inputs can beat any reduced model with single-modal inputs only. In addition, we have deployed the prototype of MMDL in a hospital for clinical comparison tests and to assist doctors in clinical diagnosis.