Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38647155

RESUMEN

Accurately delineating the connection between short nucleolar RNA (snoRNA) and disease is crucial for advancing disease detection and treatment. While traditional biological experimental methods are effective, they are labor-intensive, costly and lack scalability. With the ongoing progress in computer technology, an increasing number of deep learning techniques are being employed to predict snoRNA-disease associations. Nevertheless, the majority of these methods are black-box models, lacking interpretability and the capability to elucidate the snoRNA-disease association mechanism. In this study, we introduce IGCNSDA, an innovative and interpretable graph convolutional network (GCN) approach tailored for the efficient inference of snoRNA-disease associations. IGCNSDA leverages the GCN framework to extract node feature representations of snoRNAs and diseases from the bipartite snoRNA-disease graph. SnoRNAs with high similarity are more likely to be linked to analogous diseases, and vice versa. To facilitate this process, we introduce a subgraph generation algorithm that effectively groups similar snoRNAs and their associated diseases into cohesive subgraphs. Subsequently, we aggregate information from neighboring nodes within these subgraphs, iteratively updating the embeddings of snoRNAs and diseases. The experimental results demonstrate that IGCNSDA outperforms the most recent, highly relevant methods. Additionally, our interpretability analysis provides compelling evidence that IGCNSDA adeptly captures the underlying similarity between snoRNAs and diseases, thus affording researchers enhanced insights into the snoRNA-disease association mechanism. Furthermore, we present illustrative case studies that demonstrate the utility of IGCNSDA as a valuable tool for efficiently predicting potential snoRNA-disease associations. The dataset and source code for IGCNSDA are openly accessible at: https://github.com/altriavin/IGCNSDA.


Asunto(s)
ARN Nucleolar Pequeño , ARN Nucleolar Pequeño/genética , Humanos , Algoritmos , Biología Computacional/métodos , Redes Neurales de la Computación , Programas Informáticos , Aprendizaje Profundo
2.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38942594

RESUMEN

Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.


Asunto(s)
Aprendizaje Profundo , Enzimas , Enzimas/química , Enzimas/metabolismo , Biología Computacional/métodos , Programas Informáticos , Proteínas/química , Proteínas/metabolismo , Bases de Datos de Proteínas , Algoritmos
3.
Sensors (Basel) ; 24(10)2024 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-38793930

RESUMEN

The widespread use of encrypted traffic poses challenges to network management and network security. Traditional machine learning-based methods for encrypted traffic classification no longer meet the demands of management and security. The application of deep learning technology in encrypted traffic classification significantly improves the accuracy of models. This study focuses primarily on encrypted traffic classification in the fields of network analysis and network security. To address the shortcomings of existing deep learning-based encrypted traffic classification methods in terms of computational memory consumption and interpretability, we introduce a Parameter-Efficient Fine-Tuning method for efficiently tuning the parameters of an encrypted traffic classification model. Experimentation is conducted on various classification scenarios, including Tor traffic service classification and malicious traffic classification, using multiple public datasets. Fair comparisons are made with state-of-the-art deep learning model architectures. The results indicate that the proposed method significantly reduces the scale of fine-tuning parameters and computational resource usage while achieving performance comparable to that of the existing best models. Furthermore, we interpret the learning mechanism of encrypted traffic representation in the pre-training model by analyzing the parameters and structure of the model. This comparison validates the hypothesis that the model exhibits hierarchical structure, clear organization, and distinct features.

4.
J Environ Manage ; 356: 120510, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38490009

RESUMEN

Continuous effluent quality prediction in wastewater treatment processes is crucial to proactively reduce the risks to the environment and human health. However, wastewater treatment is an extremely complex process controlled by several uncertain, interdependent, and sometimes poorly characterized physico-chemical-biological process parameters. In addition, there are substantial spatiotemporal variations, uncertainties, and high non-linear interactions among the water quality parameters and process variables involved in the treatment process. Such complexities hinder efficient monitoring, operation, and management of wastewater treatment plants under normal and abnormal conditions. Typical mathematical and statistical tools most often fail to capture such complex interrelationships, and therefore data-driven techniques offer an attractive solution to effectively quantify the performance of wastewater treatment plants. Although several previous studies focused on applying regression-based data-driven models (e.g., artificial neural network) to predict some wastewater treatment effluent parameters, most of these studies employed a limited number of input variables to predict only one or two parameters characterizing the effluent quality (e.g., chemical oxygen demand (COD) and/or suspended solids (SS)). Harnessing the power of Artificial Intelligence (AI), the current study proposes multi-gene genetic programming (MGGP)-based models, using a dataset obtained from an operational wastewater treatment plant, deploying membrane aerated biofilm reactor, to predict the filtrated COD, ammonia (NH4), and SS concentrations along with the carbon-to-nitrogen ratio (C/N) within the effluent. Input features included a set of process variables characterizing the influent quality (e.g., filtered COD, NH4, and SS concentrations), water physics and chemistry parameters (e.g., temperature and pH), and operation conditions (e.g., applied air pressure). The developed MGGP-based models accurately reproduced the observations of the four output variables with correlation coefficient values that ranged between 0.98 and 0.99 during training and between 0.96 and 0.99 during testing, reflecting the power of the developed models in predicting the quality of the effluent from the treatment system. Interpretability analyses were subsequently deployed to confirm the intuitive understanding of input-output interrelations and to identify the governing parameters of the treatment process. The developed MGGP-based models can facilitate the AI-driven monitoring and management of wastewater treatment plants through devising optimal rapid operation and control schemes and assisting the plants' operators in maintaining proper performance of the plants under various normal and disruptive operational conditions.


Asunto(s)
Inteligencia Artificial , Purificación del Agua , Humanos , Eliminación de Residuos Líquidos/métodos , Purificación del Agua/métodos , Redes Neurales de la Computación , Análisis de la Demanda Biológica de Oxígeno
5.
Artif Intell Med ; 152: 102871, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38685169

RESUMEN

For the diagnosis and outcome prediction of gastric cancer (GC), machine learning methods based on whole slide pathological images (WSIs) have shown promising performance and reduced the cost of manual analysis. Nevertheless, accurate prediction of GC outcome may rely on multiple modalities with complementary information, particularly gene expression data. Thus, there is a need to develop multimodal learning methods to enhance prediction performance. In this paper, we collect a dataset from Ruijin Hospital and propose a multimodal learning method for GC diagnosis and outcome prediction, called GaCaMML, which is featured by a cross-modal attention mechanism and Per-Slide training scheme. Additionally, we perform feature attribution analysis via integrated gradient (IG) to identify important input features. The proposed method improves prediction accuracy over the single-modal learning method on three tasks, i.e., survival prediction (by 4.9% on C-index), pathological stage classification (by 11.6% on accuracy), and lymph node classification (by 12.0% on accuracy). Especially, the Per-Slide strategy addresses the issue of a high WSI-to-patient ratio and leads to much better results compared with the Per-Person training scheme. For the interpretable analysis, we find that although WSIs dominate the prediction for most samples, there is still a substantial portion of samples whose prediction highly relies on gene expression information. This study demonstrates the great potential of multimodal learning in GC-related prediction tasks and investigates the contribution of WSIs and gene expression, respectively, which not only shows how the model makes a decision but also provides insights into the association between macroscopic pathological phenotypes and microscopic molecular features.


Asunto(s)
Aprendizaje Automático , Neoplasias Gástricas , Neoplasias Gástricas/genética , Neoplasias Gástricas/patología , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Pronóstico , Perfilación de la Expresión Génica/métodos
6.
Diagn Interv Imaging ; 105(5): 191-205, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38272773

RESUMEN

PURPOSE: The purpose of this study was to assess the predictive performance of multiparametric magnetic resonance imaging (MRI) for molecular subtypes and interpret features using SHapley Additive exPlanations (SHAP) analysis. MATERIAL AND METHODS: Patients with breast cancer who underwent pre-treatment MRI (including ultrafast dynamic contrast-enhanced MRI, magnetic resonance spectroscopy, diffusion kurtosis imaging and intravoxel incoherent motion) were recruited between February 2019 and January 2022. Thirteen semantic and thirteen multiparametric features were collected and the key features were selected to develop machine-learning models for predicting molecular subtypes of breast cancers (luminal A, luminal B, triple-negative and HER2-enriched) by using stepwise logistic regression. Semantic model and multiparametric model were built and compared based on five machine-learning classifiers. Model decision-making was interpreted using SHAP analysis. RESULTS: A total of 188 women (mean age, 53 ± 11 [standard deviation] years; age range: 25-75 years) were enrolled and further divided into training cohort (131 women) and validation cohort (57 women). XGBoost demonstrated good predictive performance among five machine-learning classifiers. Within the validation cohort, the areas under the receiver operating characteristic curves (AUCs) for the semantic models ranged from 0.693 (95% confidence interval [CI]: 0.478-0.839) for HER2-enriched subtype to 0.764 (95% CI: 0.681-0.908) for luminal A subtype, inferior to multiparametric models that yielded AUCs ranging from 0.771 (95% CI: 0.630-0.888) for HER2-enriched subtype to 0.857 (95% CI: 0.717-0.957) for triple-negative subtype. The AUCs between the semantic and the multiparametric models did not show significant differences (P range: 0.217-0.640). SHAP analysis revealed that lower iAUC, higher kurtosis, lower D*, and lower kurtosis were distinctive features for luminal A, luminal B, triple-negative breast cancer, and HER2-enriched subtypes, respectively. CONCLUSION: Multiparametric MRI is superior to semantic models to effectively predict the molecular subtypes of breast cancer.


Asunto(s)
Neoplasias de la Mama , Aprendizaje Automático , Imágenes de Resonancia Magnética Multiparamétrica , Humanos , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Persona de Mediana Edad , Imágenes de Resonancia Magnética Multiparamétrica/métodos , Adulto , Anciano , Valor Predictivo de las Pruebas
7.
J Imaging ; 10(2)2024 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-38392093

RESUMEN

The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread, and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing of infected patients. Medical imaging, such as X-ray and computed tomography (CT), combined with the potential of artificial intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their ensemble, using majority voting, have been used to classify COVID-19, pneumoniæ and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods-occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT-and using a global technique-neuron activation profiles. The mean micro F1 score of the models for COVID-19 classifications ranged from 0.66 to 0.875, and was 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model.

8.
Front Artif Intell ; 6: 1104064, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38249791

RESUMEN

With the rapid development of deep learning techniques, the applications have become increasingly widespread in various domains. However, traditional deep learning methods are often referred to as "black box" models with low interpretability of their results, posing challenges for their application in certain critical domains. In this study, we propose a comprehensive method for the interpretability analysis of sentiment models. The proposed method encompasses two main aspects: attention-based analysis and external knowledge integration. First, we train the model within sentiment classification and generation tasks to capture attention scores from multiple perspectives. This multi-angle approach reduces bias and provides a more comprehensive understanding of the underlying sentiment. Second, we incorporate an external knowledge base to improve evidence extraction. By leveraging character scores, we retrieve complete sentiment evidence phrases, addressing the challenge of incomplete evidence extraction in Chinese texts. Experimental results on a sentiment interpretability evaluation dataset demonstrate the effectiveness of our method. We observe a notable increase in accuracy by 1.3%, Macro-F1 by 13%, and MAP by 23%. Overall, our approach offers a robust solution for enhancing the interpretability of sentiment models by combining attention-based analysis and the integration of external knowledge.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA