Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
IEEE Comput Graph Appl ; PP2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39250367

RESUMO

We present Q-Seg, a novel unsupervised image segmentation method based on quantum annealing, tailored for existing quantum hardware. We formulate the pixel- wise segmentation problem, which assimilates spectral and spatial information of the image, as a graph-cut optimization task. Our method efficiently leverages the interconnected qubit topology of the D-Wave Advantage device, offering superior scalability over existing quantum approaches and outperforming several tested state-of-the-art classical methods. Empirical evaluations on synthetic datasets have shown that Q-Seg has better runtime performance than the state-of-the-art classical optimizer Gurobi. The method has also been tested on earth observation image segmentation, a critical area with noisy and unreliable annotations. In the era of noisy intermediate-scale quantum, Q-Seg emerges as a reliable contender for real-world applications in comparison to advanced techniques like Segment Anything. Consequently, Q-Seg offers a promising solution using available quantum hardware, especially in situations constrained by limited labeled data and the need for efficient computational runtime.

2.
Heliyon ; 10(17): e36041, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39281576

RESUMO

Protein solubility prediction is useful for the careful selection of highly effective candidate proteins for drug development. In recombinant proteins synthesis, solubility prediction is valuable for optimizing key protein characteristics, including stability, functionality, and ease of purification. It contains valuable information about potential biomarkers or therapeutic targets and helps in early forecasting of neurodegenerative diseases, cancer, and cardiovascular disorders. Traditional wet-lab experimental protein solubility prediction approaches are error-prone, time-consuming, and costly. Researchers harnessed the competence of Artificial Intelligence approaches for replacing experimental approaches with computational predictors. These predictors inferred the solubility of proteins by analyzing amino acids distributions in raw protein sequences. There is still a lot of room for the development of robust computational predictors because existing predictors remain fail in extracting comprehensive discriminative distribution of amino acids. To more precisely discriminate soluble proteins from insoluble proteins, this paper presents ProSol-Multi predictor that makes use of a novel MLCDE encoder and Random Forest classifier. MLCDE encoder transforms protein sequences into informative statistical vectors by capturing amino acids multi-level correlation and discriminative distribution within raw protein sequences. The performance of proposed encoder is evaluated against 56 existing protein sequence encoding methods on a widely used protein solubility prediction benchmark dataset under two different experimental settings namely intrinsic and extrinsic. Intrinsic evaluation reveals that from all sequence encoders, proposed MLCDE encoder manages to generate non-overlapping clusters of soluble and insoluble classes. In extrinsic evaluation, 10 machine learning classifiers achieve better performance with proposed MLCDE encoder as compared to 56 existing protein sequence encoders. Moreover, across 4 public benchmark datasets, proposed ProSol-Multi predictor outshines 20 existing predictors by an average accuracy of 3%, MCC and AU-ROC of 2%. ProSol-Multi interactive web application is available at https://sds_genetic_analysis.opendfki.de/ProSol-Multi.

3.
Front Artif Intell ; 7: 1236947, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39021435

RESUMO

Since the advent of deep learning (DL), the field has witnessed a continuous stream of innovations. However, the translation of these advancements into practical applications has not kept pace, particularly in safety-critical domains where artificial intelligence (AI) must meet stringent regulatory and ethical standards. This is underscored by the ongoing research in eXplainable AI (XAI) and privacy-preserving machine learning (PPML), which seek to address some limitations associated with these opaque and data-intensive models. Despite brisk research activity in both fields, little attention has been paid to their interaction. This work is the first to thoroughly investigate the effects of privacy-preserving techniques on explanations generated by common XAI methods for DL models. A detailed experimental analysis is conducted to quantify the impact of private training on the explanations provided by DL models, applied to six image datasets and five time series datasets across various domains. The analysis comprises three privacy techniques, nine XAI methods, and seven model architectures. The findings suggest non-negligible changes in explanations through the implementation of privacy measures. Apart from reporting individual effects of PPML on XAI, the paper gives clear recommendations for the choice of techniques in real applications. By unveiling the interdependencies of these pivotal technologies, this research marks an initial step toward resolving the challenges that hinder the deployment of AI in safety-critical settings.

4.
Front Artif Intell ; 7: 1428501, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39021434

RESUMO

Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.

5.
Cancer Med ; 13(12): e7398, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38923826

RESUMO

Artificial intelligence (AI) promises to be the next revolutionary step in modern society. Yet, its role in all fields of industry and science need to be determined. One very promising field is represented by AI-based decision-making tools in clinical oncology leading to more comprehensive, personalized therapy approaches. In this review, the authors provide an overview on all relevant technical applications of AI in oncology, which are required to understand the future challenges and realistic perspectives for decision-making tools. In recent years, various applications of AI in medicine have been developed focusing on the analysis of radiological and pathological images. AI applications encompass large amounts of complex data supporting clinical decision-making and reducing errors by objectively quantifying all aspects of the data collected. In clinical oncology, almost all patients receive a treatment recommendation in a multidisciplinary cancer conference at the beginning and during their treatment periods. These highly complex decisions are based on a large amount of information (of the patients and of the various treatment options), which need to be analyzed and correctly classified in a short time. In this review, the authors describe the technical and medical requirements of AI to address these scientific challenges in a multidisciplinary manner. Major challenges in the use of AI in oncology and decision-making tools are data security, data representation, and explainability of AI-based outcome predictions, in particular for decision-making processes in multidisciplinary cancer conferences. Finally, limitations and potential solutions are described and compared for current and future research attempts.


Assuntos
Inteligência Artificial , Tomada de Decisão Clínica , Oncologia , Neoplasias , Humanos , Oncologia/métodos , Neoplasias/terapia , Medicina de Precisão/métodos , Sistemas de Apoio a Decisões Clínicas
6.
Comput Biol Med ; 176: 108538, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38759585

RESUMO

Anticancer peptides (ACPs) key properties including bioactivity, high efficacy, low toxicity, and lack of drug resistance make them ideal candidates for cancer therapies. To deeply explore the potential of ACPs and accelerate development of cancer therapies, although 53 Artificial Intelligence supported computational predictors have been developed for ACPs and non ACPs classification but only one predictor has been developed for ACPs functional types annotations. Moreover, these predictors extract amino acids distribution patterns to transform peptides sequences into statistical vectors that are further fed to classifiers for discriminating peptides sequences and annotating peptides functional classes. Overall, these predictors remain fail in extracting diverse types of amino acids distribution patterns from peptide sequences. The paper in hand presents a unique CARE encoder that transforms peptides sequences into statistical vectors by extracting 4 different types of distribution patterns including correlation, distribution, composition, and transition. Across public benchmark dataset, proposed encoder potential is explored under two different evaluation settings namely; intrinsic and extrinsic. Extrinsic evaluation indicates that 12 different machine learning classifiers achieve superior performance with the proposed encoder as compared to 55 existing encoders. Furthermore, an intrinsic evaluation reveals that, unlike existing encoders, the proposed encoder generates more discriminative clusters for ACPs and non-ACPs classes. Across 8 public benchmark ACPs and non-ACPs classification datasets, proposed encoder and Adaboost classifier based CAPTURE predictor outperforms existing predictors with an average accuracy, recall and MCC score of 1%, 4%, and 2% respectively. In generalizeability evaluation case study, across 7 benchmark anti-microbial peptides classification datasets, CAPTURE surpasses existing predictors by an average AU-ROC of 2%. CAPTURE predictive pipeline along with label powerset method outperforms state-of-the-art ACPs functional types predictor by 5%, 5%, 5%, 6%, and 3% in terms of average accuracy, subset accuracy, precision, recall, and F1 respectively. CAPTURE web application is available at https://sds_genetic_analysis.opendfki.de/CAPTURE.


Assuntos
Antineoplásicos , Peptídeos , Humanos , Antineoplásicos/uso terapêutico , Antineoplásicos/química , Peptídeos/química , Aprendizado de Máquina , Sequência de Aminoácidos , Biologia Computacional/métodos , Neoplasias/tratamento farmacológico , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas
7.
Sci Rep ; 14(1): 9466, 2024 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-38658614

RESUMO

Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.


Assuntos
DNA Circular , Saccharomyces cerevisiae , DNA Circular/genética , Humanos , Saccharomyces cerevisiae/genética , Arabidopsis/genética , Biologia Computacional/métodos , Nucleotídeos/genética , Máquina de Vetores de Suporte
8.
Brief Funct Genomics ; 23(2): 163-179, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-37248673

RESUMO

Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.


Assuntos
Biologia Computacional , Proteínas , Proteínas/metabolismo , Sequência de Aminoácidos , Aprendizado de Máquina , Tirosina
9.
Front Bioinform ; 3: 1194993, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37484865

RESUMO

Artificial Intelligence (AI) has achieved remarkable success in image generation, image analysis, and language modeling, making data-driven techniques increasingly relevant in practical real-world applications, promising enhanced creativity and efficiency for human users. However, the deployment of AI in high-stakes domains such as infrastructure and healthcare still raises concerns regarding algorithm accountability and safety. The emerging field of explainable AI (XAI) has made significant strides in developing interfaces that enable humans to comprehend the decisions made by data-driven models. Among these approaches, concept-based explainability stands out due to its ability to align explanations with high-level concepts familiar to users. Nonetheless, early research in adversarial machine learning has unveiled that exposing model explanations can render victim models more susceptible to attacks. This is the first study to investigate and compare the impact of concept-based explanations on the privacy of Deep Learning based AI models in the context of biomedical image analysis. An extensive privacy benchmark is conducted on three different state-of-the-art model architectures (ResNet50, NFNet, ConvNeXt) trained on two biomedical (ISIC and EyePACS) and one synthetic dataset (SCDB). The success of membership inference attacks while exposing varying degrees of attribution-based and concept-based explanations is systematically compared. The findings indicate that, in theory, concept-based explanations can potentially increase the vulnerability of a private AI system by up to 16% compared to attributions in the baseline setting. However, it is demonstrated that, in more realistic attack scenarios, the threat posed by explanations is negligible in practice. Furthermore, actionable recommendations are provided to ensure the safe deployment of concept-based XAI systems. In addition, the impact of differential privacy (DP) on the quality of concept-based explanations is explored, revealing that while negatively influencing the explanation ability, DP can have an adverse effect on the models' privacy.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9862-9882, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022895

RESUMO

With the advent of Deep Learning (DL), Super-Resolution (SR) has also become a thriving research area. However, despite promising results, the field still faces challenges that require further research, e.g., allowing flexible upsampling, more effective loss functions, and better evaluation metrics. We review the domain of SR in light of recent advances and examine state-of-the-art models such as diffusion (DDPM) and transformer-based SR models. We critically discuss contemporary strategies used in SR and identify promising yet unexplored research directions. We complement previous surveys by incorporating the latest developments in the field, such as uncertainty-driven losses, wavelet networks, neural architecture search, novel normalization methods, and the latest evaluation techniques. We also include several visualizations for the models and methods throughout each chapter to facilitate a global understanding of the trends in the field. This review ultimately aims at helping researchers to push the boundaries of DL applied to SR.

11.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36528802

RESUMO

Accurate prediction of deoxyribonucleic acid (DNA) modifications is essential to explore and discern the process of cell differentiation, gene expression and epigenetic regulation. Several computational approaches have been proposed for particular type-specific DNA modification prediction. Two recent generalized computational predictors are capable of detecting three different types of DNA modifications; however, type-specific and generalized modifications predictors produce limited performance across multiple species mainly due to the use of ineffective sequence encoding methods. The paper in hand presents a generalized computational approach "DNA-MP" that is competent to more precisely predict three different DNA modifications across multiple species. Proposed DNA-MP approach makes use of a powerful encoding method "position specific nucleotides occurrence based 117 on modification and non-modification class densities normalized difference" (POCD-ND) to generate the statistical representations of DNA sequences and a deep forest classifier for modifications prediction. POCD-ND encoder generates statistical representations by extracting position specific distributional information of nucleotides in the DNA sequences. We perform a comprehensive intrinsic and extrinsic evaluation of the proposed encoder and compare its performance with 32 most widely used encoding methods on $17$ benchmark DNA modifications prediction datasets of $12$ different species using $10$ different machine learning classifiers. Overall, with all classifiers, the proposed POCD-ND encoder outperforms existing $32$ different encoders. Furthermore, combinedly over 5-fold cross validation benchmark datasets and independent test sets, proposed DNA-MP predictor outperforms state-of-the-art type-specific and generalized modifications predictors by an average accuracy of 7% across 4mc datasets, 1.35% across 5hmc datasets and 10% for 6ma datasets. To facilitate the scientific community, the DNA-MP web application is available at https://sds_genetic_analysis.opendfki.de/DNA_Modifications/.


Assuntos
Epigênese Genética , Aprendizado de Máquina , Software , Nucleotídeos , DNA/genética
12.
Front Med (Lausanne) ; 9: 1025887, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36465911

RESUMO

Viral-host protein-protein interaction (VHPPI) prediction is essential to decoding molecular mechanisms of viral pathogens and host immunity processes that eventually help to control the propagation of viral diseases and to design optimized therapeutics. Multiple AI-based predictors have been developed to predict diverse VHPPIs across a wide range of viruses and hosts, however, these predictors produce better performance only for specific types of hosts and viruses. The prime objective of this research is to develop a robust meta predictor (MP-VHPPI) capable of more accurately predicting VHPPI across multiple hosts and viruses. The proposed meta predictor makes use of two well-known encoding methods Amphiphilic Pseudo-Amino Acid Composition (APAAC) and Quasi-sequence (QS) Order that capture amino acids sequence order and distributional information to most effectively generate the numerical representation of complete viral-host raw protein sequences. Feature agglomeration method is utilized to transform the original feature space into a more informative feature space. Random forest (RF) and Extra tree (ET) classifiers are trained on optimized feature space of both APAAC and QS order separate encoders and by combining both encodings. Further predictions of both classifiers are utilized to feed the Support Vector Machine (SVM) classifier that makes final predictions. The proposed meta predictor is evaluated over 7 different benchmark datasets, where it outperforms existing VHPPI predictors with an average performance of 3.07, 6.07, 2.95, and 2.85% in terms of accuracy, Mathews correlation coefficient, precision, and sensitivity, respectively. To facilitate the scientific community, the MP-VHPPI web server is available at https://sds_genetic_analysis.opendfki.de/MP-VHPPI/.

13.
iScience ; 25(10): 105169, 2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36267921

RESUMO

Protein-protein interaction (PPI) prediction is essential to understand the functions of proteins in various biological processes and their roles in the development, progression, and treatment of different diseases. To perform economical large-scale PPI analysis, several artificial intelligence-based approaches have been proposed. However, these approaches have limited predictive performance due to the use of in-effective statistical representation learning methods and predictors that lack the ability to extract comprehensive discriminative features. The paper in hand generates statistical representation of protein sequences by applying transfer learning in an unsupervised manner using FastText embedding generation approach. Furthermore, it presents "ADH-PPI" classifier which reaps the benefits of three different neural layers, long short-term memory, convolutional, and self-attention layers. Over two different species benchmark datasets, proposed ADH-PPI predictor outperforms existing approaches by an overall accuracy of 4%, and matthews correlation coefficient of 6%. In addition, it achieves an overall accuracy increment of 7% on four independent test sets. Availability: ADH-PPI web server is publicly available at https://sds_genetic_analysis.opendfki.de/PPI/.

14.
Interdiscip Sci ; 14(4): 841-862, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35947255

RESUMO

BACKGROUND AND OBJECTIVE: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. METHOD: The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach "Bot-Net" which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA-miRNA interaction prediction. RESULTS: BoT-Net outperforms the state-of-the-art lncRNA-miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA-protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. CONCLUSION: In the benchmark lncRNA-miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA-protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA-miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. AVAILABILITY: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/.


Assuntos
MicroRNAs , RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Biologia Computacional , Redes Neurais de Computação , Regulação da Expressão Gênica
15.
Health Inf Sci Syst ; 10(1): 21, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36039095

RESUMO

Purpose: Diabetic foot is a common complication associated with diabetes mellitus (DM) leading to ulcerations in the feet. Due to diabetic neuropathy, most patients have reduced sensitivity to pain. As a result, minor injuries go unnoticed and progress into ulcers. The timely detection of potential ulceration points and intervention is crucial in preventing amputation. Changes in plantar temperature are one of the early signs of ulceration. Previous studies have focused on either binary classification or grading of DM severity, but neglect the holistic consideration of the problem. Moreover, multi-class studies exhibit severe performance variations between different classes. Methods: We propose a new convolutional neural network for discrimination between non-DM and five DM severity grades from plantar thermal images and compare its performance against pre-trained networks such as AlexNet and related works. We address the lack of data and imbalanced class distribution, prevalent in prior work, achieving well-balanced classification performance. Results: Our proposed model achieved the best performance with a mean accuracy of 0.9827, mean sensitivity of 0.9684 and mean specificity of 0.9892 in combined diabetic foot detection and grading. Conclusion: To the best of our knowledge, this study sets a new state-of-the-art in plantar foot thermogram detection and grading, while being the first to implement a holistic multi-class classification and grading solution. Reliable automatic thermogram grading is a first step towards the development of smart health devices for DM patients.

16.
Comput Struct Biotechnol J ; 20: 3986-4002, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35983235

RESUMO

Subcellular localization of Ribonucleic Acid (RNA) molecules provide significant insights into the functionality of RNAs and helps to explore their association with various diseases. Predominantly developed single-compartment localization predictors (SCLPs) lack to demystify RNA association with diverse biochemical and pathological processes mainly happen through RNA co-localization in multiple compartments. Limited multi-compartment localization predictors (MCLPs) manage to produce decent performance only for target RNA class of particular sub-type. Further, existing computational approaches have limited practical significance and potential to optimize therapeutics due to the poor degree of model explainability. The paper in hand presents an explainable Long Short-Term Memory (LSTM) network "EL-RMLocNet", predictive performance and interpretability of which are optimized using a novel GeneticSeq2Vec statistical representation learning scheme and attention mechanism for accurate multi-compartment localization prediction of different RNAs solely using raw RNA sequences. GeneticSeq2Vec generates optimized statistical vectors of raw RNA sequences by capturing short and long range relations of nucleotide k-mers. Using sequence vectors generated by GeneticSeq2Vec scheme, Long Short Term Memory layers extract most informative features, weighting of which on the basis of discriminative potential for accurate multi-compartment localization prediction is performed using attention layer. Through reverse engineering, weights of statistical feature space are mapped to nucleotide k-mers patterns to make multi-compartment localization prediction decision making transparent and explainable for different RNA classes and species. Empirical evaluation indicates that EL-RMLocNet outperforms state-of-the-art predictor for subcellular localization prediction of 4 different RNA classes by an average accuracy figure of 8% for Homo Sapiens species and 6% for Mus Musculus species. EL-RMLocNet is freely available as a web server at (https://sds_genetic_analysis.opendfki.de/subcellular_loc/).

17.
PLoS One ; 17(7): e0270275, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35789333

RESUMO

Viral-host protein protein interaction (PPI) analysis is essential to decode the molecular mechanism of viral pathogen and host immunity processes which eventually help to control viral diseases and optimize therapeutics. The state-of-the-art viral-host PPI predictor leverages unsupervised embedding learning technique (doc2vec) to generate statistical representations of viral-host protein sequences and a Random Forest classifier for interaction prediction. However, doc2vec approach generates the statistical representations of viral-host protein sequences by merely modelling the local context of residues which only partially captures residue semantics. The paper in hand proposes a novel technique for generating better statistical representations of viral and host protein sequences based on the infusion of comprehensive local and global contextual information of the residues. While local residue context aware encoding captures semantic relatedness and short range dependencies of residues. Global residue context aware encoding captures comprehensive long-range residues dependencies, positional invariance of residues, and unique residue combination distribution important for interaction prediction. Using concatenated rich statistical representations of viral and host protein sequences, a robust machine learning framework "LGCA-VHPPI" is developed which makes use of a deep forest model to effectively model complex non-linearity of viral-host PPI sequences. An in-depth performance comparison of the proposed LGCA-VHPPI framework with existing diverse sequence encoding schemes based viral-host PPI predictors reveals that LGCA-VHPPI outperforms state-of-the-art predictor by 6%, 2%, and 2% in terms of matthews correlation coefficient over 3 different benchmark viral-host PPI prediction datasets.


Assuntos
Conscientização , Benchmarking , Sequência de Aminoácidos , Mãos , Aprendizado de Máquina
18.
Int J Mol Sci ; 23(15)2022 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-35897818

RESUMO

Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.


Assuntos
MicroRNAs , RNA Circular , Processamento Alternativo , Humanos , MicroRNAs/genética , RNA/genética , RNA/metabolismo , RNA Circular/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
19.
Sensors (Basel) ; 22(11)2022 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-35684703

RESUMO

Deep neural networks are one of the most successful classifiers across different domains. However, their use is limited in safety-critical areas due to their limitations concerning interpretability. The research field of explainable artificial intelligence addresses this problem. However, most interpretability methods align to the imaging modality by design. The paper introduces TimeREISE, a model agnostic attribution method that shows success in the context of time series classification. The method applies perturbations to the input and considers different attribution map characteristics such as the granularity and density of an attribution map. The approach demonstrates superior performance compared to existing methods concerning different well-established measurements. TimeREISE shows impressive results in the deletion and insertion test, Infidelity, and Sensitivity. Concerning the continuity of an explanation, it showed superior performance while preserving the correctness of the attribution map. Additional sanity checks prove the correctness of the approach and its dependency on the model parameters. TimeREISE scales well with an increasing number of channels and timesteps. TimeREISE applies to any time series classification network and does not rely on prior data knowledge. TimeREISE is suited for any usecase independent of dataset characteristics such as sequence length, channel number, and number of classes.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Fatores de Tempo
20.
Comput Methods Programs Biomed ; 215: 106620, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35033756

RESUMO

BACKGROUND AND OBJECTIVES: One principal impediment in the successful deployment of Artificial Intelligence (AI) based Computer-Aided Diagnosis (CAD) systems in everyday clinical workflows is their lack of transparent decision-making. Although commonly used eXplainable AI (XAI) methods provide insights into these largely opaque algorithms, such explanations are usually convoluted and not readily comprehensible. The explanation of decisions regarding the malignancy of skin lesions from dermoscopic images demands particular clarity, as the underlying medical problem definition is ambiguous in itself. This work presents ExAID (Explainable AI for Dermatology), a novel XAI framework for biomedical image analysis that provides multi-modal concept-based explanations, consisting of easy-to-understand textual explanations and visual maps, to justify the predictions. METHODS: Our framework relies on Concept Activation Vectors to map human-understandable concepts to those learned by an arbitrary Deep Learning (DL) based algorithm, and Concept Localisation Maps to highlight those concepts in the input space. This identification of relevant concepts is then used to construct fine-grained textual explanations supplemented by concept-wise location information to provide comprehensive and coherent multi-modal explanations. All decision-related information is presented in a diagnostic interface for use in clinical routines. Moreover, the framework includes an educational mode providing dataset-level explanation statistics as well as tools for data and model exploration to aid medical research and education processes. RESULTS: Through rigorous quantitative and qualitative evaluation of our framework on a range of publicly available dermoscopic image datasets, we show the utility of multi-modal explanations for CAD-assisted scenarios even in case of wrong disease predictions. We demonstrate that concept detectors for the explanation of pre-trained networks reach accuracies of up to 81.46%, which is comparable to supervised networks trained end-to-end. CONCLUSIONS: We present a new end-to-end framework for the multi-modal explanation of DL-based biomedical image analysis in Melanoma classification and evaluate its utility on an array of datasets. Since perspicuous explanation is one of the cornerstones of any CAD system, we believe that ExAID will accelerate the transition from AI research to practice by providing dermatologists and researchers with an effective tool that they can both understand and trust. ExAID can also serve as the basis for similar applications in other biomedical fields.


Assuntos
Inteligência Artificial , Melanoma , Algoritmos , Computadores , Diagnóstico por Computador , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA