Pesquisa | Portal Regional da BVS

1.

CAPTURE: Comprehensive anti-cancer peptide predictor with a unique amino acid sequence encoder.

Ghafoor, Hina; Asim, Muhammad Nabeel; Ibrahim, Muhammad Ali; Ahmed, Sheraz; Dengel, Andreas.

Comput Biol Med ; 176: 108538, 2024 May 03.

Artigo em Inglês | MEDLINE | ID: mdl-38759585

RESUMO

Anticancer peptides (ACPs) key properties including bioactivity, high efficacy, low toxicity, and lack of drug resistance make them ideal candidates for cancer therapies. To deeply explore the potential of ACPs and accelerate development of cancer therapies, although 53 Artificial Intelligence supported computational predictors have been developed for ACPs and non ACPs classification but only one predictor has been developed for ACPs functional types annotations. Moreover, these predictors extract amino acids distribution patterns to transform peptides sequences into statistical vectors that are further fed to classifiers for discriminating peptides sequences and annotating peptides functional classes. Overall, these predictors remain fail in extracting diverse types of amino acids distribution patterns from peptide sequences. The paper in hand presents a unique CARE encoder that transforms peptides sequences into statistical vectors by extracting 4 different types of distribution patterns including correlation, distribution, composition, and transition. Across public benchmark dataset, proposed encoder potential is explored under two different evaluation settings namely; intrinsic and extrinsic. Extrinsic evaluation indicates that 12 different machine learning classifiers achieve superior performance with the proposed encoder as compared to 55 existing encoders. Furthermore, an intrinsic evaluation reveals that, unlike existing encoders, the proposed encoder generates more discriminative clusters for ACPs and non-ACPs classes. Across 8 public benchmark ACPs and non-ACPs classification datasets, proposed encoder and Adaboost classifier based CAPTURE predictor outperforms existing predictors with an average accuracy, recall and MCC score of 1%, 4%, and 2% respectively. In generalizeability evaluation case study, across 7 benchmark anti-microbial peptides classification datasets, CAPTURE surpasses existing predictors by an average AU-ROC of 2%. CAPTURE predictive pipeline along with label powerset method outperforms state-of-the-art ACPs functional types predictor by 5%, 5%, 5%, 6%, and 3% in terms of average accuracy, subset accuracy, precision, recall, and F1 respectively. CAPTURE web application is available at https://sds_genetic_analysis.opendfki.de/CAPTURE.

2.

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns.

Abbasi, Ahtisham Fazeel; Asim, Muhammad Nabeel; Ahmed, Sheraz; Dengel, Andreas.

Sci Rep ; 14(1): 9466, 2024 04 24.

Artigo em Inglês | MEDLINE | ID: mdl-38658614

RESUMO

Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.

Assuntos

DNA Circular , Saccharomyces cerevisiae , DNA Circular/genética , Humanos , Saccharomyces cerevisiae/genética , Arabidopsis/genética , Biologia Computacional/métodos , Nucleotídeos/genética , Máquina de Vetores de Suporte

3.

NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences.

Datta, Sourajyoti; Nabeel Asim, Muhammad; Dengel, Andreas; Ahmed, Sheraz.

Brief Funct Genomics ; 23(2): 163-179, 2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-37248673

RESUMO

Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.

Assuntos

Biologia Computacional , Proteínas , Proteínas/metabolismo , Sequência de Aminoácidos , Aprendizado de Máquina , Tirosina

4.

Translating theory into practice: assessing the privacy implications of concept-based explanations for biomedical AI.

Lucieri, Adriano; Dengel, Andreas; Ahmed, Sheraz.

Front Bioinform ; 3: 1194993, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37484865

RESUMO

Artificial Intelligence (AI) has achieved remarkable success in image generation, image analysis, and language modeling, making data-driven techniques increasingly relevant in practical real-world applications, promising enhanced creativity and efficiency for human users. However, the deployment of AI in high-stakes domains such as infrastructure and healthcare still raises concerns regarding algorithm accountability and safety. The emerging field of explainable AI (XAI) has made significant strides in developing interfaces that enable humans to comprehend the decisions made by data-driven models. Among these approaches, concept-based explainability stands out due to its ability to align explanations with high-level concepts familiar to users. Nonetheless, early research in adversarial machine learning has unveiled that exposing model explanations can render victim models more susceptible to attacks. This is the first study to investigate and compare the impact of concept-based explanations on the privacy of Deep Learning based AI models in the context of biomedical image analysis. An extensive privacy benchmark is conducted on three different state-of-the-art model architectures (ResNet50, NFNet, ConvNeXt) trained on two biomedical (ISIC and EyePACS) and one synthetic dataset (SCDB). The success of membership inference attacks while exposing varying degrees of attribution-based and concept-based explanations is systematically compared. The findings indicate that, in theory, concept-based explanations can potentially increase the vulnerability of a private AI system by up to 16% compared to attributions in the baseline setting. However, it is demonstrated that, in more realistic attack scenarios, the threat posed by explanations is negligible in practice. Furthermore, actionable recommendations are provided to ensure the safe deployment of concept-based XAI systems. In addition, the impact of differential privacy (DP) on the quality of concept-based explanations is explored, revealing that while negatively influencing the explanation ability, DP can have an adverse effect on the models' privacy.

5.

Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances.

Moser, Brian B; Raue, Federico; Frolov, Stanislav; Palacio, Sebastian; Hees, Jorn; Dengel, Andreas.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 9862-9882, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37022895

RESUMO

With the advent of Deep Learning (DL), Super-Resolution (SR) has also become a thriving research area. However, despite promising results, the field still faces challenges that require further research, e.g., allowing flexible upsampling, more effective loss functions, and better evaluation metrics. We review the domain of SR in light of recent advances and examine state-of-the-art models such as diffusion (DDPM) and transformer-based SR models. We critically discuss contemporary strategies used in SR and identify promising yet unexplored research directions. We complement previous surveys by incorporating the latest developments in the field, such as uncertainty-driven losses, wavelet networks, neural architecture search, novel normalization methods, and the latest evaluation techniques. We also include several visualizations for the models and methods throughout each chapter to facilitate a global understanding of the trends in the field. This review ultimately aims at helping researchers to push the boundaries of DL applied to SR.

6.

DNA-MP: a generalized DNA modifications predictor for multiple species based on powerful sequence encoding method.

Nabeel Asim, Muhammad; Ali Ibrahim, Muhammad; Fazeel, Ahtisham; Dengel, Andreas; Ahmed, Sheraz.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36528802

RESUMO

Accurate prediction of deoxyribonucleic acid (DNA) modifications is essential to explore and discern the process of cell differentiation, gene expression and epigenetic regulation. Several computational approaches have been proposed for particular type-specific DNA modification prediction. Two recent generalized computational predictors are capable of detecting three different types of DNA modifications; however, type-specific and generalized modifications predictors produce limited performance across multiple species mainly due to the use of ineffective sequence encoding methods. The paper in hand presents a generalized computational approach "DNA-MP" that is competent to more precisely predict three different DNA modifications across multiple species. Proposed DNA-MP approach makes use of a powerful encoding method "position specific nucleotides occurrence based 117 on modification and non-modification class densities normalized difference" (POCD-ND) to generate the statistical representations of DNA sequences and a deep forest classifier for modifications prediction. POCD-ND encoder generates statistical representations by extracting position specific distributional information of nucleotides in the DNA sequences. We perform a comprehensive intrinsic and extrinsic evaluation of the proposed encoder and compare its performance with 32 most widely used encoding methods on $17$ benchmark DNA modifications prediction datasets of $12$ different species using $10$ different machine learning classifiers. Overall, with all classifiers, the proposed POCD-ND encoder outperforms existing $32$ different encoders. Furthermore, combinedly over 5-fold cross validation benchmark datasets and independent test sets, proposed DNA-MP predictor outperforms state-of-the-art type-specific and generalized modifications predictors by an average accuracy of 7% across 4mc datasets, 1.35% across 5hmc datasets and 10% for 6ma datasets. To facilitate the scientific community, the DNA-MP web application is available at https://sds_genetic_analysis.opendfki.de/DNA_Modifications/.

Assuntos

Epigênese Genética , Aprendizado de Máquina , Software , Nucleotídeos , DNA/genética

7.

MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses.

Asim, Muhammad Nabeel; Fazeel, Ahtisham; Ibrahim, Muhammad Ali; Dengel, Andreas; Ahmed, Sheraz.

Front Med (Lausanne) ; 9: 1025887, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36465911

RESUMO

Viral-host protein-protein interaction (VHPPI) prediction is essential to decoding molecular mechanisms of viral pathogens and host immunity processes that eventually help to control the propagation of viral diseases and to design optimized therapeutics. Multiple AI-based predictors have been developed to predict diverse VHPPIs across a wide range of viruses and hosts, however, these predictors produce better performance only for specific types of hosts and viruses. The prime objective of this research is to develop a robust meta predictor (MP-VHPPI) capable of more accurately predicting VHPPI across multiple hosts and viruses. The proposed meta predictor makes use of two well-known encoding methods Amphiphilic Pseudo-Amino Acid Composition (APAAC) and Quasi-sequence (QS) Order that capture amino acids sequence order and distributional information to most effectively generate the numerical representation of complete viral-host raw protein sequences. Feature agglomeration method is utilized to transform the original feature space into a more informative feature space. Random forest (RF) and Extra tree (ET) classifiers are trained on optimized feature space of both APAAC and QS order separate encoders and by combining both encodings. Further predictions of both classifiers are utilized to feed the Support Vector Machine (SVM) classifier that makes final predictions. The proposed meta predictor is evaluated over 7 different benchmark datasets, where it outperforms existing VHPPI predictors with an average performance of 3.07, 6.07, 2.95, and 2.85% in terms of accuracy, Mathews correlation coefficient, precision, and sensitivity, respectively. To facilitate the scientific community, the MP-VHPPI web server is available at https://sds_genetic_analysis.opendfki.de/MP-VHPPI/.

8.

ADH-PPI: An attention-based deep hybrid model for protein-protein interaction prediction.

Asim, Muhammad Nabeel; Ibrahim, Muhammad Ali; Malik, Muhammad Imran; Dengel, Andreas; Ahmed, Sheraz.

iScience ; 25(10): 105169, 2022 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-36267921

RESUMO

Protein-protein interaction (PPI) prediction is essential to understand the functions of proteins in various biological processes and their roles in the development, progression, and treatment of different diseases. To perform economical large-scale PPI analysis, several artificial intelligence-based approaches have been proposed. However, these approaches have limited predictive performance due to the use of in-effective statistical representation learning methods and predictors that lack the ability to extract comprehensive discriminative features. The paper in hand generates statistical representation of protein sequences by applying transfer learning in an unsupervised manner using FastText embedding generation approach. Furthermore, it presents "ADH-PPI" classifier which reaps the benefits of three different neural layers, long short-term memory, convolutional, and self-attention layers. Over two different species benchmark datasets, proposed ADH-PPI predictor outperforms existing approaches by an overall accuracy of 4%, and matthews correlation coefficient of 6%. In addition, it achieves an overall accuracy increment of 7% on four independent test sets. Availability: ADH-PPI web server is publicly available at https://sds_genetic_analysis.opendfki.de/PPI/.

9.

Holistic multi-class classification & grading of diabetic foot ulcerations from plantar thermal images using deep learning.

Muralidhara, Shishir; Lucieri, Adriano; Dengel, Andreas; Ahmed, Sheraz.

Health Inf Sci Syst ; 10(1): 21, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-36039095

RESUMO

Purpose: Diabetic foot is a common complication associated with diabetes mellitus (DM) leading to ulcerations in the feet. Due to diabetic neuropathy, most patients have reduced sensitivity to pain. As a result, minor injuries go unnoticed and progress into ulcers. The timely detection of potential ulceration points and intervention is crucial in preventing amputation. Changes in plantar temperature are one of the early signs of ulceration. Previous studies have focused on either binary classification or grading of DM severity, but neglect the holistic consideration of the problem. Moreover, multi-class studies exhibit severe performance variations between different classes. Methods: We propose a new convolutional neural network for discrimination between non-DM and five DM severity grades from plantar thermal images and compare its performance against pre-trained networks such as AlexNet and related works. We address the lack of data and imbalanced class distribution, prevalent in prior work, achieving well-balanced classification performance. Results: Our proposed model achieved the best performance with a mean accuracy of 0.9827, mean sensitivity of 0.9684 and mean specificity of 0.9892 in combined diabetic foot detection and grading. Conclusion: To the best of our knowledge, this study sets a new state-of-the-art in plantar foot thermogram detection and grading, while being the first to implement a holistic multi-class classification and grading solution. Reliable automatic thermogram grading is a first step towards the development of smart health devices for DM patients.

10.

BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.

Asim, Muhammad Nabeel; Ibrahim, Muhammad Ali; Zehe, Christoph; Trygg, Johan; Dengel, Andreas; Ahmed, Sheraz.

Interdiscip Sci ; 14(4): 841-862, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-35947255

RESUMO

BACKGROUND AND OBJECTIVE: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. METHOD: The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach "Bot-Net" which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA-miRNA interaction prediction. RESULTS: BoT-Net outperforms the state-of-the-art lncRNA-miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA-protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. CONCLUSION: In the benchmark lncRNA-miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA-protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA-miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. AVAILABILITY: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/.

Assuntos

MicroRNAs , RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Biologia Computacional , Redes Neurais de Computação , Regulação da Expressão Gênica

11.

EL-RMLocNet: An explainable LSTM network for RNA-associated multi-compartment localization prediction.

Asim, Muhammad Nabeel; Ibrahim, Muhammad Ali; Malik, Muhammad Imran; Zehe, Christoph; Cloarec, Olivier; Trygg, Johan; Dengel, Andreas; Ahmed, Sheraz.

Comput Struct Biotechnol J ; 20: 3986-4002, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35983235

RESUMO

Subcellular localization of Ribonucleic Acid (RNA) molecules provide significant insights into the functionality of RNAs and helps to explore their association with various diseases. Predominantly developed single-compartment localization predictors (SCLPs) lack to demystify RNA association with diverse biochemical and pathological processes mainly happen through RNA co-localization in multiple compartments. Limited multi-compartment localization predictors (MCLPs) manage to produce decent performance only for target RNA class of particular sub-type. Further, existing computational approaches have limited practical significance and potential to optimize therapeutics due to the poor degree of model explainability. The paper in hand presents an explainable Long Short-Term Memory (LSTM) network "EL-RMLocNet", predictive performance and interpretability of which are optimized using a novel GeneticSeq2Vec statistical representation learning scheme and attention mechanism for accurate multi-compartment localization prediction of different RNAs solely using raw RNA sequences. GeneticSeq2Vec generates optimized statistical vectors of raw RNA sequences by capturing short and long range relations of nucleotide k-mers. Using sequence vectors generated by GeneticSeq2Vec scheme, Long Short Term Memory layers extract most informative features, weighting of which on the basis of discriminative potential for accurate multi-compartment localization prediction is performed using attention layer. Through reverse engineering, weights of statistical feature space are mapped to nucleotide k-mers patterns to make multi-compartment localization prediction decision making transparent and explainable for different RNA classes and species. Empirical evaluation indicates that EL-RMLocNet outperforms state-of-the-art predictor for subcellular localization prediction of 4 different RNA classes by an average accuracy figure of 8% for Homo Sapiens species and 6% for Mus Musculus species. EL-RMLocNet is freely available as a web server at (https://sds_genetic_analysis.opendfki.de/subcellular_loc/).

12.

Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction.

Asim, Muhammad Nabeel; Ibrahim, Muhammad Ali; Imran Malik, Muhammad; Dengel, Andreas; Ahmed, Sheraz.

Int J Mol Sci ; 23(15)2022 Jul 26.

Artigo em Inglês | MEDLINE | ID: mdl-35897818

RESUMO

Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.

Assuntos

MicroRNAs , RNA Circular , Processamento Alternativo , Humanos , MicroRNAs/genética , RNA/genética , RNA/metabolismo , RNA Circular/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo

13.

LGCA-VHPPI: A local-global residue context aware viral-host protein-protein interaction predictor.

Asim, Muhammad Nabeel; Ibrahim, Muhammad Ali; Malik, Muhammad Imran; Dengel, Andreas; Ahmed, Sheraz.

PLoS One ; 17(7): e0270275, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35789333

RESUMO

Viral-host protein protein interaction (PPI) analysis is essential to decode the molecular mechanism of viral pathogen and host immunity processes which eventually help to control viral diseases and optimize therapeutics. The state-of-the-art viral-host PPI predictor leverages unsupervised embedding learning technique (doc2vec) to generate statistical representations of viral-host protein sequences and a Random Forest classifier for interaction prediction. However, doc2vec approach generates the statistical representations of viral-host protein sequences by merely modelling the local context of residues which only partially captures residue semantics. The paper in hand proposes a novel technique for generating better statistical representations of viral and host protein sequences based on the infusion of comprehensive local and global contextual information of the residues. While local residue context aware encoding captures semantic relatedness and short range dependencies of residues. Global residue context aware encoding captures comprehensive long-range residues dependencies, positional invariance of residues, and unique residue combination distribution important for interaction prediction. Using concatenated rich statistical representations of viral and host protein sequences, a robust machine learning framework "LGCA-VHPPI" is developed which makes use of a deep forest model to effectively model complex non-linearity of viral-host PPI sequences. An in-depth performance comparison of the proposed LGCA-VHPPI framework with existing diverse sequence encoding schemes based viral-host PPI predictors reveals that LGCA-VHPPI outperforms state-of-the-art predictor by 6%, 2%, and 2% in terms of matthews correlation coefficient over 3 different benchmark viral-host PPI prediction datasets.

Assuntos

Conscientização , Benchmarking , Sequência de Aminoácidos , Mãos , Aprendizado de Máquina

14.

TimeREISE: Time Series Randomized Evolving Input Sample Explanation.

Mercier, Dominique; Dengel, Andreas; Ahmed, Sheraz.

Sensors (Basel) ; 22(11)2022 May 27.

Artigo em Inglês | MEDLINE | ID: mdl-35684703

RESUMO

Deep neural networks are one of the most successful classifiers across different domains. However, their use is limited in safety-critical areas due to their limitations concerning interpretability. The research field of explainable artificial intelligence addresses this problem. However, most interpretability methods align to the imaging modality by design. The paper introduces TimeREISE, a model agnostic attribution method that shows success in the context of time series classification. The method applies perturbations to the input and considers different attribution map characteristics such as the granularity and density of an attribution map. The approach demonstrates superior performance compared to existing methods concerning different well-established measurements. TimeREISE shows impressive results in the deletion and insertion test, Infidelity, and Sensitivity. Concerning the continuity of an explanation, it showed superior performance while preserving the correctness of the attribution map. Additional sanity checks prove the correctness of the approach and its dependency on the model parameters. TimeREISE scales well with an increasing number of channels and timesteps. TimeREISE applies to any time series classification network and does not rely on prior data knowledge. TimeREISE is suited for any usecase independent of dataset characteristics such as sequence length, channel number, and number of classes.

Assuntos

Inteligência Artificial , Redes Neurais de Computação , Fatores de Tempo

15.

Bacterial prediction using internet of things (IoT) and machine learning.

Khurshid, Hamza; Mumtaz, Rafia; Alvi, Noor; Haque, Ayesha; Mumtaz, Sadaf; Shafait, Faisal; Ahmed, Sheraz; Malik, Muhammad Imran; Dengel, Andreas.

Environ Monit Assess ; 194(2): 133, 2022 Jan 28.

Artigo em Inglês | MEDLINE | ID: mdl-35089424

RESUMO

Water is a basic and primary resource which is required for sustenance of life on the Earth. The importance of water quality is increasing with the ascending water pollution owing to industrialization and depletion of fresh water sources. The countries having low control on reducing water pollution are likely to retain poor public health. Additionally, the methods being used in most developing countries are not effective and are based more on human intervention than on technological and automated solutions. Typically, most of the water samples and related data are monitored and tested in laboratories, which eventually consumes time and effort at the expense of producing fewer reliable results. In view of the above, there is an imperative need to devise a proper and systematic system to regularly monitor and manage the quality of water resources to arrest the related issues. Towards such ends, Internet of Things (IoT) is a great alternative to such traditional approaches which are complex and ineffective and it allows taking remote measurements in real-time with minimal human involvement. The proposed system consists of various water quality measuring nodes encompassing various sensors including dissolved oxygen, turbidity, pH level, water temperature, and total dissolved solids. These sensors nodes deployed at various sites of the study area transmit data to the server for processing and analysis using GSM modules. The data collected over months is used for water quality classification using water quality indices and for bacterial prediction by employing machine learning algorithms. For data visualization, a Web portal is developed which consists of a dashboard of Web services to display the heat maps and other related info-graphics. The real-time water quality data is collected using IoT nodes and the historic data is acquired from the Rawal Lake Filtration Plant. Several machine learning algorithms including neural networks (NN), convolutional neural networks (CNN), ridge regression (RR), support vector machines (SVM), decision tree regression (DTR), Bayesian regression (BR), and an ensemble of all models are trained for fecal coliform bacterial prediction, where SVM and Bayesian regression models have shown the optimal performance with mean squared error (MSE) of 0.35575 and 0.39566 respectively. The proposed system provides an alternative and more convenient solution for bacterial prediction, which otherwise is done manually in labs and is an expensive and time-consuming approach. In addition to this, it offers several other advantages including remote monitoring, ease of scalability, real-time status of water quality, and a portable hardware.

Assuntos

Internet das Coisas , Teorema de Bayes , Monitoramento Ambiental , Humanos , Aprendizado de Máquina , Qualidade da Água

16.

ExAID: A multimodal explanation framework for computer-aided diagnosis of skin lesions.

Lucieri, Adriano; Bajwa, Muhammad Naseer; Braun, Stephan Alexander; Malik, Muhammad Imran; Dengel, Andreas; Ahmed, Sheraz.

Comput Methods Programs Biomed ; 215: 106620, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-35033756

RESUMO

BACKGROUND AND OBJECTIVES: One principal impediment in the successful deployment of Artificial Intelligence (AI) based Computer-Aided Diagnosis (CAD) systems in everyday clinical workflows is their lack of transparent decision-making. Although commonly used eXplainable AI (XAI) methods provide insights into these largely opaque algorithms, such explanations are usually convoluted and not readily comprehensible. The explanation of decisions regarding the malignancy of skin lesions from dermoscopic images demands particular clarity, as the underlying medical problem definition is ambiguous in itself. This work presents ExAID (Explainable AI for Dermatology), a novel XAI framework for biomedical image analysis that provides multi-modal concept-based explanations, consisting of easy-to-understand textual explanations and visual maps, to justify the predictions. METHODS: Our framework relies on Concept Activation Vectors to map human-understandable concepts to those learned by an arbitrary Deep Learning (DL) based algorithm, and Concept Localisation Maps to highlight those concepts in the input space. This identification of relevant concepts is then used to construct fine-grained textual explanations supplemented by concept-wise location information to provide comprehensive and coherent multi-modal explanations. All decision-related information is presented in a diagnostic interface for use in clinical routines. Moreover, the framework includes an educational mode providing dataset-level explanation statistics as well as tools for data and model exploration to aid medical research and education processes. RESULTS: Through rigorous quantitative and qualitative evaluation of our framework on a range of publicly available dermoscopic image datasets, we show the utility of multi-modal explanations for CAD-assisted scenarios even in case of wrong disease predictions. We demonstrate that concept detectors for the explanation of pre-trained networks reach accuracies of up to 81.46%, which is comparable to supervised networks trained end-to-end. CONCLUSIONS: We present a new end-to-end framework for the multi-modal explanation of DL-based biomedical image analysis in Melanoma classification and evaluate its utility on an array of datasets. Since perspicuous explanation is one of the cornerstones of any CAD system, we believe that ExAID will accelerate the transition from AI research to practice by providing dermatologists and researchers with an effective tool that they can both understand and trust. ExAID can also serve as the basis for similar applications in other biomedical fields.

Assuntos

Inteligência Artificial , Melanoma , Algoritmos , Computadores , Diagnóstico por Computador , Humanos

17.

TSInsight: A Local-Global Attribution Framework for Interpretability in Time Series Data.

Siddiqui, Shoaib Ahmed; Mercier, Dominique; Dengel, Andreas; Ahmed, Sheraz.

Sensors (Basel) ; 21(21)2021 Nov 05.

Artigo em Inglês | MEDLINE | ID: mdl-34770678

RESUMO

With the rise in the employment of deep learning methods in safety-critical scenarios, interpretability is more essential than ever before. Although many different directions regarding interpretability have been explored for visual modalities, time series data has been neglected, with only a handful of methods tested due to their poor intelligibility. We approach the problem of interpretability in a novel way by proposing TSInsight, where we attach an auto-encoder to the classifier with a sparsity-inducing norm on its output and fine-tune it based on the gradients from the classifier and a reconstruction penalty. TSInsight learns to preserve features that are important for prediction by the classifier and suppresses those that are irrelevant, i.e., serves as a feature attribution method to boost the interpretability. In contrast to most other attribution frameworks, TSInsight is capable of generating both instance-based and model-based explanations. We evaluated TSInsight along with nine other commonly used attribution methods on eight different time series datasets to validate its efficacy. The evaluation results show that TSInsight naturally achieves output space contraction; therefore, it is an effective tool for the interpretability of deep time series models.

18.

Adversarial text-to-image synthesis: A review.

Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, Jörn; Dengel, Andreas.

Neural Netw ; 144: 187-209, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34500257

RESUMO

With the advent of generative adversarial networks, synthesizing images from text descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable evaluation metrics that correlate with human judgement. In this review, we contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision. We critically examine current strategies to evaluate text-to-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training. This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.

Assuntos

Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Semântica

19.

DisCaaS: Micro Behavior Analysis on Discussion by Camera as a Sensor.

Watanabe, Ko; Soneda, Yusuke; Matsuda, Yuki; Nakamura, Yugo; Arakawa, Yutaka; Dengel, Andreas; Ishimaru, Shoya.

Sensors (Basel) ; 21(17)2021 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-34502609

RESUMO

The emergence of various types of commercial cameras (compact, high resolution, high angle of view, high speed, and high dynamic range, etc.) has contributed significantly to the understanding of human activities. By taking advantage of the characteristic of a high angle of view, this paper demonstrates a system that recognizes micro-behaviors and a small group discussion with a single 360 degree camera towards quantified meeting analysis. We propose a method that recognizes speaking and nodding, which have often been overlooked in existing research, from a video stream of face images and a random forest classifier. The proposed approach was evaluated on our three datasets. In order to create the first and the second datasets, we asked participants to meet physically: 16 sets of five minutes data from 21 unique participants and seven sets of 10 min meeting data from 12 unique participants. The experimental results showed that our approach could detect speaking and nodding with a macro average f1-score of 67.9% in a 10-fold random split cross-validation and a macro average f1-score of 62.5% in a leave-one-participant-out cross-validation. By considering the increased demand for an online meeting due to the COVID-19 pandemic, we also record faces on a screen that are captured by web cameras as the third dataset and discussed the potential and challenges of applying our ideas to virtual video conferences.

Assuntos

Atividades Humanas , Fotografação , COVID-19 , Humanos , Pandemias

20.

iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing.

Tekleyohannes, Menbere Kina; Rybalkin, Vladimir; Ghaffar, Muhammad Mohsin; Varela, Javier Alejandro; Wehn, Norbert; Dengel, Andreas.

J Imaging ; 7(9)2021 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-34564101

RESUMO

In recent years, there has been an increasing demand to digitize and electronically access historical records. Optical character recognition (OCR) is typically applied to scanned historical archives to transcribe them from document images into machine-readable texts. Many libraries offer special stationary equipment for scanning historical documents. However, to digitize these records without removing them from where they are archived, portable devices that combine scanning and OCR capabilities are required. An existing end-to-end OCR software called anyOCR achieves high recognition accuracy for historical documents. However, it is unsuitable for portable devices, as it exhibits high computational complexity resulting in long runtime and high power consumption. Therefore, we have designed and implemented a configurable hardware-software programmable SoC called iDocChip that makes use of anyOCR techniques to achieve high accuracy. As a low-power and energy-efficient system with real-time capabilities, the iDocChip delivers the required portability. In this paper, we present the hybrid CPU-FPGA architecture of iDocChip along with the optimized software implementations of the anyOCR. We demonstrate our results on multiple platforms with respect to runtime and power consumption. The iDocChip system outperforms the existing anyOCR by 44× while achieving 2201× higher energy efficiency and a 3.8% increase in recognition accuracy.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA