Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

MCNN_MC: Computational Prediction of Mitochondrial Carriers and Investigation of Bongkrekic Acid Toxicity Using Protein Language Models and Convolutional Neural Networks.

Malik, Muhammad Shahid; Chang, Yan-Yun; Liu, Yu-Chen; Le, Van The; Ou, Yu-Yen.

J Chem Inf Model ; 2024 Aug 12.

Artigo em Inglês | MEDLINE | ID: mdl-39133248

RESUMO

Mitochondrial carriers (MCs) are essential proteins that transport metabolites across mitochondrial membranes and play a critical role in cellular metabolism. ADP/ATP (adenosine diphosphate/adenosine triphosphate) is one of the most important carriers as it contributes to cellular energy production and is susceptible to the powerful toxin bongkrekic acid. This toxin has claimed several lives; for example, a recent foodborne outbreak in Taipei, Taiwan, has caused four deaths and sickened 30 people. The issue of bongkrekic acid poisoning has been a long-standing problem in Indonesia, with reports as early as 1895 detailing numerous deaths from contaminated coconut fermented cakes. In bioinformatics, significant advances have been made in understanding biological processes through computational methods; however, no established computational method has been developed for identifying mitochondrial carriers. We propose a computational bioinformatics approach for predicting MCs from a broader class of secondary active transporters with a focus on the ADP/ATP carrier and its interaction with bongkrekic acid. The proposed model combines protein language models (PLMs) with multiwindow scanning convolutional neural networks (mCNNs). While PLM embeddings capture contextual information within proteins, mCNN scans multiple windows to identify potential binding sites and extract local features. Our results show 96.66% sensitivity, 95.76% specificity, 96.12% accuracy, 91.83% Matthews correlation coefficient (MCC), 94.63% F1-Score, and 98.55% area under the curve (AUC). The results demonstrate the effectiveness of the proposed approach in predicting MCs and elucidating their functions, particularly in the context of bongkrekic acid toxicity. This study presents a valuable approach for identifying novel mitochondrial complexes, characterizing their functional roles, and understanding mitochondrial toxicology mechanisms. Our findings, that utilize computational methods to improve our understanding of cellular processes and drug-target interactions, contribute to the development of therapeutic strategies for mitochondrial disorders, reducing the devastating effects of bongkrekic acid poisoning.

2.

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model.

Malik, Muhammad Shahid Iqbal; Younas, Muhammad Zeeshan; Jamjoom, Mona Mamdouh; Ignatov, Dmitry I.

PeerJ Comput Sci ; 10: e1859, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38435619

RESUMO

Identification of infrastructure and human damage assessment tweets is beneficial to disaster management organizations as well as victims during a disaster. Most of the prior works focused on the detection of informative/situational tweets, and infrastructure damage, only one focused on human damage. This study presents a novel approach for detecting damage assessment tweets involving infrastructure and human damages. We investigated the potential of the Bidirectional Encoder Representations from Transformer (BERT) model to learn universal contextualized representations targeting to demonstrate its effectiveness for binary and multi-class classification of disaster damage assessment tweets. The objective is to exploit a pre-trained BERT as a transfer learning mechanism after fine-tuning important hyper-parameters on the CrisisMMD dataset containing seven disasters. The effectiveness of fine-tuned BERT is compared with five benchmarks and nine comparable models by conducting exhaustive experiments. The findings show that the fine-tuned BERT outperformed all benchmarks and comparable models and achieved state-of-the-art performance by demonstrating up to 95.12% macro-f1-score, and 88% macro-f1-score for binary and multi-class classification. Specifically, the improvement in the classification of human damage is promising.

3.

ProtTrans and multi-window scanning convolutional neural networks for the prediction of protein-peptide interaction sites.

Le, Van-The; Zhan, Zi-Jun; Vu, Thi-Thu-Phuong; Malik, Muhammad-Shahid; Ou, Yu-Yen.

J Mol Graph Model ; 130: 108777, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38642500

RESUMO

This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids. The integrated model demonstrates remarkable performance, achieving an AUC of 0.856 and 0.823 on the PepBCL Set_1 and Set_2 datasets, respectively. Additionally, it attains a Precision of 0.564 in PepBCL Set 1 and 0.527 in PepBCL Set 2, surpassing the performance of previous methods. Beyond this, we explore the application of this model in cancer therapy, particularly in identifying peptide interactions for selective targeting of cancer cells, and other fields. The findings of this study contribute to bioinformatics, providing valuable insights for drug discovery and therapeutic development.

Assuntos

Biologia Computacional , Redes Neurais de Computação , Peptídeos , Proteínas , Peptídeos/química , Proteínas/química , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Ligação Proteica , Sítios de Ligação , Algoritmos , Bases de Dados de Proteínas

4.

VesiMCNN: Using pre-trained protein language models and multiple window scanning convolutional neural networks to identify vesicular transport proteins.

Le, Van The; Tseng, Yi-Hsuan; Liu, Yu-Chen; Malik, Muhammad Shahid; Ou, Yu-Yen.

Int J Biol Macromol ; 280(Pt 3): 136048, 2024 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-39332561

RESUMO

Vesicular transport is a critical cellular process responsible for the proper organization and functioning of eukaryotic cells. This mechanism relies on specialized vesicles that shuttle macromolecules, such as proteins, across the cellular landscape, a process pivotal to maintaining cellular homeostasis. Disruptions in vesicular transport have been linked to various disease mechanisms, including cancer and neurodegenerative disorders. In this study, we present vesiMCNN, a novel computational approach that integrates pre-trained protein language models with a multi-window scanning convolutional neural network architecture to accurately identify vesicular transport proteins. To the best of our knowledge, this is the first study to leverage the power of pre-trained language models in combination with the multi-window scanning technique for this task. Our method achieved a Matthews Correlation Coefficient (MCC) of 0.558 and an Area Under the Receiver Operating Characteristic (AUC-ROC) of 0.933, outperforming existing state-of-the-art approaches. Additionally, we have curated a comprehensive benchmark dataset for the study of vesicular transport proteins, which can facilitate further research in this field. The remarkable performance of our model, combined with the comprehensive dataset and novel deep learning model, marks a significant advancement in the field of vesicular transport protein research.

5.

DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models.

Le, Van-The; Malik, Muhammad-Shahid; Tseng, Yi-Hsuan; Lee, Yu-Cheng; Huang, Cheng-I; Ou, Yu-Yen.

Comput Biol Chem ; 110: 108055, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38555810

RESUMO

Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280. These PLM-derived features are then input into a mCNN architecture to learn conserved motifs important for classification. When evaluated on ion transporters, our best performing model utilizing ProtT5 achieved 90% sensitivity, 95.8% specificity, and 95.4% overall accuracy. For ion channels, we obtained 88.3% sensitivity, 95.7% specificity, and 95.2% overall accuracy using ESM-1b features. Our proposed DeepPLM_mCNN framework demonstrates significant improvements over previous methods on unseen test data. This study illustrates the potential of combining PLMs and deep learning for accurate computational identification of membrane proteins from sequence data alone. Our findings have important implications for membrane protein research and drug development targeting ion channels and transporters. The data and source codes in this study are publicly available at the following link: https://github.com/s1129108/DeepPLM_mCNN.

Assuntos

Canais Iônicos , Redes Neurais de Computação , Canais Iônicos/metabolismo , Canais Iônicos/química , Aprendizado Profundo , Transporte de Íons

6.

Rumour identification on Twitter as a function of novel textual and language-context features.

Ali, Ghulam; Malik, Muhammad Shahid Iqbal.

Multimed Tools Appl ; 82(5): 7017-7038, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35974894

RESUMO

Social microblogs are one of the popular platforms for information spreading. However, with several advantages, these platforms are being used for spreading rumours. At present, the majority of existing approaches identify rumours at the topic level instead of at the tweet/post level. Moreover, prior studies used the sentiment and linguistic features for rumours identification without considering discrete positive and negative emotions and effective part-of-speech features in content-based approaches. Similarly, the majority of prior studies used content-based approaches for feature generation, and recent context-based approaches were not explored. To cope with these challenges, a robust framework for rumour detection at the tweet level is designed in this paper. The model used word2vec embeddings and bidirectional encoder representations from transformers method (BERT) from context-based and discrete emotions, linguistic, and metadata characteristics from content-based approaches. According to our knowledge, we are the first ones who used these features for rumour identification at the tweet/post level. The framework is tested on four real-life twitter microblog datasets. The results show that the detection model is capable of detecting 97%, 86%, 85%, and 80% of rumours on four datasets respectively. In addition, the proposed framework outperformed the three latest state-of-the-art baselines. BERT model presented the best performance among context-based approaches, and linguistic features are best performing among content-based approaches as a stand-alone model. Moreover, the utilization of two-step feature selection further improves the detection model performance.

7.

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models.

Malik, Muhammad Shahid Iqbal; Imran, Tahir; Mona Mamdouh, Jamjoom.

PeerJ Comput Sci ; 9: e1248, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37346552

RESUMO

Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such as part-of-speech, LIWC, word uni-gram, Embeddings from Language Models (ELMo), FastText, word2vec, latent semantic analysis (LSA), and char tri-gram feature models. Moreover, fine-tuning of the BERT is also performed. Three oversampling methods are investigated to handle the imbalance status of the Qprop dataset. SMOTE Edited Nearest Neighbors (ENN) presented the best results. The fine-tuning of BERT revealed that the BERT-320 sequence length is the best model. As a standalone model, the char tri-gram presented superior performance as compared to other features. The robust performance is observed against the combination of char tri-gram + BERT and char tri-gram + word2vec and they outperformed the two state-of-the-art baselines. In contrast to prior approaches, the addition of feature selection further improves the performance and achieved more than 97.60% recall, f1-score, and AUC on the dev and test part of the dataset. The findings of the present study can be used to organize news articles for various public news websites.

8.

Identification of offensive language in Urdu using semantic and embedding models.

Hussain, Sajid; Malik, Muhammad Shahid Iqbal; Masood, Nayyer.

PeerJ Comput Sci ; 8: e1169, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37346307

RESUMO

Automatic identification of offensive/abusive language is very necessary to get rid of unwanted behavior. However, it is more challenging to generalize the solution due to the different grammatical structures and vocabulary of each language. Most of the prior work targeted western languages, however, one study targeted a low-resource language (Urdu). The prior study used basic linguistic features and a small dataset. This study designed a new dataset (collected from popular Pakistani Facebook pages) containing 7,500 posts for offensive language detection in Urdu. The proposed methodology used four types of feature engineering models: three are frequency-based and the fourth one is the embedding model. Frequency-based are either determined by the term frequency-inverse document frequency (TF-IDF) or bag-of-words or word n-gram feature vectors. The fourth is generated by the word2vec model, trained on the Urdu embeddings using a corpus of 196,226 Facebook posts. The experiments demonstrate that the stacking-based ensemble model with word2vec shows the best performance as a standalone model by achieving 88.27% accuracy. In addition, the wrapper-based feature selection method further improves performance. The hybrid combination of TF-IDF, bag-of-words, and word2vec feature models achieved 90% accuracy and 97% AUC. In addition, it outperformed the baseline with an improvement of 3.55% in accuracy, 3.68% in the recall, 3.60% in f1-measure, 3.67% in precision, and 2.71% in AUC. The findings of this research provide practical implications for commercial applications and future research.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA