Pesquisa | Biblioteca Virtual em Saúde

1.

Identification of RNAdependent liquidliquid phase separation proteins using an artificial intelligence strategy.

Ahmed, Zahoor; Shahzadi, Kiran; Jin, Yanting; Li, Rui; Momanyi, Biffon Manyura; Zulfiqar, Hasan; Ning, Lin; Lin, Hao.

Proteomics ; : e2400044, 2024 Jun 02.

Artigo em Francês | MEDLINE | ID: mdl-38824664

RESUMO

RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.

2.

DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach.

Lv, Hao; Dao, Fu-Ying; Zulfiqar, Hasan; Lin, Hao.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34184738

RESUMO

The rapid spread of SARS-CoV-2 infection around the globe has caused a massive health and socioeconomic crisis. Identification of phosphorylation sites is an important step for understanding the molecular mechanisms of SARS-CoV-2 infection and the changes within the host cells pathways. In this study, we present DeepIPs, a first specific deep-learning architecture to identify phosphorylation sites in host cells infected with SARS-CoV-2. DeepIPs consists of the most popular word embedding method and convolutional neural network-long short-term memory network architecture to make the final prediction. The independent test demonstrates that DeepIPs improves the prediction performance compared with other existing tools for general phosphorylation sites prediction. Based on the proposed model, a web-server called DeepIPs was established and is freely accessible at http://lin-group.cn/server/DeepIPs. The source code of DeepIPs is freely available at the repository https://github.com/linDing-group/DeepIPs.

Assuntos

Tratamento Farmacológico da COVID-19 , Fosforilação/genética , SARS-CoV-2/química , Software , COVID-19/genética , COVID-19/virologia , Biologia Computacional , Aprendizado Profundo , Humanos , Redes Neurais de Computação , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade

3.

A computational platform to identify origins of replication sites in eukaryotes.

Dao, Fu-Ying; Lv, Hao; Zulfiqar, Hasan; Yang, Hui; Su, Wei; Gao, Hui; Ding, Hui; Lin, Hao.

Brief Bioinform ; 22(2): 1940-1950, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-32065211

RESUMO

The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the predictor, seven eukaryotic (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Pichia pastoris, Schizosaccharomyces pombe and Kluyveromyces lactis) ORI data was collected from public database to construct benchmark datasets. Subsequently, three feature extraction strategies which are k-mer, binary encoding and combination of k-mer and binary were used to formulate DNA sequence samples. We also compared the different classification algorithms' performance. As a result, the best results were obtained by using support vector machine in 5-fold cross-validation test and independent dataset test. Based on the optimal model, an online web server called iORI-Euk (http://lin-group.cn/server/iORI-Euk/) was established for the novel ORI identification.

Assuntos

Origem de Replicação , Algoritmos , Animais , Linhagem Celular , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Eucariotos/genética , Humanos , Máquina de Vetores de Suporte

4.

A sequence-based deep learning approach to predict CTCF-mediated chromatin loop.

Lv, Hao; Dao, Fu-Ying; Zulfiqar, Hasan; Su, Wei; Ding, Hui; Liu, Li; Lin, Hao.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33634313

RESUMO

Three-dimensional (3D) architecture of the chromosomes is of crucial importance for transcription regulation and DNA replication. Various high-throughput chromosome conformation capture-based methods have revealed that CTCF-mediated chromatin loops are a major component of 3D architecture. However, CTCF-mediated chromatin loops are cell type specific, and most chromatin interaction capture techniques are time-consuming and labor-intensive, which restricts their usage on a very large number of cell types. Genomic sequence-based computational models are sophisticated enough to capture important features of chromatin architecture and help to identify chromatin loops. In this work, we develop Deep-loop, a convolutional neural network model, to integrate k-tuple nucleotide frequency component, nucleotide pair spectrum encoding, position conservation, position scoring function and natural vector features for the prediction of chromatin loops. By a series of examination based on cross-validation, Deep-loop shows excellent performance in the identification of the chromatin loops from different cell types. The source code of Deep-loop is freely available at the repository https://github.com/linDing-group/Deep-loop.

Assuntos

Fator de Ligação a CCCTC/genética , Cromatina/metabolismo , Genoma Humano , Redes Neurais de Computação , Fator de Ligação a CCCTC/metabolismo , Cromatina/ultraestrutura , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Humanos , Células K562 , Células MCF-7 , Conformação Molecular , Motivos de Nucleotídeos , Software

5.

Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design.

Lv, Hao; Shi, Lei; Berkenpas, Joshua William; Dao, Fu-Ying; Zulfiqar, Hasan; Ding, Hui; Zhang, Yang; Yang, Liming; Cao, Renzhi.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34410360

RESUMO

The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.

Assuntos

Tratamento Farmacológico da COVID-19 , Vacinas contra COVID-19/genética , Descoberta de Drogas , SARS-CoV-2/genética , Inteligência Artificial , COVID-19/genética , COVID-19/virologia , Vacinas contra COVID-19/química , Desenho de Fármacos , Humanos , Aprendizado de Máquina , Pandemias , SARS-CoV-2/química , SARS-CoV-2/patogenicidade

6.

Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli.

Zulfiqar, Hasan; Sun, Zi-Jie; Huang, Qin-Lai; Yuan, Shi-Shi; Lv, Hao; Dao, Fu-Ying; Lin, Hao; Li, Yan-Wen.

Methods ; 203: 558-563, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-34352373

RESUMO

N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.

Assuntos

DNA , Escherichia coli , DNA/genética , Escherichia coli/genética , Genoma , Genômica , Software

7.

Evaluation of radioactivity in soil and rock samples from an undiscovered sea beach in the southeastern coastline of Bangladesh and associated health risk.

Siraz, M M Mahfuz; Kamal, Md Hossain; Khan, Zulfiqar Hasan; Alam, M S; Al Mahmud, Jubair; Rashid, Md Bazlar; Khandaker, Mayeen Uddin; Osman, Hamid; Yeasmin, S.

Environ Monit Assess ; 195(9): 1028, 2023 Aug 10.

Artigo em Inglês | MEDLINE | ID: mdl-37558890

RESUMO

This study marks the first-ever assessment of radiological hazards linked to the sands and rocks of Patuartek Sea Beach, situated along one of the world's longest sea beaches in Cox' Bazar of Bangladesh. Through the utilization of an HPGe detector, a comprehensive analysis of the activity concentrations of 226Ra, 232Th, and 40 K was conducted, and their activity ranged from 7 to 23 Bq/kg, 9-58 Bq/kg, and 172-340 Bq/kg, respectively, in soils, and 19-24 Bq/kg, 27-39 Bq/kg, and 340-410 Bq/kg, respectively, in rocks. Some sand samples exhibited elevated levels of 232Th, while the rock samples displayed higher levels of 40 K compared to the global average. The radiological hazard parameters were assessed, and no values surpassed the recommended limits set by several international organizations. Hence, the sands and rocks of Patuartek sea beach pose no significant radiological risk to the residents or tourists. The findings of this study provide crucial insights for the development of a radiological baseline map in the country, which is important due to the commissioning of the country's first nuclear power plant Rooppur Nuclear Power Plant. The data may also stimulate interest in the rare-earth minerals present in the area, which is important for the electronics industry, thorium-based nuclear fuel cycles.

Assuntos

Monitoramento de Radiação , Radioatividade , Rádio (Elemento) , Poluentes Radioativos do Solo , Radioisótopos de Potássio/análise , Dióxido de Silício/análise , Solo , Areia , Bangladesh , Poluentes Radioativos do Solo/análise , Praias , Tório/análise , Rádio (Elemento)/análise

8.

A Statistical Analysis of the Sequence and Structure of Thermophilic and Non-Thermophilic Proteins.

Ahmed, Zahoor; Zulfiqar, Hasan; Tang, Lixia; Lin, Hao.

Int J Mol Sci ; 23(17)2022 Sep 04.

Artigo em Inglês | MEDLINE | ID: mdl-36077513

RESUMO

Thermophilic proteins have various practical applications in theoretical research and in industry. In recent years, the demand for thermophilic proteins on an industrial scale has been increasing; therefore, the engineering of thermophilic proteins has become a hot direction in the field of protein engineering. However, the exact mechanism of thermostability of proteins is not yet known, for engineering thermophilic proteins knowing the basis of thermostability is necessary. In order to understand the basis of the thermostability in proteins, we have made a statistical analysis of the sequences, secondary structures, hydrogen bonds, salt bridges, DHA (Donor-Hydrogen-Accepter) angles, and bond lengths of ten pairs of thermophilic proteins and their non-thermophilic orthologous. Our findings suggest that polar amino acids contribute to thermostability in proteins by forming hydrogen bonds and salt bridges which provide resistance against protein denaturation. Short bond length and a wider DHA angle provide greater bond stability in thermophilic proteins. Moreover, the increased frequency of aromatic amino acids in thermophilic proteins contributes to thermal stability by forming more aromatic interactions. Additionally, the coil, helix, and loop in the secondary structure also contribute to thermostability.

Assuntos

Aminoácidos , Proteínas , Aminoácidos/química , Ligação de Hidrogênio , Desnaturação Proteica , Engenharia de Proteínas , Proteínas/química , Temperatura

9.

Deep-4mCGP: A Deep Learning Approach to Predict 4mC Sites in Geobacter pickeringii by Using Correlation-Based Feature Selection Technique.

Zulfiqar, Hasan; Huang, Qin-Lai; Lv, Hao; Sun, Zi-Jie; Dao, Fu-Ying; Lin, Hao.

Int J Mol Sci ; 23(3)2022 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-35163174

RESUMO

4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model.

Assuntos

Biologia Computacional/métodos , Epigênese Genética/genética , Geobacter/genética , Algoritmos , Citosina/metabolismo , DNA/genética , Metilação de DNA/genética , Aprendizado Profundo , Aprendizado de Máquina , Mutação/genética , Redes Neurais de Computação , Software

10.

Promoter Prediction in Agrobacterium tumefaciens Strain C58 by Using Artificial Intelligence Strategies.

Zulfiqar, Hasan; Ahmad, Ramala Masood; Raza, Ali; Shahzad, Sana; Lin, Hao.

Methods Mol Biol ; 2844: 33-44, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39068330

RESUMO

Promoters are the genomic regions upstream of genes that RNA polymerase binds in order to initiate gene transcription. Understanding the regulation of gene expression depends on being able to identify promoters, because they are the most important component of gene expression. Agrobacterium tumefaciens (A. tumefaciens) strain C58 was the subject of this study with the goal of creating a machine learning-based model to predict promoters. In this study, nucleotide density (ND), k-mer, and one-hot were used to encode the promoter sequence. Support vector machine (SVM) on fivefold cross-validation with incremental feature selection (IFS) was used to optimize the generated features. These improved characteristics were then used to distinguish promoter sequences by feeding them into the random forest (RF) classifier. Tenfold cross-validation (CV) analysis revealed that the projected model has the ability to produce an accuracy of 84.22%.

Assuntos

Agrobacterium tumefaciens , Inteligência Artificial , Regiões Promotoras Genéticas , Máquina de Vetores de Suporte , Agrobacterium tumefaciens/genética , Biologia Computacional/métodos , Algoritmos

11.

A protein pre-trained model-based approach for the identification of the liquid-liquid phase separation (LLPS) proteins.

Ahmed, Zahoor; Shahzadi, Kiran; Temesgen, Sebu Aboma; Ahmad, Basharat; Chen, Xiang; Ning, Lin; Zulfiqar, Hasan; Lin, Hao; Jin, Yan-Ting.

Int J Biol Macromol ; 277(Pt 4): 134146, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39067723

RESUMO

Liquid-liquid phase separation (LLPS) regulates many biological processes including RNA metabolism, chromatin rearrangement, and signal transduction. Aberrant LLPS potentially leads to serious diseases. Therefore, the identification of the LLPS proteins is crucial. Traditionally, biochemistry-based methods for identifying LLPS proteins are costly, time-consuming, and laborious. In contrast, artificial intelligence-based approaches are fast and cost-effective and can be a better alternative to biochemistry-based methods. Previous research methods employed word2vec in conjunction with machine learning or deep learning algorithms. Although word2vec captures word semantics and relationships, it might not be effective in capturing features relevant to protein classification, like physicochemical properties, evolutionary relationships, or structural features. Additionally, other studies often focused on a limited set of features for model training, including planar π contact frequency, pi-pi, and ß-pairing propensities. To overcome such shortcomings, this study first constructed a reliable dataset containing 1206 protein sequences, including 603 LLPS and 603 non-LLPS protein sequences. Then a computational model was proposed to efficiently identify the LLPS proteins by perceiving semantic information of protein sequences directly; using an ESM2-36 pre-trained model based on transformer architecture in conjunction with a convolutional neural network. The model could achieve an accuracy of 85.68% and 89.67%, respectively on training data and test data, surpassing the accuracy of previous studies. The performance demonstrates the potential of our computational methods as efficient alternatives for identifying LLPS proteins.

Assuntos

Proteínas , Proteínas/química , Proteínas/isolamento & purificação , Algoritmos , Redes Neurais de Computação , Aprendizado de Máquina , Extração Líquido-Líquido/métodos , Biologia Computacional/métodos , Separação de Fases

12.

Computational identification of promoters in Klebsiella aerogenes by using support vector machine.

Lin, Yan; Sun, Meili; Zhang, Junjie; Li, Mingyan; Yang, Keli; Wu, Chengyan; Zulfiqar, Hasan; Lai, Hongyan.

Front Microbiol ; 14: 1200678, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37250059

RESUMO

Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes.

13.

Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique.

Zulfiqar, Hasan; Ahmed, Zahoor; Kissanga Grace-Mercure, Bakanina; Hassan, Farwa; Zhang, Zhao-Yue; Liu, Fen.

Front Microbiol ; 14: 1170785, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37125199

RESUMO

Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain.

14.

CFNCM: Collaborative filtering neighborhood-based model for predicting miRNA-disease associations.

Momanyi, Biffon Manyura; Zulfiqar, Hasan; Grace-Mercure, Bakanina Kissanga; Ahmed, Zahoor; Ding, Hui; Gao, Hui; Liu, Fen.

Comput Biol Med ; 163: 107165, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37315383

RESUMO

MicroRNAs have a significant role in the emergence of various human disorders. Consequently, it is essential to understand the existing interactions between miRNAs and diseases, as this will help scientists better study and comprehend the diseases' biological mechanisms. Findings can be employed as biomarkers or drug targets to advance the detection, diagnosis, and treatment of complex human disorders by foretelling possible disease-related miRNAs. This study proposed a computational model for predicting potential miRNA-disease associations called the Collaborative Filtering Neighborhood-based Classification Model (CFNCM), in light of the shortcomings of conventional and biological experiments, which are expensive and time-consuming. The model generated integrated miRNA and disease similarity matrices using the validated associations and miRNA and disease similarity information and used them as the input features for CFNCM. To produce class labels, we first determined the association scores for brand-new pairs using user-based collaborative filtering. With zero as the threshold, the associations with scores >0 were labelled 1, indicating a potential positive association, otherwise, it is marked as 0. Then, we developed classification models using various machine-learning algorithms. By comparison, we discovered that the support vector machine (SVM) produced the best AUC of 0.96 with 10-fold cross-validation through the GridSearchCV technique for identifying optimal parameter values. In addition, the models were evaluated and verified by analyzing the top 50 breast and lung neoplasms-related miRNAs, of which 46 and 47 associations were verified in two authoritative databases, dbDEMC and miR2Disease.

Assuntos

Doença , MicroRNAs , Máquina de Vetores de Suporte , Características da Vizinhança , MicroRNAs/genética , MicroRNAs/metabolismo , Simulação por Computador , Humanos , Doença/classificação , Algoritmos

15.

Exploring the Students' Perceived Effectiveness of Online Education during the COVID-19 Pandemic: Empirical Analysis Using Structural Equation Modeling (SEM).

Ali, Qamar; Abbas, Azhar; Raza, Ali; Khan, Muhammad Tariq Iqbal; Zulfiqar, Hasan; Iqbal, Muhammad Amjed; Nayak, Roshan K; Alotaibi, Bader Alhafi.

Behav Sci (Basel) ; 13(7)2023 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-37504025

RESUMO

The world faced COVID-19, which was a threat to public health and disturbed the educational system and economic stability. Educational institutes were closed for a longer period, and students faced difficulty to complete their syllabus. The government adopted a policy of "suspending classes without stopping learning" to continue education activities. However, student satisfaction with online education is a growing concern. Satisfaction of students is an important indicator of academic quality. Therefore, this study attempts to investigate the influencing factors behind learning satisfaction using information from 335 students from various institutes in Pakistan. This research examined the impact of computer and internet knowledge, instructor and course material, and Learning Management Systems (LMS) on learning satisfaction. The path coefficients were obtained via Partial Least Square-Structural Equation Modeling (PLS-SEM). The LMS is a tool that facilitates the learning process with the provision of all types of educational material. The path coefficient was more in the case of LMS (0.489), which indicates its positive and significant role to attain learning satisfaction. The instructor and course material ordered second (0.261), which shows that the quality of an instructor and course material also plays a positive role to attain learning satisfaction. The computer and internet are essential ingredients of online education, showing a significant and positive path coefficient (0.123), implying that computer and internet knowledge could enhance learning satisfaction. The universities should develop their LMS to implement online education with quality course materials. It is also vital that the instructor should be up to date with modern learning techniques while ensuring internet connectivity, especially in rural areas. The government should provide an internet connection to students at discounted rates.

16.

Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods.

Zulfiqar, Hasan; Guo, Zhiling; Grace-Mercure, Bakanina Kissanga; Zhang, Zhao-Yue; Gao, Hui; Lin, Hao; Wu, Yun.

Comput Struct Biotechnol J ; 21: 2253-2261, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37035551

RESUMO

Hormone binding proteins (HBPs) belong to the group of soluble carrier proteins. These proteins selectively and non-covalently interact with hormones and promote growth hormone signaling in human and other animals. The HBPs are useful in many medical and commercial fields. Thus, the identification of HBPs is very important because it can help to discover more details about hormone binding proteins. Meanwhile, the experimental methods are time-consuming and expensive for hormone binding proteins recognition. Computational prediction methods have played significant roles in the correct recognition of hormone binding proteins with the use of sequence information and ML algorithms. In this review, we compared and assessed the implementation of ML-based tools in recognition of HBPs in a unique way. We hope that this study will give enough awareness and knowledge for research on HBPs.

17.

Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings.

Zulfiqar, Hasan; Guo, Zhiling; Ahmad, Ramala Masood; Ahmed, Zahoor; Cai, Peiling; Chen, Xiang; Zhang, Yang; Lin, Hao; Shi, Zheng.

Front Med (Lausanne) ; 10: 1291352, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38298505

RESUMO

Snake venom contains many toxic proteins that can destroy the circulatory system or nervous system of prey. Studies have found that these snake venom proteins have the potential to treat cardiovascular and nervous system diseases. Therefore, the study of snake venom protein is conducive to the development of related drugs. The research technologies based on traditional biochemistry can accurately identify these proteins, but the experimental cost is high and the time is long. Artificial intelligence technology provides a new means and strategy for large-scale screening of snake venom proteins from the perspective of computing. In this paper, we developed a sequence-based computational method to recognize snake toxin proteins. Specially, we utilized three different feature descriptors, namely g-gap, natural vector and word 2 vector, to encode snake toxin protein sequences. The analysis of variance (ANOVA), gradient-boost decision tree algorithm (GBDT) combined with incremental feature selection (IFS) were used to optimize the features, and then the optimized features were input into the deep learning model for model training. The results show that our model can achieve a prediction performance with an accuracy of 82.00% in 10-fold cross-validation. The model is further verified on independent data, and the accuracy rate reaches to 81.14%, which demonstrated that our model has excellent prediction performance and robustness.

18.

iRNA-ac4C: A novel computational method for effectively detecting N4-acetylcytidine sites in human mRNA.

Su, Wei; Xie, Xue-Qin; Liu, Xiao-Wei; Gao, Dong; Ma, Cai-Yi; Zulfiqar, Hasan; Yang, Hui; Lin, Hao; Yu, Xiao-Long; Li, Yan-Wen.

Int J Biol Macromol ; 227: 1174-1181, 2023 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-36470433

RESUMO

RNA N4-acetylcytidine (ac4C) is the acetylation of cytidine at the nitrogen-4 position, which is a highly conserved RNA modification and involves a variety of biological processes. Hence, accurate identification of genome-wide ac4C sites is vital for understanding regulation mechanism of gene expression. In this work, a novel predictor, named iRNA-ac4C, was established to identify ac4C sites in human mRNA based on three feature extraction methods, including nucleotide composition, nucleotide chemical property, and accumulated nucleotide frequency. Subsequently, minimum-Redundancy-Maximum-Relevance combined with incremental feature selection strategies was utilized to select the optimal feature subset. According to the optimal feature subset, the best ac4C classification model was trained by gradient boosting decision tree with 10-fold cross-validation. The results of independent testing set indicated that our proposed method could produce encouraging generalization capabilities. For the convenience of other researchers, we established a user-friendly web server which is freely available at http://lin-group.cn/server/iRNA-ac4C/. We hope that the tool could provide guide for wet-experimental scholars.

Assuntos

Citidina , RNA , Humanos , RNA Mensageiro/metabolismo , Citidina/genética , Citidina/metabolismo , RNA/química , Nucleotídeos

19.

iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy.

Ahmed, Zahoor; Zulfiqar, Hasan; Khan, Abdullah Aman; Gul, Ijaz; Dao, Fu-Ying; Zhang, Zhao-Yue; Yu, Xiao-Long; Tang, Lixia.

Front Microbiol ; 13: 790063, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35273581

RESUMO

Thermophilic proteins have important application value in biotechnology and industrial processes. The correct identification of thermophilic proteins provides important information for the application of these proteins in engineering. The identification method of thermophilic proteins based on biochemistry is laborious, time-consuming, and high cost. Therefore, there is an urgent need for a fast and accurate method to identify thermophilic proteins. Considering this urgency, we constructed a reliable benchmark dataset containing 1,368 thermophilic and 1,443 non-thermophilic proteins. A multi-layer perceptron (MLP) model based on a multi-feature fusion strategy was proposed to discriminate thermophilic proteins from non-thermophilic proteins. On independent data set, the proposed model could achieve an accuracy of 96.26%, which demonstrates that the model has a good application prospect. In order to use the model conveniently, a user-friendly software package called iThermo was established and can be freely accessed at http://lin-group.cn/server/iThermo/index.html. The high accuracy of the model and the practicability of the developed software package indicate that this study can accelerate the discovery and engineering application of thermally stable proteins.

20.

Comprehensive Prediction of Lipocalin Proteins Using Artificial Intelligence Strategy.

Zulfiqar, Hasan; Ahmed, Zahoor; Ma, Cai-Yi; Khan, Rida Sarwar; Grace-Mercure, Bakanina Kissanga; Yu, Xiao-Long; Zhang, Zhao-Yue.

Front Biosci (Landmark Ed) ; 27(3): 84, 2022 03 05.

Artigo em Inglês | MEDLINE | ID: mdl-35345316

RESUMO

BACKGROUND: Lipocalin belongs to the calcyin family, and its sequence length is generally between 165 and 200 residues. They are mainly stable and multifunctional extracellular proteins. Lipocalin plays an important role in several stress responses and allergic inflammations. Because the accurate identification of lipocalins could provide significant evidences for the study of their function, it is necessary to develop a machine learning-based model to recognize lipocalin. METHODS: In this study, we constructed a prediction model to identify lipocalin. Their sequences were encoded by six types of features, namely amino acid composition (AAC), composition of k-spaced amino acid pairs (CKSAAP), pseudo amino acid composition (PseAAC), Geary correlation (GD), normalized Moreau-Broto autocorrelation (NMBroto) and composition/transition/distribution (CTD). Subsequently, these features were optimized by using feature selection techniques. A classifier based on random forest was trained according to the optimal features. RESULTS: The results of 10-fold cross-validation showed that our computational model would classify lipocalins with accuracy of 95.03% and area under the curve of 0.987. On the independent dataset, our computational model could produce the accuracy of 89.90% which was 4.17% higher than the existing model. CONCLUSIONS: In this work, we developed an advanced computational model to discriminate lipocalin proteins from non-lipocalin proteins. In the proposed model, protein sequences were encoded by six descriptors. Then, feature selection was performed to pick out the best features which could produce the maximum accuracy. On the basis of the best feature subset, the RF-based classifier can obtained the best prediction results.

Assuntos

Inteligência Artificial , Lipocalinas , Aminoácidos , Biologia Computacional , Lipocalinas/química , Aprendizado de Máquina , Proteínas/química

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA