Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Curr Pharm Des ; 30(6): 468-476, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38323613

RESUMO

INTRODUCTION: Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS: Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS: In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION: Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.


Assuntos
Redes Neurais de Computação , Humanos , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química , Desenvolvimento de Medicamentos , Desenho de Fármacos , Proteínas/metabolismo , Proteínas/química
2.
Math Biosci Eng ; 21(1): 1590-1609, 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38303479

RESUMO

As a type of programmed cell death, anoikis resistance plays an essential role in tumor metastasis, allowing cancer cells to survive in the systemic circulation and as a key pathway for regulating critical biological processes. We conducted an exploratory analysis to improve risk stratification and optimize adjuvant treatment choices for patients with breast cancer, and identify multigene features in mRNA and lncRNA transcriptome profiles associated with anoikis. First, the variance selection method filters low information content genes in RNA sequence and then extracts the mRNA and lncRNA expression data base on annotation files. Then, the top ten key mRNAs are screened out through the PPI network. Pearson analysis has been employed to identify lncRNAs related to anoikis, and the prognosis-related lncRNAs are selected using Univariate Cox regression and machine learning. Finally, we identified a group of RNAs (including ten mRNAs and six lncRNAs) and integrated the expression data of 16 genes to construct a risk-scoring system for BRCA prognosis and drug sensitivity analysis. The risk score's validity has been evaluated with the ROC curve, Kaplan-Meier survival curve analysis and decision curve analysis (DCA). For the methylation data, we have obtained 169 anoikis-related prognostic methylation sites, integrated these sites with 16 RNA features and further used the deep learning model to evaluate and predict the survival risk of patients. The developed anoikis feature is demonstrated a consistency index (C-index) of 0.778, indicating its potential to predict the survival probability of breast cancer patients using deep learning methods.


Assuntos
Neoplasias da Mama , RNA Longo não Codificante , Humanos , Feminino , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Neoplasias da Mama/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Perfilação da Expressão Gênica , Metilação de DNA , Anoikis/genética , Regulação Neoplásica da Expressão Gênica
3.
Anal Biochem ; 687: 115460, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38191118

RESUMO

SUMOylation is a protein post-translational modification that plays an essential role in cellular functions. For predicting SUMO sites, numerous researchers have proposed advanced methods based on ordinary machine learning algorithms. These reported methods have shown excellent predictive performance, but there is room for improvement. In this study, we constructed a novel deep neural network Residual Pyramid Network (RsFPN), and developed an ensemble deep learning predictor called iSUMO-RsFPN. Initially, three feature extraction methods were employed to extract features from samples. Following this, weak classifiers were trained based on RsFPN for each feature type. Ultimately, the weak classifiers were integrated to construct the final classifier. Moreover, the predictor underwent systematically testing on an independent test dataset, where the results demonstrated a significant improvement over the existing state-of-the-art predictors. The code of iSUMO-RsFPN is free and available at https://github.com/454170054/iSUMO-RsFPN.


Assuntos
Lisina , Sumoilação , Redes Neurais de Computação , Aprendizado de Máquina , Algoritmos
4.
Comput Biol Med ; 169: 107812, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38091725

RESUMO

Unexpected side effects may accompany the research stage and post-marketing of drugs. These accidents lead to drug development failure and even endanger patients' health. Thus, it is essential to recognize the unknown drug-side effects. Most existing methods in silico find the answer from the association network or similarity network of drugs while ignoring the drug-intrinsic attributes. The limitation is that they can only handle drugs in the maturation stage. To be suitable for early drug-side effect screening, we conceive a multi-structural deep learning framework, MSDSE, which synthetically considers the multi-scale features derived from the drug. MSDSE can jointly learn SMILES sequence-based word embedding, substructure-based molecular fingerprint, and chemical structure-based graph embedding. In the preprocessing stage of MSDSE, we project all features to the abstract space with the same dimension. MSDSE builds a bi-level channel strategy, including a convolutional neural network module with an Inception structure and a multi-head Self-Attention module, to learn and integrate multi-modal features from local to global perspectives. Finally, MSDSE regards the prediction of drug-side effects as pair-wise learning and outputs the pair-wise probability of drug-side effects through the inner product operation. MSDSE is evaluated and analyzed on benchmark datasets and performs optimally compared to other baseline models. We also set up the ablation study to explain the rationality of the feature approach and model structure. Moreover, we select model partial prediction results for the case study to reveal actual capability. The original data are available at http://github.com/yuliyi/MSDSE.


Assuntos
Benchmarking , Desenvolvimento de Medicamentos , Humanos , Redes Neurais de Computação , Probabilidade
5.
Heliyon ; 10(1): e23187, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38148797

RESUMO

Protein S-nitrosylation is a reversible oxidative reduction post-translational modification that is widely present in the biological community. S-nitrosylation can regulate protein function and is closely associated with a variety of diseases, thus identifying S-nitrosylation sites are crucial for revealing the function of proteins and related drug discovery. Traditional experimental methods are time-consuming and expensive; therefore, it is necessary to explore more efficient computational methods. Deep learning algorithms perform well in the field of bioinformatics sites prediction, and many studies show that they outperform existing machine learning algorithms. In this work, we proposed a deep learning algorithm-based predictor SNO-DCA for distinguishing between S-nitrosylated and non-S-nitrosylated sequences. First, one-hot encoding of protein sequences was performed. Second, the dense convolutional blocks were used to capture feature information, and an attention module was added to weigh different features to improve the prediction ability of the model. The 10-fold cross-validation and independent testing experimental results show that our SNO-DCA model outperforms existing S-nitrosylation sites prediction models under imbalanced data. In this paper, a web server prediction website: https://sno.cangmang.xyz/SNO-DCA/was established to provide an online prediction service for users. SNO-DCA can be available at https://github.com/peanono/SNO-DCA.

6.
Front Immunol ; 14: 1267755, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38094296

RESUMO

N4-acetylcytidine (ac4C) is a modification of cytidine at the nitrogen-4 position, playing a significant role in the translation process of mRNA. However, the precise mechanism and details of how ac4C modifies translated mRNA remain unclear. Since identifying ac4C sites using conventional experimental methods is both labor-intensive and time-consuming, there is an urgent need for a method that can promptly recognize ac4C sites. In this paper, we propose a comprehensive ensemble learning model, the Stacking-based heterogeneous integrated ac4C model, engineered explicitly to identify ac4C sites. This innovative model integrates three distinct feature extraction methodologies: Kmer, electron-ion interaction pseudo-potential values (PseEIIP), and pseudo-K-tuple nucleotide composition (PseKNC). The model also incorporates the robust Cluster Centroids algorithm to enhance its performance in dealing with imbalanced data and alleviate underfitting issues. Our independent testing experiments indicate that our proposed model improves the Mcc by 15.61% and the ROC by 5.97% compared to existing models. To test our model's adaptability, we also utilized a balanced dataset assembled by the authors of iRNA-ac4C. Our model showed an increase in Sn of 4.1%, an increase in Acc of nearly 1%, and ROC improvement of 0.35% on this balanced dataset. The code for our model is freely accessible at https://github.com/louliliang/ST-ac4C.git, allowing users to quickly build their model without dealing with complicated mathematical equations.


Assuntos
Citidina , Nucleotídeos , RNA Mensageiro/genética , Citidina/genética , Algoritmos
7.
Math Biosci Eng ; 20(11): 19133-19151, 2023 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-38052593

RESUMO

Malignancies such as bladder urothelial carcinoma, colon adenocarcinoma, liver hepatocellular carcinoma, lung adenocarcinoma and prostate adenocarcinoma significantly impact men's well-being. Accurate cancer classification is vital in determining treatment strategies and improving patient prognosis. This study introduced an innovative method that utilizes gene selection from high-dimensional datasets to enhance the performance of the male tumor classification algorithm. The method assesses the reliability of DNA methylation data to distinguish the five most prevalent types of male cancers from normal tissues by employing DNA methylation 450K data obtained from The Cancer Genome Atlas (TCGA) database. First, the chi-square test is used for dimensionality reduction and second, L1 penalized logistic regression is used for feature selection. Furthermore, the stacking ensemble learning technique was employed to integrate seven common multiclassification models. Experimental results demonstrated that the ensemble learning model utilizing multiple classification models outperformed any base classification model. The proposed ensemble model achieved an astonishing overall accuracy (ACC) of 99.2% in independent testing data. Moreover, it may present novel ideas and pathways for the early detection and treatment of future diseases.


Assuntos
Adenocarcinoma , Carcinoma Hepatocelular , Carcinoma de Células de Transição , Neoplasias do Colo , Neoplasias Hepáticas , Neoplasias Pulmonares , Neoplasias da Bexiga Urinária , Humanos , Masculino , Metilação de DNA , Adenocarcinoma/genética , Carcinoma de Células de Transição/genética , Reprodutibilidade dos Testes , Neoplasias da Bexiga Urinária/genética , Neoplasias do Colo/genética , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/genética , Neoplasias Pulmonares/genética , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/genética
8.
Comput Biol Med ; 166: 107529, 2023 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-37748220

RESUMO

Accurate identification of inter-chain contacts in the protein complex is critical to determine the corresponding 3D structures and understand the biological functions. We proposed a new deep learning method, ICCPred, to deduce the inter-chain contacts from the amino acid sequences of the protein complex. This pipeline was built on the designed deep residual network architecture, integrating the pre-trained language model with three multiple sequence alignments (MSAs) from different biological views. Experimental results on 709 non-redundant benchmarking protein complexes showed that the proposed ICCPred significantly increased inter-chain contact prediction accuracy compared to the state-of-the-art approaches. Detailed data analyses showed that the significant advantage of ICCPred lies in the utilization of pre-trained transformer language models which can effectively extract the complementary co-evolution diversity from three MSAs. Meanwhile, the designed deep residual network enhances the correlation between the co-evolution diversity and the patterns of inter-chain contacts. These results demonstrated a new avenue for high-accuracy deep-learning inter-chain contact prediction that is applicable to large-scale protein-protein interaction annotations from sequence alone.

9.
Heliyon ; 9(4): e15096, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37095983

RESUMO

The mortality rate from cervical cancer (CESC), a malignant tumor that affects women, has increased significantly globally in recent years. The discovery of biomarkers points to a direction for the diagnosis of cervical cancer with the advancement of bioinformatics technology. The goal of this study was to look for potential biomarkers for the diagnosis and prognosis of CESC using the GEO and TCGA databases. Because of the high dimension and small sample size of the omic data, or the use of biomarkers generated from a single omic data, the diagnosis of cervical cancer may be inaccurate and unreliable. The purpose of this study was to search the GEO and TCGA databases for potential biomarkers for the diagnosis and prognosis of CESC. We begin by downloading CESC (GSE30760) DNA methylation data from GEO, then perform differential analysis on the downloaded methylation data and screen out the differential genes. Then, using estimation algorithms, we score immune cells and stromal cells in the tumor microenvironment and perform survival analysis on the gene expression profile data and the most recent clinical data of CESC from TCGA. Then, using the 'limma' package and Venn plot in R language to perform differential analysis of genes and screen out overlapping genes, these overlapping genes were then subjected to GO and KEGG functional enrichment analysis. The differential genes screened by the GEO methylation data and the differential genes screened by the TCGA gene expression data were intersected to screen out the common differential genes. A protein-protein interaction (PPI) network of gene expression data was then created in order to discover important genes. The PPI network's key genes were crossed with previously identified common differential genes to further validate them. The Kaplan-Meier curve was then used to determine the prognostic importance of the key genes. Survival analysis has shown that CD3E and CD80 are important for the identification of cervical cancer and can be considered as potential biomarkers for cervical cancer.

10.
Front Physiol ; 14: 1105891, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36998990

RESUMO

As one of the most common diseases in pediatric surgery, an inguinal hernia is usually diagnosed by medical experts based on clinical data collected from magnetic resonance imaging (MRI), computed tomography (CT), or B-ultrasound. The parameters of blood routine examination, such as white blood cell count and platelet count, are often used as diagnostic indicators of intestinal necrosis. Based on the medical numerical data on blood routine examination parameters and liver and kidney function parameters, this paper used machine learning algorithm to assist the diagnosis of intestinal necrosis in children with inguinal hernia before operation. In the work, we used clinical data consisting of 3,807 children with inguinal hernia symptoms and 170 children with intestinal necrosis and perforation caused by the disease. Three different models were constructed according to the blood routine examination and liver and kidney function. Some missing values were replaced by using the RIN-3M (median, mean, or mode region random interpolation) method according to the actual necessity, and the ensemble learning based on the voting principle was used to deal with the imbalanced datasets. The model trained after feature selection yielded satisfactory results with an accuracy of 86.43%, sensitivity of 84.34%, specificity of 96.89%, and AUC value of 0.91. Therefore, the proposed methods may be a potential idea for auxiliary diagnosis of inguinal hernia in children.

11.
Int J Mol Sci ; 24(5)2023 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-36901929

RESUMO

A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug-drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. Most existing methods in silico only judge whether two drugs interact, ignoring the importance of interaction events to study the mechanism implied in combination drugs. In this work, we propose a deep learning framework named MSEDDI that comprehensively considers multi-scale embedding representations of the drug for predicting drug-drug interaction events. In MSEDDI, we design three-channel networks to process biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding, respectively. Finally, we fuse three heterogeneous features from channel outputs through a self-attention mechanism and feed them to the linear layer predictor. In the experimental section, we evaluate the performance of all methods on two different prediction tasks on two datasets. The results show that MSEDDI outperforms other state-of-the-art baselines. Moreover, we also reveal the stable performance of our model in a broader sample set via case studies.


Assuntos
Bases de Conhecimento , Polimedicação , Humanos , Interações Medicamentosas
12.
Math Biosci Eng ; 20(2): 2815-2830, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36899559

RESUMO

As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.


Assuntos
Lisina , Proteínas , Animais , Humanos , Lisina/química , Lisina/genética , Lisina/metabolismo , Proteínas/química , Processamento de Proteína Pós-Traducional , Glutaril-CoA Desidrogenase/metabolismo , Biologia Computacional/métodos
13.
Int J Mol Sci ; 23(24)2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36555143

RESUMO

N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.


Assuntos
Aprendizado Profundo , Animais , Reprodutibilidade dos Testes , RNA/metabolismo , Adenosina/genética , Adenosina/metabolismo , Mamíferos/metabolismo , Biologia Computacional/métodos
14.
BMC Bioinformatics ; 23(1): 450, 2022 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-36316638

RESUMO

BACKGROUND: Lysine succinylation is a newly discovered protein post-translational modifications. Predicting succinylation sites helps investigate the metabolic disease treatments. However, the biological experimental approaches are costly and inefficient, it is necessary to develop efficient computational approaches. RESULTS: In this paper, we proposed a novel predictor based on ensemble dense blocks and an attention module, called as pSuc-EDBAM, which adopted one hot encoding to derive the feature maps of protein sequences, and generated the low-level feature maps through 1-D CNN. Afterward, the ensemble dense blocks were used to capture feature information at different levels in the process of feature learning. We also introduced an attention module to evaluate the importance degrees of different features. The experimental results show that Acc reaches 74.25%, and MCC reaches 0.2927 on the testing dataset, which suggest that the pSuc-EDBAM outperforms the existing predictors. CONCLUSIONS: The experimental results of ten-fold cross-validation on the training dataset and independent test on the testing dataset showed that pSuc-EDBAM outperforms the existing succinylation site predictors and can predict potential succinylation sites effectively. The pSuc-EDBAM is feasible and obtains the credible predictive results, which may also provide valuable references for other related research. To make the convenience of the experimental scientists, a user-friendly web server has been established ( http://bioinfo.wugenqiang.top/pSuc-EDBAM/ ), by which the desired results can be easily obtained.


Assuntos
Lisina , Ácido Succínico , Lisina/metabolismo , Ácido Succínico/metabolismo , Proteínas/metabolismo , Sequência de Aminoácidos , Processamento de Proteína Pós-Traducional , Atenção , Biologia Computacional/métodos
15.
Int J Mol Sci ; 23(19)2022 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-36232325

RESUMO

N6,2'-O-dimethyladenosine (m6Am) is a post-transcriptional modification that may be associated with regulatory roles in the control of cellular functions. Therefore, it is crucial to accurately identify transcriptome-wide m6Am sites to understand underlying m6Am-dependent mRNA regulation mechanisms and biological functions. Here, we used three sequence-based feature-encoding schemes, including one-hot, nucleotide chemical property (NCP), and nucleotide density (ND), to represent RNA sequence samples. Additionally, we proposed an ensemble deep learning framework, named DLm6Am, to identify m6Am sites. DLm6Am consists of three similar base classifiers, each of which contains a multi-head attention module, an embedding module with two parallel deep learning sub-modules, a convolutional neural network (CNN) and a Bi-directional long short-term memory (BiLSTM), and a prediction module. To demonstrate the superior performance of our model's architecture, we compared multiple model frameworks with our method by analyzing the training data and independent testing data. Additionally, we compared our model with the existing state-of-the-art computational methods, m6AmPred and MultiRM. The accuracy (ACC) for the DLm6Am model was improved by 6.45% and 8.42% compared to that of m6AmPred and MultiRM on independent testing data, respectively, while the area under receiver operating characteristic curve (AUROC) for the DLm6Am model was increased by 4.28% and 5.75%, respectively. All the results indicate that DLm6Am achieved the best prediction performance in terms of ACC, Matthews correlation coefficient (MCC), AUROC, and the area under precision and recall curves (AUPR). To further assess the generalization performance of our proposed model, we implemented chromosome-level leave-out cross-validation, and found that the obtained AUROC values were greater than 0.83, indicating that our proposed method is robust and can accurately predict m6Am sites.


Assuntos
Algoritmos , Aprendizado Profundo , Sequência de Bases , Nucleotídeos , RNA Mensageiro/genética
16.
Genomics ; 114(6): 110486, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36126833

RESUMO

DNA methylation is an important epigenetics, which occurs in the early stages of tumor formation. And it also is of great significance to find the relationship between DNA methylation and cancer. This paper proposes a novel model, iCancer-Pred, to identify cancer and classify its types further. The datasets of DNA methylation information of 7 cancer types have been collected from The Cancer Genome Atlas (TCGA). The coefficient of variation firstly is used to reduce the number of features, and then the elastic network is applied to select important features. Finally, a fully connected neural network is constructed with these selected features. In predicting seven types of cancers, iCancer-Pred has achieved an overall accuracy of over 97% accuracy with 5-fold cross-validation. For the convenience of the application, a user-friendly web server: http://bioinfo.jcu.edu.cn/cancer or http://121.36.221.79/cancer/ is available. And the source codes are freely available for download at https://github.com/Huerhu/iCancer-Pred.


Assuntos
Metilação de DNA , Neoplasias , Humanos , Epigenômica , Neoplasias/genética
17.
Front Genet ; 13: 926927, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35846148

RESUMO

The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma.

18.
Front Cell Dev Biol ; 10: 894874, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35686053

RESUMO

Being a new type of widespread protein post-translational modifications discovered in recent years, succinylation plays a key role in protein conformational regulation and cellular function regulation. Numerous studies have shown that succinylation modifications are closely associated with the development of many diseases. In order to gain insight into the mechanism of succinylation, it is vital to identify lysine succinylation sites. However, experimental identification of succinylation sites is time-consuming and laborious, and traditional identification tools are unable to meet the rapid growth of datasets. Therefore, to solve this problem, we developed a new predictor named pSuc-FFSEA, which can predict succinylation sites in protein sequences by feature fusion and stacking ensemble algorithm. Specifically, the sequence information and physicochemical properties were first extracted using EBGW, One-Hot, continuous bag-of-words, chaos game representation, and AAF_DWT. Following that, feature selection was performed, which applied LASSO to select the optimal subset of features for the classifier, and then, stacking ensemble classifier was designed using two-layer stacking ensemble, selecting three classifiers, SVM, broad learning system and LightGBM classifier, as the base classifiers of the first layer, using logistic regression classifier as the meta classifier of the second layer. In order to further improve the model prediction accuracy and reduce the computational effort, bayesian optimization algorithm and grid search algorithm were utilized to optimize the hyperparameters of the classifier. Finally, the results of rigorous 10-fold cross-validation indicated our predictor showed excellent robustness and performed better than the previous prediction tools, which achieved an average prediction accuracy of 0.7773 ± 0.0120. Besides, for the convenience of the most experimental scientists, a user-friendly and comprehensive web-server for pSuc-FFSEA has been established at https://bio.cangmang.xyz/pSuc-FFSEA, by which one can easily obtain the expected data and results without going through the complicated mathematics.

19.
Front Genet ; 13: 859188, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35754843

RESUMO

Drug-target interactions (DTIs) are regarded as an essential part of genomic drug discovery, and computational prediction of DTIs can accelerate to find the lead drug for the target, which can make up for the lack of time-consuming and expensive wet-lab techniques. Currently, many computational methods predict DTIs based on sequential composition or physicochemical properties of drug and target, but further efforts are needed to improve them. In this article, we proposed a new sequence-based method for accurately identifying DTIs. For target protein, we explore using pre-trained Bidirectional Encoder Representations from Transformers (BERT) to extract sequence features, which can provide unique and valuable pattern information. For drug molecules, Discrete Wavelet Transform (DWT) is employed to generate information from drug molecular fingerprints. Then we concatenate the feature vectors of the DTIs, and input them into a feature extraction module consisting of a batch-norm layer, rectified linear activation layer and linear layer, called BRL block and a Convolutional Neural Networks module to extract DTIs features further. Subsequently, a BRL block is used as the prediction engine. After optimizing the model based on contrastive loss and cross-entropy loss, it gave prediction accuracies of the target families of G Protein-coupled receptors, ion channels, enzymes, and nuclear receptors up to 90.1, 94.7, 94.9, and 89%, which indicated that the proposed method can outperform the existing predictors. To make it as convenient as possible for researchers, the web server for the new predictor is freely accessible at: https://bioinfo.jcu.edu.cn/dtibert or http://121.36.221.79/dtibert/. The proposed method may also be a potential option for other DITs.

20.
J Biomed Inform ; 131: 104098, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35636720

RESUMO

In drug development, unexpected side effects are the main reason for the failure of candidate drug trials. Discovering potential side effects of drugsin silicocan improve the success rate of drug screening. However, most previous works extracted and utilized an effective representation of drugs from a single perspective. These methods merely considered the topological information of drug in the biological entity network, or combined the association information (e.g. knowledge graph KG) between drug and other biomarkers, or only used the chemical structure or sequence information of drug. Consequently, to jointly learn drug features from both the macroscopic biological network and the microscopic drug molecules. We propose a hybrid embedding graph neural network model named idse-HE, which integrates graph embedding module and node embedding module. idse-HE can fuse the drug chemical structure information, the drug substructure sequence information and the drug network topology information. Our model deems the final representation of drugs and side effects as two implicit factors to reconstruct the original matrix and predicts the potential side effects of drugs. In the robustness experiment, idse-HE shows stable performance in all indicators. We reproduce the baselines under the same conditions, and the experimental results indicate that idse-HE is superior to other advanced methods. Finally, we also collect evidence to confirm several real drug side effect pairs in the predicted results, which were previously regarded as negative samples. More detailed information, scientific researchers can access the user-friendly web-server of idse-HE at http://bioinfo.jcu.edu.cn/idse-HE. In this server, users can obtain the original data and source code, and will be guided to reproduce the model results.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Redes Neurais de Computação , Desenvolvimento de Medicamentos , Humanos , Conhecimento , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA