RESUMO
Understanding protein corona composition is essential for evaluating their potential applications in biomedicine. Relative protein abundance (RPA), accounting for the total proteins in the corona, is an important parameter for describing the protein corona. For the first time, we comprehensively predicted the RPA of multiple proteins on the protein corona. First, we used multiple machine learning algorithms to predict whether a protein adsorbs to a nanoparticle, which is dichotomous prediction. Then, we selected the top 3 performing machine learning algorithms in dichotomous prediction to predict the specific value of RPA, which is regression prediction. Meanwhile, we analyzed the advantages and disadvantages of different machine learning algorithms for RPA prediction through interpretable analysis. Finally, we mined important features about the RPA prediction, which provided effective suggestions for the preliminary design of protein corona. The service for the prediction of RPA is available at http://www.bioai-lab.com/PC_ML.
RESUMO
Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial.This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively. In addition, we provide a user-friendly Web server (http://www.bioailab. com/Hyb_SEnc). The source code is freely available at https://github.com/fxh1001/Hyb_SEnc.
RESUMO
DNA N6-methyladenine (6mA) modifications play a pivotal role in the regulation of growth, development, and diseases in organisms. As a significant epigenetic marker, 6mA modifications extensively participate in the intricate regulatory networks of the genome. Hence, gaining a profound understanding of how 6mA is intricately involved in these biological processes is imperative for deciphering the gene regulatory networks within organisms. In this study, we propose PSAC-6mA (Position-self-attention Capsule-6mA), a sequence-location-based self-attention capsule network. The positional layer in the model enables positional relationship extraction and independent parameter setting for each base position, avoiding parameter sharing inherent in convolutional approaches. Simultaneously, the self-attention capsule network enhances dimensionality, capturing correlation information between capsules and achieving exceptional results in feature extraction across multiple spatial dimensions within the model. Experimental results demonstrate the superior performance of PSAC-6mA in recognizing 6mA motifs across various species.
Assuntos
Adenina , Metilação de DNA , DNA/genética , Genoma , Redes Reguladoras de GenesRESUMO
Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.
Assuntos
Benchmarking , Inibidores da Bomba de Prótons , Redes Neurais de Computação , SoftwareRESUMO
BACKGROUND: Breast cancer is the most prevalent malignancy in women. Advanced breast cancer can develop distant metastases, posing a severe threat to the life of patients. Because the clinical warning signs of distant metastasis are manifested in the late stage of the disease, there is a need for better methods of predicting metastasis. METHODS: First, we screened breast cancer distant metastasis target genes by performing difference analysis and weighted gene co-expression network analysis (WGCNA) on the selected datasets, and performed analyses such as GO enrichment analysis on these target genes. Secondly, we screened breast cancer distant metastasis target genes by LASSO regression analysis and performed correlation analysis and other analyses on these biomarkers. Finally, we constructed several breast cancer distant metastasis prediction models based on Logistic Regression (LR) model, Random Forest (RF) model, Support Vector Machine (SVM) model, Gradient Boosting Decision Tree (GBDT) model and eXtreme Gradient Boosting (XGBoost) model, and selected the optimal model from them. RESULTS: Several 21-gene breast cancer distant metastasis prediction models were constructed, with the best performance of the model constructed based on the random forest model. This model accurately predicted the emergence of distant metastases from breast cancer, with an accuracy of 93.6 %, an F1-score of 88.9 % and an AUC value of 91.3 % on the validation set. CONCLUSION: Our findings have the potential to be translated into a point-of-care prognostic analysis to reduce breast cancer mortality.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Mama , Perfilação da Expressão Gênica , Modelos Logísticos , Aprendizado de MáquinaRESUMO
Culex quinquefasciatus, one of the most significant mosquito vectors in the world, is widespread in most parts of southern China. A variety of diseases including Bancroft's filariasis, West Nile disease, and St. Louis encephalitis could be transmitted by the vector. Mosquitoes have been shown to host diverse bacterial communities that vary depending on environmental factors such as temperature and rainfall. In this work, 16S rDNA sequencing was used to analyze the seasonal variation of midgut bacterial diversity of Cx. Quinquefasciatus in Haikou City, Hainan Province, China. Proteobacteria was the dominant phylum, accounting for 79.7% (autumn), 73% (winter), 80.4% (spring), and 84.5% (summer). The abundance of Bacteroidetes in autumn and winter was higher than in others. Interestingly, Epsilonbacteraeota, which only exists in autumn and winter, was discovered accidentally in the midgut. We speculated that this might participate in the nutritional supply of adult mosquitoes when temperatures drop. Wolbachia is the most abundant in autumn, accounting for 31.6% of bacteria. The content of Pantoea was highest in the summer group, which might be related to the enhancement of the ability of mosquitoes as temperatures increased. Pseudomonas is carried out as the highest level in winter. On the contrary, in spring and summer, the genus in highest abundance is Enterobacter. Acinetobacter enriches in the spring when it turns from cold to hot. By studying the diversity of midgut bacteria of Cx. quinquefasciatus, we can further understand the co-evolution of mosquitoes and their symbiotic microbes. This is necessary to discuss the seasonal variation of microorganisms and ultimately provide a new perspective for the control of Cx. quinquefasciatus to reduce the spread of the diseases which have notably vital practical significance for the effective prevention of Cx. quinquefasciatus.