Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 530
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38954583

RESUMO

Biomedical evidence has demonstrated the relevance of microRNA (miRNA) dysregulation in complex human diseases, and determining the relationship between miRNAs and diseases can aid in the early detection and prevention of diseases. Traditional biological experimental methods have the disadvantages of high cost and low efficiency, which are well compensated by computational methods. However, many computational methods have the challenge of excessively focusing on the neighbor relationship, ignoring the structural information of the graph, and belittling the redundant information of the graph structure. This study proposed a computational model based on a graph-masking autoencoder named MGAEMDA. MGAEMDA is an asymmetric framework in which the encoder maps partially observed graphs into latent representations. The decoder reconstructs the masked structural information based on the edge and node levels and combines it with linear matrices to obtain the result. The empirical results on the two datasets reveal that the MGAEMDA model performs better than its counterparts. We also demonstrated the predictive performance of MGAEMDA using a case study of four diseases, and all the top 30 predicted miRNAs were validated in the database, providing further evidence of the excellent performance of the model.

2.
BMC Biol ; 22(1): 152, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978014

RESUMO

BACKGROUND: Metabolite-associated cell communications play critical roles in maintaining human biological function. However, most existing tools and resources focus only on ligand-receptor interaction pairs where both partners are proteinaceous, neglecting other non-protein molecules. To address this gap, we introduce the MRCLinkdb database and algorithm, which aggregates and organizes data related to non-protein L-R interactions in cell-cell communication, providing a valuable resource for predicting intercellular communication based on metabolite-related ligand-receptor interactions. RESULTS: Here, we manually curated the metabolite-ligand-receptor (ML-R) interactions from the literature and known databases, ultimately collecting over 790 human and 670 mouse ML-R interactions. Additionally, we compiled information on over 1900 enzymes and 260 transporter entries associated with these metabolites. We developed Metabolite-Receptor based Cell Link Database (MRCLinkdb) to store these ML-R interactions data. Meanwhile, the platform also offers extensive information for presenting ML-R interactions, including fundamental metabolite information and the overall expression landscape of metabolite-associated gene sets (such as receptor, enzymes, and transporter proteins) based on single-cell transcriptomics sequencing (covering 35 human and 26 mouse tissues, 52 human and 44 mouse cell types) and bulk RNA-seq/microarray data (encompassing 62 human and 39 mouse tissues). Furthermore, MRCLinkdb introduces a web server dedicated to the analysis of intercellular communication based on ML-R interactions. MRCLinkdb is freely available at https://www.cellknowledge.com.cn/mrclinkdb/ . CONCLUSIONS: In addition to supplementing ligand-receptor databases, MRCLinkdb may provide new perspectives for decoding the intercellular communication and advancing related prediction tools based on ML-R interactions.


Assuntos
Comunicação Celular , Humanos , Ligantes , Animais , Camundongos , Bases de Dados Factuais
3.
Bioinformatics ; 40(7)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38967119

RESUMO

MOTIVATION: Accurate prediction of acute dermal toxicity (ADT) is essential for the safe and effective development of contact drugs. Currently, graph neural networks, a form of deep learning technology, accurately model the structure of compound molecules, enhancing predictions of their ADT. However, many existing methods emphasize atom-level information transfer and overlook crucial data conveyed by molecular bonds and their interrelationships. Additionally, these methods often generate "equal" node representations across the entire graph, failing to accentuate "important" substructures like functional groups, pharmacophores, and toxicophores, thereby reducing interpretability. RESULTS: We introduce a novel model, GraphADT, utilizing structure remapping and multi-view graph pooling (MVPool) technologies to accurately predict compound ADT. Initially, our model applies structure remapping to better delineate bonds, transforming "bonds" into new nodes and "bond-atom-bond" interactions into new edges, thereby reconstructing the compound molecular graph. Subsequently, we use MVPool to amalgamate data from various perspectives, minimizing biases inherent to single-view analyses. Following this, the model generates a robust node ranking collaboratively, emphasizing critical nodes or substructures to enhance model interpretability. Lastly, we apply a graph comparison learning strategy to train both the original and structure remapped molecular graphs, deriving the final molecular representation. Experimental results on public datasets indicate that the GraphADT model outperforms existing state-of-the-art models. The GraphADT model has been demonstrated to effectively predict compound ADT, offering potential guidance for the development of contact drugs and related treatments. AVAILABILITY AND IMPLEMENTATION: Our code and data are accessible at: https://github.com/mxqmxqmxq/GraphADT.git.


Assuntos
Pele , Pele/efeitos dos fármacos , Humanos , Aprendizado Profundo , Redes Neurais de Computação
4.
Research (Wash D C) ; 7: 0409, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39022746

RESUMO

Helicobacter pylori infection is characterized as progressive processes of bacterial persistence and chronic gastritis with features of infiltration of mononuclear cells more than granulocytes in gastric mucosa. Angiopoietin-like 4 (ANGPTL4) is considered a double-edged sword in inflammation-associated diseases, but its function and clinical relevance in H. pylori-associated pathology are unknown. Here, we demonstrate both pro-colonization and pro-inflammation roles of ANGPTL4 in H. pylori infection. Increased ANGPTL4 in the infected gastric mucosa was produced from gastric epithelial cells (GECs) synergistically induced by H. pylori and IL-17A in a cagA-dependent manner. Human gastric ANGPTL4 correlated with H. pylori colonization and the severity of gastritis, and mouse ANGPTL4 from non-bone marrow-derived cells promoted bacteria colonization and inflammation. Importantly, H. pylori colonization and inflammation were attenuated in Il17a -/-, Angptl4 -/-, and Il17a -/- Angptl4 -/- mice. Mechanistically, ANGPTL4 bound to integrin αV (ITGAV) on GECs to suppress CXCL1 production by inhibiting ERK, leading to decreased gastric influx of neutrophils, thereby promoting H. pylori colonization; ANGPTL4 also bound to ITGAV on monocytes to promote CCL5 production by activating PI3K-AKT-NF-κB, resulting in increased gastric influx of regulatory CD4+ T cells (Tregs) via CCL5-CCR4-dependent migration. In turn, ANGPTL4 induced Treg proliferation by binding to ITGAV to activate PI3K-AKT-NF-κB, promoting H. pylori-associated gastritis. Overall, we propose a model in which ANGPTL4 collectively ensures H. pylori persistence and promotes gastritis. Efforts to inhibit ANGPTL4-associated pathway may prove valuable strategies in treating H. pylori infection.

5.
Artigo em Inglês | MEDLINE | ID: mdl-39008396

RESUMO

Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming. The source code of the software is available for download at the following website: https://github.com/LigosQ/iProps and https://gitee.com/LigosQ/iProps.

6.
Int J Biol Macromol ; 276(Pt 2): 133825, 2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-39002900

RESUMO

Predicting compound-induced inhibition of cardiac ion channels is crucial and challenging, significantly impacting cardiac drug efficacy and safety assessments. Despite the development of various computational methods for compound-induced inhibition prediction in cardiac ion channels, their performance remains limited. Most methods struggle to fuse multi-source data, relying solely on specific dataset training, leading to poor accuracy and generalization. We introduce MultiCBlo, a model that fuses multimodal information through a progressive learning approach, designed to predict compound-induced inhibition of cardiac ion channels with high accuracy. MultiCBlo employs progressive multimodal information fusion technology to integrate the compound's SMILES sequence, graph structure, and fingerprint, enhancing its representation. This is the first application of progressive multimodal learning for predicting compound-induced inhibition of cardiac ion channels, to our knowledge. The objective of this study was to predict the compound-induced inhibition of three major cardiac ion channels: hERG, Cav1.2, and Nav1.5. The results indicate that MultiCBlo significantly outperforms current models in predicting compound-induced inhibition of cardiac ion channels. We hope that MultiCBlo will facilitate cardiac drug development and reduce compound toxicity risks. Code and data are accessible at: https://github.com/taowang11/MultiCBlo. The online prediction platform is freely accessible at: https://huggingface.co/spaces/wtttt/PCICB.

7.
Int J Biol Macromol ; : 133791, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38992553

RESUMO

Dengue virus (DENV2) is the cause of dengue disease and a worldwide health problem. DENV2 replicates in the host cell using polyproteins such as NS3 protease in conjugation with NS2B cofactor, making NS3 protease a promising antiviral drug-target. This study investigated the efficacy of 'Niloticin' against NS2B/NS3-protease. In silico and in vitro analyses were performed which included interaction of niloticin with NS2B/NS3-protease, protein stability and flexibility, mutation effect, betweenness centrality of residues and analysis of cytotoxicity, protein expression and WNV NS3-protease activity. Similar like acyclovir, niloticin forms strong H-bonds and hydrophobic interactions with residues LEU149, ASN152, LYS74, GLY148 and ALA164. The stability of the niloticin-NS2B/NS3-protease complex was found to be stable compared to the apo NS2B/NS3-protease in structural deviation, PCA, compactness and FEL analysis. The IC50 value of niloticin was 0.14 µM in BHK cells based on in vitro cytotoxicity analysis and showed significant activity at 2.5 µM in a concentration-dependent manner. Western blotting and qRT-PCR analyses showed that niloticin reduced DENV2 protein transcription in a dose-dependent manner. Besides, niloticin confirmed the inhibition of NS3-protease by the SensoLyte 440 WNV protease detection kit. These promising results suggest that niloticin could be an effective antiviral drug against DENV2 and other flaviviruses.

8.
Nat Commun ; 15(1): 5879, 2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-38997253

RESUMO

The development of new antibiotics continues to pose challenges, particularly considering the growing threat of multidrug-resistant Staphylococcus aureus. Structurally diverse natural products provide a promising source of antibiotics. Herein, we outline a concise approach for the collective asymmetric total synthesis of polycyclic xanthene myrtucommulone D and five related congeners. The strategy involves rapid assembly of the challenging benzopyrano[2,3-a]xanthene core, highly diastereoselective establishment of three contiguous stereocenters through a retro-hemiketalization/double Michael cascade reaction, and a Mitsunobu-mediated chiral resolution approach with high optical purity and broad substrate scope. Quantum mechanical calculations provide insight into stereoselective construction mechanism of the three contiguous stereocenters. Additionally, this work leads to the discovery of an antibacterial agent against both drug-sensitive and drug-resistant S. aureus. This compound operates through a unique mechanism that promotes bacterial autolysis by activating the two-component sensory histidine kinase WalK. Our research holds potential for future antibacterial drug development.


Assuntos
Antibacterianos , Staphylococcus aureus Resistente à Meticilina , Xantenos , Staphylococcus aureus Resistente à Meticilina/efeitos dos fármacos , Antibacterianos/farmacologia , Antibacterianos/síntese química , Antibacterianos/química , Xantenos/síntese química , Xantenos/farmacologia , Xantenos/química , Testes de Sensibilidade Microbiana , Estereoisomerismo , Compostos Policíclicos/síntese química , Compostos Policíclicos/farmacologia , Compostos Policíclicos/química , Descoberta de Drogas , Estrutura Molecular
9.
Neural Netw ; 178: 106458, 2024 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-38901093

RESUMO

The detection of therapeutic peptides is a topic of immense interest in the biomedical field. Conventional biochemical experiment-based detection techniques are tedious and time-consuming. Computational biology has become a useful tool for improving the detection efficiency of therapeutic peptides. Most computational methods do not consider the deviation caused by noise. To improve the generalization performance of therapeutic peptide prediction methods, this work presents a sequence homology score-based deep fuzzy echo-state network with maximizing mixture correntropy (SHS-DFESN-MMC) model. Our method is compared with the existing methods on eight types of therapeutic peptide datasets. The model parameters are determined by 10 fold cross-validation on their training sets and verified by independent test sets. Across the 8 datasets, the average area under the receiver operating characteristic curve (AUC) values of SHS-DFESN-MMC are the highest on both the training (0.926) and independent sets (0.923).

10.
Bioinformatics ; 40(6)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38837345

RESUMO

MOTIVATION: Accurately identifying the drug-target interactions (DTIs) is one of the crucial steps in the drug discovery and drug repositioning process. Currently, many computational-based models have already been proposed for DTI prediction and achieved some significant improvement. However, these approaches pay little attention to fuse the multi-view similarity networks related to drugs and targets in an appropriate way. Besides, how to fully incorporate the known interaction relationships to accurately represent drugs and targets is not well investigated. Therefore, there is still a need to improve the accuracy of DTI prediction models. RESULTS: In this study, we propose a novel approach that employs Multi-view similarity network fusion strategy and deep Interactive attention mechanism to predict Drug-Target Interactions (MIDTI). First, MIDTI constructs multi-view similarity networks of drugs and targets with their diverse information and integrates these similarity networks effectively in an unsupervised manner. Then, MIDTI obtains the embeddings of drugs and targets from multi-type networks simultaneously. After that, MIDTI adopts the deep interactive attention mechanism to further learn their discriminative embeddings comprehensively with the known DTI relationships. Finally, we feed the learned representations of drugs and targets to the multilayer perceptron model and predict the underlying interactions. Extensive results indicate that MIDTI significantly outperforms other baseline methods on the DTI prediction task. The results of the ablation experiments also confirm the effectiveness of the attention mechanism in the multi-view similarity network fusion strategy and the deep interactive attention mechanism. AVAILABILITY AND IMPLEMENTATION: https://github.com/XuLew/MIDTI.


Assuntos
Biologia Computacional , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Algoritmos , Reposicionamento de Medicamentos/métodos , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química , Humanos
11.
Artigo em Inglês | MEDLINE | ID: mdl-38896510

RESUMO

Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.

12.
Brief Funct Genomics ; 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38860675

RESUMO

In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.

13.
Methods ; 228: 48-54, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38789016

RESUMO

With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease (RD) relationships from extensive biomedical literature crucial for advancing research in this field. This study introduces RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models (LLM). Initially, we constructed a dedicated RD corpus by manually curating from literature, comprising 2,082 positive and 2,000 negative sentences, alongside an independent test dataset (comprising 500 positive and 500 negative sentences) for training and evaluating RDscan. Subsequently, by fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In 5-fold cross-validation, RDscan significantly outperformed traditional machine learning methods (Support Vector Machine, Logistic Regression and Random Forest). In addition, we have developed an accessible webserver that assists users in extracting RD relationships from text. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction, and is poised to emerge as an invaluable tool for researchers dedicated to exploring the intricate interactions between RNA and diseases. Webserver of RDscan is free available at https://cellknowledge.com.cn/RDscan/.


Assuntos
Mineração de Dados , RNA , Mineração de Dados/métodos , RNA/genética , Humanos , Aprendizado de Máquina , Doença/genética , Máquina de Vetores de Suporte , Software
14.
JCI Insight ; 9(13)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38771644

RESUMO

Hypotrichosis is a genetic disorder characterized by a diffuse and progressive loss of scalp and/or body hair. Nonetheless, the causative genes for several affected individuals remain elusive, and the underlying mechanisms have yet to be fully elucidated. Here, we discovered a dominant variant in a disintegrin and a metalloproteinase domain 17 (ADAM17) gene caused hypotrichosis with woolly hair. Adam17 (p.D647N) knockin mice mimicked the hair abnormality in patients. ADAM17 (p.D647N) mutation led to hair follicle stem cell (HFSC) exhaustion and caused abnormal hair follicles, ultimately resulting in alopecia. Mechanistic studies revealed that ADAM17 binds directly to E3 ubiquitin ligase tripartite motif-containing protein 47 (TRIM47). ADAM17 variant enhanced the association between ADAM17 and TRIM47, leading to an increase in ubiquitination and subsequent degradation of ADAM17 protein. Furthermore, reduced ADAM17 protein expression affected the Notch signaling pathway, impairing the activation, proliferation, and differentiation of HFSCs during hair follicle regeneration. Overexpression of Notch intracellular domain rescued the reduced proliferation ability caused by Adam17 variant in primary fibroblast cells.


Assuntos
Proteína ADAM17 , Alopecia , Folículo Piloso , Ubiquitina-Proteína Ligases , Proteína ADAM17/metabolismo , Proteína ADAM17/genética , Animais , Alopecia/genética , Alopecia/metabolismo , Alopecia/patologia , Camundongos , Folículo Piloso/metabolismo , Folículo Piloso/patologia , Humanos , Ubiquitina-Proteína Ligases/genética , Ubiquitina-Proteína Ligases/metabolismo , Ubiquitinação , Masculino , Transdução de Sinais/genética , Proteínas com Motivo Tripartido/metabolismo , Proteínas com Motivo Tripartido/genética , Feminino , Mutação , Técnicas de Introdução de Genes , Proliferação de Células/genética , Diferenciação Celular/genética , Proteólise , Modelos Animais de Doenças , Fibroblastos/metabolismo , Receptores Notch/metabolismo , Receptores Notch/genética
15.
BMC Biol ; 22(1): 126, 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38816885

RESUMO

BACKGROUND: A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. RESULTS: In this study, a two-stage integrated predictor called "msBERT-Promoter" is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. CONCLUSIONS: msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology.


Assuntos
Regiões Promotoras Genéticas , Biologia Computacional/métodos , DNA/genética , Humanos , Modelos Genéticos , Análise de Sequência de DNA/métodos
16.
Genome Biol Evol ; 16(5)2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38748485

RESUMO

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Aprendizado de Máquina , Bases de Dados Genéticas , Biologia Computacional/métodos , Classificação/métodos
17.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38710482

RESUMO

MOTIVATION: Despite the extensive manufacturing of antiviral drugs and vaccination, viral infections continue to be a major human ailment. Antiviral peptides (AVPs) have emerged as potential candidates in the pursuit of novel antiviral drugs. These peptides show vigorous antiviral activity against a diverse range of viruses by targeting different phases of the viral life cycle. Therefore, the accurate prediction of AVPs is an essential yet challenging task. Lately, many machine learning-based approaches have developed for this purpose; however, their limited capabilities in terms of feature engineering, accuracy, and generalization make these methods restricted. RESULTS: In the present study, we aim to develop an efficient machine learning-based approach for the identification of AVPs, referred to as DeepAVP-TPPred, to address the aforementioned problems. First, we extract two new transformed feature sets using our designed image-based feature extraction algorithms and integrate them with an evolutionary information-based feature. Next, these feature sets were optimized using a novel feature selection approach called binary tree growth Algorithm. Finally, the optimal feature space from the training dataset was fed to the deep neural network to build the final classification model. The proposed model DeepAVP-TPPred was tested using stringent 5-fold cross-validation and two independent dataset testing methods, which achieved the maximum performance and showed enhanced efficiency over existing predictors in terms of both accuracy and generalization capabilities. AVAILABILITY AND IMPLEMENTATION: https://github.com/MateeullahKhan/DeepAVP-TPPred.


Assuntos
Algoritmos , Antivirais , Aprendizado de Máquina , Antivirais/farmacologia , Antivirais/química , Peptídeos/química , Humanos , Biologia Computacional/métodos , Redes Neurais de Computação
18.
Mol Ther Nucleic Acids ; 35(2): 102187, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38706631

RESUMO

Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.

19.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38696758

RESUMO

MOTIVATION: Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS: We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION: The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.


Assuntos
Biologia Computacional , Peptídeos , Peptídeos/química , Biologia Computacional/métodos , Humanos , Sequência de Aminoácidos , Algoritmos , Software
20.
Microb Ecol ; 87(1): 74, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38771320

RESUMO

Rhizosphere microbial communities are to be as critical factors for plant growth and vitality, and their adaptive differentiation strategies have received increasing amounts of attention but are poorly understood. In this study, we obtained bacterial and fungal amplicon sequences from the rhizosphere and bulk soils of various ecosystems to investigate the potential mechanisms of microbial adaptation to the rhizosphere environment. Our focus encompasses three aspects: niche preference, functional profiles, and cross-kingdom co-occurrence patterns. Our findings revealed a correlation between niche similarity and nucleotide distance, suggesting that niche adaptation explains nucleotide variation among some closely related amplicon sequence variants (ASVs). Furthermore, biological macromolecule metabolism and communication among abundant bacteria increase in the rhizosphere conditions, suggesting that bacterial function is trait-mediated in terms of fitness in new habitats. Additionally, our analysis of cross-kingdom networks revealed that fungi act as intermediaries that facilitate connections between bacteria, indicating that microbes can modify their cooperative relationships to adapt. Overall, the evidence for rhizosphere microbial community adaptation, via differences in gene and functional and co-occurrence patterns, elucidates the adaptive benefits of genetic and functional flexibility of the rhizosphere microbiota through niche shifts.


Assuntos
Adaptação Fisiológica , Bactérias , Fungos , Microbiota , Rizosfera , Microbiologia do Solo , Fungos/genética , Fungos/classificação , Fungos/fisiologia , Bactérias/genética , Bactérias/classificação , Bactérias/metabolismo , Bactérias/isolamento & purificação , Ecossistema , Fenômenos Fisiológicos Bacterianos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA