Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 91
Filter
1.
Brief Bioinform ; 25(6)2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39373051

ABSTRACT

Single-cell ribonucleic acid sequencing (scRNA-seq) technology can be used to perform high-resolution analysis of the transcriptomes of individual cells. Therefore, its application has gained popularity for accurately analyzing the ever-increasing content of heterogeneous single-cell datasets. Central to interpreting scRNA-seq data is the clustering of cells to decipher transcriptomic diversity and infer cell behavior patterns. However, its complexity necessitates the application of advanced methodologies capable of resolving the inherent heterogeneity and limited gene expression characteristics of single-cell data. Herein, we introduce a novel deep learning-based algorithm for single-cell clustering, designated scDFN, which can significantly enhance the clustering of scRNA-seq data through a fusion network strategy. The scDFN algorithm applies a dual mechanism involving an autoencoder to extract attribute information and an improved graph autoencoder to capture topological nuances, integrated via a cross-network information fusion mechanism complemented by a triple self-supervision strategy. This fusion is optimized through a holistic consideration of four distinct loss functions. A comparative analysis with five leading scRNA-seq clustering methodologies across multiple datasets revealed the superiority of scDFN, as determined by better the Normalized Mutual Information (NMI) and the Adjusted Rand Index (ARI) metrics. Additionally, scDFN demonstrated robust multi-cluster dataset performance and exceptional resilience to batch effects. Ablation studies highlighted the key roles of the autoencoder and the improved graph autoencoder components, along with the critical contribution of the four joint loss functions to the overall efficacy of the algorithm. Through these advancements, scDFN set a new benchmark in single-cell clustering and can be used as an effective tool for the nuanced analysis of single-cell transcriptomics.


Subject(s)
Algorithms , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Cluster Analysis , Humans , Deep Learning , Sequence Analysis, RNA/methods , Transcriptome , Gene Expression Profiling/methods , Computational Biology/methods , Animals , Single-Cell Gene Expression Analysis
2.
Environ Int ; 192: 109046, 2024 Oct 02.
Article in English | MEDLINE | ID: mdl-39378692

ABSTRACT

Pathogenic and antimicrobial-resistant (AMR) microorganisms are continually transmitted between human, animal, and environmental reservoirs, contributing to the high burden of infectious disease and driving the growing global AMR crisis. The sheer diversity of pathogens, AMR mechanisms, and transmission pathways connecting these reservoirs create the need for comprehensive cross-sectoral surveillance to effectively monitor risks. Current approaches are often siloed by discipline and sector, focusing independently on parts of the whole. Here we advocate that integrated surveillance approaches, developed through transdisciplinary cross-sector collaboration, are key to addressing the dual crises of infectious diseases and AMR. We first review the areas of need, challenges, and benefits of cross-sectoral surveillance, then summarise and evaluate the major detection methods already available to achieve this (culture, quantitative PCR, and metagenomic sequencing). Finally, we outline how cross-sectoral surveillance initiatives can be fostered at multiple scales of action, and present key considerations for implementation and the development of effective systems to manage and integrate this information for the benefit of multiple sectors. While methods and technologies are increasingly available and affordable for comprehensive pathogen and AMR surveillance across different reservoirs, it is imperative that systems are strengthened to effectively manage and integrate this information.

3.
Article in English | MEDLINE | ID: mdl-39302773

ABSTRACT

Molecular property prediction is a key component of AI-driven drug discovery and molecular characterization learning. Despite recent advances, existing methods still face challenges such as limited ability to generalize, and inadequate representation of learning from unlabeled data, especially for tasks specific to molecular structures. To address these limitations, we introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction. This architecture leverages the power of contrast learning with dual interaction mechanisms and unique molecular graph enhancement strategies. DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization. The framework's ability to extract key information about molecular structure and higher-order semantics is supported by minimizing loss of contrast. We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks. In addition to demonstrating superior transferability in a small number of learning scenarios, our visualizations highlight DIG-Mol's enhanced interpretability and representation capabilities. These findings confirm the effectiveness of our approach in overcoming challenges faced by traditional methods and mark a significant advance in molecular property prediction. The code for this project is now available at https://github.com/ZeXingZ/DIG-Mol.

4.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39258883

ABSTRACT

N6-methyladenosine (m$^{6}$A) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate m$^{6}$A profiles of distinct cell types have been created to facilitate quick and easy mining of m$^{6}$A signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of m$^{6}$A-associated biology by end-users who are unaware of them. Here, we review various m$^{6}$A-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early m$^{6}$A sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize m$^{6}$A databases to derive biologically meaningful results.


Subject(s)
Adenosine , Humans , Adenosine/analogs & derivatives , Adenosine/genetics , Adenosine/metabolism , Databases, Genetic , RNA, Messenger/genetics , RNA, Messenger/metabolism
5.
Article in English | MEDLINE | ID: mdl-39320992

ABSTRACT

Protein-metal ion interactions play a central role in the onset of numerous diseases. When amino acid changes lead to missense mutations in metal-binding sites, the disrupted interaction with metal ions can compromise protein function, potentially causing severe human ailments. Identifying these disease-associated mutation sites within metal-binding regions is paramount for understanding protein function and fostering innovative drug development. While some computational methods aim to tackle this challenge, they often fall short in accuracy, commonly due to manual feature extraction and the absence of structural data. We introduce MetalPrognosis, an innovative, alignment-free solution that predicts disease-associated mutations within metal-binding sites of metalloproteins with heightened precision. Rather than relying on manual feature extraction, MetalPrognosis employs sliding window sequences as input, extracting deep semantic insights from pre-trained protein language models. These insights are then incorporated into a convolutional neural network, facilitating the derivation of intricate features. Comparative evaluations show MetalPrognosis outperforms leading methodologies like MCCNN and M-Ionic across various metalloprotein test sets. Furthermore, an ablation study reiterates the effectiveness of our model architecture. To facilitate public use, we have made the datasets, source codes, and trained models for MetalPrognosis online available at http://metalprognosis.unimelb-biotools.cloud.edu.au/.

6.
Nucleic Acids Res ; 2024 Sep 13.
Article in English | MEDLINE | ID: mdl-39271121

ABSTRACT

MicroRNAs (miRNAs) are short non-coding RNAs involved in various cellular processes, playing a crucial role in gene regulation. Identifying miRNA targets remains a central challenge and is pivotal for elucidating the complex gene regulatory networks. Traditional computational approaches have predominantly focused on identifying miRNA targets through perfect Watson-Crick base pairings within the seed region, referred to as canonical sites. However, emerging evidence suggests that perfect seed matches are not a prerequisite for miRNA-mediated regulation, underscoring the importance of also recognizing imperfect, or non-canonical, sites. To address this challenge, we propose Mimosa, a new computational approach that employs the Transformer framework to enhance the prediction of miRNA targets. Mimosa distinguishes itself by integrating contextual, positional and base-pairing information to capture in-depth attributes, thereby improving its predictive capabilities. Its unique ability to identify non-canonical base-pairing patterns makes Mimosa a standout model, reducing the reliance on pre-selecting candidate targets. Mimosa achieves superior performance in gene-level predictions and also shows impressive performance in site-level predictions across various non-human species through extensive benchmarking tests. To facilitate research efforts in miRNA targeting, we have developed an easy-to-use web server for comprehensive end-to-end predictions, which is publicly available at http://monash.bioweb.cloud.edu.au/Mimosa.

7.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39276327

ABSTRACT

Recent advancements in high-throughput sequencing technologies have significantly enhanced our ability to unravel the intricacies of gene regulatory processes. A critical challenge in this endeavor is the identification of variant effects, a key factor in comprehending the mechanisms underlying gene regulation. Non-coding variants, constituting over 90% of all variants, have garnered increasing attention in recent years. The exploration of gene variant impacts and regulatory mechanisms has spurred the development of various deep learning approaches, providing new insights into the global regulatory landscape through the analysis of extensive genetic data. Here, we provide a comprehensive overview of the development of the non-coding variants models based on bulk and single-cell sequencing data and their model-based interpretation and downstream tasks. This review delineates the popular sequencing technologies for epigenetic profiling and deep learning approaches for discerning the effects of non-coding variants. Additionally, we summarize the limitations of current approaches in variant effect prediction research and outline opportunities for improvement. We anticipate that our study will offer a practical and useful guide for the bioinformatic community to further advance the unraveling of genetic variant effects.


Subject(s)
Deep Learning , Genetic Variation , Humans , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods , Epigenesis, Genetic
8.
Mar Drugs ; 22(8)2024 Jul 28.
Article in English | MEDLINE | ID: mdl-39195462

ABSTRACT

The direct enzymatic conversion of untreated waste shrimp and crab shells has been a key problem that plagues the large-scale utilization of chitin biological resources. The microorganisms in soil samples were enriched in two stages with powdered chitin (CP) and shrimp shell powder (SSP) as substrates. The enrichment microbiota XHQ10 with SSP degradation ability was obtained. The activities of chitinase and lytic polysaccharide monooxygenase of XHQ10 were 1.46 and 54.62 U/mL. Metagenomic analysis showed that Chitinolyticbacter meiyuanensis, Chitiniphilus shinanonensis, and Chitinimonas koreensis, with excellent chitin degradation performance, were highly enriched in XHQ10. Chitin oligosaccharides (CHOSs) are produced by XHQ10 through enzyme induction and two-stage temperature control technology, which contains CHOSs with a degree of polymerization (DP) more significant than ten and has excellent antioxidant activity. This work is the first study on the direct enzymatic preparation of CHOSs from SSP using enrichment microbiota, which provides a new path for the large-scale utilization of chitin bioresources.


Subject(s)
Animal Shells , Chitin , Chitinases , Microbiota , Oligosaccharides , Chitin/chemistry , Animals , Oligosaccharides/chemistry , Chitinases/metabolism , Animal Shells/chemistry , Metagenomics/methods , Temperature , Polymerization , Bacteria
9.
Bioinformatics ; 40(8)2024 08 02.
Article in English | MEDLINE | ID: mdl-39133151

ABSTRACT

MOTIVATION: The asymmetrical distribution of expressed mRNAs tightly controls the precise synthesis of proteins within human cells. This non-uniform distribution, a cornerstone of developmental biology, plays a pivotal role in numerous cellular processes. To advance our comprehension of gene regulatory networks, it is essential to develop computational tools for accurately identifying the subcellular localizations of mRNAs. However, considering multi-localization phenomena remains limited in existing approaches, with none considering the influence of RNA's secondary structure. RESULTS: In this study, we propose Allocator, a multi-view parallel deep learning framework that seamlessly integrates the RNA sequence-level and structure-level information, enhancing the prediction of mRNA multi-localization. The Allocator models equip four efficient feature extractors, each designed to handle different inputs. Two are tailored for sequence-based inputs, incorporating multilayer perceptron and multi-head self-attention mechanisms. The other two are specialized in processing structure-based inputs, employing graph neural networks. Benchmarking results underscore Allocator's superiority over state-of-the-art methods, showcasing its strength in revealing intricate localization associations. AVAILABILITY AND IMPLEMENTATION: The webserver of Allocator is available at http://Allocator.unimelb-biotools.cloud.edu.au; the source code and datasets are available on GitHub (https://github.com/lifuyi774/Allocator) and Zenodo (https://doi.org/10.5281/zenodo.13235798).


Subject(s)
Computational Biology , Neural Networks, Computer , RNA, Messenger , RNA, Messenger/metabolism , RNA, Messenger/genetics , Humans , Computational Biology/methods , Nucleic Acid Conformation , Deep Learning , Software
10.
Theranostics ; 14(10): 3945-3962, 2024.
Article in English | MEDLINE | ID: mdl-38994035

ABSTRACT

Rationale: NLRP3 inflammasome is critical in the development and progression of many metabolic diseases driven by chronic inflammation, but its effect on the pathology of postmenopausal osteoporosis (PMOP) remains poorly understood. Methods: We here firstly examined the levels of NLRP3 inflammasome in PMOP patients by ELISA. Then we investigated the possible mechanisms underlying the effect of NLRP3 inflammasome on PMOP by RNA sequencing of osteoblasts treated with NLRP3 siRNA and qPCR. Lastly, we accessed the effect of decreased NLRP3 levels on ovariectomized (OVX) rats. To specifically deliver NLRP3 siRNA to osteoblasts, we constructed NLRP3 siRNA wrapping osteoblast-specific aptamer (CH6)-functionalized lipid nanoparticles (termed as CH6-LNPs-siNLRP3). Results: We found that the levels of NLRP3 inflammasome were significantly increased in patients with PMOP, and were negatively correlated with estradiol levels. NLRP3 knock-down influenced signal pathways including immune system process, interferon signal pathway. Notably, of the top ten up-regulated genes in NLRP3-reduced osteoblasts, nine genes (except Mx2) were enriched in immune system process, and five genes were related to interferon signal pathway. The in vitro results showed that CH6-LNPs-siNLRP3 was relatively uniform with a dimeter of 96.64 ± 16.83 nm and zeta potential of 38.37 ± 1.86 mV. CH6-LNPs-siNLRP3 did not show obvious cytotoxicity and selectively delivered siRNA to bone tissue. Moreover, CH6-LNPs-siNLRP3 stimulated osteoblast differentiation by activating ALP and enhancing osteoblast matrix mineralization. When administrated to OVX rats, CH6-LNPs-siNLRP3 promoted bone formation and bone mass, improved bone microarchitecture and mechanical properties by decreasing the levels of NLRP3, IL-1ß and IL-18 and increasing the levels of OCN and Runx2. Conclusion: NLRP3 inflammasome may be a new biomarker for PMOP diagnosis and plays a key role in the pathology of PMOP. CH6-LNPs-siNLRP3 has potential application for the treatment of PMOP.


Subject(s)
Inflammasomes , Liposomes , NLR Family, Pyrin Domain-Containing 3 Protein , Nanoparticles , Osteoblasts , Osteoporosis, Postmenopausal , Animals , NLR Family, Pyrin Domain-Containing 3 Protein/metabolism , Osteoblasts/drug effects , Osteoblasts/metabolism , Female , Humans , Rats , Inflammasomes/metabolism , Nanoparticles/chemistry , Osteoporosis, Postmenopausal/metabolism , Down-Regulation/drug effects , Rats, Sprague-Dawley , RNA, Small Interfering/administration & dosage , Aptamers, Nucleotide/pharmacology , Aptamers, Nucleotide/administration & dosage , Disease Models, Animal , Middle Aged , Ovariectomy
11.
IEEE J Biomed Health Inform ; 28(9): 5649-5657, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38865232

ABSTRACT

The Type III Secretion Systems (T3SSs) play a pivotal role in host-pathogen interactions by mediating the secretion of type III secretion system effectors (T3SEs) into host cells. These T3SEs mimic host cell protein functions, influencing interactions between Gram-negative bacterial pathogens and their hosts. Identifying T3SEs is essential in biomedical research for comprehending bacterial pathogenesis and its implications on human cells. This study presents EDIFIER, a novel multi-channel model designed for accurate T3SE prediction. It incorporates a graph structural channel, utilizing graph convolutional networks (GCN) to capture protein 3D structural features and a sequence channel based on the ProteinBERT pre-trained model to extract the sequence context features of T3SEs. Rigorous benchmarking tests, including ablation studies and comparative analysis, validate that EDIFIER outperforms current state-of-the-art tools in T3SE prediction. To enhance EDIFIER's accessibility to the broader scientific community, we developed a webserver that is publicly accessible at http://edifier.unimelb-biotools.cloud.edu.au/. We anticipate EDIFIER will contribute to the field by providing reliable T3SE predictions, thereby advancing our understanding of host-pathogen dynamics.


Subject(s)
Neural Networks, Computer , Type III Secretion Systems , Type III Secretion Systems/physiology , Computational Biology/methods , Humans , Bacterial Proteins/metabolism , Bacterial Proteins/chemistry
12.
Angew Chem Int Ed Engl ; 63(21): e202401189, 2024 05 21.
Article in English | MEDLINE | ID: mdl-38506220

ABSTRACT

This study introduces a novel approach for synthesizing Benzoxazine-centered Polychiral Polyheterocycles (BPCPHCs) via an innovative asymmetric carbene-alkyne metathesis-triggered cascade. Overcoming challenges associated with intricate stereochemistry and multiple chiral centers, the catalytic asymmetric Carbene Alkyne Metathesis-mediated Cascade (CAMC) is employed using dirhodium catalyst/Brønsted acid co-catalysis, ensuring precise stereo control as validated by X-ray crystallography. Systematic substrate scope evaluation establishes exceptional diastereo- and enantioselectivities, creating a unique library of BPCPHCs. Pharmacological exploration identifies twelve BPCPHCs as potent Nav ion channel blockers, notably compound 8 g. In vivo studies demonstrate that intrathecal injection of 8 g effectively reverses mechanical hyperalgesia associated with chemotherapy-induced peripheral neuropathy (CIPN), suggesting a promising therapeutic avenue. Electrophysiological investigations unveil the inhibitory effects of 8 g on Nav1.7 currents. Molecular docking, dynamics simulations and surface plasmon resonance (SPR) assay provide insights into the stable complex formation and favorable binding free energy of 8 g with C5aR1. This research represents a significant advancement in asymmetric CAMC for BPCPHCs and unveils BPCPHC 8 g as a promising, uniquely acting pain blocker, establishing a C5aR1-Nav1.7 connection in the context of CIPN.


Subject(s)
Alkynes , Benzoxazines , Methane , Methane/analogs & derivatives , Methane/chemistry , Methane/pharmacology , Alkynes/chemistry , Benzoxazines/chemistry , Benzoxazines/pharmacology , Benzoxazines/chemical synthesis , Heterocyclic Compounds/chemistry , Heterocyclic Compounds/pharmacology , Heterocyclic Compounds/chemical synthesis , Humans , Stereoisomerism , Analgesics/chemistry , Analgesics/pharmacology , Analgesics/chemical synthesis , Molecular Structure , Catalysis , Drug Discovery , Animals
13.
Bioinform Adv ; 4(1): vbae035, 2024.
Article in English | MEDLINE | ID: mdl-38549946

ABSTRACT

Motivation: PE/PPE proteins, highly abundant in the Mycobacterium genome, play a vital role in virulence and immune modulation. Understanding their functions is key to comprehending the internal mechanisms of Mycobacterium. However, a lack of dedicated resources has limited research into PE/PPE proteins. Results: Addressing this gap, we introduce MycobactERIal PE/PPE proTeinS (MERITS), a comprehensive 3D structure database specifically designed for PE/PPE proteins. MERITS hosts 22 353 non-redundant PE/PPE proteins, encompassing details like physicochemical properties, subcellular localization, post-translational modification sites, protein functions, and measures of antigenicity, toxicity, and allergenicity. MERITS also includes data on their secondary and tertiary structure, along with other relevant biological information. MERITS is designed to be user-friendly, offering interactive search and data browsing features to aid researchers in exploring the potential functions of PE/PPE proteins. MERITS is expected to become a crucial resource in the field, aiding in developing new diagnostics and vaccines by elucidating the sequence-structure-functional relationships of PE/PPE proteins. Availability and implementation: MERITS is freely accessible at http://merits.unimelb-biotools.cloud.edu.au/.

14.
ACS Chem Neurosci ; 15(6): 1063-1073, 2024 03 20.
Article in English | MEDLINE | ID: mdl-38449097

ABSTRACT

Chronic pain is a growing global health problem affecting at least 10% of the world's population. However, current chronic pain treatments are inadequate. Voltage-gated sodium channels (Navs) play a pivotal role in regulating neuronal excitability and pain signal transmission and thus are main targets for nonopioid painkiller development, especially those preferentially expressed in dorsal root ganglial (DRG) neurons, such as Nav1.6, Nav1.7, and Nav1.8. In this study, we screened in virtual hits from dihydrobenzofuran and 3-hydroxyoxindole hybrid molecules against Navs via a veratridine (VTD)-based calcium imaging method. The results showed that one of the molecules, 3g, could inhibit VTD-induced neuronal activity significantly. Voltage clamp recordings demonstrated that 3g inhibited the total Na+ currents of DRG neurons in a concentration-dependent manner. Biophysical analysis revealed that 3g slowed the activation, meanwhile enhancing the inactivation of the Navs. Additionally, 3g use-dependently blocked Na+ currents. By combining with selective Nav inhibitors and a heterozygous expression system, we demonstrated that 3g preferentially inhibited the TTX-S Na+ currents, specifically the Nav1.7 current, other than the TTX-R Na+ currents. Molecular docking experiments implicated that 3g binds to a known allosteric site at the voltage-sensing domain IV(VSDIV) of Nav1.7. Finally, intrathecal injection of 3g significantly relieved mechanical pain behavior in the spared nerve injury (SNI) rat model, suggesting that 3g is a promising candidate for treating chronic pain.


Subject(s)
Chronic Pain , Indoles , Neuralgia , Rats , Animals , Molecular Docking Simulation , NAV1.8 Voltage-Gated Sodium Channel , Neuralgia/drug therapy , Neuralgia/metabolism , Ganglia, Spinal/metabolism
15.
Environ Sci Technol ; 58(10): 4662-4669, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38422482

ABSTRACT

Since the mass production and extensive use of chloroquine (CLQ) would lead to its inevitable discharge, wastewater treatment plants (WWTPs) might play a key role in the management of CLQ. Despite the reported functional versatility of ammonia-oxidizing bacteria (AOB) that mediate the first step for biological nitrogen removal at WWTP (i.e., partial nitrification), their potential capability to degrade CLQ remains to be discovered. Therefore, with the enriched partial nitrification sludge, a series of dedicated batch tests were performed in this study to verify the performance and mechanisms of CLQ biodegradation under the ammonium conditions of mainstream wastewater. The results showed that AOB could degrade CLQ in the presence of ammonium oxidation activity, but the capability was limited by the amount of partial nitrification sludge (∼1.1 mg/L at a mixed liquor volatile suspended solids concentration of 200 mg/L). CLQ and its biodegradation products were found to have no significant effect on the ammonium oxidation activity of AOB while the latter would promote N2O production through the AOB denitrification pathway, especially at relatively low DO levels (≤0.5 mg-O2/L). This study provided valuable insights into a more comprehensive assessment of the fate of CLQ in the context of wastewater treatment.


Subject(s)
Ammonia , Ammonium Compounds , Ammonia/metabolism , Sewage/microbiology , Bacteria/metabolism , Bioreactors/microbiology , Oxidation-Reduction , Nitrous Oxide/analysis , Nitrification , Ammonium Compounds/metabolism
16.
J Chem Inf Model ; 64(4): 1407-1418, 2024 02 26.
Article in English | MEDLINE | ID: mdl-38334115

ABSTRACT

Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.


Subject(s)
Algorithms , Proteins , Humans , Proteins/chemistry , Amino Acid Sequence , Neural Networks, Computer , Amino Acids
17.
Article in English | MEDLINE | ID: mdl-38190667

ABSTRACT

Origins of replication sites (ORIs) are crucial genomic regions where DNA replication initiation takes place, playing pivotal roles in fundamental biological processes like cell division, gene expression regulation, and DNA integrity. Accurate identification of ORIs is essential for comprehending cell replication, gene expression, and mutation-related diseases. However, experimental approaches for ORI identification are often expensive and time-consuming, leading to the growing popularity of computational methods. In this study, we present PLANNER (DeeP LeArNiNg prEdictor for ORI), a novel approach for species-specific and cell-specific prediction of eukaryotic ORIs. PLANNER uses the multi-scale ktuple sequences as input and employs the DNABERT pre-training model with transfer learning and ensemble learning strategies to train accurate predictive models. Extensive empirical test results demonstrate that PLANNER achieved superior predictive performance compared to state-of-the-art approaches, including iOri-Euk, Stack-ORI, and ORI-Deep, within specific cell types and across different cell types. Furthermore, by incorporating an interpretable analysis mechanism, we provide insights into the learned patterns, facilitating the mapping from discovering important sequential determinants to comprehensively analysing their biological functions. To facilitate the widespread utilisation of PLANNER, we developed an online webserver and local stand-alone software, available at http://planner.unimelb-biotools.cloud.edu.au/ and https://github.com/CongWang3/PLANNER, respectively.

18.
Comput Biol Med ; 168: 107681, 2024 01.
Article in English | MEDLINE | ID: mdl-37992470

ABSTRACT

The multidrug-resistant Gram-negative bacteria has evolved into a worldwide threat to human health; over recent decades, polymyxins have re-emerged in clinical practice due to their high activity against multidrug-resistant bacteria. Nevertheless, the nephrotoxicity and neurotoxicity of polymyxins seriously hinder their practical use in the clinic. Based on the quantitative structure-activity relationship (QSAR), analogue design is an efficient strategy for discovering biologically active compounds with fewer adverse effects. To accelerate the polymyxin analogues discovery process and find the polymyxin analogues with high antimicrobial activity against Gram-negative bacteria, here we developed PmxPred, a GCN and catBoost-based machine learning framework. The RDKit descriptors were used for the molecule and residues representation, and the ensemble learning model was utilized for the antimicrobial activity prediction. This framework was trained and evaluated on multiple Gram-negative bacteria datasets, including Acinetobacter baumannii, Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and a general Gram-negative bacteria dataset achieving an AUROC of 0.857, 0.880, 0.756, 0.895 and 0.865 on the independent test, respectively. PmxPred outperformed the transfer learning method that trained on 10 million molecules. We interpreted our model well-trained model by analysing the importance of global and residue features. Overall, PmxPred provides a powerful additional tool for predicting active polymyxin analogues, and holds the potential elucidate the mechanisms underlying the antimicrobial activity of polymyxins. The source code is publicly available on GitHub (https://github.com/yanwu20/PmxPred).


Subject(s)
Gram-Negative Bacterial Infections , Polymyxins , Humans , Polymyxins/pharmacology , Polymyxins/chemistry , Anti-Bacterial Agents/chemistry , Gram-Negative Bacterial Infections/drug therapy , Gram-Negative Bacterial Infections/microbiology , Gram-Negative Bacteria , Drug Resistance, Multiple, Bacterial , Escherichia coli , Microbial Sensitivity Tests
19.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37874948

ABSTRACT

Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.


Subject(s)
Machine Learning , Peptide Hydrolases , Peptide Hydrolases/metabolism , Substrate Specificity , Algorithms
20.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37291763

ABSTRACT

BACKGROUND: Promoters are DNA regions that initiate the transcription of specific genes near the transcription start sites. In bacteria, promoters are recognized by RNA polymerases and associated sigma factors. Effective promoter recognition is essential for synthesizing the gene-encoded products by bacteria to grow and adapt to different environmental conditions. A variety of machine learning-based predictors for bacterial promoters have been developed; however, most of them were designed specifically for a particular species. To date, only a few predictors are available for identifying general bacterial promoters with limited predictive performance. RESULTS: In this study, we developed TIMER, a Siamese neural network-based approach for identifying both general and species-specific bacterial promoters. Specifically, TIMER uses DNA sequences as the input and employs three Siamese neural networks with the attention layers to train and optimize the models for a total of 13 species-specific and general bacterial promoters. Extensive 10-fold cross-validation and independent tests demonstrated that TIMER achieves a competitive performance and outperforms several existing methods on both general and species-specific promoter prediction. As an implementation of the proposed method, the web server of TIMER is publicly accessible at http://web.unimelb-bioinfortools.cloud.edu.au/TIMER/.


Subject(s)
Bacteria , Neural Networks, Computer , Bacteria/genetics , Bacteria/metabolism , DNA-Directed RNA Polymerases/genetics , DNA-Directed RNA Polymerases/metabolism , Base Sequence , Promoter Regions, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL