Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 206
Filter
1.
IEEE Trans Cybern ; PP2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38843061

ABSTRACT

Stability maintenance in systems refers to the capacity to preserve inherent stability characteristics. In this article, stability maintenance of large boolean networks (BNs) subjected to perturbations is investigated using a distributed pinning control (PC) strategy. The concept of edge removal as a form of perturbation is introduced, and several criteria for achieving global stability are established. Two forms of distributed PCs, one implemented before perturbation occurs and the other after, are introduced. It is noteworthy that the designs of the controllers are solely dependent on the system's in-neighbors. The proposed method significantly decreases the computational complexity, reducing it from O(22|V|) to O(|V|+ |E| + κ·2K) , where |V|, |E| denotes the cardinality of vertices and arcs of the adjacent graph of BN, κ is the number of the pinning nodes, and K represents the maximum in-degree of the network. In the worst-case scenario, the computational complexity is bounded by O(|V|+ |E| + κ·2|V|) . To validate the effectiveness of the proposed methods, results from multiple gene networks are presented, including a model representing the human rheumatoid arthritis synovial fibroblast, among which only 12 of the 359 nodes are deemed essential.

2.
Article in English | MEDLINE | ID: mdl-38767997

ABSTRACT

A novel framework for designing the molecular structure of chemical compounds with a desired chemical property has recently been proposed. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of two functions: a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. To improve the learning performance of prediction functions in the framework, we design a method that splits a given data set C into two subsets C(i),i=1,2 by a hyperplane in a chemical space so that most compounds in the first (resp., second) subset have observed values lower (resp., higher) than a threshold θ. We construct a prediction function ψ to the data set C by combining prediction functions ψi,i=1,2 each of which is constructed on C(i) independently. The results of our computational experiments suggest that the proposed method improved the learning performance for several chemical properties to which a good prediction function has been difficult to construct.

3.
BMC Bioinformatics ; 25(1): 13, 2024 Jan 09.
Article in English | MEDLINE | ID: mdl-38195423

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS: In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS: Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.


Subject(s)
Deep Learning , MicroRNAs , Humans , Benchmarking , Machine Learning , Nucleotides
4.
NPJ Syst Biol Appl ; 10(1): 9, 2024 Jan 20.
Article in English | MEDLINE | ID: mdl-38245555

ABSTRACT

Recent controllability analyses have demonstrated that driver nodes tend to be associated to genes related to important biological functions as well as human diseases. While researchers have focused on identifying critical nodes, intermittent nodes have received much less attention. Here, we propose a new efficient algorithm based on the Hamming distance for computing the importance of intermittent nodes using a Minimum Dominating Set (MDS)-based control model. We refer to this metric as criticality. The application of the proposed algorithm to compute criticality under the MDS control framework allows us to unveil the biological importance and roles of the intermittent nodes in different network systems, from cellular level such as signaling pathways and cell-cell interactions such as cytokine networks, to the complete nervous system of the nematode worm C. elegans. Taken together, the developed computational tools may open new avenues for investigating the role of intermittent nodes in many biological systems of interest in the context of network control.


Subject(s)
Caenorhabditis elegans , Computational Biology , Animals , Humans , Caenorhabditis elegans/genetics , Algorithms , Signal Transduction/genetics
5.
Article in English | MEDLINE | ID: mdl-38145512

ABSTRACT

In this brief paper, we study the size and width of autoencoders consisting of Boolean threshold functions, where an autoencoder is a layered neural network whose structure can be viewed as consisting of an encoder, which compresses an input vector to a lower dimensional vector, and a decoder which transforms the low-dimensional vector back to the original input vector exactly (or approximately). We focus on the decoder part and show that [Formula: see text] and O(√{Dn}) nodes are required to transform n vectors in d -dimensional binary space to D -dimensional binary space. We also show that the width can be reduced if we allow small errors, where the error is defined as the average of the Hamming distance between each vector input to the encoder part and the resulting vector output by the decoder.

6.
Bioinform Adv ; 3(1): vbad155, 2023.
Article in English | MEDLINE | ID: mdl-37928345

ABSTRACT

Motivation: Extended connectivity interaction features (ECIF) is a method developed to predict protein-ligand binding affinity, allowing for detailed atomic representation. It performed very well in terms of Comparative Assessment of Scoring Functions 2016 (CASF-2016) scoring power. However, ECIF has the limitation of not being able to adequately account for interatomic distances. Results: To investigate what kind of distance representation is effective for P-L binding affinity prediction, we have developed two algorithms that improved ECIF's feature extraction method to take distance into account. One is multi-shelled ECIF, which takes into account the distance between atoms by dividing the distance between atoms into multiple layers. The other is weighted ECIF, which weights the importance of interactions according to the distance between atoms. A comparison of these two methods shows that multi-shelled ECIF outperforms weighted ECIF and the original ECIF, achieving a CASF-2016 scoring power Pearson correlation coefficient of 0.877. Availability and implementation: All the codes and data are available on GitHub (https://github.com/koji11235/MSECIFv2).

7.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37950905

ABSTRACT

Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.


Subject(s)
Neoplasms , Oncogenes , Mutation , Benchmarking , Genes, Essential , Genomics , Neoplasms/genetics
8.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37874948

ABSTRACT

Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.


Subject(s)
Machine Learning , Peptide Hydrolases , Peptide Hydrolases/metabolism , Substrate Specificity , Algorithms
9.
eNeuro ; 10(10)2023 10.
Article in English | MEDLINE | ID: mdl-37903612

ABSTRACT

The brain is an organ that functions as a network of many elements connected in a nonuniform manner. In the brain, the neocortex is evolutionarily newest and is thought to be primarily responsible for the high intelligence of mammals. In the mature mammalian brain, all cortical regions are expected to have some degree of homology, but have some variations of local circuits to achieve specific functions performed by individual regions. However, few cellular-level studies have examined how the networks within different cortical regions differ. This study aimed to find rules for systematic changes of connectivity (microconnectomes) across 16 different cortical region groups. We also observed unknown trends in basic parameters in vitro such as firing rate and layer thickness across brain regions. Results revealed that the frontal group shows unique characteristics such as dense active neurons, thick cortex, and strong connections with deeper layers. This suggests the frontal side of the cortex is inherently capable of driving, even in isolation and that frontal nodes provide the driving force generating a global pattern of spontaneous synchronous activity, such as the default mode network. This finding provides a new hypothesis explaining why disruption in the frontal region causes a large impact on mental health.


Subject(s)
Neocortex , Neurons , Animals , Neurons/physiology , Frontal Lobe/physiology , Head , Nerve Net/diagnostic imaging , Nerve Net/physiology , Magnetic Resonance Imaging , Mammals
10.
ACS Omega ; 8(26): 23925-23935, 2023 Jul 04.
Article in English | MEDLINE | ID: mdl-37426216

ABSTRACT

We have developed an innovative system, AI QM Docking Net (AQDnet), which utilizes the three-dimensional structure of protein-ligand complexes to predict binding affinity. This system is novel in two respects: first, it significantly expands the training dataset by generating thousands of diverse ligand configurations for each protein-ligand complex and subsequently determining the binding energy of each configuration through quantum computation. Second, we have devised a method that incorporates the atom-centered symmetry function (ACSF), highly effective in describing molecular energies, for the prediction of protein-ligand interactions. These advancements have enabled us to effectively train a neural network to learn the protein-ligand quantum energy landscape (P-L QEL). Consequently, we have achieved a 92.6% top 1 success rate in the CASF-2016 docking power, placing first among all models assessed in the CASF-2016, thus demonstrating the exceptional docking performance of our model.

11.
BMC Bioinformatics ; 24(1): 252, 2023 Jun 15.
Article in English | MEDLINE | ID: mdl-37322439

ABSTRACT

BACKGROUND: Bioinformatics capability to analyze spatio-temporal dynamics of gene expression is essential in understanding animal development. Animal cells are spatially organized as functional tissues where cellular gene expression data contain information that governs morphogenesis during the developmental process. Although several computational tissue reconstruction methods using transcriptomics data have been proposed, those methods have been ineffective in arranging cells in their correct positions in tissues or organs unless spatial information is explicitly provided. RESULTS: This study demonstrates stochastic self-organizing map clustering with Markov chain Monte Carlo calculations for optimizing informative genes effectively reconstruct any spatio-temporal topology of cells from their transcriptome profiles with only a coarse topological guideline. The method, eSPRESSO (enhanced SPatial REconstruction by Stochastic Self-Organizing Map), provides a powerful in silico spatio-temporal tissue reconstruction capability, as confirmed by using human embryonic heart and mouse embryo, brain, embryonic heart, and liver lobule with generally high reproducibility (average max. accuracy = 92.0%), while revealing topologically informative genes, or spatial discriminator genes. Furthermore, eSPRESSO was used for temporal analysis of human pancreatic organoids to infer rational developmental trajectories with several candidate 'temporal' discriminator genes responsible for various cell type differentiations. CONCLUSIONS: eSPRESSO provides a novel strategy for analyzing mechanisms underlying the spatio-temporal formation of cellular organizations.


Subject(s)
Gene Expression Profiling , Transcriptome , Humans , Animals , Mice , Reproducibility of Results , Brain , Cluster Analysis , Spatio-Temporal Analysis
12.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37369638

ABSTRACT

Antimicrobial peptides (AMPs) are short peptides that play crucial roles in diverse biological processes and have various functional activities against target organisms. Due to the abuse of chemical antibiotics and microbial pathogens' increasing resistance to antibiotics, AMPs have the potential to be alternatives to antibiotics. As such, the identification of AMPs has become a widely discussed topic. A variety of computational approaches have been developed to identify AMPs based on machine learning algorithms. However, most of them are not capable of predicting the functional activities of AMPs, and those predictors that can specify activities only focus on a few of them. In this study, we first surveyed 10 predictors that can identify AMPs and their functional activities in terms of the features they employed and the algorithms they utilized. Then, we constructed comprehensive AMP datasets and proposed a new deep learning-based framework, iAMPCN (identification of AMPs based on CNNs), to identify AMPs and their related 22 functional activities. Our experiments demonstrate that iAMPCN significantly improved the prediction performance of AMPs and their corresponding functional activities based on four types of sequence features. Benchmarking experiments on the independent test datasets showed that iAMPCN outperformed a number of state-of-the-art approaches for predicting AMPs and their functional activities. Furthermore, we analyzed the amino acid preferences of different AMP activities and evaluated the model on datasets of varying sequence redundancy thresholds. To facilitate the community-wide identification of AMPs and their corresponding functional types, we have made the source codes of iAMPCN publicly available at https://github.com/joy50706/iAMPCN/tree/master. We anticipate that iAMPCN can be explored as a valuable tool for identifying potential AMPs with specific functional activities for further experimental validation.


Subject(s)
Antimicrobial Cationic Peptides , Deep Learning , Antimicrobial Cationic Peptides/pharmacology , Antimicrobial Peptides , Anti-Bacterial Agents , Algorithms
13.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2862-2873, 2023.
Article in English | MEDLINE | ID: mdl-37079419

ABSTRACT

Analyzing multiple networks is important to understand relevant features among different networks. Although many studies have been conducted for that purpose, not much attention has been paid to the analysis of attractors (i.e., steady states) in multiple networks. Therefore, we study common attractors and similar attractors in multiple networks to uncover hidden similarities and differences among networks using Boolean networks (BNs), where BNs have been used as a mathematical model of genetic networks and neural networks. We define three problems on detecting common attractors and similar attractors, and theoretically analyze the expected number of such objects for random BNs, where we assume that given networks have the same set of nodes (i.e., genes). We also present four methods for solving these problems. Computational experiments on randomly generated BNs are performed to demonstrate the efficiency of our proposed methods. In addition, experiments on a practical biological system, a BN model of the TGF- ß signaling pathway, are performed. The result suggests that common attractors and similar attractors are useful for exploring tumor heterogeneity and homogeneity in eight cancers.


Subject(s)
Models, Genetic , Neoplasms , Humans , Algorithms , Gene Regulatory Networks/genetics , Neoplasms/genetics , Neural Networks, Computer
14.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36880172

ABSTRACT

Lysine 2-hydroxyisobutylation (Khib), which was first reported in 2014, has been shown to play vital roles in a myriad of biological processes including gene transcription, regulation of chromatin functions, purine metabolism, pentose phosphate pathway and glycolysis/gluconeogenesis. Identification of Khib sites in protein substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein 2-hydroxyisobutylation. Experimental identification of Khib sites mainly depends on the combination of liquid chromatography and mass spectrometry. However, experimental approaches for identifying Khib sites are often time-consuming and expensive compared with computational approaches. Previous studies have shown that Khib sites may have distinct characteristics for different cell types of the same species. Several tools have been developed to identify Khib sites, which exhibit high diversity in their algorithms, encoding schemes and feature selection techniques. However, to date, there are no tools designed for predicting cell type-specific Khib sites. Therefore, it is highly desirable to develop an effective predictor for cell type-specific Khib site prediction. Inspired by the residual connection of ResNet, we develop a deep learning-based approach, termed ResNetKhib, which leverages both the one-dimensional convolution and transfer learning to enable and improve the prediction of cell type-specific 2-hydroxyisobutylation sites. ResNetKhib is capable of predicting Khib sites for four human cell types, mouse liver cell and three rice cell types. Its performance is benchmarked against the commonly used random forest (RF) predictor on both 10-fold cross-validation and independent tests. The results show that ResNetKhib achieves the area under the receiver operating characteristic curve values ranging from 0.807 to 0.901, depending on the cell type and species, which performs better than RF-based predictors and other currently available Khib site prediction tools. We also implement an online web server of the proposed ResNetKhib algorithm together with all the curated datasets and trained model for the wider research community to use, which is publicly accessible at https://resnetkhib.erc.monash.edu/.


Subject(s)
Lysine , Protein Processing, Post-Translational , Animals , Mice , Humans , Lysine/metabolism , Proteins/metabolism , Algorithms , Machine Learning
15.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36794913

ABSTRACT

MOTIVATION: The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. RESULTS: Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. AVAILABILITY AND IMPLEMENTATION: PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Molecular Sequence Annotation , Gene Ontology , Computational Biology/methods , Algorithms , Proteins/metabolism
16.
Methods Mol Biol ; 2586: 79-88, 2023.
Article in English | MEDLINE | ID: mdl-36705899

ABSTRACT

RNA secondary structure comparison is one of the important analyses for elucidating individual functions of RNAs since it is widely accepted that their functions and structures are strongly correlated. However, although the RNA secondary structures with pseudoknot play important roles in vivo, it is difficult to deal with such structures in silico due to their structural complexity, which is a major obstacle to the analysis of RNA functions.Here, we introduce an algorithm and a metric for comparing pseudoknotted RNA secondary structures based on topological centroid identification and tree edit distance and describe the usage protocol of a software enabling us to run the comparison. This software is publicly available and works on both Microsoft Windows and Apple macOS.


Subject(s)
Algorithms , RNA , RNA/genetics , RNA/chemistry , Nucleic Acid Conformation , Software , Sequence Analysis, RNA/methods
17.
IEEE Trans Neural Netw Learn Syst ; 34(2): 921-931, 2023 Feb.
Article in English | MEDLINE | ID: mdl-34428155

ABSTRACT

An autoencoder is a layered neural network whose structure can be viewed as consisting of an encoder, which compresses an input vector to a lower dimensional vector, and a decoder, which transforms the low-dimensional vector back to the original input vector (or one that is very similar). In this article, we explore the compressive power of autoencoders that are Boolean threshold networks by studying the numbers of nodes and layers that are required to ensure that each vector in a given set of distinct input binary vectors is transformed back to its original. We show that for any set of n distinct vectors there exists a seven-layer autoencoder with the optimal compression ratio, (i.e., the size of the middle layer is logarithmic in n ), but that there is a set of n vectors for which there is no three-layer autoencoder with a middle layer of logarithmic size. In addition, we present a kind of tradeoff: if the compression ratio is allowed to be considerably larger than the optimal, then there is a five-layer autoencoder. We also study the numbers of nodes and layers required only for encoding, and the results suggest that the decoding part is the bottleneck of autoencoding. For example, there always is a three-layer Boolean threshold encoder that compresses n vectors into a dimension that is twice the logarithm of n .

18.
Article in English | MEDLINE | ID: mdl-35320104

ABSTRACT

Identifying regulatory modules between miRNAs and genes is crucial in cancer research. It promotes a comprehensive understanding of the molecular mechanisms of cancer. The genomic data collected from subjects usually relate to different cancer statuses, such as different TNM Classifications of Malignant Tumors (TNM) or histological subtypes. Simple integrated analyses generally identify the core of the tumorigenesis (common modules) but miss the subtype-specific regulatory mechanisms (specific modules). In contrast, separate analyses can only report the differences and ignore important common modules. Therefore, there is an urgent need to develop a novel method to jointly analyze miRNA and gene data of different cancer statuses to identify common and specific modules. To that end, we developed a High-Order Graph Matching model to identify Common and Specific modules (HOGMCS) between miRNA and gene data of different cancer statuses. We first demonstrate the superiority of HOGMCS through a comparison with four state-of-the-art techniques using a set of simulated data. Then, we apply HOGMCS on stomach adenocarcinoma data with four TNM stages and two histological types, and breast invasive carcinoma data with four PAM50 subtypes. The experimental results demonstrate that HOGMCS can accurately extract common and subtype-specific miRNA-gene regulatory modules, where many identified miRNA-gene interactions have been confirmed in several public databases.

19.
BMC Bioinformatics ; 23(1): 451, 2022 Oct 31.
Article in English | MEDLINE | ID: mdl-36316653

ABSTRACT

BACKGROUND: Hot spots play an important role in protein binding analysis. The residue interaction network is a key point in hot spot prediction, and several graph theory-based methods have been proposed to detect hot spots. Although the existing methods can yield some interesting residues by network analysis, low recall has limited their abilities in finding more potential hot spots. RESULT: In this study, we develop three graph theory-based methods to predict hot spots from only a single residue interaction network. We detect the important residues by finding subgraphs with high densities, i.e., high average degrees. Generally, a high degree implies a high binding possibility between protein chains, and thus a subgraph with high density usually relates to binding sites that have a high rate of hot spots. By evaluating the results on 67 complexes from the SKEMPI database, our methods clearly outperform existing graph theory-based methods on recall and F-score. In particular, our main method, Min-SDS, has an average recall of over 0.665 and an f2-score of over 0.364, while the recall and f2-score of the existing methods are less than 0.400 and 0.224, respectively. CONCLUSION: The Min-SDS method performs best among all tested methods on the hot spot prediction problem, and all three of our methods provide useful approaches for analyzing bionetworks. In addition, the densest subgraph-based methods predict hot spots with only one residue interaction network, which is constructed from spatial atomic coordinate data to mitigate the shortage of data from wet-lab experiments.


Subject(s)
Protein Interaction Mapping , Proteins , Databases, Protein , Proteins/chemistry , Binding Sites , Protein Binding , Protein Interaction Mapping/methods
20.
Bioinformatics ; 38(23): 5160-5167, 2022 11 30.
Article in English | MEDLINE | ID: mdl-36205602

ABSTRACT

MOTIVATION: N4-methylcytosine (4mC) is an essential kind of epigenetic modification that regulates a wide range of biological processes. However, experimental methods for detecting 4mC sites are time-consuming and labor-intensive. As an alternative, computational methods that are capable of automatically identifying 4mC with data analysis techniques become a reasonable option. A major challenge is how to develop effective methods to fully exploit the complex interactions within the DNA sequences to improve the predictive capability. RESULTS: In this work, we propose MSNet-4mC, a lightweight neural network building upon convolutional operations with multi-scale receptive fields to perceive cross-element relationships over both short and long ranges of given DNA sequences. With strong imbalances in the number of candidates in different species in mind, we compute and apply class weights in the cross-entropy loss to balance the training process. Extensive benchmarking experiments show that our method achieves a significant performance improvement and outperforms other state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: The source code and models are freely available for download at https://github.com/LIU-CT/MSNet-4mC, implemented in Python and supported on Linux and Windows. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA , Software , DNA/genetics , Neural Networks, Computer , Machine Learning , Epigenesis, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...