Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 6.749
Filter
Add more filters

Publication year range
1.
Cell ; 175(1): 266-276.e13, 2018 09 20.
Article in English | MEDLINE | ID: mdl-30166209

ABSTRACT

A fundamental challenge of biology is to understand the vast heterogeneity of cells, particularly how cellular composition, structure, and morphology are linked to cellular physiology. Unfortunately, conventional technologies are limited in uncovering these relations. We present a machine-intelligence technology based on a radically different architecture that realizes real-time image-based intelligent cell sorting at an unprecedented rate. This technology, which we refer to as intelligent image-activated cell sorting, integrates high-throughput cell microscopy, focusing, and sorting on a hybrid software-hardware data-management infrastructure, enabling real-time automated operation for data acquisition, data processing, decision-making, and actuation. We use it to demonstrate real-time sorting of microalgal and blood cells based on intracellular protein localization and cell-cell interaction from large heterogeneous populations for studying photosynthesis and atherothrombosis, respectively. The technology is highly versatile and expected to enable machine-based scientific discovery in biological, pharmaceutical, and medical sciences.


Subject(s)
Flow Cytometry/methods , High-Throughput Screening Assays/methods , Image Processing, Computer-Assisted/methods , Animals , Deep Learning , Humans
2.
Proc Natl Acad Sci U S A ; 121(6): e2300838121, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38300863

ABSTRACT

Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.


Subject(s)
Algorithms , Neural Networks, Computer , Proteins/genetics , Machine Learning , Amino Acids
3.
Development ; 150(13)2023 Jul 01.
Article in English | MEDLINE | ID: mdl-37283069

ABSTRACT

Accurately counting and localising cellular events from movies is an important bottleneck of high-content tissue/embryo live imaging. Here, we propose a new methodology based on deep learning that allows automatic detection of cellular events and their precise xyt localisation on live fluorescent imaging movies without segmentation. We focused on the detection of cell extrusion, the expulsion of dying cells from the epithelial layer, and devised DeXtrusion: a pipeline based on recurrent neural networks for automatic detection of cell extrusion/cell death events in large movies of epithelia marked with cell contour. The pipeline, initially trained on movies of the Drosophila pupal notum marked with fluorescent E-cadherin, is easily trainable, provides fast and accurate extrusion predictions in a large range of imaging conditions, and can also detect other cellular events, such as cell division or cell differentiation. It also performs well on other epithelial tissues with reasonable re-training. Our methodology could easily be applied for other cellular events detected by live fluorescent microscopy and could help to democratise the use of deep learning for automatic event detections in developing tissues.


Subject(s)
Machine Learning , Neural Networks, Computer , Epithelial Cells , Cell Death , Microscopy
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38343322

ABSTRACT

Vaccination stands as the most effective and economical strategy for prevention and control of influenza. The primary target of neutralizing antibodies is the surface antigen hemagglutinin (HA). However, ongoing mutations in the HA sequence result in antigenic drift. The success of a vaccine is contingent on its antigenic congruence with circulating strains. Thus, predicting antigenic variants and deducing antigenic clusters of influenza viruses are pivotal for recommendation of vaccine strains. The antigenicity of influenza A viruses is determined by the interplay of amino acids in the HA1 sequence. In this study, we exploit the ability of convolutional neural networks (CNNs) to extract spatial feature representations in the convolutional layers, which can discern interactions between amino acid sites. We introduce PREDAC-CNN, a model designed to track antigenic evolution of seasonal influenza A viruses. Accessible at http://predac-cnn.cloudna.cn, PREDAC-CNN formulates a spatially oriented representation of the HA1 sequence, optimized for the convolutional framework. It effectively probes interactions among amino acid sites in the HA1 sequence. Also, PREDAC-CNN focuses exclusively on physicochemical attributes crucial for the antigenicity of influenza viruses, thereby eliminating unnecessary amino acid embeddings. Together, PREDAC-CNN is adept at capturing interactions of amino acid sites within the HA1 sequence and examining the collective impact of point mutations on antigenic variation. Through 5-fold cross-validation and retrospective testing, PREDAC-CNN has shown superior performance in predicting antigenic variants compared to its counterparts. Additionally, PREDAC-CNN has been instrumental in identifying predominant antigenic clusters for A/H3N2 (1968-2023) and A/H1N1 (1977-2023) viruses, significantly aiding in vaccine strain recommendation.


Subject(s)
Influenza A Virus, H1N1 Subtype , Influenza A virus , Vaccines , Influenza A virus/genetics , Influenza A Virus, H3N2 Subtype/genetics , Hemagglutinin Glycoproteins, Influenza Virus/genetics , Seasons , Retrospective Studies , Antigens, Viral/genetics , Neural Networks, Computer , Amino Acids
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38385876

ABSTRACT

Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.


Subject(s)
Benchmarking , Medicine , Neural Networks, Computer , Nucleotides , Regulatory Sequences, Nucleic Acid
6.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38856168

ABSTRACT

Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.


Subject(s)
Computational Biology , DNA-Binding Proteins , Deep Learning , RNA-Binding Proteins , RNA-Binding Proteins/metabolism , DNA-Binding Proteins/metabolism , Computational Biology/methods , Neural Networks, Computer , Humans
7.
Plant J ; 119(2): 735-745, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38741374

ABSTRACT

As a promising model, genome-based plant breeding has greatly promoted the improvement of agronomic traits. Traditional methods typically adopt linear regression models with clear assumptions, neither obtaining the linkage between phenotype and genotype nor providing good ideas for modification. Nonlinear models are well characterized in capturing complex nonadditive effects, filling this gap under traditional methods. Taking populus as the research object, this paper constructs a deep learning method, DCNGP, which can effectively predict the traits including 65 phenotypes. The method was trained on three datasets, and compared with other four classic models-Bayesian ridge regression (BRR), Elastic Net, support vector regression, and dualCNN. The results show that DCNGP has five typical advantages in performance: strong prediction ability on multiple experimental datasets; the incorporation of batch normalization layers and Early-Stopping technology enhancing the generalization capabilities and prediction stability on test data; learning potent features from the data and thus circumventing the tedious steps of manual production; the introduction of a Gaussian Noise layer enhancing predictive capabilities in the case of inherent uncertainties or perturbations; fewer hyperparameters aiding to reduce tuning time across datasets and improve auto-search efficiency. In this way, DCNGP shows powerful predictive ability from genotype to phenotype, which provide an important theoretical reference for building more robust populus breeding programs.


Subject(s)
Genome, Plant , Neural Networks, Computer , Phenotype , Plant Breeding , Populus , Populus/genetics , Genome, Plant/genetics , Plant Breeding/methods , Deep Learning , Genotype , Bayes Theorem
8.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36682013

ABSTRACT

While deep learning (DL)-based models have emerged as powerful approaches to predict protein-protein interactions (PPIs), the reliance on explicit similarity measures (e.g. sequence similarity and network neighborhood) to known interacting proteins makes these methods ineffective in dealing with novel proteins. The advent of AlphaFold2 presents a significant opportunity and also a challenge to predict PPIs in a straightforward way based on monomer structures while controlling bias from protein sequences. In this work, we established Structure and Graph-based Predictions of Protein Interactions (SGPPI), a structure-based DL framework for predicting PPIs, using the graph convolutional network. In particular, SGPPI focused on protein patches on the protein-protein binding interfaces and extracted the structural, geometric and evolutionary features from the residue contact map to predict PPIs. We demonstrated that our model outperforms traditional machine learning methods and state-of-the-art DL-based methods using non-representation-bias benchmark datasets. Moreover, our model trained on human dataset can be reliably transferred to predict yeast PPIs, indicating that SGPPI can capture converging structural features of protein interactions across various species. The implementation of SGPPI is available at https://github.com/emerson106/SGPPI.


Subject(s)
Machine Learning , Proteins , Humans , Proteins/chemistry , Protein Binding , Amino Acid Sequence , Saccharomyces cerevisiae/metabolism
9.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36516298

ABSTRACT

This paper describes a method Pprint2, which is an improved version of Pprint developed for predicting RNA-interacting residues in a protein. Training and independent/validation datasets used in this study comprises of 545 and 161 non-redundant RNA-binding proteins, respectively. All models were trained on training dataset and evaluated on the validation dataset. The preliminary analysis reveals that positively charged amino acids such as H, R and K, are more prominent in the RNA-interacting residues. Initially, machine learning based models have been developed using binary profile and obtain maximum area under curve (AUC) 0.68 on validation dataset. The performance of this model improved significantly from AUC 0.68 to 0.76, when evolutionary profile is used instead of binary profile. The performance of our evolutionary profile-based model improved further from AUC 0.76 to 0.82, when convolutional neural network has been used for developing model. Our final model based on convolutional neural network using evolutionary information achieved AUC 0.82 with Matthews correlation coefficient of 0.49 on the validation dataset. Our best model outperforms existing methods when evaluated on the independent/validation dataset. A user-friendly standalone software and web-based server named 'Pprint2' has been developed for predicting RNA-interacting residues (https://webs.iiitd.edu.in/raghava/pprint2 and https://github.com/raghavagps/pprint2).


Subject(s)
Amino Acids , RNA , Binding Sites , RNA/metabolism , Software , RNA-Binding Proteins/metabolism
10.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37088981

ABSTRACT

BACKGROUND: Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development. RESULTS: To shed new light on the conflicting views on short eccDNAs' randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65 ± 4.7%, attention-based models: 83.31 ± 4.18%). CONCLUSIONS: The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.


Subject(s)
DNA, Circular , DNA , Humans , Genome , Eukaryotic Cells , Biomarkers
11.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36917472

ABSTRACT

Identifying the function of DNA sequences accurately is an essential and challenging task in the genomic field. Until now, deep learning has been widely used in the functional analysis of DNA sequences, including DeepSEA, DanQ, DeepATT and TBiNet. However, these methods have the problems of high computational complexity and not fully considering the distant interactions among chromatin features, thus affecting the prediction accuracy. In this work, we propose a hybrid deep neural network model, called DeepFormer, based on convolutional neural network (CNN) and flow-attention mechanism for DNA sequence function prediction. In DeepFormer, the CNN is used to capture the local features of DNA sequences as well as important motifs. Based on the conservation law of flow network, the flow-attention mechanism can capture more distal interactions among sequence features with linear time complexity. We compare DeepFormer with the above four kinds of classical methods using the commonly used dataset of 919 chromatin features of nearly 4.9 million noncoding DNA sequences. Experimental results show that DeepFormer significantly outperforms four kinds of methods, with an average recall rate at least 7.058% higher than other methods. Furthermore, we confirmed the effectiveness of DeepFormer in capturing functional variation using Alzheimer's disease, pathogenic mutations in alpha-thalassemia and modification in CCCTC-binding factor (CTCF) activity. We further predicted the maize chromatin accessibility of five tissues and validated the generalization of DeepFormer. The average recall rate of DeepFormer exceeds the classical methods by at least 1.54%, demonstrating strong robustness.


Subject(s)
Genomics , Neural Networks, Computer , Base Sequence , Genomics/methods , Chromatin/genetics , Genome
12.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36748992

ABSTRACT

Interactions between DNA and transcription factors (TFs) play an essential role in understanding transcriptional regulation mechanisms and gene expression. Due to the large accumulation of training data and low expense, deep learning methods have shown huge potential in determining the specificity of TFs-DNA interactions. Convolutional network-based and self-attention network-based methods have been proposed for transcription factor binding sites (TFBSs) prediction. Convolutional operations are efficient to extract local features but easy to ignore global information, while self-attention mechanisms are expert in capturing long-distance dependencies but difficult to pay attention to local feature details. To discover comprehensive features for a given sequence as far as possible, we propose a Dual-branch model combining Self-Attention and Convolution, dubbed as DSAC, which fuses local features and global representations in an interactive way. In terms of features, convolution and self-attention contribute to feature extraction collaboratively, enhancing the representation learning. In terms of structure, a lightweight but efficient architecture of network is designed for the prediction, in particular, the dual-branch structure makes the convolution and the self-attention mechanism can be fully utilized to improve the predictive ability of our model. The experiment results on 165 ChIP-seq datasets show that DSAC obviously outperforms other five deep learning based methods and demonstrate that our model can effectively predict TFBSs based on sequence feature alone. The source code of DSAC is available at https://github.com/YuBinLab-QUST/DSAC/.


Subject(s)
DNA , Neural Networks, Computer , Protein Binding , Binding Sites , Transcription Factors/genetics
13.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38189540

ABSTRACT

Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.


Subject(s)
Deep Learning , Nanopore Sequencing , Nanopores , Humans , Gene Library , Genome, Bacterial
14.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38180830

ABSTRACT

2'-O-methylation (2OM) is the most common post-transcriptional modification of RNA. It plays a crucial role in RNA splicing, RNA stability and innate immunity. Despite advances in high-throughput detection, the chemical stability of 2OM makes it difficult to detect and map in messenger RNA. Therefore, bioinformatics tools have been developed using machine learning (ML) algorithms to identify 2OM sites. These tools have made significant progress, but their performances remain unsatisfactory and need further improvement. In this study, we introduced H2Opred, a novel hybrid deep learning (HDL) model for accurately identifying 2OM sites in human RNA. Notably, this is the first application of HDL in developing four nucleotide-specific models [adenine (A2OM), cytosine (C2OM), guanine (G2OM) and uracil (U2OM)] as well as a generic model (N2OM). H2Opred incorporated both stacked 1D convolutional neural network (1D-CNN) blocks and stacked attention-based bidirectional gated recurrent unit (Bi-GRU-Att) blocks. 1D-CNN blocks learned effective feature representations from 14 conventional descriptors, while Bi-GRU-Att blocks learned feature representations from five natural language processing-based embeddings extracted from RNA sequences. H2Opred integrated these feature representations to make the final prediction. Rigorous cross-validation analysis demonstrated that H2Opred consistently outperforms conventional ML-based single-feature models on five different datasets. Moreover, the generic model of H2Opred demonstrated a remarkable performance on both training and testing datasets, significantly outperforming the existing predictor and other four nucleotide-specific H2Opred models. To enhance accessibility and usability, we have deployed a user-friendly web server for H2Opred, accessible at https://balalab-skku.org/H2Opred/. This platform will serve as an invaluable tool for accurately predicting 2OM sites within human RNA, thereby facilitating broader applications in relevant research endeavors.


Subject(s)
Deep Learning , RNA , Humans , RNA/genetics , Base Sequence , Nucleotides , Methylation
15.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37080761

ABSTRACT

Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.


Subject(s)
Deep Learning , Transcriptome , Bayes Theorem , Gene Expression Profiling , Cell Communication
16.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38943434

ABSTRACT

Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.


Subject(s)
Computational Biology , Neural Networks, Computer , Proteins , Computational Biology/methods , Humans , Proteins/genetics , Proteins/metabolism , Deep Learning , Databases, Protein , Algorithms , Amino Acid Sequence
17.
Methods ; 226: 49-53, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38621436

ABSTRACT

Epigenetic proteins (EP) play a role in the progression of a wide range of diseases, including autoimmune disorders, neurological disorders, and cancer. Recognizing their different functions has prompted researchers to investigate them as potential therapeutic targets and pharmacological targets. This paper proposes a novel deep learning-based model that accurately predicts EP. This study introduces a novel deep learning-based model that accurately predicts EP. Our approach entails generating two distinct datasets for training and evaluating the model. We then use three distinct strategies to transform protein sequences to numerical representations: Dipeptide Deviation from Expected Mean (DDE), Dipeptide Composition (DPC), and Group Amino Acid (GAAC). Following that, we train and compare the performance of four advanced deep learning models algorithms: Ensemble Residual Convolutional Neural Network (ERCNN), Generative Adversarial Network (GAN), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU). The DDE encoding combined with the ERCNN model demonstrates the best performance on both datasets. This study demonstrates deep learning's potential for precisely predicting EP, which can considerably accelerate research and streamline drug discovery efforts. This analytical method has the potential to find new therapeutic targets and advance our understanding of EP activities in disease.


Subject(s)
Deep Learning , Drug Discovery , Neural Networks, Computer , Drug Discovery/methods , Humans , Epigenesis, Genetic/drug effects , Algorithms , Proteins/chemistry
18.
Methods ; 222: 41-50, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38157919

ABSTRACT

Predicting the therapeutic effect of anti-cancer drugs on tumors based on the characteristics of tumors and patients is one of the important contents of precision oncology. Existing computational methods regard the drug response prediction problem as a classification or regression task. However, few of them consider leveraging the relationship between the two tasks. In this work, we propose a Multi-task Interaction Graph Convolutional Network (MTIGCN) for anti-cancer drug response prediction. MTIGCN first utilizes an graph convolutional network-based model to produce embeddings for both cell lines and drugs. After that, the model employs multi-task learning to predict anti-cancer drug response, which involves training the model on three different tasks simultaneously: the main task of the drug sensitive or resistant classification task and the two auxiliary tasks of regression prediction and similarity network reconstruction. By sharing parameters and optimizing the losses of different tasks simultaneously, MTIGCN enhances the feature representation and reduces overfitting. The results of the experiments on two in vitro datasets demonstrated that MTIGCN outperformed seven state-of-the-art baseline methods. Moreover, the well-trained model on the in vitro dataset GDSC exhibited good performance when applied to predict drug responses in in vivo datasets PDX and TCGA. The case study confirmed the model's ability to discover unknown drug responses in cell lines.


Subject(s)
Antineoplastic Agents , Neoplasms , Humans , Neoplasms/drug therapy , Precision Medicine , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Medical Oncology , Cell Line
19.
Methods ; 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39097179

ABSTRACT

DNA N6 methyladenine (6 mA) plays an important role in many biological processes, and accurately identifying its sites helps one to understand its biological effects more comprehensively. Previous traditional experimental methods are very labor-intensive and traditional machine learning methods also seem to be somewhat insufficient as the database of 6 mA methylation groups becomes progressively larger, so we propose a deep learning-based method called multi-scale convolutional model based on global response normalization (CG6mA) to solve the prediction problem of 6 mA site. This method is tested with other methods on three different kinds of benchmark datasets, and the results show that our model can get more excellent prediction results.

20.
Methods ; 226: 127-132, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38604414

ABSTRACT

Protein lysine methylation is a particular type of post translational modification that plays an important role in both histone and non-histone function regulation in proteins. Deregulation caused by lysine methyltransferases has been identified as the cause of several diseases including cancer as well as both mental and developmental disorders. Identifying lysine methylation sites is a critical step in both early diagnosis and drug design. This study proposes a new Machine Learning method called CNN-Meth for predicting lysine methylation sites using a convolutional neural network (CNN). Our model is trained using evolutionary, structural, and physicochemical-based presentation along with binary encoding. Unlike previous studies, instead of extracting handcrafted features, we use CNN to automatically extract features from different presentations of amino acids to avoid information loss. Automated feature extraction from these representations of amino acids as well as CNN as a classifier have never been used for this problem. Our results demonstrate that CNN-Meth can significantly outperform previous methods for predicting methylation sites. It achieves 96.0%, 85.1%, 96.4%, and 0.65 in terms of Accuracy, Sensitivity, Specificity, and Matthew's Correlation Coefficient (MCC), respectively. CNN-Meth and its source code are publicly available at https://github.com/MLBC-lab/CNN-Meth.


Subject(s)
Lysine , Neural Networks, Computer , Lysine/metabolism , Lysine/chemistry , Methylation , Protein Processing, Post-Translational , Machine Learning , Humans , Histone-Lysine N-Methyltransferase/metabolism , Histone-Lysine N-Methyltransferase/genetics , Histone-Lysine N-Methyltransferase/chemistry , Computational Biology/methods
SELECTION OF CITATIONS
SEARCH DETAIL