Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 502
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39110476

RESUMEN

Bacteriophages are the viruses that infect bacterial cells. They are the most diverse biological entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of the phage-host interactions. Although there are several methods designed for phage lifestyle classification, they merely either consider sequence features or gene features, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural networks, a global self-attention neural network, and being combined by protein features of the Position Specific Scoring Matrix matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods. The accuracy of DeePhafier on five-fold cross-validation is as high as 87.54% for sequences with length >2000bp.


Asunto(s)
Bacteriófagos , Redes Neurales de la Computación , Bacteriófagos/genética , Biología Computacional/métodos , Proteínas Virales/genética , Proteínas Virales/metabolismo , Algoritmos
2.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38920341

RESUMEN

Drug-target interactions (DTIs) are a key part of drug development process and their accurate and efficient prediction can significantly boost development efficiency and reduce development time. Recent years have witnessed the rapid advancement of deep learning, resulting in an abundance of deep learning-based models for DTI prediction. However, most of these models used a single representation of drugs and proteins, making it difficult to comprehensively represent their characteristics. Multimodal data fusion can effectively compensate for the limitations of single-modal data. However, existing multimodal models for DTI prediction do not take into account both intra- and inter-modal interactions simultaneously, resulting in limited presentation capabilities of fused features and a reduction in DTI prediction accuracy. A hierarchical multimodal self-attention-based graph neural network for DTI prediction, called HMSA-DTI, is proposed to address multimodal feature fusion. Our proposed HMSA-DTI takes drug SMILES, drug molecular graphs, protein sequences and protein 2-mer sequences as inputs, and utilizes a hierarchical multimodal self-attention mechanism to achieve deep fusion of multimodal features of drugs and proteins, enabling the capture of intra- and inter-modal interactions between drugs and proteins. It is demonstrated that our proposed HMSA-DTI has significant advantages over other baseline methods on multiple evaluation metrics across five benchmark datasets.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Proteínas/química , Proteínas/metabolismo , Humanos , Algoritmos , Biología Computacional/métodos
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38678587

RESUMEN

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.


Asunto(s)
Biomarcadores de Tumor , Aprendizaje Profundo , Recurrencia Local de Neoplasia , Humanos , Biomarcadores de Tumor/metabolismo , Biomarcadores de Tumor/genética , Recurrencia Local de Neoplasia/metabolismo , Recurrencia Local de Neoplasia/genética , Biología Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patología , Genómica/métodos , Multiómica
4.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38762789

RESUMEN

Identifying drug-target interactions (DTIs) holds significant importance in drug discovery and development, playing a crucial role in various areas such as virtual screening, drug repurposing and identification of potential drug side effects. However, existing methods commonly exploit only a single type of feature from drugs and targets, suffering from miscellaneous challenges such as high sparsity and cold-start problems. We propose a novel framework called MSI-DTI (Multi-Source Information-based Drug-Target Interaction Prediction) to enhance prediction performance, which obtains feature representations from different views by integrating biometric features and knowledge graph representations from multi-source information. Our approach involves constructing a Drug-Target Knowledge Graph (DTKG), obtaining multiple feature representations from diverse information sources for SMILES sequences and amino acid sequences, incorporating network features from DTKG and performing an effective multi-source information fusion. Subsequently, we employ a multi-head self-attention mechanism coupled with residual connections to capture higher-order interaction information between sparse features while preserving lower-order information. Experimental results on DTKG and two benchmark datasets demonstrate that our MSI-DTI outperforms several state-of-the-art DTIs prediction methods, yielding more accurate and robust predictions. The source codes and datasets are publicly accessible at https://github.com/KEAML-JLU/MSI-DTI.


Asunto(s)
Descubrimiento de Drogas , Biología Computacional/métodos , Algoritmos , Humanos
5.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37649370

RESUMEN

Protein function prediction based on amino acid sequence alone is an extremely challenging but important task, especially in metagenomics/metatranscriptomics field, in which novel proteins have been uncovered exponentially from new microorganisms. Many of them are extremely low homology to known proteins and cannot be annotated with homology-based or information integrative methods. To overcome this problem, we proposed a Homology Independent protein Function annotation method (HiFun) based on a unified deep-learning model by reassembling the sequence as protein language. The robustness of HiFun was evaluated using the benchmark datasets and metrics in the CAFA3 challenge. To navigate the utility of HiFun, we annotated 2 212 663 unknown proteins and discovered novel motifs in the UHGP-50 catalog. We proved that HiFun can extract latent function related structure features which empowers it ability to achieve function annotation for non-homology proteins. HiFun can substantially improve newly proteins annotation and expand our understanding of microorganisms' adaptation in various ecological niches. Moreover, we provided a free and accessible webservice at http://www.unimd.org/HiFun, requiring only protein sequences as input, offering researchers an efficient and practical platform for predicting protein functions.


Asunto(s)
Benchmarking , Lenguaje , Secuencia de Aminoácidos , Metagenómica , Anotación de Secuencia Molecular
6.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36748992

RESUMEN

Interactions between DNA and transcription factors (TFs) play an essential role in understanding transcriptional regulation mechanisms and gene expression. Due to the large accumulation of training data and low expense, deep learning methods have shown huge potential in determining the specificity of TFs-DNA interactions. Convolutional network-based and self-attention network-based methods have been proposed for transcription factor binding sites (TFBSs) prediction. Convolutional operations are efficient to extract local features but easy to ignore global information, while self-attention mechanisms are expert in capturing long-distance dependencies but difficult to pay attention to local feature details. To discover comprehensive features for a given sequence as far as possible, we propose a Dual-branch model combining Self-Attention and Convolution, dubbed as DSAC, which fuses local features and global representations in an interactive way. In terms of features, convolution and self-attention contribute to feature extraction collaboratively, enhancing the representation learning. In terms of structure, a lightweight but efficient architecture of network is designed for the prediction, in particular, the dual-branch structure makes the convolution and the self-attention mechanism can be fully utilized to improve the predictive ability of our model. The experiment results on 165 ChIP-seq datasets show that DSAC obviously outperforms other five deep learning based methods and demonstrate that our model can effectively predict TFBSs based on sequence feature alone. The source code of DSAC is available at https://github.com/YuBinLab-QUST/DSAC/.


Asunto(s)
ADN , Redes Neurales de la Computación , Unión Proteica , Sitios de Unión , Factores de Transcripción/genética
7.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36445194

RESUMEN

piRNA and PIWI proteins have been confirmed for disease diagnosis and treatment as novel biomarkers due to its abnormal expression in various cancers. However, the current research is not strong enough to further clarify the functions of piRNA in cancer and its underlying mechanism. Therefore, how to provide large-scale and serious piRNA candidates for biological research has grown up to be a pressing issue. In this study, a novel computational model based on the structural perturbation method is proposed to predict potential disease-associated piRNAs, called SPRDA. Notably, SPRDA belongs to positive-unlabeled learning, which is unaffected by negative examples in contrast to previous approaches. In the 5-fold cross-validation, SPRDA shows high performance on the benchmark dataset piRDisease, with an AUC of 0.9529. Furthermore, the predictive performance of SPRDA for 10 diseases shows the robustness of the proposed method. Overall, the proposed approach can provide unique insights into the pathogenesis of the disease and will advance the field of oncology diagnosis and treatment.


Asunto(s)
Neoplasias , ARN de Interacción con Piwi , Humanos , ARN Interferente Pequeño/genética , ARN Interferente Pequeño/metabolismo , Neoplasias/genética , Neoplasias/metabolismo
8.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36567619

RESUMEN

With the development of genome sequencing technology, using computing technology to predict grain protein function has become one of the important tasks of bioinformatics. The protein data of four grains, soybean, maize, indica and japonica are selected in this experimental dataset. In this paper, a novel neural network algorithm Chemical-SA-BiLSTM is proposed for grain protein function prediction. The Chemical-SA-BiLSTM algorithm fuses the chemical properties of proteins on the basis of amino acid sequences, and combines the self-attention mechanism with the bidirectional Long Short-Term Memory network. The experimental results show that the Chemical-SA-BiLSTM algorithm is superior to other classical neural network algorithms, and can more accurately predict the protein function, which proves the effectiveness of the Chemical-SA-BiLSTM algorithm in the prediction of grain protein function. The source code of our method is available at https://github.com/HwaTong/Chemical-SA-BiLSTM.


Asunto(s)
Proteínas de Granos , Redes Neurales de la Computación , Algoritmos , Proteínas/química , Programas Informáticos
9.
Methods ; 227: 17-26, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38705502

RESUMEN

Messenger RNA (mRNA) is vital for post-transcriptional gene regulation, acting as the direct template for protein synthesis. However, the methods available for predicting mRNA subcellular localization need to be improved and enhanced. Notably, few existing algorithms can annotate mRNA sequences with multiple localizations. In this work, we propose the mRNA-CLA, an innovative multi-label subcellular localization prediction framework for mRNA, leveraging a deep learning approach with a multi-head self-attention mechanism. The framework employs a multi-scale convolutional layer to extract sequence features across different regions and uses a self-attention mechanism explicitly designed for each sequence. Paired with Position Weight Matrices (PWMs) derived from the convolutional neural network layers, our model offers interpretability in the analysis. In particular, we perform a base-level analysis of mRNA sequences from diverse subcellular localizations to determine the nucleotide specificity corresponding to each site. Our evaluations demonstrate that the mRNA-CLA model substantially outperforms existing methods and tools.


Asunto(s)
Aprendizaje Profundo , ARN Mensajero , ARN Mensajero/genética , ARN Mensajero/metabolismo , Biología Computacional/métodos , Redes Neurales de la Computación , Humanos , Algoritmos
10.
Methods ; 222: 142-151, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38242383

RESUMEN

Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.


Asunto(s)
Benchmarking , Inhibidores de la Bomba de Protones , Redes Neurales de la Computación , Programas Informáticos
11.
BMC Bioinformatics ; 25(1): 274, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39174927

RESUMEN

BACKGROUND: Growing evidence suggests that distal regulatory elements are essential for cellular function and states. The sequences within these distal elements, especially motifs for transcription factor binding, provide critical information about the underlying regulatory programs. However, cooperativities between transcription factors that recognize these motifs are nonlinear and multiplexed, rendering traditional modeling methods insufficient to capture the underlying mechanisms. Recent development of attention mechanism, which exhibit superior performance in capturing dependencies across input sequences, makes them well-suited to uncover and decipher intricate dependencies between regulatory elements. RESULT: We present Transcription factors cooperativity Inference Analysis with Neural Attention (TIANA), a deep learning framework that focuses on interpretability. In this study, we demonstrated that TIANA could discover biologically relevant insights into co-occurring pairs of transcription factor motifs. Compared with existing tools, TIANA showed superior interpretability and robust performance in identifying putative transcription factor cooperativities from co-occurring motifs. CONCLUSION: Our results suggest that TIANA can be an effective tool to decipher transcription factor cooperativities from distal sequence data. TIANA can be accessed through: https://github.com/rzzli/TIANA .


Asunto(s)
Factores de Transcripción , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Aprendizaje Profundo , Biología Computacional/métodos , Humanos , Sitios de Unión
12.
BMC Bioinformatics ; 25(1): 32, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38233745

RESUMEN

BACKGROUND: Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS: This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS: A predictor framework has been developed through binary classification to predict RNA methylation sites.


Asunto(s)
Metilación de ARN , ARN , Humanos , ARN/genética , Redes Neurales de la Computación , Metilación , Procesamiento Postranscripcional del ARN
13.
J Proteome Res ; 23(2): 834-843, 2024 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-38252705

RESUMEN

In shotgun proteomics, the proteome search engine analyzes mass spectra obtained by experiments, and then a peptide-spectra match (PSM) is reported for each spectrum. However, most of the PSMs identified are incorrect, and therefore various postprocessing software have been developed for reranking the peptide identifications. Yet these methods suffer from issues such as dependency on distribution, reliance on shallow models, and limited effectiveness. In this work, we propose AttnPep, a deep learning model for rescoring PSM scores that utilizes the Self-Attention module. This module helps the neural network focus on features relevant to the classification of PSMs and ignore irrelevant features. This allows AttnPep to analyze the output of different search engines and improve PSM discrimination accuracy. We considered a PSM to be correct if it achieves a q-value <0.01 and compared AttnPep with existing mainstream software PeptideProphet, Percolator, and proteoTorch. The results indicated that AttnPep found an average increase in correct PSMs of 9.29% relative to the other methods. Additionally, AttnPep was able to better distinguish between correct and incorrect PSMs and found more synthetic peptides in the complex SWATH data set.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Péptidos , Programas Informáticos , Bases de Datos de Proteínas
14.
Proteins ; 92(3): 395-410, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37915276

RESUMEN

Interaction between proteins and nucleic acids is crucial to many cellular activities. Accurately detecting nucleic acid-binding residues (NABRs) in proteins can help researchers better understand the interaction mechanism between proteins and nucleic acids. Structure-based methods can generally make more accurate predictions than sequence-based methods. However, the existing structure-based methods are sensitive to protein conformational changes, causing limited generalizability. More effective and robust approaches should be further explored. In this study, we propose iNucRes-ASSH to identify nucleic acid-binding residues with a self-attention-based structure-sequence hybrid neural network. It improves the generalizability and robustness of NABR prediction from two levels: residue representation and prediction model. Experimental results show that iNucRes-ASSH can predict the nucleic acid-binding residues even when the experimentally validated structures are unavailable and outperforms five competing methods on a recent benchmark dataset and a widely used test dataset.


Asunto(s)
Algoritmos , Ácidos Nucleicos , Proteínas/química , Redes Neurales de la Computación
15.
BMC Genomics ; 25(1): 242, 2024 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-38443802

RESUMEN

BACKGROUND: 5-Methylcytosine (5mC) plays a very important role in gene stability, transcription, and development. Therefore, accurate identification of the 5mC site is of key importance in genetic and pathological studies. However, traditional experimental methods for identifying 5mC sites are time-consuming and costly, so there is an urgent need to develop computational methods to automatically detect and identify these 5mC sites. RESULTS: Deep learning methods have shown great potential in the field of 5mC sites, so we developed a deep learning combinatorial model called i5mC-DCGA. The model innovatively uses the Convolutional Block Attention Module (CBAM) to improve the Dense Convolutional Network (DenseNet), which is improved to extract advanced local feature information. Subsequently, we combined a Bidirectional Gated Recurrent Unit (BiGRU) and a Self-Attention mechanism to extract global feature information. Our model can learn feature representations of abstract and complex from simple sequence coding, while having the ability to solve the sample imbalance problem in benchmark datasets. The experimental results show that the i5mC-DCGA model achieves 97.02%, 96.52%, 96.58% and 85.58% in sensitivity (Sn), specificity (Sp), accuracy (Acc) and matthews correlation coefficient (MCC), respectively. CONCLUSIONS: The i5mC-DCGA model outperforms other existing prediction tools in predicting 5mC sites, and it is currently the most representative promoter 5mC site prediction tool. The benchmark dataset and source code for the i5mC-DCGA model can be found in https://github.com/leirufeng/i5mC-DCGA .


Asunto(s)
5-Metilcitosina , Benchmarking , Regiones Promotoras Genéticas , Proyectos de Investigación , Programas Informáticos
16.
BMC Genomics ; 25(1): 86, 2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38254021

RESUMEN

BACKGROUND AND OBJECTIVES: Comprehensive analysis of multi-omics data is crucial for accurately formulating effective treatment plans for complex diseases. Supervised ensemble methods have gained popularity in recent years for multi-omics data analysis. However, existing research based on supervised learning algorithms often fails to fully harness the information from unlabeled nodes and overlooks the latent features within and among different omics, as well as the various associations among features. Here, we present a novel multi-omics integrative method MOSEGCN, based on the Transformer multi-head self-attention mechanism and Graph Convolutional Networks(GCN), with the aim of enhancing the accuracy of complex disease classification. MOSEGCN first employs the Transformer multi-head self-attention mechanism and Similarity Network Fusion (SNF) to separately learn the inherent correlations of latent features within and among different omics, constructing a comprehensive view of diseases. Subsequently, it feeds the learned crucial information into a self-ensembling Graph Convolutional Network (SEGCN) built upon semi-supervised learning methods for training and testing, facilitating a better analysis and utilization of information from multi-omics data to achieve precise classification of disease subtypes. RESULTS: The experimental results show that MOSEGCN outperforms several state-of-the-art multi-omics integrative analysis approaches on three types of omics data: mRNA expression data, microRNA expression data, and DNA methylation data, with accuracy rates of 83.0% for Alzheimer's disease and 86.7% for breast cancer subtyping. Furthermore, MOSEGCN exhibits strong generalizability on the GBM dataset, enabling the identification of important biomarkers for related diseases. CONCLUSION: MOSEGCN explores the significant relationship information among different omics and within each omics' latent features, effectively leveraging labeled and unlabeled information to further enhance the accuracy of complex disease classification. It also provides a promising approach for identifying reliable biomarkers, paving the way for personalized medicine.


Asunto(s)
Enfermedad de Alzheimer , Multiómica , Humanos , Metilación de ADN , Algoritmos , Biomarcadores
17.
BMC Genomics ; 25(1): 423, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38684946

RESUMEN

BACKGROUND: Single-cell clustering has played an important role in exploring the molecular mechanisms about cell differentiation and human diseases. Due to highly-stochastic transcriptomics data, accurate detection of cell types is still challenged, especially for RNA-sequencing data from human beings. In this case, deep neural networks have been increasingly employed to mine cell type specific patterns and have outperformed statistic approaches in cell clustering. RESULTS: Using cross-correlation to capture gene-gene interactions, this study proposes the scCompressSA method to integrate topological patterns from scRNA-seq data, with support of self-attention (SA) based coefficient compression (CC) block. This SA-based CC block is able to extract and employ static gene-gene interactions from scRNA-seq data. This proposed scCompressSA method has enhanced clustering accuracy in multiple benchmark scRNA-seq datasets by integrating topological and temporal features. CONCLUSION: Static gene-gene interactions have been extracted as temporal features to boost clustering performance in single-cell clustering For the scCompressSA method, dual-channel SA based CC block is able to integrate topological features and has exhibited extraordinary detection accuracy compared with previous clustering approaches that only employ temporal patterns.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Humanos , Epistasis Genética , Análisis de Secuencia de ARN/métodos , Redes Reguladoras de Genes , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Aprendizaje Profundo , Redes Neurales de la Computación
18.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35945138

RESUMEN

Protein S-sulfinylation is an important posttranslational modification that regulates a variety of cell and protein functions. This modification has been linked to signal transduction, redox homeostasis and neuronal transmission in studies. Therefore, identification of S-sulfinylation sites is crucial to understanding its structure and function, which is critical in cell biology and human diseases. In this study, we propose a multi-module deep learning framework named DLF-Sul for identification of S-sulfinylation sites in proteins. First, three types of features are extracted including binary encoding, BLOSUM62 and amino acid index. Then, sequential features are further extracted based on these three types of features using bidirectional long short-term memory network. Next, multi-head self-attention mechanism is utilized to filter the effective attribute information, and residual connection helps to reduce information loss. Furthermore, convolutional neural network is employed to extract local deep features information. Finally, fully connected layers acts as classifier that map samples to corresponding label. Performance metrics on independent test set, including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under curve, reach 91.80%, 92.36%, 92.08%, 0.8416 and 96.40%, respectively. The results show that DLF-Sul is an effective tool for predicting S-sulfinylation sites. The source code is available on the website https://github.com/ningq669/DLF-Sul.


Asunto(s)
Aprendizaje Profundo , Aminoácidos , Humanos , Redes Neurales de la Computación , Proteínas/química , Programas Informáticos
19.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35368061

RESUMEN

Ribonucleic acid (RNA) is a pivotal nucleic acid that plays a crucial role in regulating many biological activities. Recently, one study utilized a machine learning algorithm to automatically classify RNA structural events generated by a Mycobacterium smegmatis porin A nanopore trap. Although it can achieve desirable classification results, compared with deep learning (DL) methods, this classic machine learning requires domain knowledge to manually extract features, which is sophisticated, labor-intensive and time-consuming. Meanwhile, the generated original RNA structural events are not strictly equal in length, which is incompatible with the input requirements of DL models. To alleviate this issue, we propose a sequence-to-sequence (S2S) module that transforms the unequal length sequence (UELS) to the equal length sequence. Furthermore, to automatically extract features from the RNA structural events, we propose a sequence-to-sequence neural network based on DL. In addition, we add an attention mechanism to capture vital information for classification, such as dwell time and blockage amplitude. Through quantitative and qualitative analysis, the experimental results have achieved about a 2% performance increase (accuracy) compared to the previous method. The proposed method can also be applied to other nanopore platforms, such as the famous Oxford nanopore. It is worth noting that the proposed method is not only aimed at pursuing state-of-the-art performance but also provides an overall idea to process nanopore data with UELS.


Asunto(s)
Aprendizaje Profundo , Nanoporos , Peso Molecular , Extractos Vegetales , ARN/química
20.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35380603

RESUMEN

Predicting differentially expressed genes (DEGs) from epigenetics signal data is the key to understand how epigenetics controls cell functional heterogeneity by gene regulation. This knowledge can help developing 'epigenetics drugs' for complex diseases like cancers. Most of existing machine learning-based methods suffer defects in prediction accuracy, interpretability or training speed. To address these problems, in this paper, we propose a Multiple Self-Attention model for predicting DEGs on Epigenetic data (Epi-MSA). Epi-MSA first uses convolutional neural networks for neighborhood bins information embedding, and then employs multiple self-attention encoders on different input epigenetics factors data to learn which locations of genes are important for predicting DEGs. Next it trains a soft attention module to pick out which epigenetics factors are significant. The attention mechanism makes the model interpretable, and the pure matrix operation of self-attention enables the model to be parallel calculated and speeds up the training. Experiments on datasets from the Roadmap Epigenome Project and BluePrint Data Analysis Portal (BDAP) show that the performance of Epi-MSA is better than existing competitive methods, and Epi-MSA also has a smaller standard deviation, which shows that Epi-MSA is effective and stable. In addition, Epi-MSA has a good interpretability, this is confirmed by referring its attention weight matrix with existing biological knowledge.


Asunto(s)
Epigenómica , Neoplasias , Epigénesis Genética , Epigenómica/métodos , Humanos , Aprendizaje Automático , Redes Neurales de la Computación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA