Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34864888

RESUMO

Post-translational modification (PTM) refers to the covalent and enzymatic modification of proteins after protein biosynthesis, which orchestrates a variety of biological processes. Detecting PTM sites in proteome scale is one of the key steps to in-depth understanding their regulation mechanisms. In this study, we presented an integrated method based on eXtreme Gradient Boosting (XGBoost), called iRice-MS, to identify 2-hydroxyisobutyrylation, crotonylation, malonylation, ubiquitination, succinylation and acetylation in rice. For each PTM-specific model, we adopted eight feature encoding schemes, including sequence-based features, physicochemical property-based features and spatial mapping information-based features. The optimal feature set was identified from each encoding, and their respective models were established. Extensive experimental results show that iRice-MS always display excellent performance on 5-fold cross-validation and independent dataset test. In addition, our novel approach provides the superiority to other existing tools in terms of AUC value. Based on the proposed model, a web server named iRice-MS was established and is freely accessible at http://lin-group.cn/server/iRice-MS.


Assuntos
Oryza , Processamento de Proteína Pós-Traducional , Acetilação , Biologia Computacional , Modelos Biológicos , Oryza/metabolismo , Processamento de Proteína Pós-Traducional/fisiologia , Proteoma/metabolismo , Ubiquitinação
2.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34184738

RESUMO

The rapid spread of SARS-CoV-2 infection around the globe has caused a massive health and socioeconomic crisis. Identification of phosphorylation sites is an important step for understanding the molecular mechanisms of SARS-CoV-2 infection and the changes within the host cells pathways. In this study, we present DeepIPs, a first specific deep-learning architecture to identify phosphorylation sites in host cells infected with SARS-CoV-2. DeepIPs consists of the most popular word embedding method and convolutional neural network-long short-term memory network architecture to make the final prediction. The independent test demonstrates that DeepIPs improves the prediction performance compared with other existing tools for general phosphorylation sites prediction. Based on the proposed model, a web-server called DeepIPs was established and is freely accessible at http://lin-group.cn/server/DeepIPs. The source code of DeepIPs is freely available at the repository https://github.com/linDing-group/DeepIPs.


Assuntos
Tratamento Farmacológico da COVID-19 , Fosforilação/genética , SARS-CoV-2/química , Software , COVID-19/genética , COVID-19/virologia , Biologia Computacional , Aprendizado Profundo , Humanos , Redes Neurais de Computação , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade
3.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33751027

RESUMO

DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.


Assuntos
Arabidopsis/genética , DNA/genética , Aprendizado Profundo , Desoxirribonuclease I/genética , Oryza/genética , Software , Arabidopsis/metabolismo , Cromatina/metabolismo , Cromatina/ultraestrutura , DNA/química , DNA/metabolismo , Conjuntos de Dados como Assunto , Desoxirribonuclease I/metabolismo , Elementos Facilitadores Genéticos , Loci Gênicos , Humanos , Internet , Oryza/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica
4.
Brief Bioinform ; 22(2): 1940-1950, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32065211

RESUMO

The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the predictor, seven eukaryotic (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Pichia pastoris, Schizosaccharomyces pombe and Kluyveromyces lactis) ORI data was collected from public database to construct benchmark datasets. Subsequently, three feature extraction strategies which are k-mer, binary encoding and combination of k-mer and binary were used to formulate DNA sequence samples. We also compared the different classification algorithms' performance. As a result, the best results were obtained by using support vector machine in 5-fold cross-validation test and independent dataset test. Based on the optimal model, an online web server called iORI-Euk (http://lin-group.cn/server/iORI-Euk/) was established for the novel ORI identification.


Assuntos
Origem de Replicação , Algoritmos , Animais , Linhagem Celular , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Eucariotos/genética , Humanos , Máquina de Vetores de Suporte
5.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33634313

RESUMO

Three-dimensional (3D) architecture of the chromosomes is of crucial importance for transcription regulation and DNA replication. Various high-throughput chromosome conformation capture-based methods have revealed that CTCF-mediated chromatin loops are a major component of 3D architecture. However, CTCF-mediated chromatin loops are cell type specific, and most chromatin interaction capture techniques are time-consuming and labor-intensive, which restricts their usage on a very large number of cell types. Genomic sequence-based computational models are sophisticated enough to capture important features of chromatin architecture and help to identify chromatin loops. In this work, we develop Deep-loop, a convolutional neural network model, to integrate k-tuple nucleotide frequency component, nucleotide pair spectrum encoding, position conservation, position scoring function and natural vector features for the prediction of chromatin loops. By a series of examination based on cross-validation, Deep-loop shows excellent performance in the identification of the chromatin loops from different cell types. The source code of Deep-loop is freely available at the repository https://github.com/linDing-group/Deep-loop.


Assuntos
Fator de Ligação a CCCTC/genética , Cromatina/metabolismo , Genoma Humano , Redes Neurais de Computação , Fator de Ligação a CCCTC/metabolismo , Cromatina/ultraestrutura , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Humanos , Células K562 , Células MCF-7 , Conformação Molecular , Motivos de Nucleotídeos , Software
6.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33279983

RESUMO

The protein Yin Yang 1 (YY1) could form dimers that facilitate the interaction between active enhancers and promoter-proximal elements. YY1-mediated enhancer-promoter interaction is the general feature of mammalian gene control. Recently, some computational methods have been developed to characterize the interactions between DNA elements by elucidating important features of chromatin folding; however, no computational methods have been developed for identifying the YY1-mediated chromatin loops. In this study, we developed a deep learning algorithm named DeepYY1 based on word2vec to determine whether a pair of YY1 motifs would form a loop. The proposed models showed a high prediction performance (AUCs$\ge$0.93) on both training datasets and testing datasets in different cell types, demonstrating that DeepYY1 has an excellent performance in the identification of the YY1-mediated chromatin loops. Our study also suggested that sequences play an important role in the formation of YY1-mediated chromatin loops. Furthermore, we briefly discussed the distribution of the replication origin site in the loops. Finally, a user-friendly web server was established, and it can be freely accessed at http://lin-group.cn/server/DeepYY1.


Assuntos
Cromatina/metabolismo , Bases de Dados Factuais , Aprendizado Profundo , Modelos Biológicos , Fator de Transcrição YY1/metabolismo , Células HCT116 , Humanos , Células K562
7.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33099604

RESUMO

As a newly discovered protein posttranslational modification, histone lysine crotonylation (Kcr) involved in cellular regulation and human diseases. Various proteomics technologies have been developed to detect Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and labor-intensive, which is difficult to widely popularize in large-scale species. Computational approaches are cost-effective and can be used in a high-throughput manner to generate relatively precise identification. In this study, we develop a deep learning-based method termed as Deep-Kcr for Kcr sites prediction by combining sequence-based features, physicochemical property-based features and numerical space-derived information with information gain feature selection. We investigate the performances of convolutional neural network (CNN) and five commonly used classifiers (long short-term memory network, random forest, LogitBoost, naive Bayes and logistic regression) using 10-fold cross-validation and independent set test. Results show that CNN could always display the best performance with high computational efficiency on large dataset. We also compare the Deep-Kcr with other existing tools to demonstrate the excellent predictive power and robustness of our method. Based on the proposed model, a webserver called Deep-Kcr was established and is freely accessible at http://lin-group.cn/server/Deep-Kcr.


Assuntos
Crotonatos/metabolismo , Bases de Dados de Proteínas , Aprendizado Profundo , Processamento de Proteína Pós-Traducional , Análise de Sequência de Proteína , Acilação , Humanos , Lisina/metabolismo
8.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34410360

RESUMO

The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.


Assuntos
Tratamento Farmacológico da COVID-19 , Vacinas contra COVID-19/genética , Descoberta de Drogas , SARS-CoV-2/genética , Inteligência Artificial , COVID-19/genética , COVID-19/virologia , Vacinas contra COVID-19/química , Desenho de Fármacos , Humanos , Aprendizado de Máquina , Pandemias , SARS-CoV-2/química , SARS-CoV-2/patogenicidade
9.
Methods ; 203: 558-563, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34352373

RESUMO

N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.


Assuntos
DNA , Escherichia coli , DNA/genética , Escherichia coli/genética , Genoma , Genômica , Software
10.
Brief Bioinform ; 21(5): 1568-1580, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31633777

RESUMO

Meiotic recombination is one of the most important driving forces of biological evolution, which is initiated by double-strand DNA breaks. Recombination has important roles in genome diversity and evolution. This review firstly provides a comprehensive survey of the 15 computational methods developed for identifying recombination hotspots in Saccharomyces cerevisiae. These computational methods were discussed and compared in terms of underlying algorithms, extracted features, predictive capability and practical utility. Subsequently, a more objective benchmark data set was constructed to develop a new predictor iRSpot-Pse6NC2.0 (http://lin-group.cn/server/iRSpot-Pse6NC2.0). To further demonstrate the generalization ability of these methods, we compared iRSpot-Pse6NC2.0 with existing methods on the chromosome XVI of S. cerevisiae. The results of the independent data set test demonstrated that the new predictor is superior to existing tools in the identification of recombination hotspots. The iRSpot-Pse6NC2.0 will become an important tool for identifying recombination hotspot.


Assuntos
Biologia Computacional/métodos , Recombinação Genética , Saccharomyces cerevisiae/genética , Genes Fúngicos
11.
Int J Mol Sci ; 23(3)2022 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-35163174

RESUMO

4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model.


Assuntos
Biologia Computacional/métodos , Epigênese Genética/genética , Geobacter/genética , Algoritmos , Citosina/metabolismo , DNA/genética , Metilação de DNA/genética , Aprendizado Profundo , Aprendizado de Máquina , Mutação/genética , Redes Neurais de Computação , Software
12.
Biotechnol Bioeng ; 118(11): 4204-4216, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34370308

RESUMO

DNA modification plays a pivotal role in regulating gene expression in cell development. As prevalent markers on DNA, 5-methylcytosine (5mC), N6-methyladenine (6mA), and N4-methylcytosine (4mC) can be recognized by specific methyltransferases, facilitating cellular defense and the versatile regulation of gene expression in eukaryotes and prokaryotes. Recent advances in DNA sequencing technology have permitted the positions of different modifications to be resolved at the genome-wide scale, which has led to the discovery of several novel insights into the complexity and functions of multiple methylations. In this review, we summarize differences in the various mapping approaches and discuss their pros and cons with respect to their relative read depths, speeds, and costs. We also discuss the development of future sequencing technologies and strategies for improving the detection resolution of current sequencing technologies. Lastly, we speculate on the potentially instrumental role that these sequencing technologies might play in future research.


Assuntos
5-Metilcitosina/metabolismo , Adenosina/análogos & derivados , Metilação de DNA , DNA , Epigênese Genética , Análise de Sequência de DNA , Adenosina/genética , Adenosina/metabolismo , Animais , DNA/genética , DNA/metabolismo , Humanos
13.
Bioinformatics ; 35(12): 2075-2083, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30428009

RESUMO

MOTIVATION: DNA replication is a key step to maintain the continuity of genetic information between parental generation and offspring. The initiation site of DNA replication, also called origin of replication (ORI), plays an extremely important role in the basic biochemical process. Thus, rapidly and effectively identifying the location of ORI in genome will provide key clues for genome analysis. Although biochemical experiments could provide detailed information for ORI, it requires high experimental cost and long experimental period. As good complements to experimental techniques, computational methods could overcome these disadvantages. RESULTS: Thus, in this study, we developed a predictor called iORI-PseKNC2.0 to identify ORIs in the Saccharomyces cerevisiae genome based on sequence information. The PseKNC including 90 physicochemical properties was proposed to formulate ORI and non-ORI samples. In order to improve the accuracy, a two-step feature selection was proposed to exclude redundant and noise information. As a result, the overall success rate of 88.53% was achieved in the 5-fold cross-validation test by using support vector machine. AVAILABILITY AND IMPLEMENTATION: Based on the proposed model, a user-friendly webserver was established and can be freely accessed at http://lin-group.cn/server/iORI-PseKNC2.0. The webserver will provide more convenience to most of wet-experimental scholars.


Assuntos
Origem de Replicação , Saccharomyces cerevisiae , Máquina de Vetores de Suporte , Sítio de Iniciação de Transcrição
14.
Molecules ; 23(8)2018 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-30103458

RESUMO

Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.


Assuntos
Bacteriófagos , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Proteínas Virais/química , Vírion , Algoritmos , Análise de Variância , Bases de Dados de Proteínas , Curva ROC , Reprodutibilidade dos Testes
16.
Molecules ; 22(7)2017 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-28672838

RESUMO

Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.


Assuntos
Biologia Computacional/métodos , Conotoxinas/classificação , Benchmarking , Conotoxinas/química , Conotoxinas/genética , Aprendizado de Máquina
17.
Int J Biol Macromol ; 228: 706-714, 2023 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-36584777

RESUMO

CRISPR-Cas, as a tool for gene editing, has received extensive attention in recent years. Anti-CRISPR (Acr) proteins can inactivate the CRISPR-Cas defense system during interference phase, and can be used as a potential tool for the regulation of gene editing. In-depth study of Anti-CRISPR proteins is of great significance for the implementation of gene editing. In this study, we developed a high-accuracy prediction model based on two-step model fusion strategy, called AcrPred, which could produce an AUC of 0.952 with independent dataset validation. To further validate the proposed model, we compared with published tools and correctly identified 9 of 10 new Acr proteins, indicating the strong generalization ability of our model. Finally, for the convenience of related wet-experimental researchers, a user-friendly web-server AcrPred (Anti-CRISPR proteins Prediction) was established at http://lin-group.cn/server/AcrPred, by which users can easily identify potential Anti-CRISPR proteins.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Sistemas CRISPR-Cas/genética , Algoritmos , Aprendizado de Máquina , Proteínas Virais/genética
18.
Imeta ; 1(1): e11, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38867734

RESUMO

As a newly discovered protein posttranslational modification, lysine lactylation (Kla) plays a pivotal role in various cellular processes. High throughput mass spectrometry is the primary approach for the detection of Kla sites. However, experimental approaches for identifying Kla sites are often time-consuming and labor-intensive when compared to computational methods. Therefore, it is desirable to develop a powerful tool for identifying Kla sites. For this purpose, we presented the first computational framework termed as DeepKla for Kla sites prediction in rice by combining supervised embedding layer, convolutional neural network, bidirectional gated recurrent units, and attention mechanism layer. Comprehensive experiment results demonstrated the excellent predictive power and robustness of DeepKla. Based on the proposed model, a web-server called DeepKla was established and is freely accessible at http://lin-group.cn/server/DeepKla. The source code of DeepKla is freely available at the repository https://github.com/linDing-group/DeepKla.

19.
Research (Wash D C) ; 2022: 9780293, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36405252

RESUMO

DNA replication initiation is a complex process involving various genetic and epigenomic signatures. The correct identification of replication origins (ORIs) could provide important clues for the study of a variety of diseases caused by replication. Here, we design a computational approach named iORI-Epi to recognize ORIs by incorporating epigenome-based features, sequence-based features, and 3D genome-based features. The iORI-Epi displays excellent robustness and generalization ability on both training datasets and independent datasets of K562 cell line. Further experiments confirm that iORI-Epi is highly scalable in other cell lines (MCF7 and HCT116). We also analyze and clarify the regulatory role of epigenomic marks, DNA motifs, and chromatin interaction in DNA replication initiation of eukaryotic genomes. Finally, we discuss gene enrichment pathways from the perspective of ORIs in different replication timing states and heuristically dissect the effect of promoters on replication initiation. Our computational methodology is worth extending to ORI identification in other eukaryotic species.

20.
Front Microbiol ; 13: 790063, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35273581

RESUMO

Thermophilic proteins have important application value in biotechnology and industrial processes. The correct identification of thermophilic proteins provides important information for the application of these proteins in engineering. The identification method of thermophilic proteins based on biochemistry is laborious, time-consuming, and high cost. Therefore, there is an urgent need for a fast and accurate method to identify thermophilic proteins. Considering this urgency, we constructed a reliable benchmark dataset containing 1,368 thermophilic and 1,443 non-thermophilic proteins. A multi-layer perceptron (MLP) model based on a multi-feature fusion strategy was proposed to discriminate thermophilic proteins from non-thermophilic proteins. On independent data set, the proposed model could achieve an accuracy of 96.26%, which demonstrates that the model has a good application prospect. In order to use the model conveniently, a user-friendly software package called iThermo was established and can be freely accessed at http://lin-group.cn/server/iThermo/index.html. The high accuracy of the model and the practicability of the developed software package indicate that this study can accelerate the discovery and engineering application of thermally stable proteins.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA