Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33169141

RESUMO

MOTIVATION: N7-methylguanosine (m7G) is an important epigenetic modification, playing an essential role in gene expression regulation. Therefore, accurate identification of m7G modifications will facilitate revealing and in-depth understanding their potential functional mechanisms. Although high-throughput experimental methods are capable of precisely locating m7G sites, they are still cost ineffective. Therefore, it's necessary to develop new methods to identify m7G sites. RESULTS: In this work, by using the iterative feature representation algorithm, we developed a machine learning based method, namely m7G-IFL, to identify m7G sites. To demonstrate its superiority, m7G-IFL was evaluated and compared with existing predictors. The results demonstrate that our predictor outperforms existing predictors in terms of accuracy for identifying m7G sites. By analyzing and comparing the features used in the predictors, we found that the positive and negative samples in our feature space were more separated than in existing feature space. This result demonstrates that our features extracted more discriminative information via the iterative feature learning process, and thus contributed to the predictive performance improvement.


Assuntos
Metilação de DNA , Epigênese Genética , Guanosina/análogos & derivados , Máquina de Vetores de Suporte , Guanosina/genética , Guanosina/metabolismo , Células HeLa , Células Hep G2 , Humanos
2.
Methods ; 203: 28-31, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-33882361

RESUMO

The 5-methyluridine (m5U)modification plays important roles in a series of biological processes. Accurate identification of m5U sites will be helpful to decode its biological functions. Although experimental techniques have been proposed to detect m5U, they are still expensive and time consuming. In the present work, a support vector machine based method, called iRNA-m5U, was developed to identify the m5U sites in the Saccharomyces cerevisiae transcriptome. The performance of iRNA-m5U was validated based on different datasets. The accuracies obtained by iRNA-m5U is promising, indicating that it holds the potential to become an useful tool for the identification of m5U sites.


Assuntos
Saccharomyces cerevisiae , Máquina de Vetores de Suporte , Biologia Computacional/métodos , Saccharomyces cerevisiae/genética , Transcriptoma , Uridina/análogos & derivados
3.
Bioinformatics ; 35(23): 4922-4929, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31077296

RESUMO

MOTIVATION: Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. RESULTS: We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. AVAILABILITY AND IMPLEMENTATION: A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.


Assuntos
Máquina de Vetores de Suporte , Sequência de Bases , Biologia Computacional , Nucleotídeos , RNA , Reprodutibilidade dos Testes
4.
Genomics ; 111(1): 96-102, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29360500

RESUMO

N6-methyladenine (6mA) is one kind of post-replication modification (PTM or PTRM) occurring in a wide range of DNA sequences. Accurate identification of its sites will be very helpful for revealing the biological functions of 6mA, but it is time-consuming and expensive to determine them by experiments alone. Unfortunately, so far, no bioinformatics tool is available to do so. To fill in such an empty area, we have proposed a novel predictor called iDNA6mA-PseKNC that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PseKNC). It has been observed via rigorous cross-validations that the predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 93%, 100%, 96%, and 0.93, respectively. For the convenience of most experimental scientists, a user-friendly web server for iDNA6mA-PseKNC has been established at http://lin-group.cn/server/iDNA6mA-PseKNC, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.


Assuntos
Adenosina/análogos & derivados , Biologia Computacional , Nucleotídeos/química , Adenosina/análise , Adenosina/química , Algoritmos , Animais , Sequência de Bases , DNA/química , Confiabilidade dos Dados , Bases de Dados Genéticas , Genoma Bacteriano , Genoma Helmíntico , Genoma de Planta , Sensibilidade e Especificidade , Software , Validação de Programas de Computador
5.
Molecules ; 24(3)2019 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-30678171

RESUMO

As an abundant post-transcriptional modification, dihydrouridine (D) has been found in transfer RNA (tRNA) from bacteria, eukaryotes, and archaea. Nonetheless, knowledge of the exact biochemical roles of dihydrouridine in mediating tRNA function is still limited. Accurate identification of the position of D sites is essential for understanding their functions. Therefore, it is desirable to develop novel methods to identify D sites. In this study, an ensemble classifier was proposed for the detection of D modification sites in the Saccharomyces cerevisiae transcriptome by using heterogeneous features. The jackknife test results demonstrate that the proposed predictor is promising for the identification of D modification sites. It is anticipated that the proposed method can be widely used for identifying D modification sites in tRNA.


Assuntos
RNA de Transferência/química , Saccharomyces cerevisiae/química , Máquina de Vetores de Suporte , Uridina/química , Algoritmos , Fenômenos Químicos , Conformação de Ácido Nucleico , Reprodutibilidade dos Testes , Uridina/análogos & derivados
6.
Bioinformatics ; 33(22): 3518-3523, 2017 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-28961687

RESUMO

MOTIVATION: DNA N4-methylcytosine (4mC) is an epigenetic modification. The knowledge about the distribution of 4mC is helpful for understanding its biological functions. Although experimental methods have been proposed to detect 4mC sites, they are expensive for performing genome-wide detections. Thus, it is necessary to develop computational methods for predicting 4mC sites. RESULTS: In this work, we developed iDNA4mC, the first webserver to identify 4mC sites, in which DNA sequences are encoded with both nucleotide chemical properties and nucleotide frequency. The predictive results of the rigorous jackknife test and cross species test demonstrated that the performance of iDNA4mC is quite promising and holds high potential to become a useful tool for identifying 4mC sites. AVAILABILITY AND IMPLEMENTATION: The user-friendly web-server, iDNA4mC, is freely accessible at http://lin.uestc.edu.cn/server/iDNA4mC. CONTACT: chenweiimu@gmail.com or hlin@uestc.edu.cn.


Assuntos
Citosina/análogos & derivados , Citosina/química , Metilação de DNA , DNA/química , Nucleotídeos/química , Software , Citosina/análise , Epigenômica , Genoma , Análise de Sequência de DNA
7.
Molecules ; 23(8)2018 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-30103458

RESUMO

Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.


Assuntos
Bacteriófagos , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Proteínas Virais/química , Vírion , Algoritmos , Análise de Variância , Bases de Dados de Proteínas , Curva ROC , Reprodutibilidade dos Testes
8.
Genomics ; 107(6): 255-8, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27191866

RESUMO

2'-O-methylationation is an important post-transcriptional modification and plays important roles in many biological processes. Although experimental technologies have been proposed to detect 2'-O-methylationation sites, they are cost-ineffective. As complements to experimental techniques, computational methods will facilitate the identification of 2'-O-methylationation sites. In the present study, we proposed a support vector machine-based method to identify 2'-O-methylationation sites. In this method, RNA sequences were formulated by nucleotide chemical properties and nucleotide compositions. In the jackknife cross-validation test, the proposed method obtained an accuracy of 95.58% for identifying 2'-O-methylationation sites in the human genome. Moreover, the model was also validated by identifying 2'-O-methylation sites in the Mus musculus and Saccharomyces cerevisiae genomes, and the obtained accuracies are also satisfactory. These results indicate that the proposed method will become a useful tool for the research on 2'-O-methylation.


Assuntos
Sequência de Bases/genética , Genoma Humano , Nucleotídeos/genética , Processamento de Proteína Pós-Traducional/genética , Animais , Biologia Computacional , Citidina/análogos & derivados , Citidina/genética , Humanos , Metilação , Metiltransferases/genética , Camundongos , Saccharomyces cerevisiae/genética , Máquina de Vetores de Suporte
9.
Genomics ; 107(2-3): 69-75, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26724497

RESUMO

By modulating the accessibility of genomic regions to regulatory proteins, nucleosome positioning plays important roles in cellular processes. Although intensive efforts have been made, the rules for determining nucleosome positioning are far from satisfaction yet. In this study, we developed a biophysical model to predict nucleosomal sequences based on the deformation energy of DNA sequences, and validated it against the experimentally determined nucleosome positions in the Saccharomyces cerevisiae genome, achieving very high success rates. Furthermore, using the deformation energy model, we analyzed the distribution of nucleosomes around the following three types of DNA functional sites: (1) double strand break (DSB), (2) single nucleotide polymorphism (SNP), and (3) origin of replication (ORI). We have found from the analyzed energy spectra that a remarkable "trough" or "valley" occurs around each of these functional sites, implying a depletion of nucleosome density, fully in accordance with experimental observations. These findings indicate that the deformation energy may play a key role for accurately predicting nucleosome positions, and that it can also provide a quantitative physical approach for in-depth understanding the mechanism of nucleosome positioning.


Assuntos
Nucleossomos/genética , Nucleossomos/metabolismo , Saccharomyces cerevisiae/genética , Sequência de Bases , Montagem e Desmontagem da Cromatina , Quebras de DNA de Cadeia Dupla , Genoma Fúngico , Modelos Biológicos , Polimorfismo de Nucleotídeo Único , Origem de Replicação , Saccharomyces cerevisiae/metabolismo
10.
Mol Genet Genomics ; 291(6): 2225-2229, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27590733

RESUMO

N 6-Methyladenosine (m6A) plays important roles in many biological processes. The knowledge of the distribution of m6A is helpful for understanding its regulatory roles. Although the experimental methods have been proposed to detect m6A, the resolutions of these methods are still unsatisfying especially for Arabidopsis thaliana. Benefitting from the experimental data, in the current work, a support vector machine-based method was proposed to identify m6A sites in A. thaliana transcriptome. The proposed method was validated on a benchmark dataset using jackknife test and was also validated by identifying strain-specific m6A sites in A. thaliana. The obtained predictive results indicate that the proposed method is quite promising. For the convenience of experimental biologists, an online webserver for the proposed method was built, which is freely available at http://lin.uestc.edu.cn/server/M6ATH . These results indicate that the proposed method holds a potential to become an elegant tool in identifying m6A site in A. thaliana.


Assuntos
Adenosina/análogos & derivados , Arabidopsis/genética , Biologia Computacional/métodos , RNA de Plantas/química , Adenosina/metabolismo , Arabidopsis/química , Sítios de Ligação , RNA de Plantas/genética , RNA de Plantas/metabolismo , Análise de Sequência de RNA , Máquina de Vetores de Suporte , Transcriptoma
11.
Anal Biochem ; 490: 26-33, 2015 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-26314792

RESUMO

Occurring at adenine (A) with the consensus motif GAC, N(6)-methyladenosine (m(6)A) is one of the most abundant modifications in RNA, which plays very important roles in many biological processes. The nonuniform distribution of m(6)A sites across the genome implies that, for better understanding the regulatory mechanism of m(6)A, it is indispensable to characterize its sites in a genome-wide scope. Although a series of experimental technologies have been developed in this regard, they are both time-consuming and expensive. With the avalanche of RNA sequences generated in the postgenomic age, it is highly desired to develop computational methods to timely identify their m(6)A sites. In view of this, a predictor called "iRNA-Methyl" is proposed by formulating RNA sequences with the "pseudo dinucleotide composition" into which three RNA physiochemical properties were incorporated. Rigorous cross-validation tests have indicated that iRNA-Methyl holds very high potential to become a useful tool for genome analysis. For the convenience of most experimental scientists, a web-server for iRNA-Methyl has been established at http://lin.uestc.edu.cn/server/iRNA-Methyl by which users can easily get their desired results without needing to go through the mathematical details.


Assuntos
Adenosina/análogos & derivados , Metiltransferases/metabolismo , Modelos Moleculares , Processamento Pós-Transcricional do RNA , RNA Bacteriano/metabolismo , RNA Mensageiro/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Adenosina/análise , Adenosina/metabolismo , Algoritmos , Sítios de Ligação , Entropia , Epigenômica/métodos , Genoma Bacteriano , Internet , Aprendizado de Máquina , Metilação , Motivos de Nucleotídeos , RNA Bacteriano/química , RNA Mensageiro/química , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/enzimologia , Saccharomyces cerevisiae/metabolismo , Especificidade por Substrato
12.
Nucleic Acids Res ; 41(6): e68, 2013 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-23303794

RESUMO

Meiotic recombination is an important biological process. As a main driving force of evolution, recombination provides natural new combinations of genetic variations. Rather than randomly occurring across a genome, meiotic recombination takes place in some genomic regions (the so-called 'hotspots') with higher frequencies, and in the other regions (the so-called 'coldspots') with lower frequencies. Therefore, the information of the hotspots and coldspots would provide useful insights for in-depth studying of the mechanism of recombination and the genome evolution process as well. So far, the recombination regions have been mainly determined by experiments, which are both expensive and time-consuming. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the recombination regions. In this study, a predictor, called 'iRSpot-PseDNC', was developed for identifying the recombination hotspots and coldspots. In the new predictor, the samples of DNA sequences are formulated by a novel feature vector, the so-called 'pseudo dinucleotide composition' (PseDNC), into which six local DNA structural properties, i.e. three angular parameters (twist, tilt and roll) and three translational parameters (shift, slide and rise), are incorporated. It was observed by the rigorous jackknife test that the overall success rate achieved by iRSpot-PseDNC was >82% in identifying recombination spots in Saccharomyces cerevisiae, indicating the new predictor is promising or at least may become a complementary tool to the existing methods in this area. Although the benchmark data set used to train and test the current method was from S. cerevisiae, the basic approaches can also be extended to deal with all the other genomes. Particularly, it has not escaped our notice that the PseDNC approach can be also used to study many other DNA-related problems. As a user-friendly web-server, iRSpot-PseDNC is freely accessible at http://lin.uestc.edu.cn/server/iRSpot-PseDNC.


Assuntos
Recombinação Genética , Software , Algoritmos , Composição de Bases , DNA/química , Internet , Nucleotídeos/análise , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA
13.
Genomics ; 104(4): 229-33, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25172426

RESUMO

As an inheritable epigenetic modification, DNA methylation plays important roles in many biological processes. The non-uniform distribution of DNA methylation across the genome implies that characterizing genome-wide DNA methylation patterns is necessary to better understand the regulatory mechanisms of DNA methylation. Although a series of experimental technologies have been proposed, they are cost-ineffective for DNA methylation status detection. As complements to experimental techniques, computational methods will facilitate the identification of DNA methylation status. In the present study, we proposed a Naïve Bayes model to predict CpG island methylation status. In this model, DNA sequences are formulated by "pseudo trinucleotide composition" into which three DNA physicochemical properties were incorporated. It was observed by the jack-knife test that the overall success rate achieved by the proposed model in predicting the DNA methylation status was 88.22%. This result indicates that the proposed model is a useful tool for DNA methylation status prediction.


Assuntos
Ilhas de CpG , Metilação de DNA , DNA/química , Análise de Sequência de DNA/métodos , Teorema de Bayes , Epigênese Genética , Genoma Humano , Humanos
14.
Anal Biochem ; 462: 76-83, 2014 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-25016190

RESUMO

Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called "iTIS-PseTNC," was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method.


Assuntos
Algoritmos , Genômica/métodos , Oligonucleotídeos/genética , Iniciação Traducional da Cadeia Peptídica , Sequência de Bases , Genoma Humano/genética , Humanos , Internet , Máquina de Vetores de Suporte , Interface Usuário-Computador
15.
ScientificWorldJournal ; 2014: 740506, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25215331

RESUMO

DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , DNA/química , DNA/metabolismo , Desoxirribonuclease I/metabolismo , Algoritmos , Linhagem Celular , Cromatina/genética , Cromatina/metabolismo , DNA/genética , Conjuntos de Dados como Assunto , Humanos
16.
Int J Biol Macromol ; 257(Pt 2): 128802, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38101670

RESUMO

Heat shock proteins (HSPs) are crucial cellular stress proteins that react to environmental cues, ensuring the preservation of cellular functions. They also play pivotal roles in orchestrating the immune response and participating in processes associated with cancer. Consequently, the classification of HSPs holds immense significance in enhancing our understanding of their biological functions and in various diseases. However, the use of computational methods for identifying and classifying HSPs still faces challenges related to accuracy and interpretability. In this study, we introduced MulCNN-HSP, a novel deep learning model based on multi-scale convolutional neural networks, for identifying and classifying of HSPs. Comparative results showed that MulCNN-HSP outperforms or matches existing models in the identification and classification of HSPs. Furthermore, MulCNN-HSP can extract and analyze essential features for the prediction task, enhancing its interpretability. To facilitate its accessibility, we have made MulCNN-HSP available at http://cbcb.cdutcm.edu.cn/HSP/. We hope that MulCNN-HSP will contribute to advancing the study of HSPs and their roles in various biological processes and diseases.


Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Proteínas de Choque Térmico/metabolismo , Proteínas de Choque Térmico HSP70/metabolismo
17.
Anal Biochem ; 442(1): 118-25, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23756733

RESUMO

Heat shock proteins (HSPs) are a type of functionally related proteins present in all living organisms, both prokaryotes and eukaryotes. They play essential roles in protein-protein interactions such as folding and assisting in the establishment of proper protein conformation and prevention of unwanted protein aggregation. Their dysfunction may cause various life-threatening disorders, such as Parkinson's, Alzheimer's, and cardiovascular diseases. Based on their functions, HSPs are usually classified into six families: (i) HSP20 or sHSP, (ii) HSP40 or J-class proteins, (iii) HSP60 or GroEL/ES, (iv) HSP70, (v) HSP90, and (vi) HSP100. Although considerable progress has been achieved in discriminating HSPs from other proteins, it is still a big challenge to identify HSPs among their six different functional types according to their sequence information alone. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop a high-throughput computational tool in this regard. To take up such a challenge, a predictor called iHSP-PseRAAAC has been developed by incorporating the reduced amino acid alphabet information into the general form of pseudo amino acid composition. One of the remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimension disaster or overfitting problem in statistical prediction. It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset in which none of HSPs included has ≥40% pairwise sequence identity to any other in the same subset. It has not escaped our notice that the reduced amino acid alphabet approach can also be used to investigate other protein classification problems. As a user-friendly web server, iHSP-PseRAAAC is accessible to the public at http://lin.uestc.edu.cn/server/iHSP-PseRAAAC.


Assuntos
Aminoácidos/análise , Proteínas de Choque Térmico/química , Internet , Análise de Sequência de Proteína , Software , Interface Usuário-Computador , Sequência de Aminoácidos , Bases de Dados de Proteínas , Proteínas de Choque Térmico/genética , Ensaios de Triagem em Larga Escala , Oxirredução
18.
Mol Ther Nucleic Acids ; 32: 28-35, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-36908648

RESUMO

The global pandemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has generated tremendous concern and poses a serious threat to international public health. Phosphorylation is a common post-translational modification affecting many essential cellular processes and is inextricably linked to SARS-CoV-2 infection. Hence, accurate identification of phosphorylation sites will be helpful to understand the mechanisms of SARS-CoV-2 infection and mitigate the ongoing COVID-19 pandemic. In the present study, an attention-based bidirectional gated recurrent unit network, called IPs-GRUAtt, was proposed to identify phosphorylation sites in SARS-CoV-2-infected host cells. Comparative results demonstrated that IPs-GRUAtt surpassed both state-of-the-art machine-learning methods and existing models for identifying phosphorylation sites. Moreover, the attention mechanism made IPs-GRUAtt able to extract the key features from protein sequences. These results demonstrated that the IPs-GRUAtt is a powerful tool for identifying phosphorylation sites. For facilitating its academic use, a freely available online web server for IPs-GRUAtt is provided at http://cbcb.cdutcm.edu.cn/phosphory/.

19.
Int J Biol Macromol ; 242(Pt 2): 124761, 2023 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-37156312

RESUMO

O-linked glycosylation is one of the most complex post-translational modifications (PTM) of human proteins modulating various cellular metabolic and signaling pathways. Unlike N-glycosylation, the O-glycosylation has non-specific sequence features and unstable glycan core structure, which makes identification of O-glycosites more challenging either by experimental or computational methods. Biochemical experiments to identify O-glycosites in batches are technically and economically demanding. Therefore, development of computation-based methods is greatly warranted. This study constructed a prediction model based on feature fusion for O-glycosites linked to the threonine residues in Homo sapiens. In the training model, we collected and sorted out high-quality human protein data with O-linked threonine glycosites. Seven feature coding methods were fused to represent the sample sequence. By comparison of different algorithms, random forest was selected as the final classifier to construct the classification model. Through 5-fold cross-validation, the proposed model, namely O-GlyThr, performed satisfactorily on both training set (AUC: 0.9308) and independent validation dataset (AUC: 0.9323). Compared with previously published predictors, O-GlyThr achieved the highest ACC of 0.8475 on the independent test dataset. These results demonstrated the high competency of our predictor in identifying O-glycosites on threonine residues. Furthermore, a user-friendly webserver named O-GlyThr (http://cbcb.cdutcm.edu.cn/O-GlyThr/) was developed to assist glycobiologists in the research associated with glycosylation structure and function.


Assuntos
Processamento de Proteína Pós-Traducional , Treonina , Humanos , Glicosilação , Algoritmos , Biologia Computacional/métodos
20.
Hortic Res ; 10(9): uhad139, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37671073

RESUMO

Polygala tenuifolia is a perennial medicinal plant that has been widely used in traditional Chinese medicine for treating mental diseases. However, the lack of genomic resources limits the insight into its evolutionary and biological characterization. In the present work, we reported the P. tenuifolia genome, the first genome assembly of the Polygalaceae family. We sequenced and assembled this genome by a combination of Illumnina, PacBio HiFi, and Hi-C mapping. The assembly includes 19 pseudochromosomes covering ~92.68% of the assembled genome (~769.62 Mb). There are 36 463 protein-coding genes annotated in this genome. Detailed comparative genome analysis revealed that P. tenuifolia experienced two rounds of whole genome duplication that occurred ~39-44 and ~18-20 million years ago, respectively. Accordingly, we systematically reconstructed ancestral chromosomes of P. tenuifolia and inferred its chromosome evolution trajectories from the common ancestor of core eudicots to the present species. Based on the transcriptomics data, enzyme genes and transcription factors involved in the synthesis of triterpenoid saponin in P. tenuifolia were identified. Further analysis demonstrated that whole-genome duplications and tandem duplications play critical roles in the expansion of P450 and UGT gene families, which contributed to the synthesis of triterpenoid saponins. The genome and transcriptome data will not only provide valuable resources for comparative and functional genomic researches on Polygalaceae, but also shed light on the synthesis of triterpenoid saponin.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA