Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37225419

RESUMO

Single-cell RNA sequencing (scRNA-seq) detects whole transcriptome signals for large amounts of individual cells and is powerful for determining cell-to-cell differences and investigating the functional characteristics of various cell types. scRNA-seq datasets are usually sparse and highly noisy. Many steps in the scRNA-seq analysis workflow, including reasonable gene selection, cell clustering and annotation, as well as discovering the underlying biological mechanisms from such datasets, are difficult. In this study, we proposed an scRNA-seq analysis method based on the latent Dirichlet allocation (LDA) model. The LDA model estimates a series of latent variables, i.e. putative functions (PFs), from the input raw cell-gene data. Thus, we incorporated the 'cell-function-gene' three-layer framework into scRNA-seq analysis, as this framework is capable of discovering latent and complex gene expression patterns via a built-in model approach and obtaining biologically meaningful results through a data-driven functional interpretation process. We compared our method with four classic methods on seven benchmark scRNA-seq datasets. The LDA-based method performed best in the cell clustering test in terms of both accuracy and purity. By analysing three complex public datasets, we demonstrated that our method could distinguish cell types with multiple levels of functional specialization, and precisely reconstruct cell development trajectories. Moreover, the LDA-based method accurately identified the representative PFs and the representative genes for the cell types/cell stages, enabling data-driven cell cluster annotation and functional interpretation. According to the literature, most of the previously reported marker/functionally relevant genes were recognized.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma , Análise por Conglomerados , Algoritmos
2.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35279714

RESUMO

Messenger RNA (mRNA) vaccines have shown great potential for anti-tumor therapy due to the advantages in safety, efficacy and industrial production. However, it remains a challenge to identify suitable cancer neoantigens that can be targeted for mRNA vaccines. Abnormal alternative splicing occurs in a variety of tumors, which may result in the translation of abnormal transcripts into tumor-specific proteins. High-throughput technologies make it possible for systematic characterization of alternative splicing as a source of suitable target neoantigens for mRNA vaccine development. Here, we summarized difficulties and challenges for identifying alternative splicing-derived cancer neoantigens from RNA-seq data and proposed a conceptual framework for designing personalized mRNA vaccines based on alternative splicing-derived cancer neoantigens. In addition, several points were presented to spark further discussion toward improving the identification of alternative splicing-derived cancer neoantigens.


Assuntos
Processamento Alternativo , Neoplasias , Antígenos de Neoplasias/genética , Humanos , Imunoterapia , Neoplasias/genética , Neoplasias/terapia , RNA Mensageiro/genética , Vacinas Sintéticas , Vacinas de mRNA
3.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35870203

RESUMO

The rapid development of single-cel+l RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for exploring biological phenomena at the single-cell level. The discovery of cell types is one of the major applications for researchers to explore the heterogeneity of cells. Some computational methods have been proposed to solve the problem of scRNA-seq data clustering. However, the unavoidable technical noise and notorious dropouts also reduce the accuracy of clustering methods. Here, we propose the cauchy-based bounded constraint low-rank representation (CBLRR), which is a low-rank representation-based method by introducing cauchy loss function (CLF) and bounded nuclear norm regulation, aiming to alleviate the above issue. Specifically, as an effective loss function, the CLF is proven to enhance the robustness of the identification of cell types. Then, we adopt the bounded constraint to ensure the entry values of single-cell data within the restricted interval. Finally, the performance of CBLRR is evaluated on 15 scRNA-seq datasets, and compared with other state-of-the-art methods. The experimental results demonstrate that CBLRR performs accurately and robustly on clustering scRNA-seq data. Furthermore, CBLRR is an effective tool to cluster cells, and provides great potential for downstream analysis of single-cell data. The source code of CBLRR is available online at https://github.com/Ginnay/CBLRR.


Assuntos
Análise de Célula Única , Software , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
4.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37740953

RESUMO

MOTIVATION: Cell-cell interactions (CCIs) play critical roles in many biological processes such as cellular differentiation, tissue homeostasis, and immune response. With the rapid development of high throughput single-cell RNA sequencing (scRNA-seq) technologies, it is of high importance to identify CCIs from the ever-increasing scRNA-seq data. However, limited by the algorithmic constraints, current computational methods based on statistical strategies ignore some key latent information contained in scRNA-seq data with high sparsity and heterogeneity. RESULTS: Here, we developed a deep learning framework named DeepCCI to identify meaningful CCIs from scRNA-seq data. Applications of DeepCCI to a wide range of publicly available datasets from diverse technologies and platforms demonstrate its ability to predict significant CCIs accurately and effectively. Powered by the flexible and easy-to-use software, DeepCCI can provide the one-stop solution to discover meaningful intercellular interactions and build CCI networks from scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The source code of DeepCCI is available online at https://github.com/JiangBioLab/DeepCCI.


Assuntos
Aprendizado Profundo , Perfilação da Expressão Gênica , Análise de Sequência de RNA , Análise de Célula Única , Software , Análise por Conglomerados
5.
Nucleic Acids Res ; 50(22): e131, 2022 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-36250636

RESUMO

Recent advances in spatial transcriptomics (ST) have brought unprecedented opportunities to understand tissue organization and function in spatial context. However, it is still challenging to precisely dissect spatial domains with similar gene expression and histology in situ. Here, we present DeepST, an accurate and universal deep learning framework to identify spatial domains, which performs better than the existing state-of-the-art methods on benchmarking datasets of the human dorsolateral prefrontal cortex. Further testing on a breast cancer ST dataset, we showed that DeepST can dissect spatial domains in cancer tissue at a finer scale. Moreover, DeepST can achieve not only effective batch integration of ST data generated from multiple batches or different technologies, but also expandable capabilities for processing other spatial omics data. Together, our results demonstrate that DeepST has the exceptional capacity for identifying spatial domains, making it a desirable tool to gain novel insights from ST studies.


Assuntos
Aprendizado Profundo , Perfilação da Expressão Gênica , Humanos , Benchmarking , Perfilação da Expressão Gênica/métodos , Transcriptoma
6.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34254994

RESUMO

Epigenetic aberrations have played a significant role in affecting the pathophysiological state of colorectal cancer, and global DNA hypomethylation mainly occurs in partial methylation domains (PMDs). However, the distribution of PMDs in individual cells and the heterogeneity between cells are still unclear. In this study, the DNA methylation profiles of colorectal cancer detected by WGBS and scBS-seq were used to depict PMDs in individual cells for the first time. We found that more than half of the entire genome is covered by PMDs. Three subclasses of PMDS have distinct characteristics, and Gain-PMDs cover a higher proportion of protein coding genes. Gain-PMDs have extensive epigenetic heterogeneity between different cells of the same tumor, and the DNA methylation in cells is affected by the tumor microenvironment. In addition, abnormally elevated promoter methylation in Gain-PMDs may further promote the growth, proliferation and metastasis of tumor cells through silent transcription. The PMDs detected in this study have the potential as epigenetic biomarkers and provide a new insight for colorectal cancer research based on single-cell methylation data.


Assuntos
Neoplasias Colorretais/metabolismo , Metilação de DNA , Proliferação de Células , Neoplasias Colorretais/patologia , Progressão da Doença , Epigênese Genética , Heterogeneidade Genética , Humanos , Regiões Promotoras Genéticas , Análise de Célula Única , Microambiente Tumoral
7.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34415016

RESUMO

Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.


Assuntos
Tratamento Farmacológico da COVID-19 , Epitopos/imunologia , Peptídeos/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , SARS-CoV-2/imunologia , Sequência de Aminoácidos/genética , COVID-19/genética , COVID-19/imunologia , COVID-19/virologia , Simulação por Computador , Aprendizado Profundo , Epitopos/genética , Humanos , Peptídeos/genética , Peptídeos/uso terapêutico , Ligação Proteica/genética , Receptores de Antígenos de Linfócitos T/genética , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Software
8.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34015809

RESUMO

The world is facing a pandemic of Corona Virus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Adaptive immune responses are essential for SARS-CoV-2 virus clearance. Although a large body of studies have been conducted to investigate the immune mechanism in COVID-19 patients, we still lack a comprehensive understanding of the BCR repertoire in patients. In this study, we used the single-cell V(D)J sequencing to characterize the BCR repertoire across convalescent COVID-19 patients. We observed that the BCR diversity was significantly reduced in disease compared with healthy controls. And BCRs tend to skew toward different V gene segments in COVID-19 and healthy controls. The CDR3 sequences of heavy chain in clonal BCRs in patients were more convergent than that in healthy controls. In addition, we discovered increased IgG and IgA isotypes in the disease, including IgG1, IgG3 and IgA1. In all clonal BCRs, IgG isotypes had the most frequent class switch recombination events and the highest somatic hypermutation rate, especially IgG3. Moreover, we found that an IgG3 cluster from different clonal groups had the same IGHV, IGHJ and CDR3 sequences (IGHV4-4-CARLANTNQFYDSSSYLNAMDVW-IGHJ6). Overall, our study provides a comprehensive characterization of the BCR repertoire in COVID-19 patients, which contributes to the understanding of the mechanism for the immune response to SARS-CoV-2 infection.


Assuntos
COVID-19/imunologia , Receptores de Antígenos de Linfócitos B/genética , SARS-CoV-2/imunologia , Éxons VDJ/genética , Linfócitos B/imunologia , COVID-19/genética , COVID-19/virologia , Feminino , Humanos , Imunoglobulina A/genética , Imunoglobulina A/imunologia , Imunoglobulina G/genética , Imunoglobulina G/imunologia , Masculino , Receptores de Antígenos de Linfócitos B/imunologia , SARS-CoV-2/patogenicidade , Análise de Sequência , Análise de Célula Única , Éxons VDJ/imunologia
9.
Genomics ; 114(6): 110486, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36126833

RESUMO

DNA methylation is an important epigenetics, which occurs in the early stages of tumor formation. And it also is of great significance to find the relationship between DNA methylation and cancer. This paper proposes a novel model, iCancer-Pred, to identify cancer and classify its types further. The datasets of DNA methylation information of 7 cancer types have been collected from The Cancer Genome Atlas (TCGA). The coefficient of variation firstly is used to reduce the number of features, and then the elastic network is applied to select important features. Finally, a fully connected neural network is constructed with these selected features. In predicting seven types of cancers, iCancer-Pred has achieved an overall accuracy of over 97% accuracy with 5-fold cross-validation. For the convenience of the application, a user-friendly web server: http://bioinfo.jcu.edu.cn/cancer or http://121.36.221.79/cancer/ is available. And the source codes are freely available for download at https://github.com/Huerhu/iCancer-Pred.


Assuntos
Metilação de DNA , Neoplasias , Humanos , Epigenômica , Neoplasias/genética
10.
Int J Mol Sci ; 24(5)2023 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-36901929

RESUMO

A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug-drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. Most existing methods in silico only judge whether two drugs interact, ignoring the importance of interaction events to study the mechanism implied in combination drugs. In this work, we propose a deep learning framework named MSEDDI that comprehensively considers multi-scale embedding representations of the drug for predicting drug-drug interaction events. In MSEDDI, we design three-channel networks to process biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding, respectively. Finally, we fuse three heterogeneous features from channel outputs through a self-attention mechanism and feed them to the linear layer predictor. In the experimental section, we evaluate the performance of all methods on two different prediction tasks on two datasets. The results show that MSEDDI outperforms other state-of-the-art baselines. Moreover, we also reveal the stable performance of our model in a broader sample set via case studies.


Assuntos
Bases de Conhecimento , Polimedicação , Humanos , Interações Medicamentosas
11.
Bioinformatics ; 37(2): 171-177, 2021 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-32766811

RESUMO

MOTIVATION: Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. RESULTS: In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. AVAILABILITY AND IMPLEMENTATION: Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Proteína Pós-Traducional , Proteínas , Biologia Computacional , Estresse Oxidativo , Carbonilação Proteica , Proteínas/metabolismo
12.
Genomics ; 113(2): 456-462, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33383142

RESUMO

T-cell receptor (TCR) is crucial in T cell-mediated virus clearance. To date, TCR bias has been observed in various diseases. However, studies on the TCR repertoire of COVID-19 patients are lacking. Here, we used single-cell V(D)J sequencing to conduct comparative analyses of TCR repertoire between 12 COVID-19 patients and 6 healthy controls, as well as other virus-infected samples. We observed distinct T cell clonal expansion in COVID-19. Further analysis of VJ gene combination revealed 6 VJ pairs significantly increased, while 139 pairs significantly decreased in COVID-19 patients. When considering the VJ combination of α and ß chains at the same time, the combination with the highest frequency on COVID-19 was TRAV12-2-J27-TRBV7-9-J2-3. Besides, preferential usage of V and J gene segments was also observed in samples infected by different viruses. Our study provides novel insights on TCR in COVID-19, which contribute to our understanding of the immune response induced by SARS-CoV-2.


Assuntos
COVID-19/genética , Sequenciamento de Nucleotídeos em Larga Escala , Receptores de Antígenos de Linfócitos T/genética , SARS-CoV-2 , Análise de Célula Única , COVID-19/imunologia , Feminino , Humanos , Masculino , Linfócitos T/imunologia
13.
Int J Mol Sci ; 23(19)2022 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-36232325

RESUMO

N6,2'-O-dimethyladenosine (m6Am) is a post-transcriptional modification that may be associated with regulatory roles in the control of cellular functions. Therefore, it is crucial to accurately identify transcriptome-wide m6Am sites to understand underlying m6Am-dependent mRNA regulation mechanisms and biological functions. Here, we used three sequence-based feature-encoding schemes, including one-hot, nucleotide chemical property (NCP), and nucleotide density (ND), to represent RNA sequence samples. Additionally, we proposed an ensemble deep learning framework, named DLm6Am, to identify m6Am sites. DLm6Am consists of three similar base classifiers, each of which contains a multi-head attention module, an embedding module with two parallel deep learning sub-modules, a convolutional neural network (CNN) and a Bi-directional long short-term memory (BiLSTM), and a prediction module. To demonstrate the superior performance of our model's architecture, we compared multiple model frameworks with our method by analyzing the training data and independent testing data. Additionally, we compared our model with the existing state-of-the-art computational methods, m6AmPred and MultiRM. The accuracy (ACC) for the DLm6Am model was improved by 6.45% and 8.42% compared to that of m6AmPred and MultiRM on independent testing data, respectively, while the area under receiver operating characteristic curve (AUROC) for the DLm6Am model was increased by 4.28% and 5.75%, respectively. All the results indicate that DLm6Am achieved the best prediction performance in terms of ACC, Matthews correlation coefficient (MCC), AUROC, and the area under precision and recall curves (AUPR). To further assess the generalization performance of our proposed model, we implemented chromosome-level leave-out cross-validation, and found that the obtained AUROC values were greater than 0.83, indicating that our proposed method is robust and can accurately predict m6Am sites.


Assuntos
Algoritmos , Aprendizado Profundo , Sequência de Bases , Nucleotídeos , RNA Mensageiro/genética
14.
Int J Mol Sci ; 23(24)2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36555143

RESUMO

N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.


Assuntos
Aprendizado Profundo , Animais , Reprodutibilidade dos Testes , RNA/metabolismo , Adenosina/genética , Adenosina/metabolismo , Mamíferos/metabolismo , Biologia Computacional/métodos
15.
BMC Cancer ; 21(1): 703, 2021 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-34130646

RESUMO

BACKGROUD: Cancer stemness is associated with metastases in kidney renal clear cell carcinoma (KIRC) and negatively correlates with immune infiltrates. Recent stemness evaluation methods based on the absolute expression have been proposed to reveal the relationship between stemness and cancer. However, we found that existing methods do not perform well in assessing the stemness of KIRC patients, and they overlooked the impact of alternative splicing. Alternative splicing not only progresses during the differentiation of stem cells, but also changes during the acquisition of the stemness features of cancer stem cells. There is an urgent need for a new method to predict KIRC-specific stemness more accurately, so as to provide help in selecting treatment options. METHODS: The corresponding RNA-Seq data were obtained from the The Cancer Genome Atlas (TCGA) data portal. We also downloaded stem cell RNA sequence data from the Progenitor Cell Biology Consortium (PCBC) Synapse Portal. Independent validation sets with large sample size and common clinic pathological characteristics were obtained from the Gene Expression Omnibus (GEO) database. we constructed a KIRC-specific stemness prediction model using an algorithm called one-class logistic regression based on the expression and alternative splicing data to predict stemness indices of KIRC patients, and the model was externally validated. We identify stemness-associated alternative splicing events (SASEs) by analyzing different alternative splicing event between high- and low- stemness groups. Univariate Cox and multivariable logistic regression analysisw as carried out to detect the prognosis-related SASEs respectively. The area under curve (AUC) of receiver operating characteristic (ROC) was performed to evaluate the predictive values of our model. RESULTS: Here, we constructed a KIRC-specific stemness prediction model with an AUC of 0.968,and to provide a user-friendly interface of our model for KIRC stemness analysis, we have developed KIRC Stemness Calculator and Visualization (KSCV), hosted on the Shiny server, can most easily be accessed via web browser and the url https://jiang-lab.shinyapps.io/kscv/ . When applied to 605 KIRC patients, our stemness indices had a higher correlation with the gender, smoking history and metastasis of the patients than the previous stemness indices, and revealed intratumor heterogeneity at the stemness level. We identified 77 novel SASEs by dividing patients into high- and low- stemness groups with significantly different outcome and they had significant correlations with expression of 17 experimentally validated splicing factors. Both univariate and multivariate survival analysis demonstrated that SASEs closely correlated with the overall survival of patients. CONCLUSIONS: Basing on the stemness indices, we found that not only immune infiltration but also alternative splicing events showed significant different at the stemness level. More importantly, we highlight the critical role of these differential alternative splicing events in poor prognosis, and we believe in the potential for their further translation into targets for immunotherapy.


Assuntos
Processamento Alternativo/genética , Carcinoma de Células Renais/genética , Neoplasias Renais/genética , Aprendizado de Máquina/normas , Carcinoma de Células Renais/mortalidade , Carcinoma de Células Renais/patologia , Humanos , Neoplasias Renais/mortalidade , Neoplasias Renais/patologia , Prognóstico , Análise de Sobrevida
16.
Bioinformatics ; 35(23): 4922-4929, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31077296

RESUMO

MOTIVATION: Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. RESULTS: We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. AVAILABILITY AND IMPLEMENTATION: A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.


Assuntos
Máquina de Vetores de Suporte , Sequência de Bases , Biologia Computacional , Nucleotídeos , RNA , Reprodutibilidade dos Testes
17.
Genomics ; 111(6): 1785-1793, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-30529532

RESUMO

The promoter is a regulatory DNA region about 81-1000 base pairs long, usually located near the transcription start site (TSS) along upstream of a given gene. By combining a certain protein called transcription factor, the promoter provides the starting point for regulated gene transcription, and hence plays a vitally important role in gene transcriptional regulation. With explosive growth of DNA sequences in the post-genomic age, it has become an urgent challenge to develop computational method for effectively identifying promoters because the information thus obtained is very useful for both basic research and drug development. Although some prediction methods were developed in this regard, most of them were limited at merely identifying whether a query DNA sequence being of a promoter or not. However, based on their strength-distinct levels for transcriptional activation and expression, promoter should be divided into two categories: strong and weak types. Here a new two-layer predictor, called "iPSW(2L)-PseKNC", was developed by fusing the physicochemical properties of nucleotides and their nucleotide density into PseKNC (pseudo K-tuple nucleotide composition). Its 1st-layer serves to predict whether a query DNA sequence sample is of promoter or not, while its 2nd-layer is able to predict the strength of promoters. It has been observed through rigorous cross-validations that the 1st-layer sub-predictor is remarkably superior to the existing state-of-the-art predictors in identifying the promoters and non-promoters, and that the 2nd-layer sub-predictor can do what is beyond the reach of the existing predictors. Moreover, the web-server for iPSW(2L)-PseKNC has been established at http://www.jci-bioinfo.cn/iPSW(2L)-PseKNC, by which the majority of experimental scientists can easily get the results they need.


Assuntos
Sequência de Bases , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Software , Sítio de Iniciação de Transcrição , Ativação Transcricional
18.
Genomics ; 110(5): 239-246, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29107015

RESUMO

Lysine crotonylation (Kcr) is an evolution-conserved histone posttranslational modification (PTM), occurring in both human somatic and mouse male germ cell genomes. It is important for male germ cell differentiation. Information of Kcr sites in proteins is very useful for both basic research and drug development. But it is time-consuming and expensive to determine them by experiments alone. Here, we report a novel predictor called iKcr-PseEns that is established by incorporating five tiers of amino acid pairwise couplings into the general pseudo amino acid composition. It has been observed via rigorous cross-validations that the new predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 90.53%, 95.27%, 94.49%, and 0.826, respectively. For the convenience of most experimental scientists, a user-friendly web-server for iKcr-PseEns has been established at http://www.jci-bioinfo.cn/iKcr-PseEns, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.


Assuntos
Histonas/metabolismo , Processamento de Proteína Pós-Traducional , Análise de Sequência de Proteína/métodos , Software , Crotonatos/química , Crotonatos/metabolismo , Histonas/química , Humanos , Lisina/química , Lisina/metabolismo
19.
Molecules ; 24(3)2019 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-30678171

RESUMO

As an abundant post-transcriptional modification, dihydrouridine (D) has been found in transfer RNA (tRNA) from bacteria, eukaryotes, and archaea. Nonetheless, knowledge of the exact biochemical roles of dihydrouridine in mediating tRNA function is still limited. Accurate identification of the position of D sites is essential for understanding their functions. Therefore, it is desirable to develop novel methods to identify D sites. In this study, an ensemble classifier was proposed for the detection of D modification sites in the Saccharomyces cerevisiae transcriptome by using heterogeneous features. The jackknife test results demonstrate that the proposed predictor is promising for the identification of D modification sites. It is anticipated that the proposed method can be widely used for identifying D modification sites in tRNA.


Assuntos
RNA de Transferência/química , Saccharomyces cerevisiae/química , Máquina de Vetores de Suporte , Uridina/química , Algoritmos , Fenômenos Químicos , Conformação de Ácido Nucleico , Reprodutibilidade dos Testes , Uridina/análogos & derivados
20.
Bioinformatics ; 32(20): 3116-3123, 2016 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-27334473

RESUMO

MOTIVATION: Post-translational modification, abbreviated as PTM, refers to the change of the amino acid side chains of a protein after its biosynthesis. Owing to its significance for in-depth understanding various biological processes and developing effective drugs, prediction of PTM sites in proteins have currently become a hot topic in bioinformatics. Although many computational methods were established to identify various single-label PTM types and their occurrence sites in proteins, no method has ever been developed for multi-label PTM types. As one of the most frequently observed PTMs, the K-PTM, namely, the modification occurring at lysine (K), can be usually accommodated with many different types, such as 'acetylation', 'crotonylation', 'methylation' and 'succinylation'. Now we are facing an interesting challenge: given an uncharacterized protein sequence containing many K residues, which ones can accommodate two or more types of PTM, which ones only one, and which ones none? RESULTS: To address this problem, a multi-label predictor called IPTM-MLYS: has been developed. It represents the first multi-label PTM predictor ever established. The novel predictor is featured by incorporating the sequence-coupled effects into the general PseAAC, and by fusing an array of basic random forest classifiers into an ensemble system. Rigorous cross-validations via a set of multi-label metrics indicate that the first multi-label PTM predictor is very promising and encouraging. AVAILABILITY AND IMPLEMENTATION: For the convenience of most experimental scientists, a user-friendly web-server for iPTM-mLys has been established at http://www.jci-bioinfo.cn/iPTM-mLys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. CONTACT: wqiu@gordonlifescience.org, xxiao@gordonlifescience.org, kcchou@gordonlifescience.orgSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Lisina , Processamento de Proteína Pós-Traducional , Aminoácidos , Animais , Humanos , Proteínas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA