Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Genome Res ; 32(1): 150-161, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34261731

RESUMO

Archived formalin-fixed paraffin-embedded (FFPE) samples are the global standard format for preservation of the majority of biopsies in both basic research and translational cancer studies, and profiling chromatin accessibility in the archived FFPE tissues is fundamental to understanding gene regulation. Accurate mapping of chromatin accessibility from FFPE specimens is challenging because of the high degree of DNA damage. Here, we first showed that standard ATAC-seq can be applied to purified FFPE nuclei but yields lower library complexity and a smaller proportion of long DNA fragments. We then present FFPE-ATAC, the first highly sensitive method for decoding chromatin accessibility in FFPE tissues that combines Tn5-mediated transposition and T7 in vitro transcription. The FFPE-ATAC generates high-quality chromatin accessibility profiles with 500 nuclei from a single FFPE tissue section, enables the dissection of chromatin profiles from the regions of interest with the aid of hematoxylin and eosin (H&E) staining, and reveals disease-associated chromatin regulation from the human colorectal cancer FFPE tissue archived for >10 yr. In summary, the approach allows decoding of the chromatin states that regulate gene expression in archival FFPE tissues, thereby permitting investigators to better understand epigenetic regulation in cancer and precision medicine.


Assuntos
Cromatina , Formaldeído , Cromatina/genética , Epigênese Genética , Perfilação da Expressão Gênica/métodos , Humanos , Inclusão em Parafina/métodos , Fixação de Tecidos/métodos
2.
Nucleic Acids Res ; 49(21): e125, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34534335

RESUMO

The majority of biopsies in both basic research and translational cancer studies are preserved in the format of archived formalin-fixed paraffin-embedded (FFPE) samples. Profiling histone modifications in archived FFPE tissues is critically important to understand gene regulation in human disease. The required input for current genome-wide histone modification profiling studies from FFPE samples is either 10-20 tissue sections or whole tissue blocks, which prevents better resolved analyses. But it is desirable to consume a minimal amount of FFPE tissue sections in the analysis as clinical tissues of interest are limited. Here, we present FFPE tissue with antibody-guided chromatin tagmentation with sequencing (FACT-seq), the first highly sensitive method to efficiently profile histone modifications in FFPE tissues by combining a novel fusion protein of hyperactive Tn5 transposase and protein A (T7-pA-Tn5) transposition and T7 in vitro transcription. FACT-seq generates high-quality chromatin profiles from different histone modifications with low number of FFPE nuclei. We proved a very small piece of FFPE tissue section containing ∼4000 nuclei is sufficient to decode H3K27ac modifications with FACT-seq. H3K27ac FACT-seq revealed disease-specific super enhancers in the archived FFPE human colorectal and human glioblastoma cancer tissue. In summary, FACT-seq allows decoding the histone modifications in archival FFPE tissues with high sensitivity and help researchers to better understand epigenetic regulation in cancer and human disease.


Assuntos
Cromatina/metabolismo , Epigênese Genética , Histonas/análise , Animais , Linhagem Celular , Humanos , Camundongos , Processamento de Proteína Pós-Traducional , Proteína Estafilocócica A/metabolismo , Transposases/metabolismo
3.
RNA ; 25(2): 205-218, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30425123

RESUMO

N6-Methyladenosine (m6A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N6-methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m6A sites detected by high-throughput sequencing, it is now possible to discover the characteristics of m6A sequences using deep learning techniques. To the best of our knowledge, our work is the first attempt to use word embedding and deep neural networks for m6A prediction from mRNA sequences. Using four deep neural networks, we developed a model inferred from a larger sequence shifting window that can predict m6A accurately and robustly. Four prediction schemes were built with various RNA sequence representations and optimized convolutional neural networks. The soft voting results from the four deep networks were shown to outperform all of the state-of-the-art methods. We evaluated these predictors mentioned above on a rigorous independent test data set and proved that our proposed method outperforms the state-of-the-art predictors. The training, independent, and cross-species testing data sets are much larger than in previous studies, which could help to avoid the problem of overfitting. Furthermore, an online prediction web server implementing the four proposed predictors has been built and is available at http://server.malab.cn/Gene2vec/.


Assuntos
Adenosina/análogos & derivados , Biologia Computacional/métodos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA/métodos , Adenosina/genética , Sequência de Bases/genética , Sequenciamento de Nucleotídeos em Larga Escala , Aprendizado de Máquina , Modelos Teóricos , Redes Neurais de Computação
4.
J Proteome Res ; 16(5): 2044-2053, 2017 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-28436664

RESUMO

Cell-penetrating peptides (CPPs), have been proven as important drug-delivery vehicles, demonstrating the potential as therapeutic candidates. The past decade has witnessed a rapid growth in CPP-based research. Recently, many computational efforts have been made to develop machine-learning-based methods for identifying CPPs. Although much progress has been made, existing methods still suffer low feature representation capability that limits further performance improvement. In this study, we propose a novel predictor called CPPred-RF, in which we integrate multiple sequence-based feature descriptors to sufficiently explore distinct information embedded in CPPs, employ a well-established feature selection technique to improve the feature representation, and, for the first time, construct a two-layer prediction framework based on the random forest algorithm. The jackknife results on benchmark data sets show that the proposed CPPred-RF is at least competitive with the state-of-the-art predictors. Moreover, we establish the first online Web server in terms of predicting CPPs and their uptake efficiency simultaneously. It is freely available at http://server.malab.cn/CPPred-RF .


Assuntos
Algoritmos , Peptídeos Penetradores de Células/genética , Análise de Sequência de Proteína/métodos , Sistemas de Liberação de Medicamentos , Aprendizado de Máquina , Máquina de Vetores de Suporte
5.
Artigo em Inglês | MEDLINE | ID: mdl-38329860

RESUMO

Graph neural networks (GNNs) have attracted extensive research attention in recent years due to their capability to progress with graph data and have been widely used in practical applications. As societies become increasingly concerned with the need for data privacy protection, GNNs face the need to adapt to this new normal. Besides, as clients in federated learning (FL) may have relationships, more powerful tools are required to utilize such implicit information to boost performance. This has led to the rapid development of the emerging research field of federated GNNs (FedGNNs). This promising interdisciplinary field is highly challenging for interested researchers to grasp. The lack of an insightful survey on this topic further exacerbates the entry difficulty. In this article, we bridge this gap by offering a comprehensive survey of this emerging field. We propose a 2-D taxonomy of the FedGNN literature: 1) the main taxonomy provides a clear perspective on the integration of GNNs and FL by analyzing how GNNs enhance FL training as well as how FL assists GNN training and 2) the auxiliary taxonomy provides a view on how FedGNNs deal with heterogeneity across FL clients. Through discussions of key ideas, challenges, and limitations of existing works, we envision future research directions that can help build more robust, explainable, efficient, fair, inductive, and comprehensive FedGNNs.

6.
Leukemia ; 38(5): 1086-1098, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38600314

RESUMO

Blastic plasmacytoid dendritic cell neoplasm (BPDCN) constitutes a rare and aggressive malignancy originating from plasmacytoid dendritic cells (pDCs) with a primarily cutaneous tropism followed by dissemination to the bone marrow and other organs. We conducted a genome-wide analysis of the tumor methylome in an extended cohort of 45 BPDCN patients supplemented by WES and RNA-seq as well as ATAC-seq on selected cases. We determined the BPDCN DNA methylation profile and observed a dramatic loss of DNA methylation during malignant transformation from early and mature DCs towards BPDCN. DNA methylation profiles further differentiate between BPDCN, AML, CMML, and T-ALL exhibiting the most striking global demethylation, mitotic stress, and merely localized DNA hypermethylation in BPDCN resulting in pronounced inactivation of tumor suppressor genes by comparison. DNA methylation-based analysis of the tumor microenvironment by MethylCIBERSORT yielded two, prognostically relevant clusters (IC1 and IC2) with specific cellular composition and mutational spectra. Further, the transcriptional subgroups of BPDCN (C1 and C2) differ by DNA methylation signatures in interleukin/inflammatory signaling genes but also by higher transcription factor activity of JAK-STAT and NFkB signaling in C2 in contrast to an EZH2 dependence in C1-BPDCN. Our integrative characterization of BPDCN offers novel molecular insights and potential diagnostic applications.


Assuntos
Metilação de DNA , Células Dendríticas , Humanos , Células Dendríticas/patologia , Células Dendríticas/metabolismo , Feminino , Masculino , Pessoa de Meia-Idade , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/patologia , Microambiente Tumoral/genética , Idoso , Adulto , Prognóstico , Regulação Neoplásica da Expressão Gênica , Mutação , Biomarcadores Tumorais/genética
7.
Cell Death Discov ; 9(1): 260, 2023 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-37495566

RESUMO

Cutaneous squamous cell carcinoma (cSCC) is a fast-increasing cancer with metastatic potential. Extracellular vesicles (EVs) are small membrane-bound vesicles that play important roles in intercellular communication, particularly in the tumor microenvironment (TME). Here we report that cSCC cells secrete an increased number of EVs relative to normal human epidermal keratinocytes (NHEKs) and that interfering with the capacity of cSCC to secrete EVs inhibits tumor growth in vivo in a xenograft model of human cSCC. Transcriptome analysis of tumor xenografts by RNA-sequencing enabling the simultaneous quantification of both the human and the mouse transcripts revealed that impaired EV-production of cSCC cells prominently altered the phenotype of stromal cells, in particular genes related to extracellular matrix (ECM)-formation and epithelial-mesenchymal transition (EMT). In line with these results, co-culturing of human dermal fibroblasts (HDFs) with cSCC cells, but not with normal keratinocytes in vitro resulted in acquisition of cancer-associated fibroblast (CAF) phenotype. Interestingly, EVs derived from metastatic cSCC cells, but not primary cSCCs or NHEKs, were efficient in converting HDFs to CAFs. Multiplex bead-based flow cytometry assay and mass-spectrometry (MS)-based proteomic analyses revealed the heterogenous cargo of cSCC-derived EVs and that especially EVs derived from metastatic cSCCs carry proteins associated with EV-biogenesis, EMT, and cell migration. Mechanistically, EVs from metastatic cSCC cells result in the activation of TGFß signaling in HDFs. Altogether, our study suggests that cSCC-derived EVs mediate cancer-stroma communication, in particular the conversion of fibroblasts to CAFs, which eventually contribute to cSCC progression.

8.
Curr Protoc ; 2(8): e535, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35994571

RESUMO

In basic and translational cancer research, the majority of biopsies are stored in formalin-fixed paraffin-embedded (FFPE) samples. Chromatin accessibility reflects the degree to which nuclear macromolecules can physically interact with chromatinized DNA and plays a key role in gene regulation in different physiological conditions. As such, the profiling of chromatin accessibility in archived FFPE tissue can be critical to understanding gene regulation in health and disease. Due to the high degree of DNA damage in FFPE samples, accurate mapping of chromatin accessibility in these specimens is extremely difficult. To address this issue, we recently established FFPE-ATAC, a highly sensitive method based on T7-Tn5-mediated transposition followed by in vitro transcription (IVT), to generate high-quality chromatin accessibility profiles with 500-50,000 nuclei from a single FFPE tissue section. In FFPE-ATAC, which we describe here, the T7-Tn5 adaptors are inserted into the genome after FFPE sample preparation and are unlikely to sustain the DNA breakage that occurs during reverse cross-linking of these samples. It should, therefore, remain at the ends of broken accessible chromatin sites after reverse cross-linking. IVT is then used to convert the two ends of the broken DNA fragments to RNA molecules before making sequencing libraries from the IVT RNAs and further decoding Tn5 adaptor insertion sites in the genome. Through this strategy, users can decode the flanking sequences of the accessible chromatin even if there are breaks between adjacent pairs of T7-T5 adaptor insertion sites. This method is applicable to dissecting chromatin profiles of a small section of the tissue sample, characterizing stage and region-specific gene regulation and disease-associated chromatin regulation in FFPE tissues. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Nuclei isolation from FFPE tissue samples Basic Protocol 2: T7-Tn5 transposase tagmentation, reverse-crosslinking, and in vitro transcription Basic Protocol 3: Preparation of libraries for high-throughput sequencing.


Assuntos
Cromatina , DNA , Cromatina/genética , DNA/genética , Formaldeído , Inclusão em Parafina , Análise de Sequência de DNA/métodos
9.
Bio Protoc ; 12(10)2022 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-35865114

RESUMO

The majority of biopsies in both basic research and translational cancer studies are preserved in the format of archived formalin-fixed paraffin-embedded (FFPE) samples. Profiling histone modifications in archived FFPE tissues is critically important to understand gene regulation in human disease. The required input for current genome-wide histone modification profiling studies from FFPE samples is either 10-20 tissue sections or whole tissue blocks, which prevents better resolved analyses. Nevertheless, it is desirable to consume a minimal amount of FFPE tissue sections in the analysis as clinical tissue of interest are limited. Here, we present F FPE tissue with a ntibody-guided c hromatin t agmentation with sequencing (FACT-seq), highly sensitive method to efficiently profile histone modifications in FFPE tissue by combining a novel fusion protein of hyperactive Tn5 transposase and protein A (T7-pA-Tn5) transposition and T7 in vitro transcription. FACT-seq generates high-quality chromatin profiles from different histone modifications with low number of FFPE nuclei. We showed a very small piece of FFPE tissue section containing ~4000 nuclei is sufficient to decode H3K27ac modifications with FACT-seq. In archived FFPE human colorectal and human glioblastoma cancer tissue, H3K27ac FACT-seq revealed disease specific super enhancers. In summary, FACT-seq allows researchers to decode histone modifications like H3K27ac and H3K27me3 in archival FFPE tissues with high sensitivity, thus allowing us to understand epigenetic regulation. Graphical abstract: ( i ) FFPE tissue section; ( ii ) Isolated nuclei; ( iii ) Primary antibody, secondary antibody and T7-pA-Tn5 bind to targets; ( iv ) DNA purification; ( v ) In vitro transcription and sequencing library preparation; ( vi ) Sequencing.

10.
Nat Commun ; 13(1): 2236, 2022 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-35469026

RESUMO

There is ample support for developmental regulation of glioblastoma stem cells. To examine how cell lineage controls glioblastoma stem cell function, we present a cross-species epigenome analysis of mouse and human glioblastoma stem cells. We analyze and compare the chromatin-accessibility landscape of nine mouse glioblastoma stem cell cultures of three defined origins and 60 patient-derived glioblastoma stem cell cultures by assay for transposase-accessible chromatin using sequencing. This separates the mouse cultures according to cell of origin and identifies three human glioblastoma stem cell clusters that show overlapping characteristics with each of the mouse groups, and a distribution along an axis of proneural to mesenchymal phenotypes. The epigenetic-based human glioblastoma stem cell clusters display distinct functional properties and can separate patient survival. Cross-species analyses reveals conserved epigenetic regulation of mouse and human glioblastoma stem cells. We conclude that epigenetic control of glioblastoma stem cells primarily is dictated by developmental origin which impacts clinically relevant glioblastoma stem cell properties and patient survival.


Assuntos
Glioblastoma , Linhagem da Célula/genética , Cromatina/genética , Epigênese Genética , Glioblastoma/genética , Humanos , Células-Tronco Neoplásicas
11.
Plant Commun ; 3(5): 100333, 2022 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-35643085

RESUMO

The tribe Triticeae provides important staple cereal crops and contains elite wild species with wide genetic diversity and high tolerance to abiotic stresses. Sea barleygrass (Hordeum marinum Huds.), a wild Triticeae species, thrives in saline marshlands and is well known for its high tolerance to salinity and waterlogging. Here, a 3.82-Gb high-quality reference genome of sea barleygrass is assembled de novo, with 3.69 Gb (96.8%) of its sequences anchored onto seven chromosomes. In total, 41 045 high-confidence (HC) genes are annotated by homology, de novo prediction, and transcriptome analysis. Phylogenetics, non-synonymous/synonymous mutation ratios (Ka/Ks), and transcriptomic and functional analyses provide genetic evidence for the divergence in morphology and salt tolerance among sea barleygrass, barley, and wheat. The large variation in post-domestication genes (e.g. IPA1 and MOC1) may cause interspecies differences in plant morphology. The extremely high salt tolerance of sea barleygrass is mainly attributed to low Na+ uptake and root-to-shoot translocation, which are mainly controlled by SOS1, HKT, and NHX transporters. Agrobacterium-mediated transformation and CRISPR/Cas9-mediated gene editing systems were developed for sea barleygrass to promote its utilization for exploration and functional studies of hub genes and for the genetic improvement of cereal crops.


Assuntos
Domesticação , Hordeum , Produtos Agrícolas/genética , Grão Comestível/genética , Edição de Genes , Hordeum/genética , Poaceae/genética , Tolerância ao Sal/genética
12.
OMICS ; 25(10): 652-659, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34520261

RESUMO

Type 2 diabetes (T2D) is characterized by pathophysiological alterations in lipid metabolism. One strategy to understand the molecular mechanisms behind these abnormalities is to identify cis-regulatory elements (CREs) located in chromatin-accessible regions of the genome that regulate key genes. In this study we integrated assay for transposase-accessible chromatin followed by sequencing (ATAC-seq) data, widely used to decode chromatin accessibility, with multi-omics data and publicly available CRE databases to identify candidate CREs associated with T2D for further experimental validations. We performed high-sensitive ATAC-seq in nine human liver samples from normal and T2D donors, and identified a set of differentially accessible regions (DARs). We identified seven DARs including a candidate enhancer for the ACOT1 gene that regulates the balance of acyl-CoA and free fatty acids (FFAs) in the cytoplasm. The relevance of ACOT1 regulation in T2D was supported by the analysis of transcriptomics and proteomics data in liver tissue. Long-chain acyl-CoA thioesterases (ACOTs) are a group of enzymes that hydrolyze acyl-CoA esters to FFAs and coenzyme A. ACOTs have been associated with regulation of triglyceride levels, fatty acid oxidation, mitochondrial function, and insulin signaling, linking their regulation to the pathogenesis of T2D. Our strategy integrating chromatin accessibility with DNA binding and other types of omics provides novel insights on the role of genetic regulation in T2D and is extendable to other complex multifactorial diseases.


Assuntos
Diabetes Mellitus Tipo 2 , Metabolismo dos Lipídeos , Cromatina/metabolismo , Sequenciamento de Cromatina por Imunoprecipitação , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Humanos , Metabolismo dos Lipídeos/genética , Fígado/metabolismo , Tioléster Hidrolases/genética , Tioléster Hidrolases/metabolismo
13.
Nat Commun ; 12(1): 6489, 2021 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-34764264

RESUMO

The role of focal amplifications and extrachromosomal DNA (ecDNA) is unknown in gastric cardia adenocarcinoma (GCA). Here, we identify frequent focal amplifications and ecDNAs in Chinese GCA patient samples, and find focal amplifications in the GCA cohort are associated with the chromothripsis process and may be induced by accumulated DNA damage due to local dietary habits. We observe diverse correlations between the presence of oncogene focal amplifications and prognosis, where ERBB2 focal amplifications positively correlate with prognosis and EGFR focal amplifications negatively correlate with prognosis. Large-scale ERBB2 immunohistochemistry results from 1668 GCA patients show survival probability of ERBB2 positive patients is lower than that of ERBB2 negative patients when their surviving time is under 2 years, however, the tendency is opposite when their surviving time is longer than 2 years. Our observations indicate that the ERBB2 focal amplifications may represent a good prognostic marker in GCA patients.


Assuntos
Adenocarcinoma/genética , Adenocarcinoma/patologia , Cromotripsia , Neoplasias Gástricas/genética , Neoplasias Gástricas/patologia , Instabilidade Cromossômica/genética , Instabilidade Cromossômica/fisiologia , Metilação de DNA/genética , Humanos , Imuno-Histoquímica , Prognóstico
14.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1264-1273, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-28222000

RESUMO

Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.


Assuntos
Biologia Computacional/métodos , Processamento de Proteína Pós-Traducional , Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Bases de Dados de Proteínas , Humanos , Metilação , Peptídeos/química , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
15.
Sci Rep ; 7: 40242, 2017 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-28079126

RESUMO

As one of the most abundant RNA post-transcriptional modifications, N6-methyladenosine (m6A) involves in a broad spectrum of biological and physiological processes ranging from mRNA splicing and stability to cell differentiation and reprogramming. However, experimental identification of m6A sites is expensive and laborious. Therefore, it is urgent to develop computational methods for reliable prediction of m6A sites from primary RNA sequences. In the current study, a new method called RAM-ESVM was developed for detecting m6A sites from Saccharomyces cerevisiae transcriptome, which employed ensemble support vector machine classifiers and novel sequence features. The jackknife test results show that RAM-ESVM outperforms single support vector machine classifiers and other existing methods, indicating that it would be a useful computational tool for detecting m6A sites in S. cerevisiae. Furthermore, a web server named RAM-ESVM was constructed and could be freely accessible at http://server.malab.cn/RAM-ESVM/.


Assuntos
Adenosina/análogos & derivados , Biologia Computacional , RNA/análise , Máquina de Vetores de Suporte , Transcriptoma , Adenosina/análise , Adenosina/metabolismo , RNA/metabolismo , Processamento Pós-Transcricional do RNA , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
16.
IEEE Trans Nanobioscience ; 16(4): 240-247, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28166503

RESUMO

Many recent efforts have been made for the development of machine learning-based methods for fast and accurate phosphorylation site prediction. Currently, a majority of well-performing methods are based on hybrid information to build prediction models, such as evolutionary information, disorder information, and so on. Unfortunately, this type of methods suffers two major limitations: one is that it would not be much of help for protein phosphorylation site prediction in case of no obvious homology detected; the other is that computing such the complicated information is time-consuming, which probably limits the usage of predictors in practical applications. In this paper, we present a simple, fast, and powerful feature representation algorithm, which sufficiently explores the sequential information from multiple perspectives only based on primary sequences, and successfully captures the differences between true phosphorylation sites and hboxnon-phosphorylation sites. Using the proposed features, we propose a random forest-based predictor named PhosPred-RF in the prediction of protein phosphorylation sites from proteins. We evaluate and compare the proposed predictor with the state-of-the-art predictors on some benchmark data sets. The experimental results show that PhosPred-RF outperforms other existing predictors, demonstrating its potential to be a useful tool for protein phosphorylation site prediction. Currently, the proposed PhosPred-RF is freely accessible to the public through the user-friendly webserver http://server.malab.cn/PhosPred-RF.


Assuntos
Fosfoproteínas/análise , Fosfoproteínas/química , Processamento de Proteína Pós-Traducional , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Fosforilação
17.
Sci Rep ; 7: 46757, 2017 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-28440291

RESUMO

N6-methyladenosine (m6A) refers to methylation of the adenosine nucleotide acid at the nitrogen-6 position. It plays an important role in a series of biological processes, such as splicing events, mRNA exporting, nascent mRNA synthesis, nuclear translocation and translation process. Numerous experiments have been done to successfully characterize m6A sites within sequences since high-resolution mapping of m6A sites was established. However, as the explosive growth of genomic sequences, using experimental methods to identify m6A sites are time-consuming and expensive. Thus, it is highly desirable to develop fast and accurate computational identification methods. In this study, we propose a sequence-based predictor called RAM-NPPS for identifying m6A sites within RNA sequences, in which we present a novel feature representation algorithm based on multi-interval nucleotide pair position specificity, and use support vector machine classifier to construct the prediction model. Comparison results show that our proposed method outperforms the state-of-the-art predictors on three benchmark datasets across the three species, indicating the effectiveness and robustness of our method. Moreover, an online webserver implementing the proposed predictor has been established at http://server.malab.cn/RAM-NPPS/. It is anticipated to be a useful prediction tool to assist biologists to reveal the mechanisms of m6A site functions.


Assuntos
Adenosina/análogos & derivados , Arabidopsis/genética , Biologia Computacional/métodos , Nucleotídeos/química , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/genética , Máquina de Vetores de Suporte , Adenosina/química , Adenosina/genética , Adenosina/metabolismo , Algoritmos , Conjuntos de Dados como Assunto , Humanos , Nucleotídeos/genética , Nucleotídeos/metabolismo , RNA Mensageiro/genética
18.
Sci Rep ; 7(1): 16437, 2017 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-29180805

RESUMO

Selecting informative genes, including individually discriminant genes and synergic genes, from expression data has been useful for medical diagnosis and prognosis. Detecting synergic genes is more difficult than selecting individually discriminant genes. Several efforts have recently been made to detect gene-gene synergies, such as dendrogram-based I(X 1; X 2; Y) (mutual information), doublets (gene pairs) and MIC(X 1; X 2; Y) based on the maximal information coefficient. It is unclear whether dendrogram-based I(X 1; X 2; Y) and doublets can capture synergies efficiently. Although MIC(X 1; X 2; Y) can capture a wide range of interaction, it has a high computational cost triggered by its 3-D search. In this paper, we developed a simple and fast approach based on abs conversion type (i.e. Z = |X 1 - X 2|) and t-test, to detect interactions in simulation and real-world datasets. Our results showed that dendrogram-based I(X 1; X 2; Y) and doublets are helpless for discovering pair-wise gene interactions, our approach can discover typical pair-wise synergic genes efficiently. These synergic genes can reach comparable accuracy to the individually discriminant genes using the same number of genes. Classifier cannot learn well if synergic genes have not been converted properly. Combining individually discriminant and synergic genes can improve the prediction performance.


Assuntos
Algoritmos , Genes , Simulação por Computador , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Neoplasias/genética
19.
Artif Intell Med ; 83: 67-74, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28320624

RESUMO

Computational methods are employed in bioinformatics to predict protein-protein interactions (PPIs). PPIs and protein-protein non-interactions (PPNIs) display different levels of development, and the number of PPIs is considerably greater than that of PPNIs. This significant difference in the number of PPIs and PPNIs increases the cost of constructing a balanced dataset. PPIs can be classified as either physical or genetic. However, ready-made PPNI databases were proven only to have no physical interactions and were not proven to have no genetic interactions. Hence, ready-made PPNI databases contain false negative non-interactions. In this study, two PPNI datasets were artificially generated from a PPI database. In contrast to various traditional PPI feature extraction methods based on sequential information, two types of novel feature extraction methods were proposed. One is based on secondary structure information, and the other is based on the physicochemical properties of proteins. The experimental results of the RandomPairs dataset validate the efficiency and effectiveness of the proposed prediction model. These results reveal the potential of constructing a PPI negative dataset to reduce false negatives. Related datasets, tools, and source codes are accessible at http://lab.malab.cn/soft/PPIPre/PPIPre.html.


Assuntos
Biologia Computacional/métodos , Processamento de Linguagem Natural , Mapas de Interação de Proteínas , Proteínas/metabolismo , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/classificação , Reprodutibilidade dos Testes , Relação Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA