Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Hereditas ; 159(1): 7, 2022 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-35063044

RESUMO

BACKGROUND: Breast cancer is the malignant tumor with the highest incidence in women. DNA methylation has an important effect on breast cancer, but the effect of abnormal DNA methylation on gene expression in breast cancer is still unclear. Therefore, it is very important to find therapeutic targets related to DNA methylation. RESULTS: In this work, we calculated the DNA methylation distribution and gene expression level in cancer and para-cancerous tissues for breast cancer samples. We found that DNA methylation in key regions is closely related to gene expression by analyzing the relationship between the distribution characteristics of DNA methylation in different regions and the change of gene expression level. Finally, the 18 key genes (17 tumor suppressor genes and 1 oncogene) related to prognosis were confirmed by the survival analysis of clinical data. Some important DNA methylation regions in these genes that result in breast cancer were found. CONCLUSIONS: We believe that 17 TSGs and 1 oncogene may be breast cancer biomarkers regulated by DNA methylation in key regions. These results will help to explore DNA methylation biomarkers as potential therapeutic targets for breast cancer.


Assuntos
Neoplasias da Mama , Metilação de DNA , Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Regiões Promotoras Genéticas
2.
Genomics ; 112(1): 853-858, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31170440

RESUMO

Abnormal histone modifications (HMs) and transcription factors (TFs) can alter the expression of cancer-related genes to promote tumorigenesis. We studied the variations of 11 HMs and 2 TFs in human breast cancer cells (MCF-7) compared to human normal mammary epithelial cells (HMEC), and the effects of HMs/TFs in various regions of the genome on the expression changes of breast cancer-related genes. Based on HMs and TFs signals' differences between MCF-7 and HMEC flanking TSSs, the up- and down-regulated genes in MCF-7 were predicted by Random Forest, and important HMs and regions were found. Results indicate that H3K79me2, H3K27ac, and H3K4me1 are particularly important for the changes of gene expression in MCF-7. Especially, H3K79me2 around the 60-th bin flanking TSSs may be the key for regulating gene expression. Our studies reveal H3K79me2 may be a core HM for breast cancer.


Assuntos
Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica , Código das Histonas , Neoplasias da Mama/metabolismo , Feminino , Humanos , Células MCF-7 , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição
3.
Genomics ; 112(2): 2072-2079, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31809797

RESUMO

Promoter is an important functional elements of DNA sequences, which is in charge of gene transcription initiation. Recognizing promoter have important help for understanding the relative life phenomena. Based on the concept that promoter is mainly determined by its sequence and structure, a novel statistical physics model for predicting promoter in Escherichia coli K-12 is proposed. The total energies of DNA local structure of sequence segments in the three benchmark promoter sequence datasets, the sole prediction parameter, are calculated by using principles from statistical physics and information theory. The better results are obtained. And a web-server PhysMPrePro for predicting promoter is established at http://202.207.14.87:8032/bioinformation/PhysMPrePro/index.asp, so that other scientists can easily get their desired results by our web-server.


Assuntos
DNA Bacteriano/química , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Software , DNA Bacteriano/genética , Escherichia coli , Termodinâmica
4.
J Theor Biol ; 445: 136-150, 2018 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-29476833

RESUMO

The enhancer-promoter interactions (EPIs) with strong tissue-specificity play an important role in cis-regulatory mechanism of human cell lines. However, it still remains a challenging work to predict these interactions so far. Due to that these interactions are regulated by the cooperativeness of diverse functional genomic signatures, DNA spatial structure and DNA sequence elements. In this paper, by adding DNA structure properties and transcription factor binding motifs, we presented an improved computational method to predict EPIs in human cell lines. In comparison with the results of other group on the same datasets, our best accuracies by cross-validation test were about 15%-24% higher in the same cell lines, and the accuracies by independent test were about 11%-15% higher in new cell lines. Meanwhile, we found that transcription factor binding motifs and DNA structure properties have important information that would largely determine long range EPIs prediction. From the distribution comparisons, we also found their distinct differences between interacting and non-interacting sets in each cell line. Then, the correlation analysis and network models for relationships among top-ranked functional genomic signatures indicated that diverse genomic signatures would cooperatively establish a complex regulatory network to facilitate long range EPIs. The experimental results provided additional insights about the roles of DNA intrinsic properties and functional genomic signatures in EPIs prediction.


Assuntos
Genoma Humano/fisiologia , Modelos Biológicos , Motivos de Nucleotídeos/fisiologia , Elementos de Resposta/fisiologia , Fatores de Transcrição/metabolismo , Linhagem Celular , Humanos
5.
Genomics ; 109(5-6): 341-352, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28579514

RESUMO

Enhancer-promoter interaction (EPI) is an important cis-regulatory mechanism in the regulation of tissue-specific gene expression. However, it still has limitation to precisely identity these interactions so far. In this paper, using diverse genomic features for various regulatory regions, we presented a computational approach to predict EPIs with improved accuracies. Meanwhile, we comprehensively studied more potential regulatory factors that are important to EPIs prediction, such as nucleosome occupancy, enhancer RNA; and found the cell line-specificity and region-specificity of the contributions of diverse regulatory signatures. By adding genomic signatures of segmented regulatory regions, our best accuracies of cross-validation test were about 11%-16% higher than the previous results, indicating the location-specificity of genomic signatures in a regulatory region for predicting EPIs. Additionally, more training samples and related features can provide reliable performances in new cell lines. Consequently, our study provided additional insights into the roles of diverse signature features for predicting long-range EPIs.


Assuntos
Biologia Computacional/métodos , Elementos Facilitadores Genéticos , Nucleossomos/genética , Algoritmos , Linhagem Celular , Células HeLa , Humanos , Células K562 , Modelos Genéticos , Especificidade de Órgãos , Regiões Promotoras Genéticas
6.
Genomics ; 107(1): 9-15, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26697761

RESUMO

Non-coding RNA (ncRNA) genes make transcripts as same as the encoding genes, and ncRNAs directly function as RNAs rather than serve as blueprints for proteins. As the function of ncRNA is closely related to organelle genomes, it is desirable to explore ncRNA function by confirming its provenance. In this paper, the topology secondary structure, motif and the triplets under three reading frames are considered as parameters of ncRNAs. A method of SVM combining the increment of diversity (ID) algorithm is applied to construct the classifier. When the method is applied to the ncRNA dataset less than 80% sequence identity, the overall accuracies reach 95.57%, 96.40% in the five-fold cross-validation and the jackknife test, respectively. Further, for the independent testing dataset, the average prediction success rate of our method achieved 93.24%. The higher predictive success rates indicate that our method is very helpful for distinguishing ncRNAs from various organelle genomes.


Assuntos
Algoritmos , DNA de Cloroplastos/química , DNA de Cinetoplasto/química , DNA Mitocondrial/química , Fases de Leitura Aberta , RNA não Traduzido/química , Sequência de Bases , DNA de Cloroplastos/genética , DNA de Cinetoplasto/genética , DNA Mitocondrial/genética , Conformação de Ácido Nucleico , RNA não Traduzido/genética , Análise de Sequência de DNA/métodos
7.
Bioinformatics ; 29(6): 678-85, 2013 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-23335013

RESUMO

MOTIVATION: Protein-DNA interactions often take part in various crucial processes, which are essential for cellular function. The identification of DNA-binding sites in proteins is important for understanding the molecular mechanisms of protein-DNA interaction. Thus, we have developed an improved method to predict DNA-binding sites by integrating structural alignment algorithm and support vector machine-based methods. RESULTS: Evaluated on a new non-redundant protein set with 224 chains, the method has 80.7% sensitivity and 82.9% specificity in the 5-fold cross-validation test. In addition, it predicts DNA-binding sites with 85.1% sensitivity and 85.3% specificity when tested on a dataset with 62 protein-DNA complexes. Compared with a recently published method, BindN+, our method predicts DNA-binding sites with a 7% better area under the receiver operating characteristic curve value when tested on the same dataset. Many important problems in cell biology require the dense non-linear interactions between functional modules be considered. Thus, our prediction method will be useful in detecting such complex interactions.


Assuntos
Algoritmos , Proteínas de Ligação a DNA/química , DNA/química , Sítios de Ligação , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Estrutura Secundária de Proteína , Curva ROC , Análise de Sequência de Proteína , Máquina de Vetores de Suporte
8.
Genomics ; 102(4): 215-22, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23891614

RESUMO

For a successful RNA interference (RNAi) experiment, selecting the small interference RNA (siRNA) candidates which maximize the knock down effect of the given gene is the critical step. Although various computational approaches have been attempted, the design of efficient siRNA candidates is far from satisfactory yet. In this study, we proposed a novel feature selection algorithm of combined random forest and support vector machine to predict active siRNAs. Using a publically available dataset, we demonstrated that the predictive accuracy would be markedly improved when the context sequence features outside the target site were included. The Pearson correlation coefficient for regression is as high as 0.721, compared to 0.671, 0.668, 0.680, and 0.645, for Biopredsi, i-score, ThermoComposition21 and DSIR, respectively. It revealed that siRNA-target interaction requires appropriate sequence context not only in the target site but also in a broad region flanking the target site.


Assuntos
Biologia Computacional/métodos , Interferência de RNA , RNA Interferente Pequeno/genética , Algoritmos , Sequência de Bases , Bases de Dados Genéticas , Modelos Moleculares , Análise de Regressão , Análise de Sequência de RNA , Máquina de Vetores de Suporte
9.
Epigenomics ; 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38511238

RESUMO

Aim: The present study was designed to investigate the coregulatory effects of multiple histone modifications (HMs) on gene expression in lung adenocarcinoma (LUAD). Materials & methods: Ten histones for LUAD were analyzed using ChIP-seq and RNA-seq data. An innovative computational method is proposed to quantify the coregulatory effects of multiple HMs on gene expression to identify strong coregulatory genes and regions. This method was applied to explore the coregulatory mechanisms of key ferroptosis-related genes in LUAD. Results: Nine strong coregulatory regions were identified for six ferroptosis-related genes with diverse coregulatory patterns (CA9, PGD, CDKN2A, PML, OTUB1 and NFE2L2). Conclusion: This quantitative method could be used to identify important HM coregulatory genes and regions that may be epigenetic regulatory targets in cancers.

10.
Comput Biol Med ; 173: 108396, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38574529

RESUMO

Acute myeloid leukemia (AML) is an aggressive malignancy characterized by challenges in treatment, including drug resistance and frequent relapse. Recent research highlights the crucial roles of tumor microenvironment (TME) in assisting tumor cell immune escape and promoting tumor aggressiveness. This study delves into the interplay between AML and TME. Through the exploration of potential driver genes, we constructed an AML prognostic index (AMLPI). Cross-platform data and multi-dimensional internal and external validations confirmed that the AMLPI outperforms existing models in terms of areas under the receiver operating characteristic curves, concordance index values, and net benefits. High AMLPIs in AML patients were indicative of unfavorable prognostic outcomes. Immune analyses revealed that the high-AMLPI samples exhibit higher expression of HLA-family genes and immune checkpoint genes (including PD1 and CTLA4), along with lower T cell infiltration and higher macrophage infiltration. Genetic variation analyses revealed that the high-AMLPI samples associate with adverse variation events, including TP53 mutations, secondary NPM1 co-mutations, and copy number deletions. Biological interpretation indicated that ALDH2 and SPATS2L contribute significantly to AML patient survival, and their abnormal expression correlates with DNA methylation at cg12142865 and cg11912272. Drug response analyses revealed that different AMLPI samples tend to have different clinical selections, with low-AMLPI samples being more likely to benefit from immunotherapy. Finally, to facilitate broader access to our findings, a user-friendly and publicly accessible webserver was established and available at http://bioinfor.imu.edu.cn/amlpi. This server provides tools including TME-related AML driver genes mining, AMLPI construction, multi-dimensional validations, AML patients risk assessment, and figures drawing.


Assuntos
Leucemia Mieloide Aguda , Nucleofosmina , Humanos , Prognóstico , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Leucemia Mieloide Aguda/terapia , Metilação de DNA , Microambiente Tumoral , Aldeído-Desidrogenase Mitocondrial/genética , Aldeído-Desidrogenase Mitocondrial/metabolismo
11.
Comput Biol Med ; 169: 107884, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38154158

RESUMO

Overall cancer hypomethylation had been identified in the past, but it is not clear exactly which hypomethylation site is the more important for the occurrence of cancer. To identify key hypomethylation sites, we studied the effect of hypomethylation in twelve regions on gene expression in colon adenocarcinoma (COAD). The key DNA methylation sites of cg18949415, cg22193385 and important genes of C6orf223, KRT7 were found by constructing a prognostic model, survival analysis and random combination prediction a series of in-depth systematic calculations and analyses, and the results were validated by GEO database, immune microenvironment, drug and functional enrichment analysis. Based on the expression values of C6orf223, KRT7 genes and the DNA methylation values of cg18949415, cg22193385 sites, the least diversity increment algorithm were used to predict COAD and normal sample. The 100 % reliability and 97.12 % correctness of predicting tumor samples were obtained in jackknife test. Moreover, we found that C6orf223 gene, cg18949415 site play a more important role than KRT7 gene, cg22193385 site in COAD. In addition, we investigate the impact of key methylation sites on three-dimensional chromatin structure. Our results will be help for experimental studies and may be an epigenetic biomarker for COAD.


Assuntos
Adenocarcinoma , Neoplasias do Colo , Humanos , Metilação de DNA , Reprodutibilidade dos Testes , Biomarcadores , Microambiente Tumoral
12.
Amino Acids ; 44(2): 573-80, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22851052

RESUMO

The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .


Assuntos
Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Bases de Dados de Proteínas , Homologia de Sequência de Aminoácidos
13.
J Theor Biol ; 334: 45-51, 2013 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-23770403

RESUMO

Bioluminescent proteins are highly sensitive optical reporters for imaging in live animals; they have been extensively used in analytical applications in intracellular monitoring, genetic regulation and detection, and immune and binding assays. In this work, we systematically analyzed the sequence and structure information of 199 bioluminescent and nonbioluminescent proteins, respectively. Based on the results, we presented a novel method called auto covariance of averaged chemical shift (acACS) for extracting structure features from a sequence. A classifier of support vector machine (SVM) fusing increment of diversity (ID) was used to distinguish bioluminescent proteins from nonbioluminescent proteins by combining dipeptide composition, reduced amino acid composition, evolutionary information, and acACS. The overall prediction accuracy evaluated by jackknife validation reached 82.16%. This result was better than that obtained by other existing methods. Improvement of the overall prediction accuracy reached up to 5.33% higher than those of the SVM and auto covariance of sequential evolution information by 10-fold cross-validation. The acACS algorithm also outperformed other feature extraction methods, indicating that our approach is better than other existing methods in the literature.


Assuntos
Aminoácidos/genética , Evolução Molecular , Proteínas Luminescentes/genética , Máquina de Vetores de Suporte , Aminoácidos/química , Aminoácidos/metabolismo , Animais , Bases de Dados de Proteínas , Variação Genética , Proteínas Luminescentes/química , Proteínas Luminescentes/metabolismo , Modelos Genéticos
14.
Biophys Rep ; 9(1): 45-56, 2023 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-37426199

RESUMO

Abnormal histone modifications (HMs) can promote the occurrence of breast cancer. To elucidate the relationship between HMs and gene expression, we analyzed HM binding patterns and calculated their signal changes between breast tumor cells and normal cells. On this basis, the influences of HM signal changes on the expression changes of breast cancer-related genes were estimated by three different methods. The results showed that H3K79me2 and H3K36me3 may contribute more to gene expression changes. Subsequently, 2109 genes with differential H3K79me2 or H3K36me3 levels during cancerogenesis were identified by the Shannon entropy and submitted to perform functional enrichment analyses. Enrichment analyses displayed that these genes were involved in pathways in cancer, human papillomavirus infection, and viral carcinogenesis. Univariate Cox, LASSO, and multivariate Cox regression analyses were then adopted, and nine potential breast cancer-related driver genes were extracted from the genes with differential H3K79me2/H3K36me3 levels in the TCGA cohort. To facilitate the application, the expression levels of nine driver genes were transformed into a risk score model, and its robustness was tested via time-dependent receiver operating characteristic curves in the TCGA dataset and an independent GEO dataset. At last, the distribution levels of H3K79me2 and H3K36me3 in the nine driver genes were reanalyzed in the two cell lines and the regions with significant signal changes were located.

15.
Amino Acids ; 43(2): 545-55, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22102053

RESUMO

Knowledge of the submitochondria location of protein is integral to understanding its function and a necessity in the proteomics era. In this work, a new submitochondria data set is constructed, and an approach for predicting protein submitochondria locations is proposed by combining the amino acid composition, dipeptide composition, reduced physicochemical properties, gene ontology, evolutionary information, and pseudo-average chemical shift. The overall prediction accuracy is 93.57% for the submitochondria location and 97.79% for the three membrane protein types in the mitochondria inner membrane using the algorithm of the increment of diversity combined with the support vector machine. The performance of the pseudo-average chemical shift is excellent. For contrast, the method is also used to predict submitochondria locations in the data set constructed by Du and Li; an accuracy of 94.95% is obtained by our method, which is better than that of other existing methods.


Assuntos
Simulação por Computador , Proteínas Mitocondriais/química , Modelos Moleculares , Algoritmos , Sequência de Aminoácidos , Evolução Molecular , Proteínas Mitocondriais/genética , Sinais Direcionadores de Proteínas , Transporte Proteico , Software , Máquina de Vetores de Suporte
16.
Amino Acids ; 42(4): 1309-16, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21191803

RESUMO

Due to the complexity of Plasmodium falciparum (PF) genome, predicting mitochondrial proteins of PF is more difficult than other species. In this study, using the n-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as feature parameter, the increment of diversity (ID) is firstly developed to predict mitochondrial proteins. By choosing the 1-peptide compositions on the N-terminal regions with 20 residues as the only input vector, the prediction performance achieves 86.86% accuracy with 0.69 Mathew's correlation coefficient (MCC) by the jackknife test. Moreover, by combining with the hydropathy distribution along protein sequence and several reduced amino acid alphabets, we achieved maximum MCC 0.82 with accuracy 92% in the jackknife test by using the developed ID model. When evaluating on an independent dataset our method performs better than existing methods. The results indicate that the ID is a simple and efficient prediction method for mitochondrial proteins of malaria parasite.


Assuntos
Biologia Computacional/métodos , Proteínas Mitocondriais/química , Plasmodium falciparum/química , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Sequência de Aminoácidos , Bases de Dados de Proteínas , Proteínas Mitocondriais/genética , Dados de Sequência Molecular , Plasmodium falciparum/genética , Análise de Sequência de Proteína
17.
J Theor Biol ; 312: 55-64, 2012 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-22874580

RESUMO

RNA-protein interactions play important roles in various biological processes. The precise detection of RNA-protein interaction sites is very important for understanding essential biological processes and annotating the function of the proteins. In this study, based on various features from amino acid sequence and structure, including evolutionary information, solvent accessible surface area and torsion angles (φ, ψ) in the backbone structure of the polypeptide chain, a computational method for predicting RNA-binding sites in proteins is proposed. When the method is applied to predict RNA-binding sites in three datasets: RBP86 containing 86 protein chains, RBP107 containing 107 proteins chains and RBP109 containing 109 proteins chains, better sensitivities and specificities are obtained compared to previously published methods in five-fold cross-validation tests. In order to make further examination for the efficiency of our method, the RBP107 dataset is used as training set, RBP86 and RBP109 datasets are used as the independent test sets. In addition, as examples of our prediction, RNA-binding sites in a few proteins are presented. The annotated results are consistent with the PDB annotation. These results show that our method is useful for annotating RNA binding sites of novel proteins.


Assuntos
Evolução Molecular , Anotação de Sequência Molecular/métodos , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , RNA/química , RNA/genética
18.
J Theor Biol ; 304: 88-95, 2012 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-22459701

RESUMO

Mycobacterium tuberculosis (MTB) is a pathogenic bacterial species in the genus Mycobacterium and the causative agent of most cases of tuberculosis (Berman et al., 2000). Knowledge of the localization of Mycobacterial protein may help unravel the normal function of this protein. Automated prediction of Mycobacterial protein subcellular localization is an important tool for genome annotation and drug discovery. In this work, a benchmark data set with 638 non-redundant mycobacterial proteins is constructed and an approach for predicting Mycobacterium subcellular localization is proposed by combining amino acid composition, dipeptide composition, reduced physicochemical property, evolutionary information, pseudo-average chemical shift. The overall prediction accuracy is 87.77% for Mycobacterial subcellular localizations and 85.03% for three membrane protein types in Integral membranes using the algorithm of increment of diversity combined with support vector machine. The performance of pseudo-average chemical shift is excellent. In order to check the performance of our method, the data set constructed by Rashid was also predicted and the accuracy of 98.12% was obtained. This indicates that our approach was better than other existing methods in literature.


Assuntos
Aminoácidos/análise , Proteínas de Bactérias/análise , Mycobacterium tuberculosis/química , Algoritmos , Físico-Química , Biologia Computacional/métodos , Bases de Dados de Proteínas , Evolução Molecular , Imageamento por Ressonância Magnética/métodos , Proteínas de Membrana/análise , Mycobacterium tuberculosis/genética , Frações Subcelulares/química
19.
Genomics ; 97(2): 112-20, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21112384

RESUMO

Accurate identification of core promoters is important for gaining more insight about the understanding of the eukaryotic transcription regulation. In this study, the authors focused on the biologically realistic promoter prediction of plant genomes. By analyzing the correlative conservation, GC-compositional bias and specific structural patterns of TATA and TATA-less promoters in PlantPromDB, a hybrid multi-feature approach based on support vector machine (SVM) for predicting the two types of promoters were developed by integrating local word content, GC-Skew and DNA geometric flexibility. Compared with the TSSP-TCM program on the same test dataset, better prediction results were obtained. Especially for the TATA-less promoter, the accuracy is 10% higher than the result of TSSP-TCM program. The good performance of the hybrid promoters and the experimental data also indicate that our method has the ability to locate the promoter region of the plant genome.


Assuntos
Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Genoma de Planta/genética , Regiões Promotoras Genéticas , Composição de Bases , Sequência de Bases , Biologia Computacional , Mineração de Dados , Análise de Sequência de DNA
20.
Front Cell Dev Biol ; 10: 815843, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35178391

RESUMO

Breast cancer is the most common cancer in the world, and DNA methylation plays a key role in the occurrence and development of breast cancer. However, the effect of DNA methylation in different gene functional regions on gene expression and the effect of gene expression on breast cancer is not completely clear. In our study, we computed and analyzed DNA methylation, gene expression, and clinical data in the TCGA database. Firstly, we calculated the distribution of abnormal DNA methylated probes in 12 regions, found the abnormal DNA methylated probes in down-regulated genes were highly enriched, and the number of hypermethylated probes in the promoter region was 6.5 times than that of hypomethylated probes. Secondly, the correlation coefficients between abnormal DNA methylated values in each functional region of differentially expressed genes and gene expression values were calculated. Then, co-expression analysis of differentially expressed genes was performed, 34 hub genes in cancer-related pathways were obtained, of which 11 genes were regulated by abnormal DNA methylation. Finally, a multivariate Cox regression analysis was performed on 27 probes of 11 genes. Three DNA methylation probes (cg13569051 and cg14399183 of GSN, and cg25274503 of CAV2) related to survival were used to construct a prognostic model, which has a good prognostic ability. Furthermore, we found that the cg25274503 hypermethylation in the promoter region inhibited the expression of the CAV2, and the hypermethylation of cg13569051 and cg14399183 in the 5'UTR region inhibited the expression of GSN. These results may provide possible molecular targets for breast cancer.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA