Pesquisa | Portal Regional da BVS

Assessment of transcriptional importance of cell line-specific features based on GTRD and FANTOM5 data.

Sharipov, Ruslan N; Kondrakhin, Yury V; Ryabova, Anna S; Yevshin, Ivan S; Kolpakov, Fedor A.

PLoS One ; 15(12): e0243332, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33347457

RESUMO

Creating a complete picture of the regulation of transcription seems to be an urgent task of modern biology. Regulation of transcription is a complex process carried out by transcription factors (TFs) and auxiliary proteins. Over the past decade, ChIP-Seq has become the most common experimental technology studying genome-wide interactions between TFs and DNA. We assessed the transcriptional significance of cell line-specific features using regression analysis of ChIP-Seq datasets from the GTRD database and transcriptional start site (TSS) activities from the FANTOM5 expression atlas. For this purpose, we initially generated a large number of features that were defined as the presence or absence of TFs in different promoter regions around TSSs. Using feature selection and regression analysis, we identified sets of the most important TFs that affect expression activity of TSSs in human cell lines such as HepG2, K562 and HEK293. We demonstrated that some TFs can be classified as repressors and activators depending on their location relative to TSS.

Assuntos

Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Fatores de Transcrição , Transcriptoma , Células HEK293 , Células Hep G2 , Humanos , Células K562 , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo

Population size estimation for quality control of ChIP-Seq datasets.

Kolmykov, Semyon K; Kondrakhin, Yury V; Yevshin, Ivan S; Sharipov, Ruslan N; Ryabova, Anna S; Kolpakov, Fedor A.

PLoS One ; 14(8): e0221760, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31465497

RESUMO

Chromatin immunoprecipitation followed by sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as ENCODE, GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets. Comprehensive control of dataset quality is currently indispensable to select the most reliable data for further analysis. In addition to existing quality control metrics, we have developed two novel metrics that allow to control false positives and false negatives in ChIP-Seq datasets. For this purpose, we have adapted well-known population size estimate for determination of unknown number of genuine transcription factor binding regions. Determination of the proposed metrics was based on overlapping distinct binding sites derived from processing one ChIP-Seq experiment by different peak callers. Moreover, the metrics also can be useful for assessing quality of datasets obtained from processing distinct ChIP-Seq experiments by a given peak caller. We also have shown that these metrics appear to be useful not only for dataset selection but also for comparison of peak callers and identification of site motifs based on ChIP-Seq datasets. The developed algorithm for determination of the false positive control metric and false negative control metric for ChIP-Seq datasets was implemented as a plugin for a BioUML platform: https://ict.biouml.org/bioumlweb/chipseq_analysis.html.

Assuntos

Sequenciamento de Cromatina por Imunoprecipitação , Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA , Algoritmos , Área Sob a Curva , Sítios de Ligação , Controle de Qualidade , Curva ROC , Fatores de Transcrição/metabolismo

Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features.

Volkova, Oxana A; Kondrakhin, Yury V; Kashapov, Timur A; Sharipov, Ruslan N.

J Bioinform Comput Biol ; 16(2): 1840013, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29739305

RESUMO

RNA plays an important role in the intracellular cell life and in the organism in general. Besides the well-established protein coding RNAs (messenger RNAs, mRNAs), long non-coding RNAs (lncRNAs) have gained the attention of recent researchers. Although lncRNAs have been classified as non-coding, some authors reported the presence of corresponding sequences in ribosome profiling data (Ribo-seq). Ribo-seq technology is a powerful experimental tool utilized to characterize RNA translation in cell with focus on initiation (harringtonine, lactimidomycin) and elongation (cycloheximide). By exploiting translation starts obtained from the Ribo-seq experiment, we developed a novel position weight matrix model for the prediction of translation starts. This model allowed us to achieve 96% accuracy of discrimination between human mRNAs and lncRNAs. When the same model was used for the prediction of putative ORFs in RNAs, we discovered that the majority of lncRNAs contained only small ORFs ([Formula: see text][Formula: see text]nt) in contrast to mRNAs.

Assuntos

Biologia Computacional/métodos , Proteínas/genética , RNA Longo não Codificante , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Algoritmos , Fases de Leitura Aberta , Biossíntese de Proteínas , RNA Mensageiro/genética , Ribossomos/genética , Análise de Sequência de RNA

Assessment of translational importance of mammalian mRNA sequence features based on Ribo-Seq and mRNA-Seq data.

Volkova, Oxana A; Kondrakhin, Yury V; Yevshin, Ivan S; Valeev, Tagir F; Sharipov, Ruslan N.

J Bioinform Comput Biol ; 14(2): 1641006, 2016 04.

Artigo em Inglês | MEDLINE | ID: mdl-27122318

RESUMO

Ribosome profiling technology (Ribo-Seq) allowed to highlight more details of mRNA translation in cell and get additional information on importance of mRNA sequence features for this process. Application of translation inhibitors like harringtonine and cycloheximide along with mRNA-Seq technique helped to assess such important characteristic as translation efficiency. We assessed the translational importance of features of mRNA sequences with the help of statistical analysis of Ribo-Seq and mRNA-Seq data. Translationally important features known from literature as well as proposed by the authors were used in analysis. Such comparisons as protein coding versus non-coding RNAs and high- versus low-translated mRNAs were performed. We revealed a set of features that allowed to discriminate the compared categories of RNA. Significant relationships between mRNA features and efficiency of translation were also established.

Assuntos

Mamíferos/genética , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Animais , Códon de Iniciação , Humanos , Camundongos , Biossíntese de Proteínas , Proteínas/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Ribossomos/genética

Identification of differentially expressed genes by meta-analysis of microarray data on breast cancer.

Kondrakhin, Yury V; Sharipov, Ruslan N; Keld, Alexander E; Kolpakov, Fedor A.

In Silico Biol ; 8(5-6): 383-411, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-19374127

RESUMO

Albeit the great number of microarray data available on breast cancer, reliable identification of genes associated with breast cancer development remains a challenge. The aim of this work was to develop a novel method of meta-analysis for the identification of differentially expressed genes integrating results of several independent microarray experiments. We developed a statistical method for identification of up- and down-regulated genes to perform meta-analysis. The method takes advantage of hypergeometric and binomial distributions. Using our method we performed meta-analysis of five data sets from independent cDNA-microarray experiments on breast cancer. The meta-analysis revealed that 3.2% and 2.8% of the 24,726 analyzed genes are significantly (P-value < 0.01) down- and up-regulated, respectively. We also show that properly applied meta-analysis is a good tool for comparison of different breast cancer subtypes. Our meta-analysis showed that the expression of the majority of genes does not show significant differences in different subtypes of breast cancer. Here, we report the rationale, development and application of meta-analysis that enable us to identify biologically meaningful features of breast cancer. The algorithm we propose for the meta-analysis can reveal the features specific to the breast cancer subtypes and those common to breast cancer. The results allow us to revise the previously generated lists of genes associated with breast cancer and also identify most promising anticancer drug-target genes.

Assuntos

Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica/genética , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos , Neoplasias da Mama/classificação , Heterogeneidade Genética , Humanos

Recognition of interferon-inducible sites, promoters, and enhancers.

Ananko, Elena A; Kondrakhin, Yury V; Merkulova, Tatiana I; Kolchanov, Nikolay A.

BMC Bioinformatics ; 8: 56, 2007 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-17309789

RESUMO

BACKGROUND: Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-kappaB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions. RESULTS: We developed computational approaches to the recognition of the binding sites for ISGF3, STAT1, IRF1, and NF-kappaB. Analysis of the distribution of these binding sites demonstrated that the regions -500 upstream of the transcription start site in IFN-inducible genes are enriched in putative binding sites for these transcription factors. Based on selected combinations of the sites whose frequencies were significantly higher than in the other functional gene groups, we developed methods for the prediction of the IFN-inducible promoters and enhancers. We analyzed 1004 sequences of the IFN-inducible genes compiled using microarray data analyses and also about 10,000 human gene sequences from the EPD and RefSeq databases; 74 of 1,664 human genes annotated in EPD were significantly IFN-inducible. CONCLUSION: Analyses of several control datasets demonstrated that the developed methods have a high accuracy of prediction of the IFN-inducible genes. Application of these methods to several datasets suggested that the number of the IFN-inducible genes is approximately 1500-2000 in the human genome.

Assuntos

Mapeamento Cromossômico/métodos , Elementos Facilitadores Genéticos/genética , Interferons/genética , Elementos Reguladores de Transcrição/genética , Análise de Sequência de DNA/métodos , Fatores de Transcrição/genética , Ativação Transcricional/genética , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Ligação Proteica , Alinhamento de Sequência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA