Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors.

Vorontsov, Ilya E; Eliseeva, Irina A; Zinkevich, Arsenii; Nikonov, Mikhail; Abramov, Sergey; Boytsov, Alexandr; Kamenets, Vasily; Kasianova, Alexandra; Kolmykov, Semyon; Yevshin, Ivan S; Favorov, Alexander; Medvedeva, Yulia A; Jolma, Arttu; Kolpakov, Fedor; Makeev, Vsevolod J; Kulakovskiy, Ivan V.

Nucleic Acids Res ; 52(D1): D154-D163, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37971293

RESUMO

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.

Assuntos

Bases de Dados Genéticas , Regulação da Expressão Gênica , Domínios e Motivos de Interação entre Proteínas , Fatores de Transcrição , Animais , Humanos , Camundongos , Sítios de Ligação/genética , Motivos de Nucleotídeos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Internet , Domínios e Motivos de Interação entre Proteínas/genética

2.

ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs.

Boytsov, Alexandr; Abramov, Sergey; Aiusheeva, Ariuna Z; Kasianova, Alexandra M; Baulin, Eugene; Kuznetsov, Ivan A; Aulchenko, Yurii S; Kolmykov, Semyon; Yevshin, Ivan; Kolpakov, Fedor; Vorontsov, Ilya E; Makeev, Vsevolod J; Kulakovskiy, Ivan V.

Nucleic Acids Res ; 50(W1): W51-W56, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35446421

RESUMO

We present ANANASTRA, https://ananastra.autosome.org, a web server for the identification and annotation of regulatory single-nucleotide polymorphisms (SNPs) with allele-specific binding events. ANANASTRA accepts a list of dbSNP IDs or a VCF file and reports allele-specific binding (ASB) sites of particular transcription factors or in specific cell types, highlighting those with ASBs significantly enriched at SNPs in the query list. ANANASTRA is built on top of a systematic analysis of allelic imbalance in ChIP-Seq experiments and performs the ASB enrichment test against background sets of SNPs found in the same source experiments as ASB sites but not displaying significant allelic imbalance. We illustrate ANANASTRA usage with selected case studies and expect that ANANASTRA will help to conduct the follow-up of GWAS in terms of establishing functional hypotheses and designing experimental verification.

Assuntos

Polimorfismo de Nucleotídeo Único , Fatores de Transcrição , Alelos , Sítios de Ligação , Estudo de Associação Genômica Ampla , Ligação Proteica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Proteínas de Ligação a DNA

3.

GTRD: an integrated view of transcription regulation.

Kolmykov, Semyon; Yevshin, Ivan; Kulyashov, Mikhail; Sharipov, Ruslan; Kondrakhin, Yury; Makeev, Vsevolod J; Kulakovskiy, Ivan V; Kel, Alexander; Kolpakov, Fedor.

Nucleic Acids Res ; 49(D1): D104-D111, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33231677

RESUMO

The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.

Assuntos

Bases de Dados Genéticas , Regulação da Expressão Gênica , Genoma , Fatores de Transcrição/genética , Transcrição Gênica , Animais , Linhagem Celular , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Ontologia Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Software , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo

4.

A holistic view of mouse enhancer architectures reveals analogous pleiotropic effects and correlation with human disease.

Sethi, Siddharth; Vorontsov, Ilya E; Kulakovskiy, Ivan V; Greenaway, Simon; Williams, John; Makeev, Vsevolod J; Brown, Steve D M; Simon, Michelle M; Mallon, Ann-Marie.

BMC Genomics ; 21(1): 754, 2020 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-33138777

RESUMO

BACKGROUND: Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype. RESULTS: Though super-enhancers drive high total- and tissue-specific expression of their associated genes, we find that typical-enhancers also contribute heavily to the tissue-specific expression landscape on account of their large numbers in the genome. Unexpectedly, we demonstrate that both enhancer types are preferentially associated with relevant 'tissue-type' phenotypes and exhibit no difference in phenotype effect size or pleiotropy. Modelling regulatory data alongside molecular data, we built a predictive model to infer gene-phenotype associations and use this model to predict potentially novel disease-associated genes. CONCLUSION: Overall our findings reveal that differing enhancer architectures have a similar impact on mammalian phenotypes whilst harbouring differing cellular and expression effects. Together, our results systematically characterise enhancers with predicted phenotypic traits endorsing the role for both types of enhancers in human disease and disorders.

Assuntos

Elementos Facilitadores Genéticos , Animais , Elementos Facilitadores Genéticos/genética , Humanos , Camundongos , Fenótipo

5.

HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.

Kulakovskiy, Ivan V; Vorontsov, Ilya E; Yevshin, Ivan S; Sharipov, Ruslan N; Fedorova, Alla D; Rumynskiy, Eugene I; Medvedeva, Yulia A; Magana-Mora, Arturo; Bajic, Vladimir B; Papatsenko, Dmitry A; Kolpakov, Fedor A; Makeev, Vsevolod J.

Nucleic Acids Res ; 46(D1): D252-D259, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29140464

RESUMO

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.

Assuntos

Bases de Dados Genéticas , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Humanos , Camundongos , Modelos Genéticos , Motivos de Nucleotídeos , Análise de Sequência de DNA

6.

HDL activates expression of genes stimulating cholesterol efflux in human monocyte-derived macrophages.

Orekhov, Alexander N; Pushkarsky, Tatiana; Oishi, Yumiko; Nikiforov, Nikita G; Zhelankin, Andrey V; Dubrovsky, Larisa; Makeev, Vsevolod J; Foxx, Kathy; Jin, Xueting; Kruth, Howard S; Sobenin, Igor A; Sukhorukov, Vasily N; Zakiev, Emile R; Kontush, Anatol; Le Goff, Wilfried; Bukrinsky, Michael.

Exp Mol Pathol ; 105(2): 202-207, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30118702

RESUMO

High density lipoproteins (HDL) are key components of reverse cholesterol transport pathway. HDL removes excessive cholesterol from peripheral cells, including macrophages, providing protection from cholesterol accumulation and conversion into foam cells, which is a key event in pathogenesis of atherosclerosis. The mechanism of cellular cholesterol efflux stimulation by HDL involves interaction with the ABCA1 lipid transporter and ensuing transfer of cholesterol to HDL particles. In this study, we looked for additional proteins contributing to HDL-dependent cholesterol efflux. Using RNAseq, we analyzed mRNAs induced by HDL in human monocyte-derived macrophages and identified three genes, fatty acid desaturase 1 (FADS1), insulin induced gene 1 (INSIG1), and the low-density lipoprotein receptor (LDLR), expression of which was significantly upregulated by HDL. We individually knocked down these genes in THP-1 cells using gene silencing by siRNA, and measured cellular cholesterol efflux to HDL. Knock down of FADS1 did not significantly change cholesterol efflux (pâ¯=â¯0.70), but knockdown of INSIG1 and LDLR resulted in highly significant reduction of the efflux to HDL (67% and 75% of control, respectively, pâ¯<â¯0.001). Importantly, the suppression of cholesterol efflux was independent of known effects of these genes on cellular cholesterol content, as cells were loaded with cholesterol using acetylated LDL. These results indicate that HDL particles stimulate expression of genes that enhance cellular cholesterol transfer to HDL.

Assuntos

HDL-Colesterol/genética , Macrófagos/fisiologia , Transportador 1 de Cassete de Ligação de ATP/genética , Aterosclerose/fisiopatologia , Transporte Biológico , Colesterol , HDL-Colesterol/metabolismo , Dessaturase de Ácido Graxo Delta-5 , Ácidos Graxos Dessaturases/genética , Ácidos Graxos Dessaturases/metabolismo , Células Espumosas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/genética , Inativação Gênica , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Lipoproteínas HDL/genética , Lipoproteínas HDL/metabolismo , Macrófagos/metabolismo , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , RNA Mensageiro , RNA Interferente Pequeno , Receptores de LDL/genética , Receptores de LDL/metabolismo , Células THP-1 , Regulação para Cima

7.

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.

Kulakovskiy, Ivan V; Vorontsov, Ilya E; Yevshin, Ivan S; Soboleva, Anastasiia V; Kasianov, Artem S; Ashoor, Haitham; Ba-Alawi, Wail; Bajic, Vladimir B; Medvedeva, Yulia A; Kolpakov, Fedor A; Makeev, Vsevolod J.

Nucleic Acids Res ; 44(D1): D116-25, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26586801

RESUMO

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.

Assuntos

Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Imunoprecipitação da Cromatina , Humanos , Camundongos , Modelos Biológicos , Análise de Sequência de DNA

8.

Negative selection maintains transcription factor binding motifs in human cancer.

Vorontsov, Ilya E; Khimulya, Grigory; Lukianova, Elena N; Nikolaeva, Daria D; Eliseeva, Irina A; Kulakovskiy, Ivan V; Makeev, Vsevolod J.

BMC Genomics ; 17 Suppl 2: 395, 2016 06 23.

Artigo em Inglês | MEDLINE | ID: mdl-27356864

RESUMO

BACKGROUND: Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and, consequently, expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer. Cancer somatic mutations in binding sites of selected transcription factors have been found under positive selection. However, action and significance of negative selection in non-coding regions remain controversial. RESULTS: Here we present analysis of transcription factor binding motifs co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of predicted binding sites. Such stability of binding motifs is even more exhibited in DNase accessible regions. CONCLUSIONS: Our data demonstrate negative selection against binding sites alterations and suggest that such selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors with conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.

Assuntos

DNA/metabolismo , Mutação , Neoplasias/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação , DNA/química , DNA/genética , Humanos , Neoplasias/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Seleção Genética , Fatores de Transcrição/química

9.

Phenomenon of individual difference in human monocyte activation.

Orekhov, Alexander N; Nikiforov, Nikita G; Elizova, Natalia V; Ivanova, Ekaterina A; Makeev, Vsevolod J.

Exp Mol Pathol ; 99(1): 151-4, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26107006

RESUMO

Macrophages play an important role in the pathogenesis of atherosclerosis, including the early pre-clinical stages of the disease development. We have explored the possibility that the disease onset could be associated with altered monocyte/macrophage response to activating pro- and anti-inflammatory stimuli. We evaluated the susceptibility of circulating monocytes from healthy individuals and patients with asymptomatic carotid atherosclerosis to M1 and M2 activation. The obtained data indicated the existence of a remarkable individual difference in susceptibility to activation among monocytes isolated from the blood of different subjects, regardless of the presence or absence of atherosclerosis. The identified differences in susceptibility to activation between monocytes may explain the individual peculiarities of the immune response in different subjects.

Assuntos

Doenças das Artérias Carótidas/imunologia , Monócitos/citologia , Monócitos/imunologia , Linfócitos T CD4-Positivos/citologia , Linfócitos T CD4-Positivos/imunologia , Doenças das Artérias Carótidas/patologia , Espessura Intima-Media Carotídea , Quimiocinas CC/genética , Quimiocinas CC/metabolismo , Estudos Transversais , Progressão da Doença , Humanos , Imunidade Inata/imunologia , Macrófagos , Monócitos/metabolismo , Fator de Necrose Tumoral alfa/genética , Fator de Necrose Tumoral alfa/metabolismo

10.

Complete Genome Sequence of Bifidobacterium longum GT15: Identification and Characterization of Unique and Global Regulatory Genes.

Zakharevich, Natalia V; Averina, Olga V; Klimina, Ksenia M; Kudryavtseva, Anna V; Kasianov, Artem S; Makeev, Vsevolod J; Danilenko, Valery N.

Microb Ecol ; 70(3): 819-34, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-25894918

RESUMO

In this study, we report the first completely annotated genome sequence of the Russia origin Bifidobacterium longum subsp. longum strain GT15. Comparative genomic analysis of this genome with other available completely annotated genome sequences of B. longum strains isolated from other countries has revealed a high degree of conservation and synteny across the entire genomes. However, it was discovered that the open reading frames to 35 genes were detected only from the B. longum GT15 genome and absent from other genomes B. longum strains (not of Russian origin). These so-called unique genes (UGs) represent a total length of 39,066 bp, with G + C content ranging from 37 to 65 %. Interestingly, certain genes were detected in other B. longum strains of Russian origin. In our analysis, we examined genes for global regulatory systems: proteins of toxin-antitoxin (TA) systems type II, serine/threonine protein kinases (STPKs) of eukaryotic type, and genes of the WhiB-like family proteins. In addition, we have made in silico analysis of all the most significant probiotic genes and considered genes involved in epigenetic regulation and genes responsible for producing various neuromediators. This genome sequence may elucidate the biology of this probiotic strain as a promising candidate for practical (pharmaceutical) applications.

Assuntos

Bifidobacterium/genética , Cromossomos Bacterianos/genética , Genoma Bacteriano , Bifidobacterium/metabolismo , Mapeamento Cromossômico , Cromossomos Bacterianos/metabolismo , Epigênese Genética , Dados de Sequência Molecular , Filogenia , Federação Russa , Análise de Sequência de DNA

11.

HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.

Kulakovskiy, Ivan V; Medvedeva, Yulia A; Schaefer, Ulf; Kasianov, Artem S; Vorontsov, Ilya E; Bajic, Vladimir B; Makeev, Vsevolod J.

Nucleic Acids Res ; 41(Database issue): D195-202, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23175603

RESUMO

Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.

Assuntos

Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Humanos , Internet , Modelos Genéticos , Matrizes de Pontuação de Posição Específica

12.

Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.

Levitsky, Victor G; Kulakovskiy, Ivan V; Ershov, Nikita I; Oshchepkov, Dmitry Yu; Makeev, Vsevolod J; Hodgman, T C; Merkulova, Tatyana I.

BMC Genomics ; 15: 80, 2014 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-24472686

RESUMO

BACKGROUND: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. RESULTS: Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. CONCLUSIONS: The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.

Assuntos

Imunoprecipitação da Cromatina , Biologia Computacional , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Camundongos

13.

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A.

Nucleic Acids Res ; 40(12): e93, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22422836

RESUMO

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Assuntos

Regulação da Expressão Gênica , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Algoritmos , Animais , Padronização Corporal/genética , Drosophila/embriologia , Drosophila/genética , Drosophila/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Músculos/metabolismo , Matrizes de Pontuação de Posição Específica , Software

14.

Annotation of nuclear lncRNAs based on chromatin interactions.

Agrawal, Saumya; Buyan, Andrey; Severin, Jessica; Koido, Masaru; Alam, Tanvir; Abugessaisa, Imad; Chang, Howard Y; Dostie, Josée; Itoh, Masayoshi; Kere, Juha; Kondo, Naoto; Li, Yunjing; Makeev, Vsevolod J; Mendez, Mickaël; Okazaki, Yasushi; Ramilowski, Jordan A; Sigorskikh, Andrey I; Strug, Lisa J; Yagi, Ken; Yasuzawa, Kayoko; Yip, Chi Wai; Hon, Chung Chau; Hoffman, Michael M; Terao, Chikashi; Kulakovskiy, Ivan V; Kasukawa, Takeya; Shin, Jay W; Carninci, Piero; de Hoon, Michiel J L.

PLoS One ; 19(5): e0295971, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38709794

RESUMO

The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Some specific nuclear lncRNAs have been shown to be important regulatory components acting locally. As RNA-chromatin interaction and Hi-C chromatin conformation data showed that chromatin interactions of nuclear lncRNAs are determined by the local chromatin 3D conformation, we used Hi-C data to identify potential target genes of lncRNAs. RNA-protein interaction data suggested that nuclear lncRNAs act as scaffolds to recruit regulatory proteins to target promoters and enhancers. Nuclear lncRNAs may therefore play a role in directing regulatory factors to locations spatially close to the lncRNA gene. We provide the analysis results through an interactive visualization web portal at https://fantom.gsc.riken.jp/zenbu/reports/#F6_3D_lncRNA.

Assuntos

Cromatina , RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Cromatina/metabolismo , Cromatina/genética , Humanos , Anotação de Sequência Molecular , Núcleo Celular/metabolismo , Núcleo Celular/genética , Genoma Humano , Regiões Promotoras Genéticas

15.

Exploring massive, genome scale datasets with the GenometriCorr package.

Favorov, Alexander; Mularoni, Loris; Cope, Leslie M; Medvedeva, Yulia; Mironov, Andrey A; Makeev, Vsevolod J; Wheelan, Sarah J.

PLoS Comput Biol ; 8(5): e1002529, 2012 May.

Artigo em Inglês | MEDLINE | ID: mdl-22693437

RESUMO

UNLABELLED: We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. AVAILABILITY AND IMPLEMENTATION: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.

Assuntos

Bases de Dados Genéticas , Genômica/métodos , Armazenamento e Recuperação da Informação , Modelos Genéticos , Modelos Estatísticos , Software , Animais , Cromossomos , Epigenômica , Loci Gênicos , Genoma , Humanos , Internet , RNA de Transferência/genética , Estatísticas não Paramétricas , Interface Usuário-Computador

16.

Zebrafish pigment cells develop directly from persistent highly multipotent progenitors.

Subkhankulova, Tatiana; Camargo Sosa, Karen; Uroshlev, Leonid A; Nikaido, Masataka; Shriever, Noah; Kasianov, Artem S; Yang, Xueyan; Rodrigues, Frederico S L M; Carney, Thomas J; Bavister, Gemma; Schwetlick, Hartmut; Dawes, Jonathan H P; Rocco, Andrea; Makeev, Vsevolod J; Kelsh, Robert N.

Nat Commun ; 14(1): 1258, 2023 03 06.

Artigo em Inglês | MEDLINE | ID: mdl-36878908

RESUMO

Neural crest cells are highly multipotent stem cells, but it remains unclear how their fate restriction to specific fates occurs. The direct fate restriction model hypothesises that migrating cells maintain full multipotency, whilst progressive fate restriction envisages fully multipotent cells transitioning to partially-restricted intermediates before committing to individual fates. Using zebrafish pigment cell development as a model, we show applying NanoString hybridization single cell transcriptional profiling and RNAscope in situ hybridization that neural crest cells retain broad multipotency throughout migration and even in post-migratory cells in vivo, with no evidence for partially-restricted intermediates. We find that leukocyte tyrosine kinase early expression marks a multipotent stage, with signalling driving iridophore differentiation through repression of fate-specific transcription factors for other fates. We reconcile the direct and progressive fate restriction models by proposing that pigment cell development occurs directly, but dynamically, from a highly multipotent state, consistent with our recently-proposed Cyclical Fate Restriction model.

Assuntos

Condução de Veículo , Peixe-Zebra , Animais , Peixe-Zebra/genética , Células-Tronco Hematopoéticas , Células-Tronco Multipotentes , Diferenciação Celular/genética

17.

The complete genome sequence of Pantoea ananatis AJ13355, an organism with great biotechnological potential.

Hara, Yoshihiko; Kadotani, Naoki; Izui, Hiroshi; Katashkina, Joanna I; Kuvaeva, Tatiana M; Andreeva, Irina G; Golubeva, Lyubov I; Malko, Dmitry B; Makeev, Vsevolod J; Mashko, Sergey V; Kozlov, Yurii I.

Appl Microbiol Biotechnol ; 93(1): 331-41, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22159605

RESUMO

Pantoea ananatis AJ13355 is a newly identified member of the Enterobacteriaceae family with promising biotechnological applications. This bacterium is able to grow at an acidic pH and is resistant to saturating concentrations of L-glutamic acid, making this organism a suitable host for the production of L-glutamate. In the current study, the complete genomic sequence of P. ananatis AJ13355 was determined. The genome was found to consist of a single circular chromosome consisting of 4,555,536 bp [DDBJ: AP012032] and a circular plasmid, pEA320, of 321,744 bp [DDBJ: AP012033]. After automated annotation, 4,071 protein-coding sequences were identified in the P. ananatis AJ13355 genome. For 4,025 of these genes, functions were assigned based on homologies to known proteins. A high level of nucleotide sequence identity (99%) was revealed between the genome of P. ananatis AJ13355 and the previously published genome of P. ananatis LMG 20103. Short colinear regions, which are identical to DNA sequences in the Escherichia coli MG1655 chromosome, were found to be widely dispersed along the P. ananatis AJ13355 genome. Conjugal gene transfer from E. coli to P. ananatis, mediated by homologous recombination between short identical sequences, was also experimentally demonstrated. The determination of the genome sequence has paved the way for the directed metabolic engineering of P. ananatis to produce biotechnologically relevant compounds.

Assuntos

DNA Bacteriano/química , DNA Bacteriano/genética , Genoma Bacteriano , Pantoea/genética , Cromossomos Bacterianos , Conjugação Genética , DNA Circular/química , DNA Circular/genética , Escherichia coli/genética , Transferência Genética Horizontal , Dados de Sequência Molecular , Plasmídeos , Recombinação Genética , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico

18.

Positional weight matrices have sufficient prediction power for analysis of noncoding variants.

Boytsov, Alexandr; Abramov, Sergey; Makeev, Vsevolod J; Kulakovskiy, Ivan V.

F1000Res ; 11: 33, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35811788

RESUMO

The position weight matrix, also called the position-specific scoring matrix, is the commonly accepted model to quantify the specificity of transcription factor binding to DNA. Position weight matrices are used in thousands of projects and software tools in regulatory genomics, including computational prediction of the regulatory impact of single-nucleotide variants. Yet, recently Yan et al. reported that "the position weight matrices of most transcription factors lack sufficient predictive power" if applied to the analysis of regulatory variants studied with a newly developed experimental method, SNP-SELEX. Here, we re-analyze the rich experimental dataset obtained by Yan et al. and show that appropriately selected position weight matrices in fact can adequately quantify transcription factor binding to alternative alleles.

Assuntos

Software , Fatores de Transcrição , Sítios de Ligação/genética , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

19.

The gene regulation knowledge commons: the action area of GREEKC.

Kuiper, Martin; Bonello, Joseph; Fernández-Breis, Jesualdo T; Bucher, Philipp; Futschik, Matthias E; Gaudet, Pascale; Kulakovskiy, Ivan V; Licata, Luana; Logie, Colin; Lovering, Ruth C; Makeev, Vsevolod J; Orchard, Sandra; Panni, Simona; Perfetto, Livia; Sant, David; Schulz, Stefan; Vercruysse, Steven; Zerbino, Daniel R; Lægreid, Astrid.

Biochim Biophys Acta Gene Regul Mech ; 1865(1): 194768, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34757206

RESUMO

As computational modeling becomes more essential to analyze and understand biological regulatory mechanisms, governance of the many databases and knowledge bases that support this domain is crucial to guarantee reliability and interoperability of resources. To address this, the COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various steps in the knowledge management process that focuses on understanding gene regulatory mechanisms. The discussions between ontologists, curators, text miners, biologists, bioinformaticians, philosophers and computational scientists spawned a host of activities aimed to standardize and update existing knowledge management workflows and involve end-users in the process of designing the Gene Regulation Knowledge Commons (GRKC). Here the GREEKC consortium describes its main achievements in improving this GRKC.

Assuntos

Regulação da Expressão Gênica , Reprodutibilidade dos Testes

20.

De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum).

Logacheva, Maria D; Kasianov, Artem S; Vinogradov, Dmitriy V; Samigullin, Tagir H; Gelfand, Mikhail S; Makeev, Vsevolod J; Penin, Aleksey A.

BMC Genomics ; 12: 30, 2011 Jan 13.

Artigo em Inglês | MEDLINE | ID: mdl-21232141

RESUMO

BACKGROUND: Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales--a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. RESULTS: Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. CONCLUSIONS: 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.

Assuntos

Fagopyrum/genética , Flores/genética , Perfilação da Expressão Gênica , Fagopyrum/classificação , Anotação de Sequência Molecular , RNA de Plantas/genética , Análise de Sequência de DNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA