Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D154-D163, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971293

RESUMO

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica , Domínios e Motivos de Interação entre Proteínas , Fatores de Transcrição , Animais , Humanos , Camundongos , Sítios de Ligação/genética , Motivos de Nucleotídeos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Internet , Domínios e Motivos de Interação entre Proteínas/genética
2.
BMC Bioinformatics ; 25(1): 238, 2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39003441

RESUMO

MOTIVATION: Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. RESULTS: In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.


Assuntos
Variação Genética , Genoma Humano , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Software , Algoritmos , Estudo de Associação Genômica Ampla/métodos
3.
Development ; 148(22)2021 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-35020872

RESUMO

Neural crest cells are crucial in development, not least because of their remarkable multipotency. Early findings stimulated two hypotheses for how fate specification and commitment from fully multipotent neural crest cells might occur, progressive fate restriction (PFR) and direct fate restriction, differing in whether partially restricted intermediates were involved. Initially hotly debated, they remain unreconciled, although PFR has become favoured. However, testing of a PFR hypothesis of zebrafish pigment cell development refutes this view. We propose a novel 'cyclical fate restriction' hypothesis, based upon a more dynamic view of transcriptional states, reconciling the experimental evidence underpinning the traditional hypotheses.


Assuntos
Diferenciação Celular/genética , Linhagem da Célula/genética , Crista Neural/crescimento & desenvolvimento , Peixe-Zebra/crescimento & desenvolvimento , Animais , Linhagem da Célula/fisiologia , Transição Epitelial-Mesenquimal/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Melanócitos/metabolismo , Pigmentação/genética , Peixe-Zebra/genética , Proteínas de Peixe-Zebra/genética
4.
Nucleic Acids Res ; 50(W1): W51-W56, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35446421

RESUMO

We present ANANASTRA, https://ananastra.autosome.org, a web server for the identification and annotation of regulatory single-nucleotide polymorphisms (SNPs) with allele-specific binding events. ANANASTRA accepts a list of dbSNP IDs or a VCF file and reports allele-specific binding (ASB) sites of particular transcription factors or in specific cell types, highlighting those with ASBs significantly enriched at SNPs in the query list. ANANASTRA is built on top of a systematic analysis of allelic imbalance in ChIP-Seq experiments and performs the ASB enrichment test against background sets of SNPs found in the same source experiments as ASB sites but not displaying significant allelic imbalance. We illustrate ANANASTRA usage with selected case studies and expect that ANANASTRA will help to conduct the follow-up of GWAS in terms of establishing functional hypotheses and designing experimental verification.


Assuntos
Polimorfismo de Nucleotídeo Único , Fatores de Transcrição , Alelos , Sítios de Ligação , Estudo de Associação Genômica Ampla , Ligação Proteica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Proteínas de Ligação a DNA
5.
Nucleic Acids Res ; 50(18): 10264-10277, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36130228

RESUMO

The mutational spectrum of the mitochondrial DNA (mtDNA) does not resemble any of the known mutational signatures of the nuclear genome and variation in mtDNA mutational spectra between different organisms is still incomprehensible. Since mitochondria are responsible for aerobic respiration, it is expected that mtDNA mutational spectrum is affected by oxidative damage. Assuming that oxidative damage increases with age, we analyse mtDNA mutagenesis of different species in regards to their generation length. Analysing, (i) dozens of thousands of somatic mtDNA mutations in samples of different ages (ii) 70053 polymorphic synonymous mtDNA substitutions reconstructed in 424 mammalian species with different generation lengths and (iii) synonymous nucleotide content of 650 complete mitochondrial genomes of mammalian species we observed that the frequency of AH > GH substitutions (H: heavy strand notation) is twice bigger in species with high versus low generation length making their mtDNA more AH poor and GH rich. Considering that AH > GH substitutions are also sensitive to the time spent single-stranded (TSSS) during asynchronous mtDNA replication we demonstrated that AH > GH substitution rate is a function of both species-specific generation length and position-specific TSSS. We propose that AH > GH is a mitochondria-specific signature of oxidative damage associated with both aging and TSSS.


Assuntos
Envelhecimento , DNA Mitocondrial , Mamíferos , Envelhecimento/genética , Animais , DNA Mitocondrial/genética , Mamíferos/genética , Mitocôndrias/genética , Mutação , Nucleotídeos
6.
Nucleic Acids Res ; 49(D1): D104-D111, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33231677

RESUMO

The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica , Genoma , Fatores de Transcrição/genética , Transcrição Gênica , Animais , Linhagem Celular , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Ontologia Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Software , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo
7.
BMC Genomics ; 21(1): 754, 2020 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-33138777

RESUMO

BACKGROUND: Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype. RESULTS: Though super-enhancers drive high total- and tissue-specific expression of their associated genes, we find that typical-enhancers also contribute heavily to the tissue-specific expression landscape on account of their large numbers in the genome. Unexpectedly, we demonstrate that both enhancer types are preferentially associated with relevant 'tissue-type' phenotypes and exhibit no difference in phenotype effect size or pleiotropy. Modelling regulatory data alongside molecular data, we built a predictive model to infer gene-phenotype associations and use this model to predict potentially novel disease-associated genes. CONCLUSION: Overall our findings reveal that differing enhancer architectures have a similar impact on mammalian phenotypes whilst harbouring differing cellular and expression effects. Together, our results systematically characterise enhancers with predicted phenotypic traits endorsing the role for both types of enhancers in human disease and disorders.


Assuntos
Elementos Facilitadores Genéticos , Animais , Elementos Facilitadores Genéticos/genética , Humanos , Camundongos , Fenótipo
8.
Nucleic Acids Res ; 46(D1): D252-D259, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29140464

RESUMO

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.


Assuntos
Bases de Dados Genéticas , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Humanos , Camundongos , Modelos Genéticos , Motivos de Nucleotídeos , Análise de Sequência de DNA
9.
Int J Mol Sci ; 21(8)2020 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-32295185

RESUMO

Accumulation of lipid-laden (foam) cells in the arterial wall is known to be the earliest step in the pathogenesis of atherosclerosis. There is almost no doubt that atherogenic modified low-density lipoproteins (LDL) are the main sources of accumulating lipids in foam cells. Atherogenic modified LDL are taken up by arterial cells, such as macrophages, pericytes, and smooth muscle cells in an unregulated manner bypassing the LDL receptor. The present study was conducted to reveal possible common mechanisms in the interaction of macrophages with associates of modified LDL and non-lipid latex particles of a similar size. To determine regulatory pathways that are potentially responsible for cholesterol accumulation in human macrophages after the exposure to naturally occurring atherogenic or artificially modified LDL, we used transcriptome analysis. Previous studies of our group demonstrated that any type of LDL modification facilitates the self-association of lipoprotein particles. The size of such self-associates hinders their interaction with a specific LDL receptor. As a result, self-associates are taken up by nonspecific phagocytosis bypassing the LDL receptor. That is why we used latex beads as a stimulator of macrophage phagocytotic activity. We revealed at least 12 signaling pathways that were regulated by the interaction of macrophages with the multiple-modified atherogenic naturally occurring LDL and with latex beads in a similar manner. Therefore, modified LDL was shown to stimulate phagocytosis through the upregulation of certain genes. We have identified at least three genes (F2RL1, EIF2AK3, and IL15) encoding inflammatory molecules and associated with signaling pathways that were upregulated in response to the interaction of modified LDL with macrophages. Knockdown of two of these genes, EIF2AK3 and IL15, completely suppressed cholesterol accumulation in macrophages. Correspondingly, the upregulation of EIF2AK3 and IL15 promoted cholesterol accumulation. These data confirmed our hypothesis of the following chain of events in atherosclerosis: LDL particles undergo atherogenic modification; this is accompanied by the formation of self-associates; large LDL associates stimulate phagocytosis; as a result of phagocytosis stimulation, pro-inflammatory molecules are secreted; these molecules cause or at least contribute to the accumulation of intracellular cholesterol. This chain of events may explain the relationship between cholesterol accumulation and inflammation. The primary sequence of events in this chain is related to inflammatory response rather than cholesterol accumulation.


Assuntos
Colesterol/metabolismo , Células Espumosas/metabolismo , Metabolismo dos Lipídeos , Transdução de Sinais , Biomarcadores , Suscetibilidade a Doenças , Células Espumosas/patologia , Perfilação da Expressão Gênica , Humanos , Inflamação/etiologia , Inflamação/metabolismo , Inflamação/patologia , Mediadores da Inflamação/metabolismo , Macrófagos/metabolismo , Macrófagos/patologia , Modelos Biológicos
11.
Exp Mol Pathol ; 105(2): 202-207, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30118702

RESUMO

High density lipoproteins (HDL) are key components of reverse cholesterol transport pathway. HDL removes excessive cholesterol from peripheral cells, including macrophages, providing protection from cholesterol accumulation and conversion into foam cells, which is a key event in pathogenesis of atherosclerosis. The mechanism of cellular cholesterol efflux stimulation by HDL involves interaction with the ABCA1 lipid transporter and ensuing transfer of cholesterol to HDL particles. In this study, we looked for additional proteins contributing to HDL-dependent cholesterol efflux. Using RNAseq, we analyzed mRNAs induced by HDL in human monocyte-derived macrophages and identified three genes, fatty acid desaturase 1 (FADS1), insulin induced gene 1 (INSIG1), and the low-density lipoprotein receptor (LDLR), expression of which was significantly upregulated by HDL. We individually knocked down these genes in THP-1 cells using gene silencing by siRNA, and measured cellular cholesterol efflux to HDL. Knock down of FADS1 did not significantly change cholesterol efflux (p = 0.70), but knockdown of INSIG1 and LDLR resulted in highly significant reduction of the efflux to HDL (67% and 75% of control, respectively, p < 0.001). Importantly, the suppression of cholesterol efflux was independent of known effects of these genes on cellular cholesterol content, as cells were loaded with cholesterol using acetylated LDL. These results indicate that HDL particles stimulate expression of genes that enhance cellular cholesterol transfer to HDL.


Assuntos
HDL-Colesterol/genética , Macrófagos/fisiologia , Transportador 1 de Cassete de Ligação de ATP/genética , Aterosclerose/fisiopatologia , Transporte Biológico , Colesterol , HDL-Colesterol/metabolismo , Dessaturase de Ácido Graxo Delta-5 , Ácidos Graxos Dessaturases/genética , Ácidos Graxos Dessaturases/metabolismo , Células Espumosas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/genética , Inativação Gênica , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Lipoproteínas HDL/genética , Lipoproteínas HDL/metabolismo , Macrófagos/metabolismo , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , RNA Mensageiro , RNA Interferente Pequeno , Receptores de LDL/genética , Receptores de LDL/metabolismo , Células THP-1 , Regulação para Cima
12.
Nucleic Acids Res ; 44(D1): D116-25, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26586801

RESUMO

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.


Assuntos
Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Imunoprecipitação da Cromatina , Humanos , Camundongos , Modelos Biológicos , Análise de Sequência de DNA
13.
BMC Genomics ; 17 Suppl 2: 395, 2016 06 23.
Artigo em Inglês | MEDLINE | ID: mdl-27356864

RESUMO

BACKGROUND: Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and, consequently, expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer. Cancer somatic mutations in binding sites of selected transcription factors have been found under positive selection. However, action and significance of negative selection in non-coding regions remain controversial. RESULTS: Here we present analysis of transcription factor binding motifs co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of predicted binding sites. Such stability of binding motifs is even more exhibited in DNase accessible regions. CONCLUSIONS: Our data demonstrate negative selection against binding sites alterations and suggest that such selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors with conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.


Assuntos
DNA/metabolismo , Mutação , Neoplasias/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação , DNA/química , DNA/genética , Humanos , Neoplasias/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Seleção Genética , Fatores de Transcrição/química
14.
Exp Mol Pathol ; 99(1): 151-4, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26107006

RESUMO

Macrophages play an important role in the pathogenesis of atherosclerosis, including the early pre-clinical stages of the disease development. We have explored the possibility that the disease onset could be associated with altered monocyte/macrophage response to activating pro- and anti-inflammatory stimuli. We evaluated the susceptibility of circulating monocytes from healthy individuals and patients with asymptomatic carotid atherosclerosis to M1 and M2 activation. The obtained data indicated the existence of a remarkable individual difference in susceptibility to activation among monocytes isolated from the blood of different subjects, regardless of the presence or absence of atherosclerosis. The identified differences in susceptibility to activation between monocytes may explain the individual peculiarities of the immune response in different subjects.


Assuntos
Doenças das Artérias Carótidas/imunologia , Monócitos/citologia , Monócitos/imunologia , Linfócitos T CD4-Positivos/citologia , Linfócitos T CD4-Positivos/imunologia , Doenças das Artérias Carótidas/patologia , Espessura Intima-Media Carotídea , Quimiocinas CC/genética , Quimiocinas CC/metabolismo , Estudos Transversais , Progressão da Doença , Humanos , Imunidade Inata/imunologia , Macrófagos , Monócitos/metabolismo , Fator de Necrose Tumoral alfa/genética , Fator de Necrose Tumoral alfa/metabolismo
15.
Microb Ecol ; 70(3): 819-34, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25894918

RESUMO

In this study, we report the first completely annotated genome sequence of the Russia origin Bifidobacterium longum subsp. longum strain GT15. Comparative genomic analysis of this genome with other available completely annotated genome sequences of B. longum strains isolated from other countries has revealed a high degree of conservation and synteny across the entire genomes. However, it was discovered that the open reading frames to 35 genes were detected only from the B. longum GT15 genome and absent from other genomes B. longum strains (not of Russian origin). These so-called unique genes (UGs) represent a total length of 39,066 bp, with G + C content ranging from 37 to 65 %. Interestingly, certain genes were detected in other B. longum strains of Russian origin. In our analysis, we examined genes for global regulatory systems: proteins of toxin-antitoxin (TA) systems type II, serine/threonine protein kinases (STPKs) of eukaryotic type, and genes of the WhiB-like family proteins. In addition, we have made in silico analysis of all the most significant probiotic genes and considered genes involved in epigenetic regulation and genes responsible for producing various neuromediators. This genome sequence may elucidate the biology of this probiotic strain as a promising candidate for practical (pharmaceutical) applications.


Assuntos
Bifidobacterium/genética , Cromossomos Bacterianos/genética , Genoma Bacteriano , Bifidobacterium/metabolismo , Mapeamento Cromossômico , Cromossomos Bacterianos/metabolismo , Epigênese Genética , Dados de Sequência Molecular , Filogenia , Federação Russa , Análise de Sequência de DNA
16.
Nucleic Acids Res ; 41(Database issue): D195-202, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23175603

RESUMO

Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.


Assuntos
Bases de Dados Genéticas , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Humanos , Internet , Modelos Genéticos , Matrizes de Pontuação de Posição Específica
17.
BMC Genomics ; 15: 80, 2014 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-24472686

RESUMO

BACKGROUND: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. RESULTS: Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. CONCLUSIONS: The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.


Assuntos
Imunoprecipitação da Cromatina , Biologia Computacional , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Camundongos
18.
Nucleic Acids Res ; 40(12): e93, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22422836

RESUMO

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.


Assuntos
Regulação da Expressão Gênica , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Algoritmos , Animais , Padronização Corporal/genética , Drosophila/embriologia , Drosophila/genética , Drosophila/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Músculos/metabolismo , Matrizes de Pontuação de Posição Específica , Software
19.
PLoS One ; 19(5): e0295971, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38709794

RESUMO

The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Some specific nuclear lncRNAs have been shown to be important regulatory components acting locally. As RNA-chromatin interaction and Hi-C chromatin conformation data showed that chromatin interactions of nuclear lncRNAs are determined by the local chromatin 3D conformation, we used Hi-C data to identify potential target genes of lncRNAs. RNA-protein interaction data suggested that nuclear lncRNAs act as scaffolds to recruit regulatory proteins to target promoters and enhancers. Nuclear lncRNAs may therefore play a role in directing regulatory factors to locations spatially close to the lncRNA gene. We provide the analysis results through an interactive visualization web portal at https://fantom.gsc.riken.jp/zenbu/reports/#F6_3D_lncRNA.


Assuntos
Cromatina , RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Cromatina/metabolismo , Cromatina/genética , Humanos , Anotação de Sequência Molecular , Núcleo Celular/metabolismo , Núcleo Celular/genética , Genoma Humano , Regiões Promotoras Genéticas
20.
PLoS Comput Biol ; 8(5): e1002529, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22693437

RESUMO

UNLABELLED: We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. AVAILABILITY AND IMPLEMENTATION: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Armazenamento e Recuperação da Informação , Modelos Genéticos , Modelos Estatísticos , Software , Animais , Cromossomos , Epigenômica , Loci Gênicos , Genoma , Humanos , Internet , RNA de Transferência/genética , Estatísticas não Paramétricas , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA