Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
Nucleic Acids Res ; 51(W1): W379-W386, 2023 07 05.
Article in English | MEDLINE | ID: mdl-37166953

ABSTRACT

MiniPromoters, or compact promoters, are short DNA sequences that can drive expression in specific cells and tissues. While broadly useful, they are of high relevance to gene therapy due to their role in enabling precise control of where a therapeutic gene will be expressed. Here, we present OnTarget (http://ontarget.cmmt.ubc.ca), a webserver that streamlines the MiniPromoter design process. Users only need to specify a gene of interest or custom genomic coordinates on which to focus the identification of promoters and enhancers, and can also provide relevant cell-type-specific genomic evidence (e.g. accessible chromatin regions, histone modifications, etc.). OnTarget combines the provided data with internal data to identify candidate promoters and enhancers and design MiniPromoters. To illustrate the utility of OnTarget, we designed and characterized two MiniPromoters targeting different cell populations relevant to Parkinson Disease.


Subject(s)
Computational Biology , Computer Simulation , Promoter Regions, Genetic , Software , Enhancer Elements, Genetic/genetics , Genome , Genomics , Promoter Regions, Genetic/genetics , Internet , Computational Biology/instrumentation , Computational Biology/methods
2.
Nucleic Acids Res ; 46(D1): D260-D266, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29140473

ABSTRACT

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.


Subject(s)
Databases, Genetic , Transcription Factors/metabolism , Animals , Binding Sites/genetics , Genomics , Humans , Internet , Plants/genetics , Plants/metabolism , Position-Specific Scoring Matrices , Protein Binding/genetics , User-Computer Interface , Vertebrates/genetics , Vertebrates/metabolism
3.
Nucleic Acids Res ; 45(D1): D737-D743, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27794045

ABSTRACT

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.


Subject(s)
Databases, Genetic , Gene Expression Profiling/methods , Genomics/methods , Mammals/genetics , Software , Web Browser , Animals , Computational Biology , Humans , Search Engine
4.
Nucleic Acids Res ; 44(D1): D110-5, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26531826

ABSTRACT

JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release.


Subject(s)
Databases, Genetic , Regulatory Elements, Transcriptional , Transcription Factors/metabolism , Animals , Binding Sites , DNA-Binding Proteins/chemistry , Protein Structure, Tertiary , Software , Transcription Factors/chemistry
6.
Bioinformatics ; 32(18): 2858-60, 2016 09 15.
Article in English | MEDLINE | ID: mdl-27334471

ABSTRACT

UNLABELLED: With the emergence of large-scale Cap Analysis of Gene Expression (CAGE) datasets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ∼1300 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool helps power insights into the regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions. AVAILABILITY AND IMPLEMENTATION: The CAGEd-oPOSUM web tool is implemented in Perl, MySQL and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM CONTACTS: anthony.mathelier@ncmm.uio.no or wyeth@cmmt.ubc.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Nucleotide Motifs , Software , Transcription Initiation Site , Binding Sites , High-Throughput Nucleotide Sequencing , Humans , Regulatory Sequences, Nucleic Acid , Transcription Factors
7.
Nucleic Acids Res ; 42(Database issue): D142-7, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24194598

ABSTRACT

JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR-the JASPAR CORE subcollection, which contains curated, non-redundant profiles-with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods.


Subject(s)
Databases, Genetic , Regulatory Elements, Transcriptional , Transcription Factors/metabolism , Animals , Arabidopsis/genetics , Binding Sites , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Humans , Internet , Mice , Position-Specific Scoring Matrices
8.
Hum Mutat ; 36(4): 432-8, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25703386

ABSTRACT

Advances in next-generation sequencing (NGS) technologies have helped reveal causal variants for genetic diseases. In order to establish causality, it is often necessary to compare genomes of unrelated individuals with similar disease phenotypes to identify common disrupted genes. When working with cases of rare genetic disorders, finding similar individuals can be extremely difficult. We introduce a web tool, GeneYenta, which facilitates the matchmaking process, allowing clinicians to coordinate detailed comparisons for phenotypically similar cases. Importantly, the system is focused on phenotype annotation, with explicit limitations on highly confidential data that create barriers to participation. The procedure for matching of patient phenotypes, inspired by online dating services, uses an ontology-based semantic case matching algorithm with attribute weighting. We evaluate the capacity of the system using a curated reference data set and 19 clinician entered cases comparing four matching algorithms. We find that the inclusion of clinician weights can augment phenotype matching.


Subject(s)
Databases, Genetic , Genetic Association Studies/methods , Phenotype , Rare Diseases/diagnosis , Rare Diseases/genetics , Software , Algorithms , Computational Biology/methods , Exome , Gene Ontology , High-Throughput Nucleotide Sequencing , Humans , Internet
9.
BMC Genomics ; 16: 545, 2015 Jul 24.
Article in English | MEDLINE | ID: mdl-26204903

ABSTRACT

BACKGROUND: Nr2e1 (nuclear receptor subfamily 2, group e, member 1) encodes a transcription factor important in neocortex development. Previous work has shown that nuclear receptors can have hundreds of target genes, and bind more than 300 co-interacting proteins. However, recognition of the critical role of Nr2e1 in neural stem cells and neocortex development is relatively recent, thus the molecular mechanisms involved for this nuclear receptor are only beginning to be understood. Serial analysis of gene expression (SAGE), has given researchers both qualitative and quantitative information pertaining to biological processes. Thus, in this work, six LongSAGE mouse libraries were generated from laser microdissected tissue samples of dorsal VZ/SVZ (ventricular zone and subventricular zone) from the telencephalon of wild-type (Wt) and Nr2e1-null embryos at the critical development ages E13.5, E15.5, and E17.5. We then used a novel approach, implementing multiple computational methods followed by biological validation to further our understanding of Nr2e1 in neocortex development. RESULTS: In this work, we have generated a list of 1279 genes that are differentially expressed in response to altered Nr2e1 expression during in vivo neocortex development. We have refined this list to 64 candidate direct-targets of NR2E1. Our data suggested distinct roles for Nr2e1 during different neocortex developmental stages. Most importantly, our results suggest a possible novel pathway by which Nr2e1 regulates neurogenesis, which includes Lhx2 as one of the candidate direct-target genes, and SOX9 as a co-interactor. CONCLUSIONS: In conclusion, we have provided new candidate interacting partners and numerous well-developed testable hypotheses for understanding the pathways by which Nr2e1 functions to regulate neocortex development.


Subject(s)
Neocortex/growth & development , Neurogenesis , Receptors, Cytoplasmic and Nuclear/biosynthesis , Transcription Factors/genetics , Animals , Binding Sites , Gene Expression Regulation, Developmental , Laser Capture Microdissection , Mice , Neocortex/metabolism , Receptors, Cytoplasmic and Nuclear/genetics
11.
Proc Natl Acad Sci U S A ; 107(38): 16589-94, 2010 Sep 21.
Article in English | MEDLINE | ID: mdl-20807748

ABSTRACT

The Pleiades Promoter Project integrates genomewide bioinformatics with large-scale knockin mouse production and histological examination of expression patterns to develop MiniPromoters and related tools designed to study and treat the brain by directed gene expression. Genes with brain expression patterns of interest are subjected to bioinformatic analysis to delineate candidate regulatory regions, which are then incorporated into a panel of compact human MiniPromoters to drive expression to brain regions and cell types of interest. Using single-copy, homologous-recombination "knockins" in embryonic stem cells, each MiniPromoter reporter is integrated immediately 5' of the Hprt locus in the mouse genome. MiniPromoter expression profiles are characterized in differentiation assays of the transgenic cells or in mouse brains following transgenic mouse production. Histological examination of adult brains, eyes, and spinal cords for reporter gene activity is coupled to costaining with cell-type-specific markers to define expression. The publicly available Pleiades MiniPromoter Project is a key resource to facilitate research on brain development and therapies.


Subject(s)
Brain/metabolism , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Animals , Cell Differentiation/genetics , Computational Biology , Databases, Genetic , Embryonic Stem Cells/cytology , Embryonic Stem Cells/metabolism , Gene Expression , Gene Expression Profiling/statistics & numerical data , Gene Knock-In Techniques , Genes, Reporter , Genomics , Humans , Mice , Mice, Transgenic , Neurons/cytology , Neurons/metabolism
12.
PLoS Comput Biol ; 7(12): e1002256, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22144875

ABSTRACT

We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions.


Subject(s)
Computational Biology/methods , Models, Genetic , Muscle, Skeletal/physiology , Regulatory Sequences, Nucleic Acid , Animals , Base Composition , Chromatin Immunoprecipitation , Computer Simulation , Conserved Sequence , Genome , Histones/genetics , Humans , Mice , Models, Statistical , Muscle Fibers, Skeletal/physiology , MyoD Protein/genetics , NIH 3T3 Cells , Phylogeny , Reproducibility of Results , Sequence Analysis, DNA
13.
Nucleic Acids Res ; 38(Database issue): D105-10, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19906716

ABSTRACT

JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches.


Subject(s)
Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Transcription Factors/chemistry , Access to Information , Algorithms , Animals , Chromatin Immunoprecipitation , Computational Biology/trends , Databases, Protein , Fungal Proteins/chemistry , Genome , Humans , Information Storage and Retrieval/methods , Protein Binding , Software
14.
Nucleic Acids Res ; 38(17): 5718-34, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20460467

ABSTRACT

The Nrf2 (nuclear factor E2 p45-related factor 2) transcription factor responds to diverse oxidative and electrophilic environmental stresses by circumventing repression by Keap1, translocating to the nucleus, and activating cytoprotective genes. Nrf2 responses provide protection against chemical carcinogenesis, chronic inflammation, neurodegeneration, emphysema, asthma and sepsis in murine models. Nrf2 regulates the expression of a plethora of genes that detoxify oxidants and electrophiles and repair or remove damaged macromolecules, such as through proteasomal processing. However, many direct targets of Nrf2 remain undefined. Here, mouse embryonic fibroblasts (MEF) with either constitutive nuclear accumulation (Keap1(-/-)) or depletion (Nrf2(-/-)) of Nrf2 were utilized to perform chromatin-immunoprecipitation with parallel sequencing (ChIP-Seq) and global transcription profiling. This unique Nrf2 ChIP-Seq dataset is highly enriched for Nrf2-binding motifs. Integrating ChIP-Seq and microarray analyses, we identified 645 basal and 654 inducible direct targets of Nrf2, with 244 genes at the intersection. Modulated pathways in stress response and cell proliferation distinguish the inducible and basal programs. Results were confirmed in an in vivo stress model of cigarette smoke-exposed mice. This study reveals global circuitry of the Nrf2 stress response emphasizing Nrf2 as a central node in cell survival response.


Subject(s)
Gene Regulatory Networks , NF-E2-Related Factor 2/metabolism , Regulatory Elements, Transcriptional , Animals , Antioxidants/metabolism , Binding Sites , Cell Cycle , Cell Proliferation , Cell Survival , Chromatin Immunoprecipitation , Gene Expression Profiling , Male , Mice , Mice, Knockout , NF-E2-Related Factor 2/genetics , Oligonucleotide Array Sequence Analysis , Sequence Analysis, DNA , Transcription, Genetic , Xenobiotics/metabolism
15.
BMC Bioinformatics ; 12: 67, 2011 Mar 04.
Article in English | MEDLINE | ID: mdl-21375730

ABSTRACT

BACKGROUND: To understand biological processes and diseases, it is crucial to unravel the concerted interplay of transcription factors (TFs), microRNAs (miRNAs) and their targets within regulatory networks and fundamental sub-networks. An integrative computational resource generating a comprehensive view of these regulatory molecular interactions at a genome-wide scale would be of great interest to biologists, but is not available to date. RESULTS: To identify and analyze molecular interaction networks, we developed MIR@NT@N, an integrative approach based on a meta-regulation network model and a large-scale database. MIR@NT@N uses a graph-based approach to predict novel molecular actors across multiple regulatory processes (i.e. TFs acting on protein-coding or miRNA genes, or miRNAs acting on messenger RNAs). Exploiting these predictions, the user can generate networks and further analyze them to identify sub-networks, including motifs such as feedback and feedforward loops (FBL and FFL). In addition, networks can be built from lists of molecular actors with an a priori role in a given biological process to predict novel and unanticipated interactions. Analyses can be contextualized and filtered by integrating additional information such as microarray expression data. All results, including generated graphs, can be visualized, saved and exported into various formats. MIR@NT@N performances have been evaluated using published data and then applied to the regulatory program underlying epithelium to mesenchyme transition (EMT), an evolutionary-conserved process which is implicated in embryonic development and disease. CONCLUSIONS: MIR@NT@N is an effective computational approach to identify novel molecular regulations and to predict gene regulatory networks and sub-networks including conserved motifs within a given biological context. Taking advantage of the M@IA environment, MIR@NT@N is a user-friendly web resource freely available at http://mironton.uni.lu which will be updated on a regular basis.


Subject(s)
Databases, Genetic , Gene Regulatory Networks , MicroRNAs/genetics , Transcription Factors/genetics , Amino Acid Motifs/genetics , Computational Biology/methods , Gene Expression Regulation , Humans , Internet , MicroRNAs/metabolism , RNA, Messenger/genetics , Transcription Factors/metabolism
16.
Nucleic Acids Res ; 37(Database issue): D54-60, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18971253

ABSTRACT

The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein-DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data 'boutiques' within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk.


Subject(s)
Databases, Genetic , Gene Expression Regulation , Regulatory Elements, Transcriptional , Software , Transcription Factors/metabolism , Base Sequence , Binding Sites , Conserved Sequence , Sequence Alignment , Sequence Analysis, DNA
17.
PLoS Comput Biol ; 4(1): e5, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18208319

ABSTRACT

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.


Subject(s)
Genetic Variation/genetics , Polymorphism, Single Nucleotide/genetics , Regulatory Elements, Transcriptional/genetics , Sequence Analysis, DNA/methods , Software , Transcription Factors/genetics , Algorithms , Binding Sites , Internet , Protein Binding
18.
Nucleic Acids Res ; 35(Web Server issue): W245-52, 2007 Jul.
Article in English | MEDLINE | ID: mdl-17576675

ABSTRACT

The identification of over-represented transcription factor binding sites from sets of co-expressed genes provides insights into the mechanisms of regulation for diverse biological contexts. oPOSSUM, an internet-based system for such studies of regulation, has been improved and expanded in this new release. New features include a worm-specific version for investigating binding sites conserved between Caenorhabditis elegans and C. briggsae, as well as a yeast-specific version for the analysis of co-expressed sets of Saccharomyces cerevisiae genes. The human and mouse applications feature improvements in ortholog mapping, sequence alignments and the delineation of multiple alternative promoters. oPOSSUM2, introduced for the analysis of over-represented combinations of motifs in human and mouse genes, has been integrated with the original oPOSSUM system. Analysis using user-defined background gene sets is now supported. The transcription factor binding site models have been updated to include new profiles from the JASPAR database. oPOSSUM is available at http://www.cisreg.ca/oPOSSUM/


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Gene Expression Profiling , Gene Expression Regulation , Promoter Regions, Genetic , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites , Caenorhabditis elegans/genetics , Humans , Internet , Mice , NF-kappa B/metabolism , Saccharomyces cerevisiae/genetics
19.
Sci Data ; 5: 180141, 2018 07 24.
Article in English | MEDLINE | ID: mdl-30040077

ABSTRACT

Interpreting the functional impact of noncoding variants is an ongoing challenge in the field of genome analysis. With most noncoding variants associated with complex traits and disease residing in regulatory regions, altered transcription factor (TF) binding has been proposed as a mechanism of action. It is therefore imperative to develop methods that predict the impact of noncoding variants at TF binding sites (TFBSs). Here, we describe the update of our MANTA database that stores: 1) TFBS predictions in the human genome, and 2) the potential impact on TF binding for all possible single nucleotide variants (SNVs) at these TFBSs. TFBSs were predicted by combining experimental ChIP-seq data from ReMap and computational position weight matrices (PWMs) derived from JASPAR. Impact of SNVs at these TFBSs was assessed by means of PWM scores computed on the alternate alleles. The updated database, MANTA2, provides the scientific community with a critical map of TFBSs and SNV impact scores to improve the interpretation of noncoding variants in the human genome.


Subject(s)
Genome, Human , Transcription Factors , Binding Sites , Databases, Genetic , Humans , Nucleotide Motifs , Transcription Factors/genetics , Transcription Factors/metabolism
20.
Nucleic Acids Res ; 33(10): 3154-64, 2005.
Article in English | MEDLINE | ID: mdl-15933209

ABSTRACT

Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes.


Subject(s)
Databases, Nucleic Acid , Gene Expression Profiling , Gene Expression Regulation , Promoter Regions, Genetic , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites , Humans , Internet , Mice , NF-kappa B/metabolism , Oligonucleotide Array Sequence Analysis
SELECTION OF CITATIONS
SEARCH DETAIL