Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 19 de 19
1.
Genome Res ; 34(4): 620-632, 2024 May 15.
Article En | MEDLINE | ID: mdl-38631728

Differential gene expression in response to perturbations is mediated at least in part by changes in binding of transcription factors (TFs) and other proteins at specific genomic regions. Association of these cis-regulatory elements (CREs) with their target genes is a challenging task that is essential to address many biological and mechanistic questions. Many current approaches rely on chromatin conformation capture techniques or single-cell correlational methods to establish CRE-to-gene associations. These methods can be effective but have limitations, including resolution, gaps in detectable association distances, and cost. As an alternative, we have developed DegCre, a nonparametric method that evaluates correlations between measurements of perturbation-induced differential gene expression and differential regulatory signal at CREs to score possible CRE-to-gene associations. It has several unique features, including the ability to use any type of CRE activity measurement, yield probabilistic scores for CRE-to-gene pairs, and assess CRE-to-gene pairings across a wide range of sequence distances. We apply DegCre to six data sets, each using different perturbations and containing a variety of regulatory signal measurements, including chromatin openness, histone modifications, and TF occupancy. To test their efficacy, we compare DegCre associations to Hi-C loop calls and CRISPR-validated CRE-to-gene associations, establishing good performance by DegCre that is comparable or superior to competing methods. DegCre is a novel approach to the association of CREs to genes from a perturbation-differential perspective, with strengths that are complementary to existing approaches and allow for new insights into gene regulation.


Chromatin , Transcription Factors , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Chromatin/metabolism , Chromatin/genetics , Gene Expression Regulation , Regulatory Sequences, Nucleic Acid , Regulatory Elements, Transcriptional
2.
Genome Res ; 2023 Oct 18.
Article En | MEDLINE | ID: mdl-37852782

Transcription factors (TFs) are trans-acting proteins that bind cis-regulatory elements (CREs) in DNA to control gene expression. Here, we analyzed the genomic localization profiles of 529 sequence-specific TFs and 151 cofactors and chromatin regulators in the human cancer cell line HepG2, for a total of 680 broadly termed DNA-associated proteins (DAPs). We used this deep collection to model each TF's impact on gene expression, and identified a cohort of 26 candidate transcriptional repressors. We examine high occupancy target (HOT) sites in the context of three-dimensional genome organization and show biased motif placement in distal-promoter connections involving HOT sites. We also found a substantial number of closed chromatin regions with multiple DAPs bound, and explored their properties, finding that a MAFF/MAFK TF pair correlates with transcriptional repression. Altogether, these analyses provide novel insights into the regulatory logic of the human cell line HepG2 genome and show the usefulness of large genomic analyses for elucidation of individual TF functions.

3.
Am J Hum Genet ; 110(2): 215-227, 2023 02 02.
Article En | MEDLINE | ID: mdl-36586412

Neurodevelopmental disorders (NDDs) result from highly penetrant variation in hundreds of different genes, some of which have not yet been identified. Using the MatchMaker Exchange, we assembled a cohort of 27 individuals with rare, protein-altering variation in the transcriptional coregulator ZMYM3, located on the X chromosome. Most (n = 24) individuals were males, 17 of which have a maternally inherited variant; six individuals (4 male, 2 female) harbor de novo variants. Overlapping features included developmental delay, intellectual disability, behavioral abnormalities, and a specific facial gestalt in a subset of males. Variants in almost all individuals (n = 26) are missense, including six that recurrently affect two residues. Four unrelated probands were identified with inherited variation affecting Arg441, a site at which variation has been previously seen in NDD-affected siblings, and two individuals have de novo variation resulting in p.Arg1294Cys (c.3880C>T). All variants affect evolutionarily conserved sites, and most are predicted to damage protein structure or function. ZMYM3 is relatively intolerant to variation in the general population, is widely expressed across human tissues, and encodes a component of the KDM1A-RCOR1 chromatin-modifying complex. ChIP-seq experiments on one variant, p.Arg1274Trp, indicate dramatically reduced genomic occupancy, supporting a hypomorphic effect. While we are unable to perform statistical evaluations to definitively support a causative role for variation in ZMYM3, the totality of the evidence, including 27 affected individuals, recurrent variation at two codons, overlapping phenotypic features, protein-modeling data, evolutionary constraint, and experimentally confirmed functional effects strongly support ZMYM3 as an NDD-associated gene.


Intellectual Disability , Nervous System Malformations , Neurodevelopmental Disorders , Humans , Male , Female , Neurodevelopmental Disorders/genetics , Intellectual Disability/genetics , Phenotype , Gene Expression Regulation , Face , Nuclear Proteins/genetics , Histone Demethylases/genetics
4.
HGG Adv ; 2(2)2021 Apr 08.
Article En | MEDLINE | ID: mdl-33937879

Exome and genome sequencing have proven to be effective tools for the diagnosis of neurodevelopmental disorders (NDDs), but large fractions of NDDs cannot be attributed to currently detectable genetic variation. This is likely, at least in part, a result of the fact that many genetic variants are difficult or impossible to detect through typical short-read sequencing approaches. Here, we describe a genomic analysis using Pacific Biosciences circular consensus sequencing (CCS) reads, which are both long (>10 kb) and accurate (>99% bp accuracy). We used CCS on six proband-parent trios with NDDs that were unexplained despite extensive testing, including genome sequencing with short reads. We identified variants and created de novo assemblies in each trio, with global metrics indicating these datasets are more accurate and comprehensive than those provided by short-read data. In one proband, we identified a likely pathogenic (LP), de novo L1-mediated insertion in CDKL5 that results in duplication of exon 3, leading to a frameshift. In a second proband, we identified multiple large de novo structural variants, including insertion-translocations affecting DGKB and MLLT3, which we show disrupt MLLT3 transcript levels. We consider this extensive structural variation likely pathogenic. The breadth and quality of variant detection, coupled to finding variants of clinical and research interest in two of six probands with unexplained NDDs, support the hypothesis that long-read genome sequencing can substantially improve rare disease genetic discovery rates.

5.
Genome Res ; 31(5): 866-876, 2021 05.
Article En | MEDLINE | ID: mdl-33879525

Massively parallel reporter assays (MPRAs) are useful tools to characterize regulatory elements in human genomes. An aspect of MPRAs that is not typically the focus of analysis is their intrinsic ability to differentiate activity levels for a given sequence element when placed in both of its possible orientations relative to the reporter construct. Here, we describe pervasive strand asymmetry of MPRA signals in data sets from multiple reporter configurations in both published and newly reported data. These effects are reproducible across different cell types and in different treatments within a cell type and are observed both within and outside of annotated regulatory elements. From elements in gene bodies, MPRA strand asymmetry favors the sense strand, suggesting that function related to endogenous transcription is driving the phenomenon. Similarly, we find that within Alu mobile element insertions, strand asymmetry favors the transcribed strand of the ancestral retrotransposon. The effect is consistent across the multiplicity of Alu elements in human genomes and is more pronounced in less diverged Alu elements. We find sequence features driving MPRA strand asymmetry and show its prediction from sequence alone. We see some evidence for RNA stabilization and transcriptional activation mechanisms and hypothesize that the effect is driven by natural selection favoring efficient transcription. Our results indicate that strand asymmetry is a pervasive and reproducible feature in MPRA data. More importantly, the fact that MPRA asymmetry favors naturally transcribed strands suggests that it stems from preserved biological functions that have a substantial, global impact on gene and genome evolution.


Genome, Human , Regulatory Sequences, Nucleic Acid , Gene Expression Regulation , Genes, Reporter , Humans
6.
Genet Med ; 23(2): 280-288, 2021 02.
Article En | MEDLINE | ID: mdl-32989269

PURPOSE: To evaluate the effectiveness and specificity of population-based genomic screening in Alabama. METHODS: The Alabama Genomic Health Initiative (AGHI) has enrolled and evaluated 5369 participants for the presence of pathogenic/likely pathogenic (P/LP) variants using the Illumina Global Screening Array (GSA), with validation of all P/LP variants via Sanger sequencing in a CLIA-certified laboratory before return of results. RESULTS: Among 131 variants identified by the GSA that were evaluated by Sanger sequencing, 67 (51%) were false positives (FP). For 39 of the 67 FP variants, a benign/likely benign variant was present at or near the targeted P/LP variant. Variants detected within African American individuals were significantly enriched for FPs, likely due to a higher rate of nontargeted alternative alleles close to array-targeted P/LP variants. CONCLUSION: In AGHI, we have implemented an array-based process to screen for highly penetrant genetic variants in actionable disease genes. We demonstrate the need for clinical validation of array-identified variants in direct-to-consumer or population testing, especially for diverse populations.


Genetic Testing , Genomics , Alabama , Genetic Variation , High-Throughput Nucleotide Sequencing , Humans
7.
Nature ; 583(7818): 720-728, 2020 07.
Article En | MEDLINE | ID: mdl-32728244

Transcription factors are DNA-binding proteins that have key roles in gene regulation1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium.


Chromatin Immunoprecipitation Sequencing , Chromatin/genetics , Chromatin/metabolism , DNA-Binding Proteins/metabolism , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Datasets as Topic , Enhancer Elements, Genetic/genetics , Hep G2 Cells , Humans , Nucleotide Motifs/genetics , Promoter Regions, Genetic/genetics , Protein Binding , Transcription Factors/metabolism
8.
Genome Res ; 30(7): 939-950, 2020 07.
Article En | MEDLINE | ID: mdl-32616518

DNA-associated proteins (DAPs) classically regulate gene expression by binding to regulatory loci such as enhancers or promoters. As expanding catalogs of genome-wide DAP binding maps reveal thousands of loci that, unlike the majority of conventional enhancers and promoters, associate with dozens of different DAPs with apparently little regard for motif preference, an understanding of DAP association and coordination at such regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DAPs assayed in three cell lines and integrated these data with an orthogonal data set of 352 nonredundant, in vitro-derived motifs mapped to the genome within DNase I hypersensitivity footprints to characterize regions with high numbers of DAP associations. We establish a generalizable definition for high occupancy target (HOT) loci and identify putative driver DAP motifs in HepG2 cells, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and show sequence conservation at HOT loci. The number of different DAPs associated with an element is positively associated with evidence of regulatory activity, and by systematically mutating 245 HOT loci with a massively parallel mutagenesis assay, we localized regulatory activity to a central core region that depends on the motif sequences of our previously nominated driver DAPs. In sum, this work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.


Enhancer Elements, Genetic , Gene Expression Regulation , Promoter Regions, Genetic , Transcription Factors/metabolism , Base Composition , Cell Line , Chromatin/chemistry , Chromatin Immunoprecipitation Sequencing , DNA/chemistry , Genetic Loci , Genome , Hep G2 Cells , Humans , Mutagenesis , Mutation , Nucleotide Motifs
9.
Methods Mol Biol ; 2117: 3-34, 2020.
Article En | MEDLINE | ID: mdl-31960370

Chromatin immunoprecipitation followed by next-generation DNA sequencing (ChIP-seq) has been used to identify transcription factor (TF) binding proteins throughout the genome. Unfortunately, this approach traditionally requires commercially available, ChIP-seq grade antibodies that frequently fail to generate acceptable datasets. To obtain data for the many TFs for which there is no appropriate antibody, we recently developed a new method for performing ChIP-seq by epitope tagging endogenous TFs using CRISPR/Cas9 genome editing technology (CETCh-seq). Here, we describe our general protocol of CETCh-seq for both adherent and nonadherent cell lines using a commercially available FLAG antibody.


Epitopes/metabolism , Transcription Factors/analysis , Transcription Factors/genetics , Binding Sites , CRISPR-Cas Systems , Cell Adhesion , Chromatin Immunoprecipitation Sequencing , Gene Editing , Hep G2 Cells , Humans , Protein Binding
10.
Oncotarget ; 8(5): 8226-8238, 2017 Jan 31.
Article En | MEDLINE | ID: mdl-28030809

Breast cancer is a heterogeneous disease comprised of four molecular subtypes defined by whether the tumor-originating cells are luminal or basal epithelial cells. Breast cancers arising from the luminal mammary duct often express estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth receptor 2 (HER2). Tumors expressing ER and/or PR are treated with anti-hormonal therapies, while tumors overexpressing HER2 are targeted with monoclonal antibodies. Immunohistochemical detection of ER, PR, and HER2 receptors/proteins is a critical step in breast cancer diagnosis and guided treatment. Breast tumors that do not express these proteins are known as "triple negative breast cancer" (TNBC) and are typically basal-like. TNBCs are the most aggressive subtype, with the highest mortality rates and no targeted therapy, so there is a pressing need to identify important TNBC tumor regulators. The signal transducer and activator of transcription 3 (STAT3) transcription factor has been previously implicated as a constitutively active oncogene in TNBC. However, its direct regulatory gene targets and tumorigenic properties have not been well characterized. By integrating RNA-seq and ChIP-seq data from 2 TNBC tumors and 5 cell lines, we discovered novel gene signatures directly regulated by STAT3 that were enriched for processes involving inflammation, immunity, and invasion in TNBC. Functional analysis revealed that STAT3 has a key role regulating invasion and metastasis, a characteristic often associated with TNBC. Our findings suggest therapies targeting STAT3 may be important for preventing TNBC metastasis.


Cell Movement , Gene Expression Regulation, Neoplastic , Genome, Human , STAT3 Transcription Factor/genetics , Transcriptome , Triple Negative Breast Neoplasms/genetics , Cell Line, Tumor , Female , Gene Expression Profiling , Humans , Neoplasm Invasiveness , Neoplasm Metastasis , Protein Binding , RNA Interference , STAT3 Transcription Factor/metabolism , Signal Transduction , Transfection , Triple Negative Breast Neoplasms/metabolism , Triple Negative Breast Neoplasms/pathology
11.
Bioessays ; 38(8): 801-11, 2016 08.
Article En | MEDLINE | ID: mdl-27311628

Genome-wide identification of transcription factor binding sites with the ChIP-seq method is an extremely important scientific endeavor - one that should ideally be performed for every transcription factor in as many cell types as possible. A major hurdle on the way to this goal is the necessity for a specific, ChIP-grade antibody for each transcription factor of interest, which is often not available. Here, we describe CETCh-seq, a recently published method utilizing genome engineering with the CRISPR/Cas9 system to circumvent the need for a specific antibody. Using the CETCh-seq method, targeted genomic editing results in an epitope-tagged transcription factor, which is recognized by a well-characterized, standard antibody, efficacious for ChIP-seq. We have used CETCh-seq in human cancer cell lines as well as mouse embryonic stem cells. We find that roughly 60% of transcription factors tagged using CETCh-seq produce a high quality ChIP-seq map, a significant improvement over traditional antibody-based methods.


Genome, Human , Genomics/methods , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Animals , CRISPR-Cas Systems , Chromatin Immunoprecipitation/methods , DNA/metabolism , Epitopes , Humans , Mice , Protein Binding , Sequence Analysis, DNA/methods , Transcription Factors/immunology
12.
Genome Res ; 25(12): 1791-800, 2015 Dec.
Article En | MEDLINE | ID: mdl-26486725

Transcription factors (TFs) bind to thousands of DNA sequences in mammalian genomes, but most of these binding events appear to have no direct effect on gene expression. It is unclear why only a subset of TF bound sites are actively involved in transcriptional regulation. Moreover, the key genomic features that accurately discriminate between active and inactive TF binding events remain ambiguous. Recent studies have identified promoter-distal RNA polymerase II (RNAP2) binding at enhancer elements, suggesting that these interactions may serve as a marker for active regulatory sequences. Despite these correlative analyses, a thorough functional validation of these genomic co-occupancies is still lacking. To characterize the gene regulatory activity of DNA sequences underlying promoter-distal TF binding events that co-occur with RNAP2 and TF sites devoid of RNAP2 occupancy using a functional reporter assay, we performed cis-regulatory element sequencing (CRE-seq). We tested more than 1000 promoter-distal CCAAT/enhancer-binding protein beta (CEBPB)-bound sites in HepG2 and K562 cells, and found that CEBPB-bound sites co-occurring with RNAP2 were more likely to exhibit enhancer activity. CEBPB-bound sites further maintained substantial cell-type specificity, indicating that local DNA sequence can accurately convey cell-type-specific regulatory information. By comparing our CRE-seq results to a comprehensive set of genome annotations, we identified a variety of genomic features that are strong predictors of regulatory element activity and cell-type-specific activity. Collectively, our functional assay results indicate that RNAP2 occupancy can be used as a key genomic marker that can distinguish active from inactive TF bound sites.


Binding Sites , CCAAT-Enhancer-Binding Protein-beta/metabolism , Promoter Regions, Genetic , RNA Polymerase II/metabolism , Enhancer Elements, Genetic , Gene Expression Regulation , Hep G2 Cells , Histones/metabolism , Humans , K562 Cells , Organ Specificity/genetics , Protein Binding , Response Elements , Sequence Analysis, DNA
13.
Genome Res ; 25(10): 1581-9, 2015 Oct.
Article En | MEDLINE | ID: mdl-26355004

Chromatin immunoprecipitation followed by next-generation DNA sequencing (ChIP-seq) is a widely used technique for identifying transcription factor (TF) binding events throughout an entire genome. However, ChIP-seq is limited by the availability of suitable ChIP-seq grade antibodies, and the vast majority of commercially available antibodies fail to generate usable data sets. To ameliorate these technical obstacles, we present a robust methodological approach for performing ChIP-seq through epitope tagging of endogenous TFs. We used clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-based genome editing technology to develop CRISPR epitope tagging ChIP-seq (CETCh-seq) of DNA-binding proteins. We assessed the feasibility of CETCh-seq by tagging several DNA-binding proteins spanning a wide range of endogenous expression levels in the hepatocellular carcinoma cell line HepG2. Our data exhibit strong correlations between both replicate types as well as with standard ChIP-seq approaches that use TF antibodies. Notably, we also observed minimal changes to the cellular transcriptome and to the expression of the tagged TF. To examine the robustness of our technique, we further performed CETCh-seq in the breast adenocarcinoma cell line MCF7 as well as mouse embryonic stem cells and observed similarly high correlations. Collectively, these data highlight the applicability of CETCh-seq to accurately define the genome-wide binding profiles of DNA-binding proteins, allowing for a straightforward methodology to potentially assay the complete repertoire of TFs, including the large fraction for which ChIP-quality antibodies are not available.


Clustered Regularly Interspaced Short Palindromic Repeats , DNA-Binding Proteins/immunology , Epitope Mapping , Oligonucleotide Array Sequence Analysis , Animals , Epitope Mapping/methods , Epitopes/analysis , Feasibility Studies , Gene Expression Profiling , Humans , Mice , Oligonucleotide Array Sequence Analysis/methods , Transcription Factors/analysis , Transcription Factors/immunology , Transcriptome , Tumor Cells, Cultured
14.
Mol Cell ; 52(1): 25-36, 2013 Oct 10.
Article En | MEDLINE | ID: mdl-24076218

Most human transcription factors bind a small subset of potential genomic sites and often use different subsets in different cell types. To identify mechanisms that govern cell-type-specific transcription factor binding, we used an integrative approach to study estrogen receptor α (ER). We found that ER exhibits two distinct modes of binding. Shared sites, bound in multiple cell types, are characterized by high-affinity estrogen response elements (EREs), inaccessible chromatin, and a lack of DNA methylation, while cell-specific sites are characterized by a lack of EREs, co-occurrence with other transcription factors, and cell-type-specific chromatin accessibility and DNA methylation. These observations enabled accurate quantitative models of ER binding that suggest tethering of ER to one-third of cell-specific sites. The distinct properties of cell-specific binding were also observed with glucocorticoid receptor and for ER in primary mouse tissues, representing an elegant genomic encoding scheme for generating cell-type-specific gene regulation.


Estrogen Receptor alpha/metabolism , Promoter Regions, Genetic , Transcription Factors/metabolism , Amino Acid Sequence , Animals , Binding Sites , Cell Line , Conserved Sequence , DNA Methylation , Estradiol/pharmacology , Estrogen Receptor alpha/drug effects , Estrogen Receptor alpha/genetics , Estrogens/pharmacology , Evolution, Molecular , Gene Expression Regulation , Humans , Mice , Models, Biological , Promoter Regions, Genetic/drug effects , RNA Interference , Receptors, Glucocorticoid/genetics , Receptors, Glucocorticoid/metabolism , Response Elements , Thermodynamics , Transcription Factors/genetics , Transfection
15.
Genome Biol ; 13(9): R50, 2012 Sep 26.
Article En | MEDLINE | ID: mdl-22951020

BACKGROUND: The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1. RESULTS: In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif. CONCLUSIONS: The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.


Promoter Regions, Genetic , Transcription Factors/metabolism , YY1 Transcription Factor/metabolism , Binding Sites , Cell Line , Genome, Human , Humans , Nucleotide Motifs , Transcription Initiation Site
16.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article En | MEDLINE | ID: mdl-22955619

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
17.
Eur J Hum Genet ; 18(5): 560-8, 2010 May.
Article En | MEDLINE | ID: mdl-20051991

Breast cancer is a major cause of morbidity and mortality in women and its metastatic spread is the principal reason behind the fatal outcome. Metastasis-related research of breast cancer is however underdeveloped when compared with the abundant literature on primary tumors. We applied an unexplored approach comparing at high resolution the genomic profiles of primary tumors and synchronous axillary lymph node metastases from 13 patients with breast cancer. Overall, primary tumors displayed 20% higher number of aberrations than metastases. In all but two patients, we detected in total 157 statistically significant differences between primary lesions and matched metastases. We further observed differences that can be linked to metastatic disease and there was also an overlapping pattern of changes between different patients. Many of the differences described here have been previously linked to poor patient survival, suggesting that this is a viable approach toward finding biomarkers for disease progression and definition of new targets useful for development of anticancer drugs. Frequent genetic differences between primary tumors and metastases in breast cancer also question, at least to some extent, the role of primary tumors as a surrogate subject of study for the systemic disease.


Biomarkers, Tumor/analysis , Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Disease Progression , Lymphatic Metastasis/genetics , Adult , Aged , Chromosomes, Human, Pair 11/genetics , DNA Copy Number Variations/genetics , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Genome, Human/genetics , Humans , Middle Aged , Oligonucleotide Array Sequence Analysis
18.
Hum Mutat ; 29(9): 1118-24, 2008 Sep.
Article En | MEDLINE | ID: mdl-18570184

Two major types of genetic variation are known: single nucleotide polymorphisms (SNPs), and a more recently discovered structural variation, involving changes in copy number (CNVs) of kilobase- to megabase-sized chromosomal segments. It is unknown whether CNVs arise in somatic cells, but it is, however, generally assumed that normal cells are genetically identical. We tested 34 tissue samples from three subjects and, having analyzed for each tissue < or =10(-6) of all cells expected in an adult human, we observed at least six CNVs, affecting a single organ or one or more tissues of the same subject. The CNVs ranged from 82 to 176 kb, often encompassing known genes, potentially affecting gene function. Our results indicate that humans are commonly affected by somatic mosaicism for stochastic CNVs, which occur in a substantial fraction of cells. The majority of described CNVs were previously shown to be polymorphic between unrelated subjects, suggesting that some CNVs previously reported as germline might represent somatic events, since in most studies of this kind, only one tissue is typically examined and analysis of parents for the studied subjects is not routinely performed. A considerable number of human phenotypes are a consequence of a somatic process. Thus, our conclusions will be important for the delineation of genetic factors behind these phenotypes. Consequently, biobanks should consider sampling multiple tissues to better address mosaicism in the studies of somatic disorders.


Gene Dosage , Mosaicism , Polymorphism, Genetic , Adult , Chromosomes, Human , Genetic Predisposition to Disease , Genomics , Humans , Oligonucleotide Array Sequence Analysis , Organ Specificity , Tissue Distribution
19.
Am J Hum Genet ; 82(3): 763-71, 2008 Mar.
Article En | MEDLINE | ID: mdl-18304490

The exploration of copy-number variation (CNV), notably of somatic cells, is an understudied aspect of genome biology. Any differences in the genetic makeup between twins derived from the same zygote represent an irrefutable example of somatic mosaicism. We studied 19 pairs of monozygotic twins with either concordant or discordant phenotype by using two platforms for genome-wide CNV analyses and showed that CNVs exist within pairs in both groups. These findings have an impact on our views of genotypic and phenotypic diversity in monozygotic twins and suggest that CNV analysis in phenotypically discordant monozygotic twins may provide a powerful tool for identifying disease-predisposition loci. Our results also imply that caution should be exercised when interpreting disease causality of de novo CNVs found in patients based on analysis of a single tissue in routine disease-related DNA diagnostics.


Chromosomes, Human/genetics , Genetic Variation , Neurodegenerative Diseases/genetics , Twins, Monozygotic/genetics , DNA/chemistry , DNA/genetics , Female , Humans , Male , Oligonucleotide Array Sequence Analysis , Phenotype
...