RESUMO
Differential gene expression in response to perturbations is mediated at least in part by changes in binding of transcription factors (TFs) and other proteins at specific genomic regions. Association of these cis-regulatory elements (CREs) with their target genes is a challenging task that is essential to address many biological and mechanistic questions. Many current approaches rely on chromatin conformation capture techniques or single-cell correlational methods to establish CRE-to-gene associations. These methods can be effective but have limitations, including resolution, gaps in detectable association distances, and cost. As an alternative, we have developed DegCre, a nonparametric method that evaluates correlations between measurements of perturbation-induced differential gene expression and differential regulatory signal at CREs to score possible CRE-to-gene associations. It has several unique features, including the ability to use any type of CRE activity measurement, yield probabilistic scores for CRE-to-gene pairs, and assess CRE-to-gene pairings across a wide range of sequence distances. We apply DegCre to six data sets, each using different perturbations and containing a variety of regulatory signal measurements, including chromatin openness, histone modifications, and TF occupancy. To test their efficacy, we compare DegCre associations to Hi-C loop calls and CRISPR-validated CRE-to-gene associations, establishing good performance by DegCre that is comparable or superior to competing methods. DegCre is a novel approach to the association of CREs to genes from a perturbation-differential perspective, with strengths that are complementary to existing approaches and allow for new insights into gene regulation.
Assuntos
Cromatina , Fatores de Transcrição , Humanos , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Cromatina/metabolismo , Cromatina/genética , Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Elementos Reguladores de TranscriçãoRESUMO
Transcription factors (TFs) are trans-acting proteins that bind cis-regulatory elements (CREs) in DNA to control gene expression. Here, we analyzed the genomic localization profiles of 529 sequence-specific TFs and 151 cofactors and chromatin regulators in the human cancer cell line HepG2, for a total of 680 broadly termed DNA-associated proteins (DAPs). We used this deep collection to model each TF's impact on gene expression, and identified a cohort of 26 candidate transcriptional repressors. We examine high occupancy target (HOT) sites in the context of three-dimensional genome organization and show biased motif placement in distal-promoter connections involving HOT sites. We also found a substantial number of closed chromatin regions with multiple DAPs bound, and explored their properties, finding that a MAFF/MAFK TF pair correlates with transcriptional repression. Altogether, these analyses provide novel insights into the regulatory logic of the human cell line HepG2 genome and show the usefulness of large genomic analyses for elucidation of individual TF functions.
RESUMO
Neurodevelopmental disorders (NDDs) result from highly penetrant variation in hundreds of different genes, some of which have not yet been identified. Using the MatchMaker Exchange, we assembled a cohort of 27 individuals with rare, protein-altering variation in the transcriptional coregulator ZMYM3, located on the X chromosome. Most (n = 24) individuals were males, 17 of which have a maternally inherited variant; six individuals (4 male, 2 female) harbor de novo variants. Overlapping features included developmental delay, intellectual disability, behavioral abnormalities, and a specific facial gestalt in a subset of males. Variants in almost all individuals (n = 26) are missense, including six that recurrently affect two residues. Four unrelated probands were identified with inherited variation affecting Arg441, a site at which variation has been previously seen in NDD-affected siblings, and two individuals have de novo variation resulting in p.Arg1294Cys (c.3880C>T). All variants affect evolutionarily conserved sites, and most are predicted to damage protein structure or function. ZMYM3 is relatively intolerant to variation in the general population, is widely expressed across human tissues, and encodes a component of the KDM1A-RCOR1 chromatin-modifying complex. ChIP-seq experiments on one variant, p.Arg1274Trp, indicate dramatically reduced genomic occupancy, supporting a hypomorphic effect. While we are unable to perform statistical evaluations to definitively support a causative role for variation in ZMYM3, the totality of the evidence, including 27 affected individuals, recurrent variation at two codons, overlapping phenotypic features, protein-modeling data, evolutionary constraint, and experimentally confirmed functional effects strongly support ZMYM3 as an NDD-associated gene.
Assuntos
Deficiência Intelectual , Malformações do Sistema Nervoso , Transtornos do Neurodesenvolvimento , Humanos , Masculino , Feminino , Transtornos do Neurodesenvolvimento/genética , Deficiência Intelectual/genética , Fenótipo , Regulação da Expressão Gênica , Face , Proteínas Nucleares/genética , Histona Desmetilases/genéticaRESUMO
Exome and genome sequencing have proven to be effective tools for the diagnosis of neurodevelopmental disorders (NDDs), but large fractions of NDDs cannot be attributed to currently detectable genetic variation. This is likely, at least in part, a result of the fact that many genetic variants are difficult or impossible to detect through typical short-read sequencing approaches. Here, we describe a genomic analysis using Pacific Biosciences circular consensus sequencing (CCS) reads, which are both long (>10 kb) and accurate (>99% bp accuracy). We used CCS on six proband-parent trios with NDDs that were unexplained despite extensive testing, including genome sequencing with short reads. We identified variants and created de novo assemblies in each trio, with global metrics indicating these datasets are more accurate and comprehensive than those provided by short-read data. In one proband, we identified a likely pathogenic (LP), de novo L1-mediated insertion in CDKL5 that results in duplication of exon 3, leading to a frameshift. In a second proband, we identified multiple large de novo structural variants, including insertion-translocations affecting DGKB and MLLT3, which we show disrupt MLLT3 transcript levels. We consider this extensive structural variation likely pathogenic. The breadth and quality of variant detection, coupled to finding variants of clinical and research interest in two of six probands with unexplained NDDs, support the hypothesis that long-read genome sequencing can substantially improve rare disease genetic discovery rates.
RESUMO
Massively parallel reporter assays (MPRAs) are useful tools to characterize regulatory elements in human genomes. An aspect of MPRAs that is not typically the focus of analysis is their intrinsic ability to differentiate activity levels for a given sequence element when placed in both of its possible orientations relative to the reporter construct. Here, we describe pervasive strand asymmetry of MPRA signals in data sets from multiple reporter configurations in both published and newly reported data. These effects are reproducible across different cell types and in different treatments within a cell type and are observed both within and outside of annotated regulatory elements. From elements in gene bodies, MPRA strand asymmetry favors the sense strand, suggesting that function related to endogenous transcription is driving the phenomenon. Similarly, we find that within Alu mobile element insertions, strand asymmetry favors the transcribed strand of the ancestral retrotransposon. The effect is consistent across the multiplicity of Alu elements in human genomes and is more pronounced in less diverged Alu elements. We find sequence features driving MPRA strand asymmetry and show its prediction from sequence alone. We see some evidence for RNA stabilization and transcriptional activation mechanisms and hypothesize that the effect is driven by natural selection favoring efficient transcription. Our results indicate that strand asymmetry is a pervasive and reproducible feature in MPRA data. More importantly, the fact that MPRA asymmetry favors naturally transcribed strands suggests that it stems from preserved biological functions that have a substantial, global impact on gene and genome evolution.
Assuntos
Genoma Humano , Sequências Reguladoras de Ácido Nucleico , Regulação da Expressão Gênica , Genes Reporter , HumanosRESUMO
PURPOSE: To evaluate the effectiveness and specificity of population-based genomic screening in Alabama. METHODS: The Alabama Genomic Health Initiative (AGHI) has enrolled and evaluated 5369 participants for the presence of pathogenic/likely pathogenic (P/LP) variants using the Illumina Global Screening Array (GSA), with validation of all P/LP variants via Sanger sequencing in a CLIA-certified laboratory before return of results. RESULTS: Among 131 variants identified by the GSA that were evaluated by Sanger sequencing, 67 (51%) were false positives (FP). For 39 of the 67 FP variants, a benign/likely benign variant was present at or near the targeted P/LP variant. Variants detected within African American individuals were significantly enriched for FPs, likely due to a higher rate of nontargeted alternative alleles close to array-targeted P/LP variants. CONCLUSION: In AGHI, we have implemented an array-based process to screen for highly penetrant genetic variants in actionable disease genes. We demonstrate the need for clinical validation of array-identified variants in direct-to-consumer or population testing, especially for diverse populations.
Assuntos
Testes Genéticos , Genômica , Alabama , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Transcription factors are DNA-binding proteins that have key roles in gene regulation1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/metabolismo , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Células Hep G2 , Humanos , Motivos de Nucleotídeos/genética , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/metabolismoRESUMO
DNA-associated proteins (DAPs) classically regulate gene expression by binding to regulatory loci such as enhancers or promoters. As expanding catalogs of genome-wide DAP binding maps reveal thousands of loci that, unlike the majority of conventional enhancers and promoters, associate with dozens of different DAPs with apparently little regard for motif preference, an understanding of DAP association and coordination at such regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DAPs assayed in three cell lines and integrated these data with an orthogonal data set of 352 nonredundant, in vitro-derived motifs mapped to the genome within DNase I hypersensitivity footprints to characterize regions with high numbers of DAP associations. We establish a generalizable definition for high occupancy target (HOT) loci and identify putative driver DAP motifs in HepG2 cells, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and show sequence conservation at HOT loci. The number of different DAPs associated with an element is positively associated with evidence of regulatory activity, and by systematically mutating 245 HOT loci with a massively parallel mutagenesis assay, we localized regulatory activity to a central core region that depends on the motif sequences of our previously nominated driver DAPs. In sum, this work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.
Assuntos
Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Composição de Bases , Linhagem Celular , Cromatina/química , Sequenciamento de Cromatina por Imunoprecipitação , DNA/química , Loci Gênicos , Genoma , Células Hep G2 , Humanos , Mutagênese , Mutação , Motivos de NucleotídeosRESUMO
Chromatin immunoprecipitation followed by next-generation DNA sequencing (ChIP-seq) has been used to identify transcription factor (TF) binding proteins throughout the genome. Unfortunately, this approach traditionally requires commercially available, ChIP-seq grade antibodies that frequently fail to generate acceptable datasets. To obtain data for the many TFs for which there is no appropriate antibody, we recently developed a new method for performing ChIP-seq by epitope tagging endogenous TFs using CRISPR/Cas9 genome editing technology (CETCh-seq). Here, we describe our general protocol of CETCh-seq for both adherent and nonadherent cell lines using a commercially available FLAG antibody.
Assuntos
Epitopos/metabolismo , Fatores de Transcrição/análise , Fatores de Transcrição/genética , Sítios de Ligação , Sistemas CRISPR-Cas , Adesão Celular , Sequenciamento de Cromatina por Imunoprecipitação , Edição de Genes , Células Hep G2 , Humanos , Ligação ProteicaRESUMO
Breast cancer is a heterogeneous disease comprised of four molecular subtypes defined by whether the tumor-originating cells are luminal or basal epithelial cells. Breast cancers arising from the luminal mammary duct often express estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth receptor 2 (HER2). Tumors expressing ER and/or PR are treated with anti-hormonal therapies, while tumors overexpressing HER2 are targeted with monoclonal antibodies. Immunohistochemical detection of ER, PR, and HER2 receptors/proteins is a critical step in breast cancer diagnosis and guided treatment. Breast tumors that do not express these proteins are known as "triple negative breast cancer" (TNBC) and are typically basal-like. TNBCs are the most aggressive subtype, with the highest mortality rates and no targeted therapy, so there is a pressing need to identify important TNBC tumor regulators. The signal transducer and activator of transcription 3 (STAT3) transcription factor has been previously implicated as a constitutively active oncogene in TNBC. However, its direct regulatory gene targets and tumorigenic properties have not been well characterized. By integrating RNA-seq and ChIP-seq data from 2 TNBC tumors and 5 cell lines, we discovered novel gene signatures directly regulated by STAT3 that were enriched for processes involving inflammation, immunity, and invasion in TNBC. Functional analysis revealed that STAT3 has a key role regulating invasion and metastasis, a characteristic often associated with TNBC. Our findings suggest therapies targeting STAT3 may be important for preventing TNBC metastasis.
Assuntos
Movimento Celular , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Fator de Transcrição STAT3/genética , Transcriptoma , Neoplasias de Mama Triplo Negativas/genética , Linhagem Celular Tumoral , Feminino , Perfilação da Expressão Gênica , Humanos , Invasividade Neoplásica , Metástase Neoplásica , Ligação Proteica , Interferência de RNA , Fator de Transcrição STAT3/metabolismo , Transdução de Sinais , Transfecção , Neoplasias de Mama Triplo Negativas/metabolismo , Neoplasias de Mama Triplo Negativas/patologiaRESUMO
Genome-wide identification of transcription factor binding sites with the ChIP-seq method is an extremely important scientific endeavor - one that should ideally be performed for every transcription factor in as many cell types as possible. A major hurdle on the way to this goal is the necessity for a specific, ChIP-grade antibody for each transcription factor of interest, which is often not available. Here, we describe CETCh-seq, a recently published method utilizing genome engineering with the CRISPR/Cas9 system to circumvent the need for a specific antibody. Using the CETCh-seq method, targeted genomic editing results in an epitope-tagged transcription factor, which is recognized by a well-characterized, standard antibody, efficacious for ChIP-seq. We have used CETCh-seq in human cancer cell lines as well as mouse embryonic stem cells. We find that roughly 60% of transcription factors tagged using CETCh-seq produce a high quality ChIP-seq map, a significant improvement over traditional antibody-based methods.
Assuntos
Genoma Humano , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Animais , Sistemas CRISPR-Cas , Imunoprecipitação da Cromatina/métodos , DNA/metabolismo , Epitopos , Humanos , Camundongos , Ligação Proteica , Análise de Sequência de DNA/métodos , Fatores de Transcrição/imunologiaRESUMO
Transcription factors (TFs) bind to thousands of DNA sequences in mammalian genomes, but most of these binding events appear to have no direct effect on gene expression. It is unclear why only a subset of TF bound sites are actively involved in transcriptional regulation. Moreover, the key genomic features that accurately discriminate between active and inactive TF binding events remain ambiguous. Recent studies have identified promoter-distal RNA polymerase II (RNAP2) binding at enhancer elements, suggesting that these interactions may serve as a marker for active regulatory sequences. Despite these correlative analyses, a thorough functional validation of these genomic co-occupancies is still lacking. To characterize the gene regulatory activity of DNA sequences underlying promoter-distal TF binding events that co-occur with RNAP2 and TF sites devoid of RNAP2 occupancy using a functional reporter assay, we performed cis-regulatory element sequencing (CRE-seq). We tested more than 1000 promoter-distal CCAAT/enhancer-binding protein beta (CEBPB)-bound sites in HepG2 and K562 cells, and found that CEBPB-bound sites co-occurring with RNAP2 were more likely to exhibit enhancer activity. CEBPB-bound sites further maintained substantial cell-type specificity, indicating that local DNA sequence can accurately convey cell-type-specific regulatory information. By comparing our CRE-seq results to a comprehensive set of genome annotations, we identified a variety of genomic features that are strong predictors of regulatory element activity and cell-type-specific activity. Collectively, our functional assay results indicate that RNAP2 occupancy can be used as a key genomic marker that can distinguish active from inactive TF bound sites.
Assuntos
Sítios de Ligação , Proteína beta Intensificadora de Ligação a CCAAT/metabolismo , Regiões Promotoras Genéticas , RNA Polimerase II/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Células Hep G2 , Histonas/metabolismo , Humanos , Células K562 , Especificidade de Órgãos/genética , Ligação Proteica , Elementos de Resposta , Análise de Sequência de DNARESUMO
Chromatin immunoprecipitation followed by next-generation DNA sequencing (ChIP-seq) is a widely used technique for identifying transcription factor (TF) binding events throughout an entire genome. However, ChIP-seq is limited by the availability of suitable ChIP-seq grade antibodies, and the vast majority of commercially available antibodies fail to generate usable data sets. To ameliorate these technical obstacles, we present a robust methodological approach for performing ChIP-seq through epitope tagging of endogenous TFs. We used clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-based genome editing technology to develop CRISPR epitope tagging ChIP-seq (CETCh-seq) of DNA-binding proteins. We assessed the feasibility of CETCh-seq by tagging several DNA-binding proteins spanning a wide range of endogenous expression levels in the hepatocellular carcinoma cell line HepG2. Our data exhibit strong correlations between both replicate types as well as with standard ChIP-seq approaches that use TF antibodies. Notably, we also observed minimal changes to the cellular transcriptome and to the expression of the tagged TF. To examine the robustness of our technique, we further performed CETCh-seq in the breast adenocarcinoma cell line MCF7 as well as mouse embryonic stem cells and observed similarly high correlations. Collectively, these data highlight the applicability of CETCh-seq to accurately define the genome-wide binding profiles of DNA-binding proteins, allowing for a straightforward methodology to potentially assay the complete repertoire of TFs, including the large fraction for which ChIP-quality antibodies are not available.
Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Proteínas de Ligação a DNA/imunologia , Mapeamento de Epitopos , Análise de Sequência com Séries de Oligonucleotídeos , Animais , Mapeamento de Epitopos/métodos , Epitopos/análise , Estudos de Viabilidade , Perfilação da Expressão Gênica , Humanos , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fatores de Transcrição/análise , Fatores de Transcrição/imunologia , Transcriptoma , Células Tumorais CultivadasRESUMO
Most human transcription factors bind a small subset of potential genomic sites and often use different subsets in different cell types. To identify mechanisms that govern cell-type-specific transcription factor binding, we used an integrative approach to study estrogen receptor α (ER). We found that ER exhibits two distinct modes of binding. Shared sites, bound in multiple cell types, are characterized by high-affinity estrogen response elements (EREs), inaccessible chromatin, and a lack of DNA methylation, while cell-specific sites are characterized by a lack of EREs, co-occurrence with other transcription factors, and cell-type-specific chromatin accessibility and DNA methylation. These observations enabled accurate quantitative models of ER binding that suggest tethering of ER to one-third of cell-specific sites. The distinct properties of cell-specific binding were also observed with glucocorticoid receptor and for ER in primary mouse tissues, representing an elegant genomic encoding scheme for generating cell-type-specific gene regulation.
Assuntos
Receptor alfa de Estrogênio/metabolismo , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Animais , Sítios de Ligação , Linhagem Celular , Sequência Conservada , Metilação de DNA , Estradiol/farmacologia , Receptor alfa de Estrogênio/efeitos dos fármacos , Receptor alfa de Estrogênio/genética , Estrogênios/farmacologia , Evolução Molecular , Regulação da Expressão Gênica , Humanos , Camundongos , Modelos Biológicos , Regiões Promotoras Genéticas/efeitos dos fármacos , Interferência de RNA , Receptores de Glucocorticoides/genética , Receptores de Glucocorticoides/metabolismo , Elementos de Resposta , Termodinâmica , Fatores de Transcrição/genética , TransfecçãoRESUMO
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Assuntos
DNA/genética , Enciclopédias como Assunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Alelos , Linhagem Celular , Fator de Transcrição GATA1/metabolismo , Perfilação da Expressão Gênica , Genômica , Humanos , Células K562 , Especificidade de Órgãos , Fosforilação/genética , Polimorfismo de Nucleotídeo Único/genética , Mapas de Interação de Proteínas , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Seleção Genética/genética , Sítio de Iniciação de TranscriçãoRESUMO
BACKGROUND: The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1. RESULTS: In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif. CONCLUSIONS: The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.
Assuntos
Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Fator de Transcrição YY1/metabolismo , Sítios de Ligação , Linhagem Celular , Genoma Humano , Humanos , Motivos de Nucleotídeos , Sítio de Iniciação de TranscriçãoRESUMO
Breast cancer is a major cause of morbidity and mortality in women and its metastatic spread is the principal reason behind the fatal outcome. Metastasis-related research of breast cancer is however underdeveloped when compared with the abundant literature on primary tumors. We applied an unexplored approach comparing at high resolution the genomic profiles of primary tumors and synchronous axillary lymph node metastases from 13 patients with breast cancer. Overall, primary tumors displayed 20% higher number of aberrations than metastases. In all but two patients, we detected in total 157 statistically significant differences between primary lesions and matched metastases. We further observed differences that can be linked to metastatic disease and there was also an overlapping pattern of changes between different patients. Many of the differences described here have been previously linked to poor patient survival, suggesting that this is a viable approach toward finding biomarkers for disease progression and definition of new targets useful for development of anticancer drugs. Frequent genetic differences between primary tumors and metastases in breast cancer also question, at least to some extent, the role of primary tumors as a surrogate subject of study for the systemic disease.
Assuntos
Biomarcadores Tumorais/análise , Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Progressão da Doença , Metástase Linfática/genética , Adulto , Idoso , Cromossomos Humanos Par 11/genética , Variações do Número de Cópias de DNA/genética , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Humanos , Pessoa de Meia-Idade , Análise de Sequência com Séries de OligonucleotídeosRESUMO
Two major types of genetic variation are known: single nucleotide polymorphisms (SNPs), and a more recently discovered structural variation, involving changes in copy number (CNVs) of kilobase- to megabase-sized chromosomal segments. It is unknown whether CNVs arise in somatic cells, but it is, however, generally assumed that normal cells are genetically identical. We tested 34 tissue samples from three subjects and, having analyzed for each tissue < or =10(-6) of all cells expected in an adult human, we observed at least six CNVs, affecting a single organ or one or more tissues of the same subject. The CNVs ranged from 82 to 176 kb, often encompassing known genes, potentially affecting gene function. Our results indicate that humans are commonly affected by somatic mosaicism for stochastic CNVs, which occur in a substantial fraction of cells. The majority of described CNVs were previously shown to be polymorphic between unrelated subjects, suggesting that some CNVs previously reported as germline might represent somatic events, since in most studies of this kind, only one tissue is typically examined and analysis of parents for the studied subjects is not routinely performed. A considerable number of human phenotypes are a consequence of a somatic process. Thus, our conclusions will be important for the delineation of genetic factors behind these phenotypes. Consequently, biobanks should consider sampling multiple tissues to better address mosaicism in the studies of somatic disorders.
Assuntos
Dosagem de Genes , Mosaicismo , Polimorfismo Genético , Adulto , Cromossomos Humanos , Predisposição Genética para Doença , Genômica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Especificidade de Órgãos , Distribuição TecidualRESUMO
The exploration of copy-number variation (CNV), notably of somatic cells, is an understudied aspect of genome biology. Any differences in the genetic makeup between twins derived from the same zygote represent an irrefutable example of somatic mosaicism. We studied 19 pairs of monozygotic twins with either concordant or discordant phenotype by using two platforms for genome-wide CNV analyses and showed that CNVs exist within pairs in both groups. These findings have an impact on our views of genotypic and phenotypic diversity in monozygotic twins and suggest that CNV analysis in phenotypically discordant monozygotic twins may provide a powerful tool for identifying disease-predisposition loci. Our results also imply that caution should be exercised when interpreting disease causality of de novo CNVs found in patients based on analysis of a single tissue in routine disease-related DNA diagnostics.