RESUMEN
Polycomb Repressive Complex 2 (PRC2) regulates key developmental genes in embryonic stem (ES) cells and during development. Here we show that Jarid2/Jumonji, a protein enriched in pluripotent cells and a founding member of the Jumonji C (JmjC) domain protein family, is a PRC2 subunit in ES cells. Genome-wide ChIP-seq analyses of Jarid2, Ezh2, and Suz12 binding reveal that Jarid2 and PRC2 occupy the same genomic regions. We further show that Jarid2 promotes PRC2 recruitment to the target genes while inhibiting PRC2 histone methyltransferase activity, suggesting that it acts as a "molecular rheostat" that finely calibrates PRC2 functions at developmental genes. Using Xenopus laevis as a model we demonstrate that Jarid2 knockdown impairs the induction of gastrulation genes in blastula embryos and results in failure of differentiation. Our findings illuminate a mechanism of histone methylation regulation in pluripotent cells and during early cell-fate transitions.
Asunto(s)
Proteínas del Tejido Nervioso/metabolismo , Proteínas Represoras/metabolismo , Animales , Células Madre Embrionarias/metabolismo , Técnicas de Silenciamiento del Gen , Humanos , Ratones , Mitocondrias/metabolismo , Complejo Represivo Polycomb 2 , Proteínas del Grupo Polycomb , ARN/metabolismo , Proteína 2 de Unión a Retinoblastoma/metabolismoRESUMEN
Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or "mutational signatures". Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features.
Asunto(s)
Análisis Mutacional de ADN/estadística & datos numéricos , Neoplasias/genética , Mutación Puntual , Algoritmos , Biomarcadores de Tumor/genética , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Femenino , Genes BRCA1 , Genes BRCA2 , Genoma Humano , Humanos , Neoplasias Pancreáticas/clasificación , Neoplasias Pancreáticas/genética , Programas InformáticosRESUMEN
The mechanisms by which the p53 tumor suppressor acts remain incompletely understood. To gain new insights into p53 biology, we used high-throughput sequencing to analyze global p53 transcriptional networks in primary mouse embryo fibroblasts in response to DNA damage. Chromatin immunoprecipitation sequencing reveals 4785 p53-bound sites in the genome located near 3193 genes involved in diverse biological processes. RNA sequencing analysis shows that only a subset of p53-bound genes is transcriptionally regulated, yielding a list of 432 p53-bound and regulated genes. Interestingly, we identify a host of autophagy genes as direct p53 target genes. While the autophagy program is regulated predominantly by p53, the p53 family members p63 and p73 contribute to activation of this autophagy gene network. Induction of autophagy genes in response to p53 activation is associated with enhanced autophagy in diverse settings and depends on p53 transcriptional activity. While p53-induced autophagy does not affect cell cycle arrest in response to DNA damage, it is important for both robust p53-dependent apoptosis triggered by DNA damage and transformation suppression by p53. Together, our data highlight an intimate connection between p53 and autophagy through a vast transcriptional network and indicate that autophagy contributes to p53-dependent apoptosis and cancer suppression.
Asunto(s)
Autofagia/genética , Daño del ADN/genética , Fibroblastos/citología , Fibroblastos/metabolismo , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo , Regulación hacia Arriba , Animales , Puntos de Control del Ciclo Celular/genética , Supervivencia Celular/genética , Células Cultivadas , Embrión de Mamíferos , Regulación del Desarrollo de la Expresión Génica/genética , Estudio de Asociación del Genoma Completo , Ratones , Unión Proteica , Análisis de Secuencia de ARNRESUMEN
In read cloud approaches, microfluidic partitioning of long genomic DNA fragments and barcoding of shorter fragments derived from these fragments retains long-range information in short sequencing reads. This combination of short reads with long-range information represents a powerful alternative to single-molecule long-read sequencing. We develop Genome-wide Reconstruction of Complex Structural Variants (GROC-SVs) for SV detection and assembly from read cloud data and apply this method to Illumina-sequenced 10x Genomics sarcoma and breast cancer data sets. Compared with short-fragment sequencing, GROC-SVs substantially improves the specificity of breakpoint detection at comparable sensitivity. This approach also performs sequence assembly across multiple breakpoints simultaneously, enabling the reconstruction of events exhibiting remarkable complexity. We show that chromothriptic rearrangements occurred before copy number amplifications, and that rates of single-nucleotide variants and SVs are not correlated. Our results support the use of read cloud approaches to advance the characterization of large and complex structural variation.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Análisis Mutacional de ADN/métodos , Variación Genética/genética , Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodosRESUMEN
BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls. RESULTS: To address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is. CONCLUSIONS: HAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.
Asunto(s)
Genoma Humano , Haplotipos , Mutación , Programas Informáticos , Algoritmos , Biología Computacional , Análisis Mutacional de ADN , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
Evolutionary mechanisms in cancer progression give tumors their individuality. Cancer evolution is different from organismal evolution, however, and we discuss where concepts from evolutionary genetics are useful or limited in facilitating an understanding of cancer. Based on these concepts we construct and apply the simplest plausible model of tumor growth and progression. Simulations using this simple model illustrate the importance of stochastic events early in tumorigenesis, highlight the dominance of exponential growth over linear growth and differentiation, and explain the clonal substructure of tumors.
Asunto(s)
Neoplasias/genética , Neoplasias/patología , Animales , Diferenciación Celular , Progresión de la Enfermedad , Heterogeneidad Genética , Humanos , Modelos Biológicos , Mutación , Neoplasias/etiologíaRESUMEN
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.
Asunto(s)
Variación Genética , Genoma Humano , Análisis de Secuencia de ADN/métodos , Algoritmos , Carcinoma Ductal/genética , Carcinoma Ductal de Mama/genética , Fragmentación del ADN , Humanos , Alineación de Secuencia/métodosRESUMEN
We present the discovery of genes recurrently involved in structural variation in nasopharyngeal carcinoma (NPC) and the identification of a novel type of somatic structural variant. We identified the variants with high complexity mate-pair libraries and a novel computational algorithm specifically designed for tumor-normal comparisons, SMASH. SMASH combines signals from split reads and mate-pair discordance to detect somatic structural variants. We demonstrate a >90% validation rate and a breakpoint reconstruction accuracy of 3 bp by Sanger sequencing. Our approach identified three in-frame gene fusions (YAP1-MAML2, PTPLB-RSRC1, and SP3-PTK2) that had strong levels of expression in corresponding NPC tissues. We found two cases of a novel type of structural variant, which we call "coupled inversion," one of which produced the YAP1-MAML2 fusion. To investigate whether the identified fusion genes are recurrent, we performed fluorescent in situ hybridization (FISH) to screen 196 independent NPC cases. We observed recurrent rearrangements of MAML2 (three cases), PTK2 (six cases), and SP3 (two cases), corresponding to a combined rate of structural variation recurrence of 6% among tested NPC tissues.
Asunto(s)
Regulación Neoplásica de la Expresión Génica , Variación Estructural del Genoma , Neoplasias Nasofaríngeas/genética , Proteínas de Fusión Oncogénica/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Carcinoma , Proteínas de Unión al ADN/genética , Quinasa 1 de Adhesión Focal/genética , Fusión Génica/genética , Humanos , Hidroliasas , Hibridación Fluorescente in Situ , Proteínas de la Membrana/genética , Carcinoma Nasofaríngeo , Neoplasias Nasofaríngeas/patología , Proteínas Nucleares/genética , Fosfoproteínas/genética , Proteínas Tirosina Fosfatasas/genética , Factor de Transcripción Sp3/genética , Transactivadores , Factores de Transcripción/genética , Proteínas Señalizadoras YAPRESUMEN
Nucleosomes are the basic packaging units of chromatin, modulating accessibility of regulatory proteins to DNA and thus influencing eukaryotic gene regulation. Elaborate chromatin remodelling mechanisms have evolved that govern nucleosome organization at promoters, regulatory elements, and other functional regions in the genome. Analyses of chromatin landscape have uncovered a variety of mechanisms, including DNA sequence preferences, that can influence nucleosome positions. To identify major determinants of nucleosome organization in the human genome, we used deep sequencing to map nucleosome positions in three primary human cell types and in vitro. A majority of the genome showed substantial flexibility of nucleosome positions, whereas a small fraction showed reproducibly positioned nucleosomes. Certain sites that position in vitro can anchor the formation of nucleosomal arrays that have cell type-specific spacing in vivo. Our results unveil an interplay of sequence-based nucleosome preferences and non-nucleosomal factors in determining nucleosome organization within mammalian cells.
Asunto(s)
Ensamble y Desensamble de Cromatina/fisiología , Regulación de la Expresión Génica , Nucleosomas/metabolismo , Linfocitos T CD4-Positivos/metabolismo , Linfocitos T CD8-positivos/metabolismo , Células Cultivadas , Genoma Humano/genética , Granulocitos/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Nucleasa Microcócica/metabolismo , Nucleosomas/química , Nucleosomas/genética , Especificidad de Órganos , Transcripción GenéticaRESUMEN
Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative whole-genome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma.
Asunto(s)
Neoplasias de la Mama/genética , Transformación Celular Neoplásica/genética , Genoma Humano , Mutación , Alelos , Aneuploidia , Neoplasias de la Mama/patología , Carcinoma/genética , Carcinoma/patología , Progresión de la Enfermedad , Femenino , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Asunto(s)
Evolución Molecular , Genoma Humano , Mutación INDEL/genética , Genética de Población , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutagénesis Insercional , Tasa de Mutación , Polimorfismo de Nucleótido SimpleRESUMEN
UNLABELLED: Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms. AVAILABILITY AND IMPLEMENTATION: svviz is implemented in python and freely available from http://svviz.github.io/.
Asunto(s)
Variación Estructural del Genoma , Genómica/métodos , Programas Informáticos , Alelos , Alineación de SecuenciaRESUMEN
To investigate the epigenetic landscape at the interface between mother and fetus, we provide a comprehensive analysis of parent-of-origin bias in the mouse placenta. Using F1 interspecies hybrids between mus musculus (C57BL/6J) and mus musculus castaneus, we sequenced RNA from 23 individual midgestation placentas, five late stage placentas, and two yolk sac samples and then used SNPs to determine whether transcripts were preferentially generated from the maternal or paternal allele. In the placenta, we find 103 genes that show significant and reproducible parent-of-origin bias, of which 78 are novel candidates. Most (96%) show a strong maternal bias which we demonstrate, via multiple mathematical models, pyrosequencing, and FISH, is not due to maternal decidual contamination. Analysis of the X chromosome also reveals paternal expression of Xist and several genes that escape inactivation, most significantly Alas2, Fhl1, and Slc38a5. Finally, sequencing individual placentas allowed us to reveal notable expression similarity between littermates. In all, we observe a striking preference for maternal transcription in the midgestation mouse placenta and a dynamic imprinting landscape in extraembryonic tissues, reflecting the complex nature of epigenetic pathways in the placenta.
Asunto(s)
Cromosomas de los Mamíferos/genética , Impresión Genómica , Placenta/metabolismo , Cromosoma X/genética , 5-Aminolevulinato Sintetasa/genética , Sistemas de Transporte de Aminoácidos Neutros/genética , Animales , Análisis por Conglomerados , Femenino , Regulación del Desarrollo de la Expresión Génica , Edad Gestacional , Hibridación Genética , Patrón de Herencia , Péptidos y Proteínas de Señalización Intracelular/genética , Proteínas con Dominio LIM/genética , Masculino , Ratones , Ratones Endogámicos C57BL , Proteínas Musculares/genética , Placenta/embriología , Placentación , Polimorfismo de Nucleótido Simple , Embarazo , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN/métodos , Especificidad de la Especie , Transcriptoma , Inactivación del Cromosoma XRESUMEN
Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.
Asunto(s)
Ensamble y Desensamble de Cromatina , Heterogeneidad Genética , Secuencias Reguladoras de Ácidos Nucleicos , Sitios de Unión/genética , Línea Celular , Análisis por Conglomerados , Biología Computacional/métodos , Humanos , Células K562 , Nucleosomas/genética , Nucleosomas/metabolismo , Unión Proteica , Programas Informáticos , Sitio de Iniciación de la TranscripciónRESUMEN
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Genoma/genética , Genómica/métodos , Guías como Asunto , Histonas/metabolismo , Humanos , Internet , Factores de Transcripción/metabolismoRESUMEN
BACKGROUND: High-occupancy target (HOT) regions are compact genome loci occupied by many different transcription factors (TFs). HOT regions were initially defined in invertebrate model organisms, and we here show that they are a ubiquitous feature of the human gene-regulation landscape. RESULTS: We identified HOT regions by a comprehensive analysis of ChIP-seq data from 96 DNA-associated proteins in 5 human cell lines. Most HOT regions co-localize with RNA polymerase II binding sites, but many are not near the promoters of annotated genes. At HOT promoters, TF occupancy is strongly predictive of transcription preinitiation complex recruitment and moderately predictive of initiating Pol II recruitment, but only weakly predictive of elongating Pol II and RNA transcript abundance. TF occupancy varies quantitatively within human HOT regions; we used this variation to discover novel associations between TFs. The sequence motif associated with any given TF's direct DNA binding is somewhat predictive of its empirical occupancy, but a great deal of occupancy occurs at sites without the TF's motif, implying indirect recruitment by another TF whose motif is present. CONCLUSIONS: Mammalian HOT regions are regulatory hubs that integrate the signals from diverse regulatory pathways to quantitatively tune the promoter for RNA polymerase II recruitment.
Asunto(s)
ARN Polimerasa II/metabolismo , Factores de Transcripción/genética , Línea Celular , Inmunoprecipitación de Cromatina , Análisis por Conglomerados , Sitios Genéticos , Genoma Humano , Células HeLa , Células Hep G2 , Humanos , Células K562 , Regiones Promotoras Genéticas , Unión Proteica , Análisis de Secuencia de ADNRESUMEN
ProPhylER (Protein Phylogeny and Evolutionary Rates) is a next-generation curated proteome resource that uses comparative sequence analysis to predict constraint and mutation impact for eukaryotic proteins. Its purpose is to inform any research program for which protein function and structure are relevant, by the predictive power of evolutionary constraint analyses. ProPhylER currently has nearly 9000 clusters of related proteins, including more than 200,000 sequences. It serves data via two interfaces. The "ProPhylER Interface" displays predictive analyses in sequence space; the "CrystalPainter" maps evolutionary constraints onto solved protein structures. Here we summarize ProPhylER's data content and analysis pipeline, demonstrate the use of ProPhylER's interfaces, and evaluate ProPhylER's unique regional analysis of evolutionary constraint. The high accuracy of ProPhylER's regional analysis complements the high resolution of its single-site analysis to effectively guide and inform structure-function investigations and predict the impact of polymorphisms.
Asunto(s)
Bases de Datos de Proteínas , Eucariontes , Evolución Molecular , Internet , Filogenia , Proteínas , Eucariontes/genética , Eucariontes/metabolismo , Polimorfismo de Nucleótido Simple , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Relación Estructura-Actividad , Interfaz Usuario-ComputadorRESUMEN
Here, we demonstrate how comparative sequence analysis facilitates genome-wide base-pair-level interpretation of individual genetic variation and address two questions of importance for human personal genomics: first, whether an individual's functional variation comes mostly from noncoding or coding polymorphisms; and, second, whether population-specific or globally-present polymorphisms contribute more to functional variation in any given individual. Neither has been definitively answered by analyses of existing variation data because of a focus on coding polymorphisms, ascertainment biases in favor of common variation, and a lack of base-pair-level resolution for identifying functional variants. We resequenced 575 amplicons within 432 individuals at genomic sites enriched for evolutionary constraint and also analyzed variation within three published human genomes. We find that single-site measures of evolutionary constraint derived from mammalian multiple sequence alignments are strongly predictive of reductions in modern-day genetic diversity across a range of annotation categories and across the allele frequency spectrum from rare (<1%) to high frequency (>10% minor allele frequency). Furthermore, we show that putatively functional variation in an individual genome is dominated by polymorphisms that do not change protein sequence and that originate from our shared ancestral population and commonly segregate in human populations. These observations show that common, noncoding alleles contribute substantially to human phenotypes and that constraint-based analyses will be of value to identify phenotypically relevant variants in individual genomes.
Asunto(s)
Alelos , Frecuencia de los Genes , Variación Genética , Genoma Humano , Alineación de Secuencia , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Evolución Biológica , Pruebas Genéticas , Genoma , Genómica , Humanos , Mamíferos/genética , Fenotipo , Polimorfismo Genético , Secuencias Reguladoras de Ácidos NucleicosRESUMEN
Technological advances hold the promise of rapidly catalyzing the discovery of pathogenic variants for genetic disease. However, this possibility is tempered by limitations in interpreting the functional consequences of genetic variation at candidate loci. Here, we present a systematic approach, grounded on physiologically relevant assays, to evaluate the mutational content (125 alleles) of the 14 genes associated with Bardet-Biedl syndrome (BBS). A combination of in vivo assays with subsequent in vitro validation suggests that a significant fraction of BBS-associated mutations have a dominant-negative mode of action. Moreover, we find that a subset of common alleles, previously considered to be benign, are, in fact, detrimental to protein function and can interact with strong rare alleles to modulate disease presentation. These data represent a comprehensive evaluation of genetic load in a multilocus disease. Importantly, superimposition of these results to human genetics data suggests a previously underappreciated complexity in disease architecture that might be shared among diverse clinical phenotypes.
Asunto(s)
Síndrome de Bardet-Biedl/genética , Mutación , Alelos , Animales , Femenino , Regulación de la Expresión Génica , Humanos , Masculino , Modelos Animales , Linaje , Fenotipo , Pez Cebra/embriología , Pez Cebra/genéticaRESUMEN
We outline the features of the R package SparseSignatures and its application to determine the signatures contributing to mutation profiles of tumor samples. We describe installation details and illustrate a step-by-step approach to (1) prepare the data for signature analysis, (2) determine the optimal parameters, and (3) employ them to determine the signatures and related exposure levels in the point mutation dataset. For complete details on the use and execution of this protocol, please refer to Lal et al. (2021).