RESUMEN
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Asunto(s)
Exoma/genética , Variación Genética/genética , Análisis Mutacional de ADN , Conjuntos de Datos como Asunto , Humanos , Fenotipo , Proteoma/genética , Enfermedades Raras/genética , Tamaño de la MuestraRESUMEN
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
Asunto(s)
Alelos , Apolipoproteínas A/genética , Exoma/genética , Predisposición Genética a la Enfermedad/genética , Infarto del Miocardio/genética , Receptores de LDL/genética , Factores de Edad , Edad de Inicio , Apolipoproteína A-V , Estudios de Casos y Controles , LDL-Colesterol/sangre , Enfermedad de la Arteria Coronaria/genética , Femenino , Genética de Población , Heterocigoto , Humanos , Masculino , Persona de Mediana Edad , Mutación/genética , Infarto del Miocardio/sangre , National Heart, Lung, and Blood Institute (U.S.) , Triglicéridos/sangre , Estados UnidosRESUMEN
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
Asunto(s)
Heterogeneidad Genética , Mutación/genética , Neoplasias/genética , Oncogenes/genética , Artefactos , Momento de Replicación del ADN , Exoma/genética , Reacciones Falso Positivas , Expresión Génica , Genoma Humano/genética , Humanos , Neoplasias Pulmonares/genética , Tasa de Mutación , Neoplasias/clasificación , Neoplasias/patología , Neoplasias de Células Escamosas/genética , Reproducibilidad de los Resultados , Tamaño de la MuestraRESUMEN
The naked mole rat (Heterocephalus glaber) is a strictly subterranean, extraordinarily long-lived eusocial mammal. Although it is the size of a mouse, its maximum lifespan exceeds 30 years, making this animal the longest-living rodent. Naked mole rats show negligible senescence, no age-related increase in mortality, and high fecundity until death. In addition to delayed ageing, they are resistant to both spontaneous cancer and experimentally induced tumorigenesis. Naked mole rats pose a challenge to the theories that link ageing, cancer and redox homeostasis. Although characterized by significant oxidative stress, the naked mole rat proteome does not show age-related susceptibility to oxidative damage or increased ubiquitination. Naked mole rats naturally reside in large colonies with a single breeding female, the 'queen', who suppresses the sexual maturity of her subordinates. They also live in full darkness, at low oxygen and high carbon dioxide concentrations, and are unable to sustain thermogenesis nor feel certain types of pain. Here we report the sequencing and analysis of the naked mole rat genome, which reveals unique genome features and molecular adaptations consistent with cancer resistance, poikilothermy, hairlessness and insensitivity to low oxygen, and altered visual function, circadian rythms and taste sensing. This information provides insights into the naked mole rat's exceptional longevity and ability to live in hostile conditions, in the dark and at low oxygen. The extreme traits of the naked mole rat, together with the reported genome and transcriptome information, offer opportunities for understanding ageing and advancing other areas of biological and biomedical research.
Asunto(s)
Adaptación Fisiológica/genética , Genoma/genética , Longevidad/genética , Ratas Topo/genética , Ratas Topo/fisiología , Envejecimiento/genética , Secuencia de Aminoácidos , Animales , Regulación de la Temperatura Corporal/genética , Dióxido de Carbono/análisis , Dióxido de Carbono/metabolismo , Ritmo Circadiano/genética , Oscuridad , Genes/genética , Inestabilidad Genómica/genética , Genómica , Humanos , Canales Iónicos/genética , Longevidad/fisiología , Masculino , Proteínas Mitocondriales/genética , Datos de Secuencia Molecular , Mutagénesis/genética , Oxígeno/análisis , Oxígeno/metabolismo , Gusto/genética , Transcriptoma/genética , Proteína Desacopladora 1 , Percepción Visual/genéticaRESUMEN
Osteosarcoma is the most common primary bone tumor, yet there have been no substantial advances in treatment or survival in three decades. We examined 59 tumor/normal pairs by whole-exome, whole-genome, and RNA-sequencing. Only the TP53 gene was mutated at significant frequency across all samples. The mean nonsilent somatic mutation rate was 1.2 mutations per megabase, and there was a median of 230 somatic rearrangements per tumor. Complex chains of rearrangements and localized hypermutation were detected in almost all cases. Given the intertumor heterogeneity, the extent of genomic instability, and the difficulty in acquiring a large sample size in a rare tumor, we used several methods to identify genomic events contributing to osteosarcoma survival. Pathway analysis, a heuristic analytic algorithm, a comparative oncology approach, and an shRNA screen converged on the phosphatidylinositol 3-kinase/mammalian target of rapamycin (PI3K/mTOR) pathway as a central vulnerability for therapeutic exploitation in osteosarcoma. Osteosarcoma cell lines are responsive to pharmacologic and genetic inhibition of the PI3K/mTOR pathway both in vitro and in vivo.
Asunto(s)
Neoplasias Óseas/metabolismo , Genoma Humano , Osteosarcoma/metabolismo , Fosfatidilinositol 3-Quinasas/metabolismo , Serina-Treonina Quinasas TOR/metabolismo , Neoplasias Óseas/genética , Neoplasias Óseas/patología , Línea Celular Tumoral , Proliferación Celular , Heterogeneidad Genética , Mutación de Línea Germinal , Humanos , Osteosarcoma/genética , Osteosarcoma/patología , Proteína p53 Supresora de Tumor/genéticaRESUMEN
Large-scale population sequencing studies provide a complete picture of human genetic variation within the studied populations. A key challenge is to identify, among the myriad alleles, those variants that have an effect on molecular function, phenotypes, and reproductive fitness. Most non-neutral variation consists of deleterious alleles segregating at low population frequency due to incessant mutation. To date, studies characterizing selection against deleterious alleles have been based on allele frequency (testing for a relative excess of rare alleles) or ratio of polymorphism to divergence (testing for a relative increase in the number of polymorphic alleles). Here, starting from Maruyama's theoretical prediction (Maruyama T (1974), Am J Hum Genet USA 6:669-673) that a (slightly) deleterious allele is, on average, younger than a neutral allele segregating at the same frequency, we devised an approach to characterize selection based on allelic age. Unlike existing methods, it compares sets of neutral and deleterious sequence variants at the same allele frequency. When applied to human sequence data from the Genome of the Netherlands Project, our approach distinguishes low-frequency coding non-synonymous variants from synonymous and non-coding variants at the same allele frequency and discriminates between sets of variants independently predicted to be benign or damaging for protein structure and function. The results confirm the abundance of slightly deleterious coding variation in humans.
Asunto(s)
Alelos , Flujo Genético , Variación Genética , Genética de Población , Selección Genética , Evolución Molecular , Frecuencia de los Genes , Genoma Humano , Humanos , Modelos Teóricos , Eliminación de SecuenciaRESUMEN
Assessing the significance of novel genetic variants revealed by DNA sequencing is a major challenge to the integration of genomic techniques with medical practice. Many variants remain difficult to classify by traditional genetic methods. Computational methods have been developed that could contribute to classifying these variants, but they have not been properly validated and are generally not considered mature enough to be used effectively in a clinical setting. We developed a computational method for predicting the effects of missense variants detected in patients with hypertrophic cardiomyopathy (HCM). We used a curated clinical data set of 74 missense variants in six genes associated with HCM to train and validate an automated predictor. The predictor is based on support vector regression and uses phylogenetic and structural features specific to genes involved in HCM. Ten-fold cross validation estimated our predictor's sensitivity at 94% (95% confidence interval: 83%-98%) and specificity at 89% (95% confidence interval: 72%-100%). This corresponds to an odds ratio of 10 for a prediction of pathogenic (95% confidence interval: 4.0-infinity), or an odds ratio of 9.9 for a prediction of benign (95% confidence interval: 4.6-21). Coverage (proportion of variants for which a prediction was made) was 57% (95% confidence interval: 49%-64%). This performance exceeds that of existing methods that are not specifically designed for HCM. The accuracy of this predictor provides support for the clinical use of automated predictions alongside family segregation and population frequency data in the interpretation of new missense variants and suggests future development of similar tools for other diseases.
Asunto(s)
Cardiomiopatía Hipertrófica/genética , Biología Computacional , Variación Genética/genética , Mutación Missense/genética , Proteínas Nucleares/genética , Predisposición Genética a la Enfermedad , HumanosRESUMEN
MOTIVATION: Proteomics presents the opportunity to provide novel insights about the global biochemical state of a tissue. However, a significant problem with current methods is that shotgun proteomics has limited success at detecting many low abundance proteins, such as transcription factors from complex mixtures of cells and tissues. The ability to assay for these proteins in the context of the entire proteome would be useful in many areas of experimental biology. RESULTS: We used network-based inference in an approach named SNIPE (Software for Network Inference of Proteomics Experiments) that selectively highlights proteins that are more likely to be active but are otherwise undetectable in a shotgun proteomic sample. SNIPE integrates spectral counts from paired case-control samples over a network neighbourhood and assesses the statistical likelihood of enrichment by a permutation test. As an initial application, SNIPE was able to select several proteins required for early murine tooth development. Multiple lines of additional experimental evidence confirm that SNIPE can uncover previously unreported transcription factors in this system. We conclude that SNIPE can enhance the utility of shotgun proteomics data to facilitate the study of poorly detected proteins in complex mixtures. AVAILABILITY AND IMPLEMENTATION: An implementation for the R statistical computing environment named snipeR has been made freely available at http://genetics.bwh.harvard.edu/snipe/. CONTACT: ssunyaev@rics.bwh.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Proteoma/análisis , Proteómica/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Ratones , Diente/metabolismoRESUMEN
Along with traditional effects of aging and carcinogen exposure-inherited DNA variation has substantial contribution to cancer risk. Extraordinary progress made in analysis of common variation with GWAS methodology does not provide sufficient resolution to understand rare variation. To fulfill missing classification for rare germline variation we assembled dataset of whole exome sequences from>2000 patients (selected cases tested negative for candidate genes and unselected cases) with different types of cancers (breast cancer, colon cancer, and cutaneous and ocular melanomas) matched to more than 7000 non-cancer controls and analyzed germline variation in known cancer predisposing genes to identify common properties of disease-associated DNA variation and aid the future searches for new cancer susceptibility genes. Cancer predisposing genes were divided into non-overlapping classes according to the mode of inheritance of the related cancer syndrome or known tumor suppressor activity. Out of all classes only genes linked to dominant syndromes presented significant rare germline variants enrichment in cases. Separate analysis of protein-truncating and missense variation in this list of genes confirmed significant prevalence of protein-truncating variants in cases only in loss-of-function tolerant genes (pLI < 0.1), while ultra-rare missense variants were significantly overrepresented in cases only in constrained genes (pLI > 0.9). In addition to findings in genetically enriched cases, we observed significant burden of rare variation in unselected cases, suggesting substantial role of inherited variation even in relatively late cancer manifestation. Taken together, our findings provide reference for distribution and types of DNA variation underlying inherited predisposition to some common cancer types.
Asunto(s)
Predisposición Genética a la Enfermedad , Mutación de Línea Germinal/genética , Neoplasias/genética , Estudios de Casos y Controles , ADN/genética , HumanosRESUMEN
BACKGROUND: MicroRNAs (miRNAs), present in most metazoans, are small non-coding RNAs that control gene expression by negatively regulating translation through binding to the 3'UTR of mRNA transcripts. Previously, experimental and computational methods were used to construct miRNA gene repositories agreeing with careful submission guidelines. RESULTS: An algorithm we developed - miRNAminer - is used for homologous conserved miRNA gene search in several animal species. Given a search query, candidate homologs from different species are tested for their known miRNA properties, such as secondary structure, energy and alignment and conservation, in order to asses their fidelity. When applying miRNAminer on seven mammalian species we identified several hundreds of high-confidence homologous miRNAs increasing the total collection of (miRbase) miRNAs, in these species, by more than 50%. miRNAminer uses stringent criteria and exhibits high sensitivity and specificity. CONCLUSION: We present - miRNAminer - the first web-server for homologous miRNA gene search in animals. miRNAminer can be used to identify conserved homolog miRNA genes and can also be used prior to depositing miRNAs in public databases. miRNAminer is available at http://pag.csail.mit.edu/mirnaminer.
Asunto(s)
MicroARNs/genética , Homología de Secuencia de Ácido Nucleico , Animales , Secuencia de Bases , Bovinos , Bases de Datos Genéticas/tendencias , Perros , Humanos , Internet/tendencias , Ratones , MicroARNs/análisis , Datos de Secuencia Molecular , Zarigüeyas , Pan troglodytes , RatasRESUMEN
Embryoid bodies (EBs) can serve as a system for evaluating pluripotency, cellular differentiation, and tissue morphogenesis. In this study, we use EBs derived from mouse embryonic stem cells (mESCs) and human amniocyte-derived induced pluripotent stem cells (hAdiPSCs) as a model for ovarian granulosa cell (GC) development and steroidogenic cell commitment. We demonstrated that spontaneously differentiated murine EBs (mEBs) and human EBs (hEBs) displayed ovarian GC markers, such as aromatase (CYP19A1), FOXL2, AMHR2, FSHR, and GJA1. Comparative microarray analysis identified both shared and unique gene expression between mEBs and the maturing mouse ovary. Gene sets related to gonadogenesis, lipid metabolism, and ovarian development were significantly overrepresented in EBs. Of the 29 genes, 15 that were differentially regulated in steroidogenic mEBs displayed temporal expression changes between embryonic, postnatal, and mature ovarian tissues by polymerase chain reaction. Importantly, both mEBs and hEBs were capable of gonadotropin-responsive estradiol (E2) synthesis in vitro (217-759 pg/mL). Live fluorescence-activated cell sorting-sorted AMHR2+ granulosa-like cells from mEBs continued to produce E2 after purification (15.3 pg/mL) and secreted significantly more E2 than AMHR2- cells (8.6 pg/mL, P < .05). We conclude that spontaneously differentiated EBs of both mESC and hAdiPSC origin can serve as a biologically relevant model for ovarian GC differentiation and steroidogenic cell commitment. These cells should be further investigated for therapeutic uses, such as stem cell-based hormone replacement therapy and in vitro maturation of oocytes.
Asunto(s)
Cuerpos Embrioides/fisiología , Células de la Granulosa/fisiología , Células Madre Pluripotentes Inducidas/fisiología , Esteroides/biosíntesis , Animales , Cuerpos Embrioides/metabolismo , Células Madre Embrionarias/metabolismo , Células Madre Embrionarias/fisiología , Femenino , Expresión Génica , Células de la Granulosa/metabolismo , Humanos , Células Madre Pluripotentes Inducidas/metabolismo , RatonesRESUMEN
Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these 'hotspot' sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.
Asunto(s)
Antígenos de Histocompatibilidad Clase I/genética , Mutación/genética , Neoplasias/genética , Biología Computacional , Análisis Mutacional de ADN , Bases de Datos Genéticas , Humanos , Programas InformáticosRESUMEN
To explore restoration of ovarian function using epigenetically-related, induced pluripotent stem cells (iPSCs), we functionally evaluated the epigenetic memory of novel iPSC lines, derived from mouse and human ovarian granulosa cells (GCs) using c-Myc, Klf4, Sox2 and Oct4 retroviral vectors. The stem cell identity of the mouse and human GC-derived iPSCs (mGriPSCs, hGriPSCs) was verified by demonstrating embryonic stem cell (ESC) antigen expression using immunocytochemistry and RT-PCR analysis, as well as formation of embryoid bodies (EBs) and teratomas that are capable of differentiating into cells from all three germ layers. GriPSCs' gene expression profiles associate more closely with those of ESCs than of the originating GCs as demonstrated by genome-wide analysis of mRNA and microRNA. A comparative analysis of EBs generated from three different mouse cell lines (mGriPSCs; fibroblast-derived iPSC, mFiPSCs; G4 embryonic stem cells, G4 mESCs) revealed that differentiated mGriPSC-EBs synthesize 10-fold more estradiol (E2) than either differentiated FiPSC- or mESC-EBs under identical culture conditions. By contrast, mESC-EBs primarily synthesize progesterone (P4) and FiPSC-EBs produce neither E2 nor P4. Differentiated mGriPSC-EBs also express ovarian markers (AMHR, FSHR, Cyp19a1, ER and Inha) as well as markers of early gametogenesis (Mvh, Dazl, Gdf9, Boule and Zp1) more frequently than EBs of the other cell lines. These results provide evidence of preferential homotypic differentiation of mGriPSCs into ovarian cell types. Collectively, our data support the hypothesis that generating iPSCs from the desired tissue type may prove advantageous due to the iPSCs' epigenetic memory.
Asunto(s)
Epigénesis Genética , Estradiol/metabolismo , Células de la Granulosa/citología , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/metabolismo , Progesterona/metabolismo , Animales , Diferenciación Celular , Células Cultivadas , Cuerpos Embrioides/citología , Cuerpos Embrioides/inmunología , Cuerpos Embrioides/metabolismo , Células Madre Embrionarias/citología , Células Madre Embrionarias/inmunología , Células Madre Embrionarias/metabolismo , Femenino , Estratos Germinativos/citología , Estratos Germinativos/inmunología , Estratos Germinativos/metabolismo , Humanos , Células Madre Pluripotentes Inducidas/inmunología , Factor 4 Similar a Kruppel , Factores de Transcripción de Tipo Kruppel/genética , Ratones , Factor 3 de Transcripción de Unión a Octámeros/genética , Proteínas Proto-Oncogénicas c-myc/genética , Retroviridae/genética , Retroviridae/inmunología , Factores de Transcripción SOXB1/genéticaRESUMEN
UNLABELLED: Pediatric Ewing sarcoma is characterized by the expression of chimeric fusions of EWS and ETS family transcription factors, representing a paradigm for studying cancers driven by transcription factor rearrangements. In this study, we describe the somatic landscape of pediatric Ewing sarcoma. These tumors are among the most genetically normal cancers characterized to date, with only EWS-ETS rearrangements identified in the majority of tumors. STAG2 loss, however, is present in more than 15% of Ewing sarcoma tumors; occurs by point mutation, rearrangement, and likely nongenetic mechanisms; and is associated with disease dissemination. Perhaps the most striking finding is the paucity of mutations in immediately targetable signal transduction pathways, highlighting the need for new therapeutic approaches to target EWS-ETS fusions in this disease. SIGNIFICANCE: We performed next-generation sequencing of Ewing sarcoma, a pediatric cancer involving bone, characterized by expression of EWS-ETS fusions. We found remarkably few mutations. However, we discovered that loss of STAG2 expression occurs in 15% of tumors and is associated with metastatic disease, suggesting a potential genetic vulnerability in Ewing sarcoma.
Asunto(s)
Antígenos Nucleares/genética , Neoplasias Óseas/genética , Sarcoma de Ewing/genética , Antígenos Nucleares/metabolismo , Neoplasias Óseas/metabolismo , Proteínas de Ciclo Celular , Línea Celular Tumoral , Niño , ADN de Neoplasias/genética , Femenino , Reordenamiento Génico , Genómica , Humanos , Masculino , Mutación , Sarcoma de Ewing/metabolismo , Análisis de Secuencia de ADNRESUMEN
Translating whole-exome sequencing (WES) for prospective clinical use may have an impact on the care of patients with cancer; however, multiple innovations are necessary for clinical implementation. These include rapid and robust WES of DNA derived from formalin-fixed, paraffin-embedded tumor tissue, analytical output similar to data from frozen samples and clinical interpretation of WES data for prospective use. Here, we describe a prospective clinical WES platform for archival formalin-fixed, paraffin-embedded tumor samples. The platform employs computational methods for effective clinical analysis and interpretation of WES data. When applied retrospectively to 511 exomes, the interpretative framework revealed a 'long tail' of somatic alterations in clinically important genes. Prospective application of this approach identified clinically relevant alterations in 15 out of 16 patients. In one patient, previously undetected findings guided clinical trial enrollment, leading to an objective clinical response. Overall, this methodology may inform the widespread implementation of precision cancer medicine.
Asunto(s)
Algoritmos , Exoma/genética , Neoplasias/genética , Medicina de Precisión/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Células HEK293 , Humanos , Massachusetts , Mutagénesis Sitio-Dirigida , Neoplasias/patología , Medicina de Precisión/tendencias , Estadísticas no ParamétricasRESUMEN
Advances in next-generation sequencing technology have enabled systematic exploration of the contribution of rare variation to Mendelian and complex diseases. Although it is well known that population stratification can generate spurious associations with common alleles, its impact on rare variant association methods remains poorly understood. Here, we performed exhaustive coalescent simulations with demographic parameters calibrated from exome sequence data to evaluate the performance of nine rare variant association methods in the presence of fine-scale population structure. We find that all methods have an inflated spurious association rate for parameter values that are consistent with levels of differentiation typical of European populations. For example, at a nominal significance level of 5%, some test statistics have a spurious association rate as high as 40%. Finally, we empirically assess the impact of population stratification in a large data set of 4,298 European American exomes. Our results have important implications for the design, analysis, and interpretation of rare variant genome-wide association studies.
Asunto(s)
Exoma , Variación Genética , Modelos Genéticos , Alelos , Frecuencia de los Genes , Genética de Población , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Componente Principal , Análisis de Secuencia de ADN , Población BlancaRESUMEN
Recent studies indicate that a subclass of APOBEC cytidine deaminases, which convert cytosine to uracil during RNA editing and retrovirus or retrotransposon restriction, may induce mutation clusters in human tumors. We show here that throughout cancer genomes APOBEC-mediated mutagenesis is pervasive and correlates with APOBEC mRNA levels. Mutation clusters in whole-genome and exome data sets conformed to the stringent criteria indicative of an APOBEC mutation pattern. Applying these criteria to 954,247 mutations in 2,680 exomes from 14 cancer types, mostly from The Cancer Genome Atlas (TCGA), showed a significant presence of the APOBEC mutation pattern in bladder, cervical, breast, head and neck, and lung cancers, reaching 68% of all mutations in some samples. Within breast cancer, the HER2-enriched subtype was clearly enriched for tumors with the APOBEC mutation pattern, suggesting that this type of mutagenesis is functionally linked with cancer development. The APOBEC mutation pattern also extended to cancer-associated genes, implying that ubiquitous APOBEC-mediated mutagenesis is carcinogenic.
Asunto(s)
Citidina Desaminasa/genética , Mutagénesis , Neoplasias/genética , Desaminasas APOBEC-1 , Neoplasias de la Mama , Transformación Celular Neoplásica/genética , Exoma , Femenino , Genoma Humano , Genómica , Humanos , Masculino , Mutación , ARN Mensajero/genética , Receptor ErbB-2/genéticaRESUMEN
The diagnosed incidence of small intestine neuroendocrine tumors (SI-NETs) is increasing, and the underlying genomic mechanisms have not yet been defined. Using exome- and genome-sequence analysis of SI-NETs, we identified recurrent somatic mutations and deletions in CDKN1B, the cyclin-dependent kinase inhibitor gene, which encodes p27. We observed frameshift mutations of CDKN1B in 14 of 180 SI-NETs, and we detected hemizygous deletions encompassing CDKN1B in 7 out of 50 SI-NETs, nominating p27 as a tumor suppressor and implicating cell cycle dysregulation in the etiology of SI-NETs.
Asunto(s)
Inhibidor p27 de las Quinasas Dependientes de la Ciclina/genética , Neoplasias Intestinales/genética , Mutación , Tumores Neuroendocrinos/genética , Ciclo Celular/genética , Estudios de Cohortes , Genes Supresores de Tumor , Predisposición Genética a la Enfermedad , Humanos , Neoplasias Intestinales/epidemiología , Neoplasias Intestinales/patología , Intestino Delgado/patología , Tumores Neuroendocrinos/epidemiología , Tumores Neuroendocrinos/patología , Análisis de Secuencia de ADNRESUMEN
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 affected individuals (cases) using a combination of whole-exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per Mb (0.48 nonsilent) and notably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, and an additional 7.1% had focal deletions), MYCN (1.7%, causing a recurrent p.Pro44Leu alteration) and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1 and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies that rely on frequently altered oncogenic drivers.