RESUMEN
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Asunto(s)
Genoma Humano , Genómica , Humanos , Diploidia , Genoma Humano/genética , Haplotipos/genética , Análisis de Secuencia de ADN , Genómica/normas , Estándares de Referencia , Estudios de Cohortes , Alelos , Variación GenéticaRESUMEN
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Asunto(s)
Mapeo Cromosómico , Diploidia , Genoma Humano , Genómica , Humanos , Mapeo Cromosómico/normas , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Estándares de Referencia , Genómica/métodos , Genómica/normas , Cromosomas Humanos/genética , Variación Genética/genéticaRESUMEN
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.
Asunto(s)
Genoma Humano , Genómica , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADNRESUMEN
We report the results of whole-genome and transcriptome sequencing of tumor and adjacent normal tissue samples from 17 patients with non-small cell lung carcinoma (NSCLC). We identified 3,726 point mutations and more than 90 indels in the coding sequence, with an average mutation frequency more than 10-fold higher in smokers than in never-smokers. Novel alterations in genes involved in chromatin modification and DNA repair pathways were identified, along with DACH1, CFTR, RELN, ABCB5, and HGF. Deep digital sequencing revealed diverse clonality patterns in both never-smokers and smokers. All validated EFGR and KRAS mutations were present in the founder clones, suggesting possible roles in cancer initiation. Analysis revealed 14 fusions, including ROS1 and ALK, as well as novel metabolic enzymes. Cell-cycle and JAK-STAT pathways are significantly altered in lung cancer, along with perturbations in 54 genes that are potentially targetable with currently available drugs.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/patología , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Fumar/genética , Fumar/patología , Carcinoma de Pulmón de Células no Pequeñas/terapia , Aberraciones Cromosómicas , Femenino , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Neoplasias Pulmonares/terapia , Masculino , Terapia Molecular Dirigida , Mutación Puntual , Proteína ReelinaRESUMEN
Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability and drive clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of M3-AML samples with a known initiating event (PML-RARA) versus the genomes of normal karyotype M1-AML samples and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is "captured" as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
Asunto(s)
Evolución Clonal , Leucemia Mieloide Aguda/genética , Mutación , Adulto , Anciano , Análisis Mutacional de ADN , Progresión de la Enfermedad , Femenino , Estudio de Asociación del Genoma Completo , Células Madre Hematopoyéticas/metabolismo , Humanos , Leucemia Mieloide Aguda/fisiopatología , Masculino , Persona de Mediana Edad , Proteínas de Fusión Oncogénica/genética , Recurrencia , Piel/metabolismo , Adulto JovenRESUMEN
Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency < 1%) predicted damaging coding variation by using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.
Asunto(s)
Exoma , Variación Genética , Estudio de Asociación del Genoma Completo , Lípidos/sangre , Sistemas de Lectura Abierta , Alelos , Glucemia/genética , Estudios de Casos y Controles , Biología Computacional/métodos , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Humanos , Metabolismo de los Lípidos/genética , Hígado/metabolismo , Hígado/patología , Anotación de Secuencia Molecular , Herencia Multifactorial , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Studies of Y Chromosome evolution have focused primarily on gene decay, a consequence of suppression of crossing-over with the X Chromosome. Here, we provide evidence that suppression of X-Y crossing-over unleashed a second dynamic: selfish X-Y arms races that reshaped the sex chromosomes in mammals as different as cattle, mice, and men. Using super-resolution sequencing, we explore the Y Chromosome of Bos taurus (bull) and find it to be dominated by massive, lineage-specific amplification of testis-expressed gene families, making it the most gene-dense Y Chromosome sequenced to date. As in mice, an X-linked homolog of a bull Y-amplified gene has become testis-specific and amplified. This evolutionary convergence implies that lineage-specific X-Y coevolution through gene amplification, and the selfish forces underlying this phenomenon, were dominatingly powerful among diverse mammalian lineages. Together with Y gene decay, X-Y arms races molded mammalian sex chromosomes and influenced the course of mammalian evolution.
Asunto(s)
Análisis de Secuencia de ADN/veterinaria , Cromosoma X/genética , Cromosoma Y/genética , Animales , Bovinos , Linaje de la Célula , Intercambio Genético , Evolución Molecular , Femenino , Amplificación de Genes , Humanos , Masculino , Ratones , Especificidad de Órganos , Testículo/químicaRESUMEN
A correction to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
The Alzheimer's Disease Sequencing Project (ADSP) undertook whole exome sequencing in 5,740 late-onset Alzheimer disease (AD) cases and 5,096 cognitively normal controls primarily of European ancestry (EA), among whom 218 cases and 177 controls were Caribbean Hispanic (CH). An age-, sex- and APOE based risk score and family history were used to select cases most likely to harbor novel AD risk variants and controls least likely to develop AD by age 85 years. We tested ~1.5 million single nucleotide variants (SNVs) and 50,000 insertion-deletion polymorphisms (indels) for association to AD, using multiple models considering individual variants as well as gene-based tests aggregating rare, predicted functional, and loss of function variants. Sixteen single variants and 19 genes that met criteria for significant or suggestive associations after multiple-testing correction were evaluated for replication in four independent samples; three with whole exome sequencing (2,778 cases, 7,262 controls) and one with genome-wide genotyping imputed to the Haplotype Reference Consortium panel (9,343 cases, 11,527 controls). The top findings in the discovery sample were also followed-up in the ADSP whole-genome sequenced family-based dataset (197 members of 42 EA families and 501 members of 157 CH families). We identified novel and predicted functional genetic variants in genes previously associated with AD. We also detected associations in three novel genes: IGHG3 (p = 9.8 × 10-7), an immunoglobulin gene whose antibodies interact with ß-amyloid, a long non-coding RNA AC099552.4 (p = 1.2 × 10-7), and a zinc-finger protein ZNF655 (gene-based p = 5.0 × 10-6). The latter two suggest an important role for transcriptional regulation in AD pathogenesis.
Asunto(s)
Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/inmunología , Secuenciación del Exoma , Regulación de la Expresión Génica/genética , Inmunidad/genética , Transcripción Genética/genética , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/patología , Péptidos beta-Amiloides/inmunología , Apolipoproteínas E/genética , Femenino , Haplotipos/genética , Humanos , Inmunoglobulina G , Factores de Transcripción de Tipo Kruppel/genética , Masculino , Polimorfismo Genético/genética , ARN Largo no Codificante/genéticaRESUMEN
Members of the nuclear factor-κB (NF-κB) family of transcriptional regulators are central mediators of the cellular inflammatory response. Although constitutive NF-κB signalling is present in most human tumours, mutations in pathway members are rare, complicating efforts to understand and block aberrant NF-κB activity in cancer. Here we show that more than two-thirds of supratentorial ependymomas contain oncogenic fusions between RELA, the principal effector of canonical NF-κB signalling, and an uncharacterized gene, C11orf95. In each case, C11orf95-RELA fusions resulted from chromothripsis involving chromosome 11q13.1. C11orf95-RELA fusion proteins translocated spontaneously to the nucleus to activate NF-κB target genes, and rapidly transformed neural stem cells--the cell of origin of ependymoma--to form these tumours in mice. Our data identify a highly recurrent genetic alteration of RELA in human cancer, and the C11orf95-RELA fusion protein as a potential therapeutic target in supratentorial ependymoma.
Asunto(s)
Transformación Celular Neoplásica , Ependimoma/genética , Ependimoma/metabolismo , FN-kappa B/metabolismo , Proteínas/metabolismo , Transducción de Señal , Factor de Transcripción ReIA/metabolismo , Proteínas Adaptadoras Transductoras de Señales/genética , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Animales , Secuencia de Bases , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Neoplasias Encefálicas/patología , Línea Celular , Núcleo Celular/metabolismo , Transformación Celular Neoplásica/genética , Cromosomas Humanos Par 11/genética , Ependimoma/patología , Femenino , Humanos , Ratones , Modelos Genéticos , Datos de Secuencia Molecular , FN-kappa B/genética , Células-Madre Neurales/metabolismo , Células-Madre Neurales/patología , Proteínas de Fusión Oncogénica/genética , Proteínas de Fusión Oncogénica/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Proteínas/genética , Factor de Transcripción ReIA/genética , Factores de Transcripción , Translocación Genética/genética , Proteínas Señalizadoras YAPRESUMEN
Retinoblastoma is an aggressive childhood cancer of the developing retina that is initiated by the biallelic loss of RB1. Tumours progress very quickly following RB1 inactivation but the underlying mechanism is not known. Here we show that the retinoblastoma genome is stable, but that multiple cancer pathways can be epigenetically deregulated. To identify the mutations that cooperate with RB1 loss, we performed whole-genome sequencing of retinoblastomas. The overall mutational rate was very low; RB1 was the only known cancer gene mutated. We then evaluated the role of RB1 in genome stability and considered non-genetic mechanisms of cancer pathway deregulation. For example, the proto-oncogene SYK is upregulated in retinoblastoma and is required for tumour cell survival. Targeting SYK with a small-molecule inhibitor induced retinoblastoma tumour cell death in vitro and in vivo. Thus, retinoblastomas may develop quickly as a result of the epigenetic deregulation of key cancer pathways as a direct or indirect result of RB1 loss.
Asunto(s)
Epigénesis Genética/genética , Genómica , Terapia Molecular Dirigida , Inhibidores de Proteínas Quinasas/farmacología , Retinoblastoma/tratamiento farmacológico , Retinoblastoma/genética , Aneuploidia , Animales , Muerte Celular/efectos de los fármacos , Línea Celular , Supervivencia Celular/efectos de los fármacos , Inestabilidad Cromosómica/genética , Regulación Neoplásica de la Expresión Génica , Genes de Retinoblastoma/genética , Humanos , Péptidos y Proteínas de Señalización Intracelular/antagonistas & inhibidores , Péptidos y Proteínas de Señalización Intracelular/genética , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Ratones , Mutación/genética , Inhibidores de Proteínas Quinasas/uso terapéutico , Proteínas Tirosina Quinasas/antagonistas & inhibidores , Proteínas Tirosina Quinasas/genética , Proteínas Tirosina Quinasas/metabolismo , Proto-Oncogenes Mas , Retinoblastoma/patología , Proteína de Retinoblastoma/deficiencia , Proteína de Retinoblastoma/genética , Análisis de Secuencia de ADN , Quinasa Syk , Ensayos Antitumor por Modelo de XenoinjertoRESUMEN
Most patients with acute myeloid leukaemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level. To determine the mutational spectrum associated with relapse, we sequenced the primary tumour and relapse genomes from eight AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to define clonality and clonal evolution patterns precisely at relapse. In addition to discovering novel, recurrently mutated genes (for example, WAC, SMC3, DIS3, DDX41 and DAXX) in AML, we also found two major clonal evolution patterns during AML relapse: (1) the founding clone in the primary tumour gained mutations and evolved into the relapse clone, or (2) a subclone of the founding clone survived initial therapy, gained additional mutations and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific versus primary tumour mutations in all eight cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped, in part, by the chemotherapy that the patients receive to establish and maintain remissions.
Asunto(s)
Evolución Clonal/genética , Genoma Humano/genética , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patología , Antineoplásicos/efectos adversos , Antineoplásicos/uso terapéutico , Células Clonales/efectos de los fármacos , Células Clonales/metabolismo , Células Clonales/patología , Daño del ADN/efectos de los fármacos , Análisis Mutacional de ADN , Genes Relacionados con las Neoplasias/genética , Genoma Humano/efectos de los fármacos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Leucemia Mieloide Aguda/tratamiento farmacológico , Mutagénesis/efectos de los fármacos , Mutagénesis/genética , Recurrencia , Reproducibilidad de los ResultadosRESUMEN
Medulloblastoma is a malignant childhood brain tumour comprising four discrete subgroups. Here, to identify mutations that drive medulloblastoma, we sequenced the entire genomes of 37 tumours and matched normal blood. One-hundred and thirty-six genes harbouring somatic mutations in this discovery set were sequenced in an additional 56 medulloblastomas. Recurrent mutations were detected in 41 genes not yet implicated in medulloblastoma; several target distinct components of the epigenetic machinery in different disease subgroups, such as regulators of H3K27 and H3K4 trimethylation in subgroups 3 and 4 (for example, KDM6A and ZMYM3), and CTNNB1-associated chromatin re-modellers in WNT-subgroup tumours (for example, SMARCA4 and CREBBP). Modelling of mutations in mouse lower rhombic lip progenitors that generate WNT-subgroup tumours identified genes that maintain this cell lineage (DDX3X), as well as mutated genes that initiate (CDH1) or cooperate (PIK3CA) in tumorigenesis. These data provide important new insights into the pathogenesis of medulloblastoma subgroups and highlight targets for therapeutic development.
Asunto(s)
Neoplasias Cerebelosas/clasificación , Neoplasias Cerebelosas/genética , Meduloblastoma/clasificación , Meduloblastoma/genética , Mutación/genética , Animales , Antígenos CD , Proteína de Unión a CREB/genética , Cadherinas/genética , Proteínas Cdh1 , Proteínas de Ciclo Celular/deficiencia , Proteínas de Ciclo Celular/genética , Linaje de la Célula , Neoplasias Cerebelosas/patología , Niño , Fosfatidilinositol 3-Quinasa Clase I , ARN Helicasas DEAD-box/genética , Variaciones en el Número de Copia de ADN , ADN Helicasas/genética , Análisis Mutacional de ADN , Modelos Animales de Enfermedad , Genoma Humano/genética , Genómica , Proteínas Hedgehog/metabolismo , Histona Demetilasas/genética , Histonas/metabolismo , Humanos , Meduloblastoma/patología , Metilación , Ratones , Proteínas Nucleares/genética , Fosfatidilinositol 3-Quinasas/genética , Factores de Transcripción/genética , Proteínas Wnt/metabolismo , beta Catenina/genéticaRESUMEN
To correlate the variable clinical features of oestrogen-receptor-positive breast cancer with somatic alterations, we studied pretreatment tumour biopsies accrued from patients in two studies of neoadjuvant aromatase inhibitor therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to haematopoietic disorders. Mutant MAP3K1 was associated with luminal A status, low-grade histology and low proliferation rates, whereas mutant TP53 was associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon aromatase inhibitor treatment. Pathway analysis demonstrated that mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in oestrogen-receptor-positive breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumour biology, but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing.
Asunto(s)
Inhibidores de la Aromatasa/uso terapéutico , Aromatasa/metabolismo , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Genoma Humano/genética , Anastrozol , Androstadienos/farmacología , Androstadienos/uso terapéutico , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Reparación del ADN , Exoma/genética , Exones/genética , Femenino , Variación Genética/genética , Humanos , Letrozol , MAP Quinasa Quinasa 4/genética , Quinasa 1 de Quinasa de Quinasa MAP/genética , Mutación/genética , Nitrilos/farmacología , Nitrilos/uso terapéutico , Receptores de Estrógenos/metabolismo , Resultado del Tratamiento , Triazoles/farmacología , Triazoles/uso terapéuticoRESUMEN
Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (â¼ 702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.
Asunto(s)
Adaptación Fisiológica/genética , Enfermedad de Chagas , Interacciones Huésped-Parásitos/genética , Insectos Vectores , Rhodnius , Trypanosoma cruzi/fisiología , Animales , Secuencia de Bases , Transferencia de Gen Horizontal , Humanos , Insectos Vectores/genética , Insectos Vectores/parasitología , Datos de Secuencia Molecular , Rhodnius/genética , Rhodnius/parasitología , Wolbachia/genéticaRESUMEN
'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
Asunto(s)
Variación Genética , Genoma/genética , Pongo abelii/genética , Pongo pygmaeus/genética , Animales , Centrómero/genética , Cerebrósidos/metabolismo , Cromosomas , Evolución Molecular , Femenino , Reordenamiento Génico/genética , Especiación Genética , Genética de Población , Humanos , Masculino , Filogenia , Densidad de Población , Dinámica Poblacional , Especificidad de la EspecieRESUMEN
Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Asunto(s)
Evolución Clonal/genética , Células Clonales , Leucemia Mieloide Aguda/genética , Análisis de la Célula Individual , Adulto , Anciano , Línea Celular Tumoral , Femenino , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Leucemia Mieloide Aguda/patología , Masculino , Persona de Mediana Edad , Mutación , Síndromes Mielodisplásicos/genética , Síndromes Mielodisplásicos/patología , Recurrencia Local de Neoplasia/genética , Recurrencia Local de Neoplasia/patología , Polimorfismo de Nucleótido SimpleRESUMEN
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Asunto(s)
HDL-Colesterol/genética , Colesterol/genética , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Colesterol/metabolismo , HDL-Colesterol/metabolismo , Finlandia , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Desequilibrio de Ligamiento , Fenotipo , Grupos de Población , Población BlancaRESUMEN
BACKGROUND: Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns of mutations and epigenetic phenotypes are not yet clear. METHODS: We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis. RESULTS: AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes. Of these, an average of 5 are in genes that are recurrently mutated in AML. A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples. Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcription-factor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumor-suppressor genes (16%), DNA-methylation-related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%). Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories. CONCLUSIONS: We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients. The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification. (Funded by the National Institutes of Health.).