RESUMO
Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Assuntos
Exoma , Variação Genética , Proteínas , Humanos , Alelos , Exoma/genética , Sequenciamento do Exoma , Frequência do Gene , Variação Genética/genética , Heterozigoto , Mutação com Perda de Função/genética , Mutação de Sentido Incorreto/genética , Fases de Leitura Aberta/genética , Proteínas/genética , Sítios de Splice de RNA/genética , Medicina de PrecisãoRESUMO
Human mitochondrial DNA (mtDNA) is replicated and repaired by the mtDNA polymerase gamma, polγ. Polγ is composed of three subunits encoded by two nuclear genes: (1) POLG codes for the 140-kilodalton (kDa) catalytic subunit, p140 and (2) POLG2 encodes the â¼110-kDa homodimeric accessory subunit, p55. Specific mutations are associated with POLG- or POLG2-related disorders. During DNA replication the p55 accessory subunit binds to p140 and increases processivity by preventing polγ's dissociation from the template. To date, studies have demonstrated that homodimeric p55 disease variants are deficient in the ability to stimulate p140; however, all patients currently identified with POLG2-related disorders are heterozygotes. In these patients, we expect p55 to occur as 25% wild-type (WT) homodimers, 25% variant homodimers and 50% heterodimers. We report the development of a tandem affinity strategy to isolate p55 heterodimers. The WT/G451E p55 heterodimer impairs polγ function in vitro, demonstrating that the POLG2 c.1352G>A/p.G451E mutation encodes a dominant negative protein. To analyze the subcellular consequence of disease mutations in HEK293 cells, we designed plasmids encoding p55 disease variants tagged with green fluorescent protein (GFP). P205R and L475DfsX2 p55 variants exhibit irregular diffuse mitochondrial fluorescence and unlike WT p55, they fail to form distinct puncta associated with mtDNA nucleoids. Furthermore, homogenous preparations of P205R and L475DfsX2 p55 form aberrant reducible multimers. We predict that abnormal protein folding or aggregation or both contribute to the pathophysiology of these disorders. Examination of mitochondrial bioenergetics in stable cell lines overexpressing GFP-tagged p55 variants revealed impaired mitochondrial reserve capacity.
Assuntos
DNA Polimerase Dirigida por DNA/genética , DNA Polimerase Dirigida por DNA/metabolismo , Proteínas de Transporte , Linhagem Celular , Respiração Celular , DNA/metabolismo , DNA Mitocondrial/genética , DNA Mitocondrial/metabolismo , DNA Polimerase Dirigida por DNA/química , DNA Polimerase Dirigida por DNA/isolamento & purificação , Expressão Gênica , Genes Dominantes , Humanos , Mitocôndrias/metabolismo , Ligação Proteica , Multimerização Proteica , Subunidades Proteicas/metabolismo , Transporte Proteico , Proteínas Recombinantes de FusãoRESUMO
For more than 20 years, the Ethical, Legal, and Social Implications (ELSI) Program of the National Human Genome Research Institute has supported empirical and conceptual research to anticipate and address the ethical, legal, and social implications of genomics. As a component of the agency that funds much of the underlying science, the program has always been an experiment. The ever-expanding number of issues the program addresses and the relatively low level of commitment on the part of other funding agencies to support such research make setting priorities especially challenging. Program-supported studies have had a significant impact on the conduct of genomics research, the implementation of genomic medicine, and broader public policies. The program's influence is likely to grow as ELSI research, genomics research, and policy development activities become increasingly integrated. Achieving the benefits of increased integration while preserving the autonomy, objectivity, and intellectual independence of ELSI investigators presents ongoing challenges and new opportunities.
Assuntos
Genoma Humano/genética , National Human Genome Research Institute (U.S.)/ética , National Human Genome Research Institute (U.S.)/legislação & jurisprudência , Política Pública , Testes Genéticos , Humanos , National Human Genome Research Institute (U.S.)/tendências , Estados UnidosRESUMO
The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science.
Assuntos
Genômica/ética , Gestão da Informação em Saúde/ética , Gestão da Informação em Saúde/tendências , Humanos , Consentimento Livre e Esclarecido/ética , PrivacidadeRESUMO
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
RESUMO
Female mammals are functional mosaics of their parental X-linked gene expression due to X chromosome inactivation (XCI). This process inactivates one copy of the X chromosome in each cell during embryogenesis and that state is maintained clonally through mitosis. In mice, the choice of which parental X chromosome remains active is determined by the X chromosome controlling element (Xce), which has been mapped to a 176-kb candidate interval. A series of functional Xce alleles has been characterized or inferred for classical inbred strains based on biased, or skewed, inactivation of the parental X chromosomes in crosses between strains. To further explore the function structure basis and location of the Xce, we measured allele-specific expression of X-linked genes in a large population of F1 females generated from Collaborative Cross (CC) strains. Using published sequence data and applying a Bayesian "Pólya urn" model of XCI skew, we report two major findings. First, inter-individual variability in XCI suggests mouse epiblasts contain on average 20-30 cells contributing to brain. Second, CC founder strain NOD/ShiLtJ has a novel and unique functional allele, Xceg, that is the weakest in the Xce allelic series. Despite phylogenetic analysis confirming that NOD/ShiLtJ carries a haplotype almost identical to the well-characterized C57BL/6J (Xceb), we observed unexpected patterns of XCI skewing in females carrying the NOD/ShiLtJ haplotype within the Xce. Copy number variation is common at the Xce locus and we conclude that the observed allelic series is a product of independent and recurring duplications shared between weak Xce alleles.
Assuntos
Mecanismo Genético de Compensação de Dose , Inativação do Cromossomo X/genética , Cromossomo X/genética , Alelos , Animais , Teorema de Bayes , Mapeamento Cromossômico/métodos , Variações do Número de Cópias de DNA/genética , Genes Ligados ao Cromossomo X/genética , Haplótipos , Camundongos , Camundongos da Linhagem 129 , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NOD , Filogenia , RNA Longo não Codificante/genéticaRESUMO
Lung cancer is the leading cause of cancer-related mortality. While the majority of lung cancers are associated with tobacco smoke, approximately 10-15% of U.S. lung cancers occur in never smokers. Evidence suggests that lung cancer in never smokers appears to be a distinct disease caused by driver mutations which are different than the genetic pathways observed with lung cancer in smokers. A meta-analysis of human epidemiologic data was conducted to evaluate the profile of common or therapy-targetable mutations in lung cancers of never and ever smokers. Epidemiologic studies (N=167) representing over 63,000 lung cancer cases were identified and used to calculate summary odds ratios for lung cancer in never and ever smokers containing gene mutations: EGFR, chromosomal rearrangements and fusion of EML4 and ALK, and KRAS. This analysis also considered the effect of histopathology, smoking status, sex, and ethnicity. There were significantly increased odds of presenting the EGFR and ALK-EML4 mutations in 1) adenocarcinomas compared to non-small cell lung cancer and 2) never smokers compared to ever smokers. The prevalence of EGFR mutations was higher in Asian women as compared to women of Caucasian/Mixed ethnicity. As the smoking history increased, there was a decreased odds for exhibiting the EGFR mutation, particularly for cases >30 pack-years. Compared to ever smokers, never smokers had a decreased odds of KRAS mutations among those of Caucasian/Mixed ethnicity (OR=0.22, 95% CI: 0.17-0.29) and those of Asian ethnicity (OR=0.39, 95% CI: 0.30-0.50). Our findings show that key driver mutations and several patient features are highly prevalent in lung cancers of never smokers. These associations may be helpful as patient demographic models are developed to predict successful outcomes of targeted therapeutic interventions NSCLC.