Search | VHL Regional Portal

A deep catalogue of protein-coding variation in 983,578 individuals.

Sun, Kathie Y; Bai, Xiaodong; Chen, Siying; Bao, Suying; Zhang, Chuanyi; Kapoor, Manav; Backman, Joshua; Joseph, Tyler; Maxwell, Evan; Mitra, George; Gorovits, Alexander; Mansfield, Adam; Boutkov, Boris; Gokhale, Sujit; Habegger, Lukas; Marcketta, Anthony; Locke, Adam E; Ganel, Liron; Hawes, Alicia; Kessler, Michael D; Sharma, Deepika; Staples, Jeffrey; Bovijn, Jonas; Gelfman, Sahar; Di Gioia, Alessandro; Rajagopal, Veera M; Lopez, Alexander; Varela, Jennifer Rico; Alegre, Jesus; Berumen, Jaime; Tapia-Conyer, Roberto; Kuri-Morales, Pablo; Torres, Jason; Emberson, Jonathan; Collins, Rory; Cantor, Michael; Thornton, Timothy; Kang, Hyun Min; Overton, John D; Shuldiner, Alan R; Cremona, M Laura; Nafde, Mona; Baras, Aris; Abecasis, Goncalo; Marchini, Jonathan; Reid, Jeffrey G; Salerno, William; Balasubramanian, Suganthi.

Nature ; 2024 May 20.

Article in English | MEDLINE | ID: mdl-38768635

ABSTRACT

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

A deep catalog of protein-coding variation in 985,830 individuals.

Sun, Kathie Y; Bai, Xiaodong; Chen, Siying; Bao, Suying; Kapoor, Manav; Zhang, Chuanyi; Backman, Joshua; Joseph, Tyler; Maxwell, Evan; Mitra, George; Gorovits, Alexander; Mansfield, Adam; Boutkov, Boris; Gokhale, Sujit; Habegger, Lukas; Marcketta, Anthony; Locke, Adam; Kessler, Michael D; Sharma, Deepika; Staples, Jeffrey; Bovijn, Jonas; Gelfman, Sahar; Gioia, Alessandro Di; Rajagopal, Veera; Lopez, Alexander; Varela, Jennifer Rico; Alegre, Jesus; Berumen, Jaime; Tapia-Conyer, Roberto; Kuri-Morales, Pablo; Torres, Jason; Emberson, Jonathan; Collins, Rory; Cantor, Michael; Thornton, Timothy; Kang, Hyun Min; Overton, John; Shuldiner, Alan R; Cremona, M Laura; Nafde, Mona; Baras, Aris; Abecasis, Goncalo; Marchini, Jonathan; Reid, Jeffrey G; Salerno, William; Balasubramanian, Suganthi.

bioRxiv ; 2023 Nov 02.

Article in English | MEDLINE | ID: mdl-37214792

ABSTRACT

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

Bayesian modeling of skewed X inactivation in genetically diverse mice identifies a novel Xce allele associated with copy number changes.

Sun, Kathie Y; Oreper, Daniel; Schoenrock, Sarah A; McMullan, Rachel; Giusti-Rodríguez, Paola; Zhabotynsky, Vasyl; Miller, Darla R; Tarantino, Lisa M; Pardo-Manuel de Villena, Fernando; Valdar, William.

Genetics ; 218(1)2021 05 17.

Article in English | MEDLINE | ID: mdl-33693696

ABSTRACT

Female mammals are functional mosaics of their parental X-linked gene expression due to X chromosome inactivation (XCI). This process inactivates one copy of the X chromosome in each cell during embryogenesis and that state is maintained clonally through mitosis. In mice, the choice of which parental X chromosome remains active is determined by the X chromosome controlling element (Xce), which has been mapped to a 176-kb candidate interval. A series of functional Xce alleles has been characterized or inferred for classical inbred strains based on biased, or skewed, inactivation of the parental X chromosomes in crosses between strains. To further explore the function structure basis and location of the Xce, we measured allele-specific expression of X-linked genes in a large population of F1 females generated from Collaborative Cross (CC) strains. Using published sequence data and applying a Bayesian "Pólya urn" model of XCI skew, we report two major findings. First, inter-individual variability in XCI suggests mouse epiblasts contain on average 20-30 cells contributing to brain. Second, CC founder strain NOD/ShiLtJ has a novel and unique functional allele, Xceg, that is the weakest in the Xce allelic series. Despite phylogenetic analysis confirming that NOD/ShiLtJ carries a haplotype almost identical to the well-characterized C57BL/6J (Xceb), we observed unexpected patterns of XCI skewing in females carrying the NOD/ShiLtJ haplotype within the Xce. Copy number variation is common at the Xce locus and we conclude that the observed allelic series is a product of independent and recurring duplications shared between weak Xce alleles.

Subject(s)

Dosage Compensation, Genetic , X Chromosome Inactivation/genetics , X Chromosome/genetics , Alleles , Animals , Bayes Theorem , Chromosome Mapping/methods , DNA Copy Number Variations/genetics , Genes, X-Linked/genetics , Haplotypes , Mice , Mice, 129 Strain , Mice, Inbred C57BL , Mice, Inbred NOD , Phylogeny , RNA, Long Noncoding/genetics

Lung cancer mutation profile of EGFR, ALK, and KRAS: Meta-analysis and comparison of never and ever smokers.

Chapman, Aaron M; Sun, Kathie Y; Ruestow, Peter; Cowan, Dallas M; Madl, Amy K.

Lung Cancer ; 102: 122-134, 2016 12.

Article in English | MEDLINE | ID: mdl-27987580

ABSTRACT

Lung cancer is the leading cause of cancer-related mortality. While the majority of lung cancers are associated with tobacco smoke, approximately 10-15% of U.S. lung cancers occur in never smokers. Evidence suggests that lung cancer in never smokers appears to be a distinct disease caused by driver mutations which are different than the genetic pathways observed with lung cancer in smokers. A meta-analysis of human epidemiologic data was conducted to evaluate the profile of common or therapy-targetable mutations in lung cancers of never and ever smokers. Epidemiologic studies (N=167) representing over 63,000 lung cancer cases were identified and used to calculate summary odds ratios for lung cancer in never and ever smokers containing gene mutations: EGFR, chromosomal rearrangements and fusion of EML4 and ALK, and KRAS. This analysis also considered the effect of histopathology, smoking status, sex, and ethnicity. There were significantly increased odds of presenting the EGFR and ALK-EML4 mutations in 1) adenocarcinomas compared to non-small cell lung cancer and 2) never smokers compared to ever smokers. The prevalence of EGFR mutations was higher in Asian women as compared to women of Caucasian/Mixed ethnicity. As the smoking history increased, there was a decreased odds for exhibiting the EGFR mutation, particularly for cases >30 pack-years. Compared to ever smokers, never smokers had a decreased odds of KRAS mutations among those of Caucasian/Mixed ethnicity (OR=0.22, 95% CI: 0.17-0.29) and those of Asian ethnicity (OR=0.39, 95% CI: 0.30-0.50). Our findings show that key driver mutations and several patient features are highly prevalent in lung cancers of never smokers. These associations may be helpful as patient demographic models are developed to predict successful outcomes of targeted therapeutic interventions NSCLC.

Subject(s)

ErbB Receptors/genetics , Lung Neoplasms/genetics , Proto-Oncogene Proteins p21(ras)/genetics , Receptor Protein-Tyrosine Kinases/genetics , Smoking/genetics , Anaplastic Lymphoma Kinase , Humans , Lung Neoplasms/enzymology , Lung Neoplasms/epidemiology , Mutation , Prevalence , Smoking/epidemiology , Smoking/metabolism

POLG2 disease variants: analyses reveal a dominant negative heterodimer, altered mitochondrial localization and impaired respiratory capacity.

Young, Matthew J; Humble, Margaret M; DeBalsi, Karen L; Sun, Kathie Y; Copeland, William C.

Hum Mol Genet ; 24(18): 5184-97, 2015 Sep 15.

Article in English | MEDLINE | ID: mdl-26123486

ABSTRACT

Human mitochondrial DNA (mtDNA) is replicated and repaired by the mtDNA polymerase gamma, polÎ³. PolÎ³ is composed of three subunits encoded by two nuclear genes: (1) POLG codes for the 140-kilodalton (kDa) catalytic subunit, p140 and (2) POLG2 encodes the â¼110-kDa homodimeric accessory subunit, p55. Specific mutations are associated with POLG- or POLG2-related disorders. During DNA replication the p55 accessory subunit binds to p140 and increases processivity by preventing polÎ³'s dissociation from the template. To date, studies have demonstrated that homodimeric p55 disease variants are deficient in the ability to stimulate p140; however, all patients currently identified with POLG2-related disorders are heterozygotes. In these patients, we expect p55 to occur as 25% wild-type (WT) homodimers, 25% variant homodimers and 50% heterodimers. We report the development of a tandem affinity strategy to isolate p55 heterodimers. The WT/G451E p55 heterodimer impairs polÎ³ function in vitro, demonstrating that the POLG2 c.1352G>A/p.G451E mutation encodes a dominant negative protein. To analyze the subcellular consequence of disease mutations in HEK293 cells, we designed plasmids encoding p55 disease variants tagged with green fluorescent protein (GFP). P205R and L475DfsX2 p55 variants exhibit irregular diffuse mitochondrial fluorescence and unlike WT p55, they fail to form distinct puncta associated with mtDNA nucleoids. Furthermore, homogenous preparations of P205R and L475DfsX2 p55 form aberrant reducible multimers. We predict that abnormal protein folding or aggregation or both contribute to the pathophysiology of these disorders. Examination of mitochondrial bioenergetics in stable cell lines overexpressing GFP-tagged p55 variants revealed impaired mitochondrial reserve capacity.

Subject(s)

DNA-Directed DNA Polymerase/genetics , DNA-Directed DNA Polymerase/metabolism , Carrier Proteins , Cell Line , Cell Respiration , DNA/metabolism , DNA, Mitochondrial/genetics , DNA, Mitochondrial/metabolism , DNA-Directed DNA Polymerase/chemistry , DNA-Directed DNA Polymerase/isolation & purification , Gene Expression , Genes, Dominant , Humans , Mitochondria/metabolism , Protein Binding , Protein Multimerization , Protein Subunits/metabolism , Protein Transport , Recombinant Fusion Proteins

The Ethical, Legal, and Social Implications Program of the National Human Genome Research Institute: reflections on an ongoing experiment.

McEwen, Jean E; Boyer, Joy T; Sun, Kathie Y; Rothenberg, Karen H; Lockhart, Nicole C; Guyer, Mark S.

Annu Rev Genomics Hum Genet ; 15: 481-505, 2014.

Article in English | MEDLINE | ID: mdl-24773317

ABSTRACT

For more than 20 years, the Ethical, Legal, and Social Implications (ELSI) Program of the National Human Genome Research Institute has supported empirical and conceptual research to anticipate and address the ethical, legal, and social implications of genomics. As a component of the agency that funds much of the underlying science, the program has always been an experiment. The ever-expanding number of issues the program addresses and the relatively low level of commitment on the part of other funding agencies to support such research make setting priorities especially challenging. Program-supported studies have had a significant impact on the conduct of genomics research, the implementation of genomic medicine, and broader public policies. The program's influence is likely to grow as ELSI research, genomics research, and policy development activities become increasingly integrated. Achieving the benefits of increased integration while preserving the autonomy, objectivity, and intellectual independence of ELSI investigators presents ongoing challenges and new opportunities.

Subject(s)

Genome, Human/genetics , National Human Genome Research Institute (U.S.)/ethics , National Human Genome Research Institute (U.S.)/legislation & jurisprudence , Public Policy , Genetic Testing , Humans , National Human Genome Research Institute (U.S.)/trends , United States

Evolving approaches to the ethical management of genomic data.

McEwen, Jean E; Boyer, Joy T; Sun, Kathie Y.

Trends Genet ; 29(6): 375-82, 2013 Jun.

Article in English | MEDLINE | ID: mdl-23453621

ABSTRACT

The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science.

Subject(s)

Genomics/ethics , Health Information Management/ethics , Health Information Management/trends , Humans , Informed Consent/ethics , Privacy

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL