Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Nature ; 2024 May 20.
Article in English | MEDLINE | ID: mdl-38768635

ABSTRACT

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

2.
Nature ; 622(7984): 784-793, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821707

ABSTRACT

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.


Subject(s)
Exome Sequencing , Genome, Human , Genotype , Hispanic or Latino , Adult , Humans , Africa/ethnology , Americas/ethnology , Europe/ethnology , Gene Frequency/genetics , Genetics, Population , Genome, Human/genetics , Genotyping Techniques , Hispanic or Latino/genetics , Homozygote , Loss of Function Mutation/genetics , Mexico , Prospective Studies
3.
Cell ; 148(6): 1293-307, 2012 Mar 16.
Article in English | MEDLINE | ID: mdl-22424236

ABSTRACT

Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.


Subject(s)
Genome, Human , Genomics , Precision Medicine , Diabetes Mellitus, Type 2/genetics , Female , Gene Expression Profiling , Humans , Male , Metabolomics , Middle Aged , Mutation , Proteomics , Respiratory Syncytial Viruses/isolation & purification , Rhinovirus/isolation & purification
4.
Nature ; 599(7886): 628-634, 2021 11.
Article in English | MEDLINE | ID: mdl-34662886

ABSTRACT

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.


Subject(s)
Biological Specimen Banks , Databases, Genetic , Exome Sequencing , Exome/genetics , Africa/ethnology , Asia/ethnology , Asthma/genetics , Diabetes Mellitus/genetics , Europe/ethnology , Eye Diseases/genetics , Female , Genetic Predisposition to Disease/genetics , Genetic Variation , Genome-Wide Association Study , Humans , Hypertension/genetics , Liver Diseases/genetics , Male , Mutation , Neoplasms/genetics , Quantitative Trait, Heritable , United Kingdom
5.
Nature ; 586(7831): 749-756, 2020 10.
Article in English | MEDLINE | ID: mdl-33087929

ABSTRACT

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/genetics
7.
Genet Med ; 24(3): 703-711, 2022 03.
Article in English | MEDLINE | ID: mdl-34906480

ABSTRACT

PURPOSE: Recurrent pathogenic copy number variants (pCNVs) have large-effect impacts on brain function and represent important etiologies of neurodevelopmental psychiatric disorders (NPDs), including autism and schizophrenia. Patterns of health care utilization in adults with pCNVs have gone largely unstudied and are likely to differ in significant ways from those of children. METHODS: We compared the prevalence of NPDs and electronic health record-based medical conditions in 928 adults with 26 pCNVs to a demographically-matched cohort of pCNV-negative controls from >135,000 patient-participants in Geisinger's MyCode Community Health Initiative. We also evaluated 3 quantitative health care utilization measures (outpatient, inpatient, and emergency department visits) in both groups. RESULTS: Adults with pCNVs (24.9%) were more likely than controls (16.0%) to have a documented NPD. They had significantly higher rates of several chronic diseases, including diabetes (29.3% in participants with pCNVs vs 20.4% in participants without pCNVs) and dementia (2.2% in participants with pCNVs vs 1.0% participants without pCNVs), and twice as many annual emergency department visits. CONCLUSION: These findings highlight the potential for genetic information-specifically, pCNVs-to inform the study of health care outcomes and utilization in adults. If, as our findings suggest, adults with pCNVs have poorer health and require disproportionate health care resources, early genetic diagnosis paired with patient-centered interventions may help to anticipate problems, improve outcomes, and reduce the associated economic burden.


Subject(s)
DNA Copy Number Variations , Delivery of Health Care , Adult , Child , Cohort Studies , DNA Copy Number Variations/genetics , Humans , Patient Acceptance of Health Care , Prevalence
8.
Am J Hum Genet ; 102(5): 874-889, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29727688

ABSTRACT

Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.


Subject(s)
Exome/genetics , Precision Medicine , Cohort Studies , Computer Simulation , Electronic Health Records , Exons/genetics , Family , Female , Genetics, Population , Geography , Heterozygote , Humans , Male , Mutation/genetics , Pedigree , Phenotype , Reproducibility of Results
9.
Bioorg Med Chem ; 48: 116389, 2021 10 15.
Article in English | MEDLINE | ID: mdl-34543844

ABSTRACT

With the emergence of the third infectious and virulent coronavirus within the past two decades, it has become increasingly important to understand how the virus causes infection. This will inform therapeutic strategies that target vulnerabilities in the vital processes through which the virus enters cells. This review identifies enzymes responsible for SARS-CoV-2 viral entry into cells (ACE2, Furin, TMPRSS2) and discuss compounds proposed to inhibit viral entry with the end goal of treating COVID-19 infection. We argue that TMPRSS2 inhibitors show the most promise in potentially treating COVID-19, in addition to being a pre-existing medication with fewer predicted side-effects.


Subject(s)
Angiotensin Receptor Antagonists/therapeutic use , Angiotensin-Converting Enzyme 2/antagonists & inhibitors , Antiviral Agents/therapeutic use , COVID-19 Drug Treatment , Janus Kinase Inhibitors/therapeutic use , SARS-CoV-2/drug effects , Animals , Drug Combinations , Humans , Methotrexate/therapeutic use , Receptors, Angiotensin/metabolism , Signal Transduction/drug effects
10.
N Engl J Med ; 377(3): 211-221, 2017 07 20.
Article in English | MEDLINE | ID: mdl-28538136

ABSTRACT

BACKGROUND: Loss-of-function variants in the angiopoietin-like 3 gene (ANGPTL3) have been associated with decreased plasma levels of triglycerides, low-density lipoprotein (LDL) cholesterol, and high-density lipoprotein (HDL) cholesterol. It is not known whether such variants or therapeutic antagonism of ANGPTL3 are associated with a reduced risk of atherosclerotic cardiovascular disease. METHODS: We sequenced the exons of ANGPTL3 in 58,335 participants in the DiscovEHR human genetics study. We performed tests of association for loss-of-function variants in ANGPTL3 with lipid levels and with coronary artery disease in 13,102 case patients and 40,430 controls from the DiscovEHR study, with follow-up studies involving 23,317 case patients and 107,166 controls from four population studies. We also tested the effects of a human monoclonal antibody, evinacumab, against Angptl3 in dyslipidemic mice and against ANGPTL3 in healthy human volunteers with elevated levels of triglycerides or LDL cholesterol. RESULTS: In the DiscovEHR study, participants with heterozygous loss-of-function variants in ANGPTL3 had significantly lower serum levels of triglycerides, HDL cholesterol, and LDL cholesterol than participants without these variants. Loss-of-function variants were found in 0.33% of case patients with coronary artery disease and in 0.45% of controls (adjusted odds ratio, 0.59; 95% confidence interval, 0.41 to 0.85; P=0.004). These results were confirmed in the follow-up studies. In dyslipidemic mice, inhibition of Angptl3 with evinacumab resulted in a greater decrease in atherosclerotic lesion area and necrotic content than a control antibody. In humans, evinacumab caused a dose-dependent placebo-adjusted reduction in fasting triglyceride levels of up to 76% and LDL cholesterol levels of up to 23%. CONCLUSIONS: Genetic and therapeutic antagonism of ANGPTL3 in humans and of Angptl3 in mice was associated with decreased levels of all three major lipid fractions and decreased odds of atherosclerotic cardiovascular disease. (Funded by Regeneron Pharmaceuticals and others; ClinicalTrials.gov number, NCT01749878 .).


Subject(s)
Angiopoietins/antagonists & inhibitors , Antibodies, Monoclonal/administration & dosage , Atherosclerosis/drug therapy , Coronary Artery Disease/genetics , Dyslipidemias/drug therapy , Lipids/blood , Mutation , Aged , Angiopoietin-Like Protein 3 , Angiopoietin-like Proteins , Angiopoietins/genetics , Animals , Antibodies, Monoclonal/adverse effects , Antibodies, Monoclonal/pharmacology , Atherosclerosis/metabolism , Cardiovascular Diseases/prevention & control , Coronary Artery Disease/metabolism , Disease Models, Animal , Dose-Response Relationship, Drug , Double-Blind Method , Dyslipidemias/blood , Female , Humans , Lipid Metabolism/drug effects , Male , Mice , Mice, Inbred Strains , Middle Aged
11.
N Engl J Med ; 374(12): 1123-33, 2016 Mar 24.
Article in English | MEDLINE | ID: mdl-26933753

ABSTRACT

BACKGROUND: Higher-than-normal levels of circulating triglycerides are a risk factor for ischemic cardiovascular disease. Activation of lipoprotein lipase, an enzyme that is inhibited by angiopoietin-like 4 (ANGPTL4), has been shown to reduce levels of circulating triglycerides. METHODS: We sequenced the exons of ANGPTL4 in samples obtain from 42,930 participants of predominantly European ancestry in the DiscovEHR human genetics study. We performed tests of association between lipid levels and the missense E40K variant (which has been associated with reduced plasma triglyceride levels) and other inactivating mutations. We then tested for associations between coronary artery disease and the E40K variant and other inactivating mutations in 10,552 participants with coronary artery disease and 29,223 controls. We also tested the effect of a human monoclonal antibody against ANGPTL4 on lipid levels in mice and monkeys. RESULTS: We identified 1661 heterozygotes and 17 homozygotes for the E40K variant and 75 participants who had 13 other monoallelic inactivating mutations in ANGPTL4. The levels of triglycerides were 13% lower and the levels of high-density lipoprotein (HDL) cholesterol were 7% higher among carriers of the E40K variant than among noncarriers. Carriers of the E40K variant were also significantly less likely than noncarriers to have coronary artery disease (odds ratio, 0.81; 95% confidence interval, 0.70 to 0.92; P=0.002). K40 homozygotes had markedly lower levels of triglycerides and higher levels of HDL cholesterol than did heterozygotes. Carriers of other inactivating mutations also had lower triglyceride levels and higher HDL cholesterol levels and were less likely to have coronary artery disease than were noncarriers. Monoclonal antibody inhibition of Angptl4 in mice and monkeys reduced triglyceride levels. CONCLUSIONS: Carriers of E40K and other inactivating mutations in ANGPTL4 had lower levels of triglycerides and a lower risk of coronary artery disease than did noncarriers. The inhibition of Angptl4 in mice and monkeys also resulted in corresponding reductions in these values. (Funded by Regeneron Pharmaceuticals.).


Subject(s)
Angiopoietins/genetics , Coronary Artery Disease/genetics , Gene Silencing , Mutation , Aged , Angiopoietin-Like Protein 4 , Angiopoietins/antagonists & inhibitors , Animals , Cholesterol/blood , Disease Models, Animal , Female , Heterozygote , Humans , Macaca mulatta , Male , Mice , Middle Aged , Risk Factors , Triglycerides/blood
12.
Genes Dev ; 25(1): 1-10, 2011 Jan 01.
Article in English | MEDLINE | ID: mdl-21205862

ABSTRACT

The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.


Subject(s)
Gene Silencing/physiology , Genetic Privacy , Molecular Sequence Annotation , Genetic Privacy/trends , Genetic Variation , Genome, Human/genetics , Humans , Polymorphism, Single Nucleotide
13.
Bioinformatics ; 32(1): 133-5, 2016 Jan 01.
Article in English | MEDLINE | ID: mdl-26382196

ABSTRACT

MOTIVATION: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm--Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)--which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. RESULTS: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. AVAILABILITY AND IMPLEMENTATION: https://github.com/rgcgithub/clamms (implemented in C). CONTACT: jeffrey.reid@regeneron.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , DNA Copy Number Variations/genetics , Exome/genetics , Sequence Analysis, DNA/methods , Humans , Markov Chains , Reproducibility of Results
14.
Nature ; 470(7333): 214-20, 2011 Feb 10.
Article in English | MEDLINE | ID: mdl-21307934

ABSTRACT

Prostate cancer is the second most common cause of male cancer deaths in the United States. However, the full range of prostate cancer genomic alterations is incompletely characterized. Here we present the complete sequence of seven primary human prostate cancers and their paired normal counterparts. Several tumours contained complex chains of balanced (that is, 'copy-neutral') rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumours lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumours contained rearrangements that disrupted CADM2, and four harboured events disrupting either PTEN (unbalanced events), a prostate tumour suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies and engage prostate tumorigenic mechanisms.


Subject(s)
Genome, Human/genetics , Prostatic Neoplasms/genetics , Adaptor Proteins, Signal Transducing , Carrier Proteins/genetics , Case-Control Studies , Cell Adhesion Molecules/genetics , Chromatin/genetics , Chromatin/metabolism , Chromosome Aberrations , Chromosome Breakpoints , Epigenesis, Genetic/genetics , Gene Expression Regulation, Neoplastic , Guanylate Kinases , Humans , Male , PTEN Phosphohydrolase/genetics , PTEN Phosphohydrolase/metabolism , Recombination, Genetic/genetics , Signal Transduction/genetics , Transcription, Genetic
15.
Genome Res ; 23(12): 2042-52, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24026178

ABSTRACT

In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.


Subject(s)
Cell Division/genetics , Gene Duplication , Retroelements/genetics , Computational Biology/methods , Evolution, Molecular , Genome, Human , Genotype , Humans , Phylogeny , Pseudogenes , Reproducibility of Results , Sequence Analysis, DNA
16.
Proc Natl Acad Sci U S A ; 109(31): 12656-61, 2012 Jul 31.
Article in English | MEDLINE | ID: mdl-22797897

ABSTRACT

Gene expression differences are shaped by selective pressures and contribute to phenotypic differences between species. We identified 964 copy number differences (CNDs) of conserved sequences across three primate species and examined their potential effects on gene expression profiles. Samples with copy number different genes had significantly different expression than samples with neutral copy number. Genes encoding regulatory molecules differed in copy number and were associated with significant expression differences. Additionally, we identified 127 CNDs that were processed pseudogenes and some of which were expressed. Furthermore, there were copy number-different regulatory regions such as ultraconserved elements and long intergenic noncoding RNAs with the potential to affect expression. We postulate that CNDs of these conserved sequences fine-tune developmental pathways by altering the levels of RNA.


Subject(s)
DNA, Intergenic/physiology , Gene Dosage/physiology , Gene Expression Regulation/physiology , Pseudogenes/physiology , RNA, Untranslated/physiology , Regulatory Elements, Transcriptional/physiology , Animals , Cell Line , Humans , Macaca mulatta , Pan troglodytes , Species Specificity
17.
Genome Res ; 21(1): 56-67, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21036922

ABSTRACT

Half of prostate cancers harbor gene fusions between TMPRSS2 and members of the ETS transcription factor family. To date, little is known about the presence of non-ETS fusion events in prostate cancer. We used next-generation transcriptome sequencing (RNA-seq) in order to explore the whole transcriptome of 25 human prostate cancer samples for the presence of chimeric fusion transcripts. We generated more than 1 billion sequence reads and used a novel computational approach (FusionSeq) in order to identify novel gene fusion candidates with high confidence. In total, we discovered and characterized seven new cancer-specific gene fusions, two involving the ETS genes ETV1 and ERG, and four involving non-ETS genes such as CDKN1A (p21), CD9, and IKBKB (IKK-beta), genes known to exhibit key biological roles in cellular homeostasis or assumed to be critical in tumorigenesis of other tumor entities, as well as the oncogene PIGU and the tumor suppressor gene RSRC2. The novel gene fusions are found to be of low frequency, but, interestingly, the non-ETS fusions were all present in prostate cancer harboring the TMPRSS2-ERG gene fusion. Future work will focus on determining if the ETS rearrangements in prostate cancer are associated or directly predispose to a rearrangement-prone phenotype.


Subject(s)
Gene Fusion , Prostatic Neoplasms/genetics , Proto-Oncogene Proteins c-ets/genetics , Sequence Analysis, RNA/methods , Antigens, CD/genetics , Computational Biology/methods , Cyclin-Dependent Kinase Inhibitor p21/genetics , Gene Expression Profiling , Humans , I-kappa B Kinase/genetics , In Situ Hybridization, Fluorescence , Male , Membrane Glycoproteins/genetics , Molecular Sequence Data , Prostatic Neoplasms/pathology , Reverse Transcriptase Polymerase Chain Reaction , Serine Endopeptidases/genetics , Serine Endopeptidases/metabolism , Tetraspanin 29 , Trans-Activators/metabolism , Transcriptional Regulator ERG
18.
Bioinformatics ; 28(17): 2267-9, 2012 Sep 01.
Article in English | MEDLINE | ID: mdl-22743228

ABSTRACT

UNLABELLED: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. AVAILABILITY AND IMPLEMENTATION: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.


Subject(s)
Genome, Human , Genomics/methods , Information Storage and Retrieval/methods , Molecular Sequence Annotation/methods , Software , Genetic Variation , Genotype , Humans , Internet
19.
Proc Natl Acad Sci U S A ; 107(11): 5254-9, 2010 Mar 16.
Article in English | MEDLINE | ID: mdl-20194744

ABSTRACT

To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.


Subject(s)
Cell Differentiation/genetics , Embryonic Stem Cells/cytology , Gene Expression Profiling , Neurons/cytology , Neurons/metabolism , Sequence Analysis, DNA/methods , Alternative Splicing/genetics , Base Sequence , Cells, Cultured , Embryonic Stem Cells/metabolism , Gene Expression Regulation, Developmental , Humans , RNA/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcription, Genetic
20.
bioRxiv ; 2023 Nov 02.
Article in English | MEDLINE | ID: mdl-37214792

ABSTRACT

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

SELECTION OF CITATIONS
SEARCH DETAIL