RESUMO
Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Assuntos
Exoma , Variação Genética , Proteínas , Humanos , Alelos , Exoma/genética , Sequenciamento do Exoma , Frequência do Gene , Variação Genética/genética , Heterozigoto , Mutação com Perda de Função/genética , Mutação de Sentido Incorreto/genética , Fases de Leitura Aberta/genética , Proteínas/genética , Sítios de Splice de RNA/genética , Medicina de PrecisãoRESUMO
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Assuntos
Sequenciamento do Exoma , Genoma Humano , Genótipo , Hispânico ou Latino , Adulto , Humanos , África/etnologia , América/etnologia , Europa (Continente)/etnologia , Frequência do Gene/genética , Genética Populacional , Genoma Humano/genética , Técnicas de Genotipagem , Hispânico ou Latino/genética , Homozigoto , Mutação com Perda de Função/genética , México , Estudos ProspectivosRESUMO
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
Assuntos
Genoma Humano , Genômica , Medicina de Precisão , Diabetes Mellitus Tipo 2/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Metabolômica , Pessoa de Meia-Idade , Mutação , Proteômica , Vírus Sinciciais Respiratórios/isolamento & purificação , Rhinovirus/isolamento & purificaçãoRESUMO
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Assuntos
Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Sequenciamento do Exoma , Exoma/genética , África/etnologia , Ásia/etnologia , Asma/genética , Diabetes Mellitus/genética , Europa (Continente)/etnologia , Oftalmopatias/genética , Feminino , Predisposição Genética para Doença/genética , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Hipertensão/genética , Hepatopatias/genética , Masculino , Mutação , Neoplasias/genética , Característica Quantitativa Herdável , Reino UnidoRESUMO
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Assuntos
Bases de Dados Genéticas , Sequenciamento do Exoma , Exoma/genética , Mutação com Perda de Função/genética , Fenótipo , Idoso , Densidade Óssea/genética , Colágeno Tipo VI/genética , Demografia , Feminino , Genes BRCA1 , Genes BRCA2 , Genótipo , Humanos , Canais Iônicos/genética , Masculino , Pessoa de Meia-Idade , Neoplasias/genética , Penetrância , Fragmentos de Peptídeos/genética , Reino Unido , Varizes/genética , Proteínas Ativadoras de ras GTPase/genéticaRESUMO
PURPOSE: Recurrent pathogenic copy number variants (pCNVs) have large-effect impacts on brain function and represent important etiologies of neurodevelopmental psychiatric disorders (NPDs), including autism and schizophrenia. Patterns of health care utilization in adults with pCNVs have gone largely unstudied and are likely to differ in significant ways from those of children. METHODS: We compared the prevalence of NPDs and electronic health record-based medical conditions in 928 adults with 26 pCNVs to a demographically-matched cohort of pCNV-negative controls from >135,000 patient-participants in Geisinger's MyCode Community Health Initiative. We also evaluated 3 quantitative health care utilization measures (outpatient, inpatient, and emergency department visits) in both groups. RESULTS: Adults with pCNVs (24.9%) were more likely than controls (16.0%) to have a documented NPD. They had significantly higher rates of several chronic diseases, including diabetes (29.3% in participants with pCNVs vs 20.4% in participants without pCNVs) and dementia (2.2% in participants with pCNVs vs 1.0% participants without pCNVs), and twice as many annual emergency department visits. CONCLUSION: These findings highlight the potential for genetic information-specifically, pCNVs-to inform the study of health care outcomes and utilization in adults. If, as our findings suggest, adults with pCNVs have poorer health and require disproportionate health care resources, early genetic diagnosis paired with patient-centered interventions may help to anticipate problems, improve outcomes, and reduce the associated economic burden.
Assuntos
Variações do Número de Cópias de DNA , Atenção à Saúde , Adulto , Criança , Estudos de Coortes , Variações do Número de Cópias de DNA/genética , Humanos , Aceitação pelo Paciente de Cuidados de Saúde , PrevalênciaRESUMO
Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified â¼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in â¼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.
Assuntos
Exoma/genética , Medicina de Precisão , Estudos de Coortes , Simulação por Computador , Registros Eletrônicos de Saúde , Éxons/genética , Família , Feminino , Genética Populacional , Geografia , Heterozigoto , Humanos , Masculino , Mutação/genética , Linhagem , Fenótipo , Reprodutibilidade dos TestesRESUMO
With the emergence of the third infectious and virulent coronavirus within the past two decades, it has become increasingly important to understand how the virus causes infection. This will inform therapeutic strategies that target vulnerabilities in the vital processes through which the virus enters cells. This review identifies enzymes responsible for SARS-CoV-2 viral entry into cells (ACE2, Furin, TMPRSS2) and discuss compounds proposed to inhibit viral entry with the end goal of treating COVID-19 infection. We argue that TMPRSS2 inhibitors show the most promise in potentially treating COVID-19, in addition to being a pre-existing medication with fewer predicted side-effects.
Assuntos
Antagonistas de Receptores de Angiotensina/uso terapêutico , Enzima de Conversão de Angiotensina 2/antagonistas & inibidores , Antivirais/uso terapêutico , Tratamento Farmacológico da COVID-19 , Inibidores de Janus Quinases/uso terapêutico , SARS-CoV-2/efeitos dos fármacos , Animais , Combinação de Medicamentos , Humanos , Metotrexato/uso terapêutico , Receptores de Angiotensina/metabolismo , Transdução de Sinais/efeitos dos fármacosRESUMO
BACKGROUND: Loss-of-function variants in the angiopoietin-like 3 gene (ANGPTL3) have been associated with decreased plasma levels of triglycerides, low-density lipoprotein (LDL) cholesterol, and high-density lipoprotein (HDL) cholesterol. It is not known whether such variants or therapeutic antagonism of ANGPTL3 are associated with a reduced risk of atherosclerotic cardiovascular disease. METHODS: We sequenced the exons of ANGPTL3 in 58,335 participants in the DiscovEHR human genetics study. We performed tests of association for loss-of-function variants in ANGPTL3 with lipid levels and with coronary artery disease in 13,102 case patients and 40,430 controls from the DiscovEHR study, with follow-up studies involving 23,317 case patients and 107,166 controls from four population studies. We also tested the effects of a human monoclonal antibody, evinacumab, against Angptl3 in dyslipidemic mice and against ANGPTL3 in healthy human volunteers with elevated levels of triglycerides or LDL cholesterol. RESULTS: In the DiscovEHR study, participants with heterozygous loss-of-function variants in ANGPTL3 had significantly lower serum levels of triglycerides, HDL cholesterol, and LDL cholesterol than participants without these variants. Loss-of-function variants were found in 0.33% of case patients with coronary artery disease and in 0.45% of controls (adjusted odds ratio, 0.59; 95% confidence interval, 0.41 to 0.85; P=0.004). These results were confirmed in the follow-up studies. In dyslipidemic mice, inhibition of Angptl3 with evinacumab resulted in a greater decrease in atherosclerotic lesion area and necrotic content than a control antibody. In humans, evinacumab caused a dose-dependent placebo-adjusted reduction in fasting triglyceride levels of up to 76% and LDL cholesterol levels of up to 23%. CONCLUSIONS: Genetic and therapeutic antagonism of ANGPTL3 in humans and of Angptl3 in mice was associated with decreased levels of all three major lipid fractions and decreased odds of atherosclerotic cardiovascular disease. (Funded by Regeneron Pharmaceuticals and others; ClinicalTrials.gov number, NCT01749878 .).
Assuntos
Angiopoietinas/antagonistas & inibidores , Anticorpos Monoclonais/administração & dosagem , Aterosclerose/tratamento farmacológico , Doença da Artéria Coronariana/genética , Dislipidemias/tratamento farmacológico , Lipídeos/sangue , Mutação , Idoso , Proteína 3 Semelhante a Angiopoietina , Proteínas Semelhantes a Angiopoietina , Angiopoietinas/genética , Animais , Anticorpos Monoclonais/efeitos adversos , Anticorpos Monoclonais/farmacologia , Aterosclerose/metabolismo , Doenças Cardiovasculares/prevenção & controle , Doença da Artéria Coronariana/metabolismo , Modelos Animais de Doenças , Relação Dose-Resposta a Droga , Método Duplo-Cego , Dislipidemias/sangue , Feminino , Humanos , Metabolismo dos Lipídeos/efeitos dos fármacos , Masculino , Camundongos , Camundongos Endogâmicos , Pessoa de Meia-IdadeRESUMO
BACKGROUND: Higher-than-normal levels of circulating triglycerides are a risk factor for ischemic cardiovascular disease. Activation of lipoprotein lipase, an enzyme that is inhibited by angiopoietin-like 4 (ANGPTL4), has been shown to reduce levels of circulating triglycerides. METHODS: We sequenced the exons of ANGPTL4 in samples obtain from 42,930 participants of predominantly European ancestry in the DiscovEHR human genetics study. We performed tests of association between lipid levels and the missense E40K variant (which has been associated with reduced plasma triglyceride levels) and other inactivating mutations. We then tested for associations between coronary artery disease and the E40K variant and other inactivating mutations in 10,552 participants with coronary artery disease and 29,223 controls. We also tested the effect of a human monoclonal antibody against ANGPTL4 on lipid levels in mice and monkeys. RESULTS: We identified 1661 heterozygotes and 17 homozygotes for the E40K variant and 75 participants who had 13 other monoallelic inactivating mutations in ANGPTL4. The levels of triglycerides were 13% lower and the levels of high-density lipoprotein (HDL) cholesterol were 7% higher among carriers of the E40K variant than among noncarriers. Carriers of the E40K variant were also significantly less likely than noncarriers to have coronary artery disease (odds ratio, 0.81; 95% confidence interval, 0.70 to 0.92; P=0.002). K40 homozygotes had markedly lower levels of triglycerides and higher levels of HDL cholesterol than did heterozygotes. Carriers of other inactivating mutations also had lower triglyceride levels and higher HDL cholesterol levels and were less likely to have coronary artery disease than were noncarriers. Monoclonal antibody inhibition of Angptl4 in mice and monkeys reduced triglyceride levels. CONCLUSIONS: Carriers of E40K and other inactivating mutations in ANGPTL4 had lower levels of triglycerides and a lower risk of coronary artery disease than did noncarriers. The inhibition of Angptl4 in mice and monkeys also resulted in corresponding reductions in these values. (Funded by Regeneron Pharmaceuticals.).
Assuntos
Angiopoietinas/genética , Doença da Artéria Coronariana/genética , Inativação Gênica , Mutação , Idoso , Proteína 4 Semelhante a Angiopoietina , Angiopoietinas/antagonistas & inibidores , Animais , Colesterol/sangue , Modelos Animais de Doenças , Feminino , Heterozigoto , Humanos , Macaca mulatta , Masculino , Camundongos , Pessoa de Meia-Idade , Fatores de Risco , Triglicerídeos/sangueRESUMO
The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.
Assuntos
Inativação Gênica/fisiologia , Privacidade Genética , Anotação de Sequência Molecular , Privacidade Genética/tendências , Variação Genética , Genoma Humano/genética , Humanos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
MOTIVATION: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm--Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)--which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. RESULTS: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. AVAILABILITY AND IMPLEMENTATION: https://github.com/rgcgithub/clamms (implemented in C). CONTACT: jeffrey.reid@regeneron.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Variações do Número de Cópias de DNA/genética , Exoma/genética , Análise de Sequência de DNA/métodos , Humanos , Cadeias de Markov , Reprodutibilidade dos TestesRESUMO
Prostate cancer is the second most common cause of male cancer deaths in the United States. However, the full range of prostate cancer genomic alterations is incompletely characterized. Here we present the complete sequence of seven primary human prostate cancers and their paired normal counterparts. Several tumours contained complex chains of balanced (that is, 'copy-neutral') rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumours lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumours contained rearrangements that disrupted CADM2, and four harboured events disrupting either PTEN (unbalanced events), a prostate tumour suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies and engage prostate tumorigenic mechanisms.
Assuntos
Genoma Humano/genética , Neoplasias da Próstata/genética , Proteínas Adaptadoras de Transdução de Sinal , Proteínas de Transporte/genética , Estudos de Casos e Controles , Moléculas de Adesão Celular/genética , Cromatina/genética , Cromatina/metabolismo , Aberrações Cromossômicas , Pontos de Quebra do Cromossomo , Epigênese Genética/genética , Regulação Neoplásica da Expressão Gênica , Guanilato Quinases , Humanos , Masculino , PTEN Fosfo-Hidrolase/genética , PTEN Fosfo-Hidrolase/metabolismo , Recombinação Genética/genética , Transdução de Sinais/genética , Transcrição GênicaRESUMO
In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.
Assuntos
Divisão Celular/genética , Duplicação Gênica , Retroelementos/genética , Biologia Computacional/métodos , Evolução Molecular , Genoma Humano , Genótipo , Humanos , Filogenia , Pseudogenes , Reprodutibilidade dos Testes , Análise de Sequência de DNARESUMO
Gene expression differences are shaped by selective pressures and contribute to phenotypic differences between species. We identified 964 copy number differences (CNDs) of conserved sequences across three primate species and examined their potential effects on gene expression profiles. Samples with copy number different genes had significantly different expression than samples with neutral copy number. Genes encoding regulatory molecules differed in copy number and were associated with significant expression differences. Additionally, we identified 127 CNDs that were processed pseudogenes and some of which were expressed. Furthermore, there were copy number-different regulatory regions such as ultraconserved elements and long intergenic noncoding RNAs with the potential to affect expression. We postulate that CNDs of these conserved sequences fine-tune developmental pathways by altering the levels of RNA.
Assuntos
DNA Intergênico/fisiologia , Dosagem de Genes/fisiologia , Regulação da Expressão Gênica/fisiologia , Pseudogenes/fisiologia , RNA não Traduzido/fisiologia , Elementos Reguladores de Transcrição/fisiologia , Animais , Linhagem Celular , Humanos , Macaca mulatta , Pan troglodytes , Especificidade da EspécieRESUMO
Half of prostate cancers harbor gene fusions between TMPRSS2 and members of the ETS transcription factor family. To date, little is known about the presence of non-ETS fusion events in prostate cancer. We used next-generation transcriptome sequencing (RNA-seq) in order to explore the whole transcriptome of 25 human prostate cancer samples for the presence of chimeric fusion transcripts. We generated more than 1 billion sequence reads and used a novel computational approach (FusionSeq) in order to identify novel gene fusion candidates with high confidence. In total, we discovered and characterized seven new cancer-specific gene fusions, two involving the ETS genes ETV1 and ERG, and four involving non-ETS genes such as CDKN1A (p21), CD9, and IKBKB (IKK-beta), genes known to exhibit key biological roles in cellular homeostasis or assumed to be critical in tumorigenesis of other tumor entities, as well as the oncogene PIGU and the tumor suppressor gene RSRC2. The novel gene fusions are found to be of low frequency, but, interestingly, the non-ETS fusions were all present in prostate cancer harboring the TMPRSS2-ERG gene fusion. Future work will focus on determining if the ETS rearrangements in prostate cancer are associated or directly predispose to a rearrangement-prone phenotype.
Assuntos
Fusão Gênica , Neoplasias da Próstata/genética , Proteínas Proto-Oncogênicas c-ets/genética , Análise de Sequência de RNA/métodos , Antígenos CD/genética , Biologia Computacional/métodos , Inibidor de Quinase Dependente de Ciclina p21/genética , Perfilação da Expressão Gênica , Humanos , Quinase I-kappa B/genética , Hibridização in Situ Fluorescente , Masculino , Glicoproteínas de Membrana/genética , Dados de Sequência Molecular , Neoplasias da Próstata/patologia , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Serina Endopeptidases/genética , Serina Endopeptidases/metabolismo , Tetraspanina 29 , Transativadores/metabolismo , Regulador Transcricional ERGRESUMO
Coronavirus disease 2019 (COVID-19) and influenza are respiratory illnesses caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses, respectively. Both diseases share symptoms and clinical risk factors1, but the extent to which these conditions have a common genetic etiology is unknown. This is partly because host genetic risk factors are well characterized for COVID-19 but not for influenza, with the largest published genome-wide association studies for these conditions including >2 million individuals2 and about 1,000 individuals3-6, respectively. Shared genetic risk factors could point to targets to prevent or treat both infections. Through a genetic study of 18,334 cases with a positive test for influenza and 276,295 controls, we show that published COVID-19 risk variants are not associated with influenza. Furthermore, we discovered and replicated an association between influenza infection and noncoding variants in B3GALT5 and ST6GAL1, neither of which was associated with COVID-19. In vitro small interfering RNA knockdown of ST6GAL1-an enzyme that adds sialic acid to the cell surface, which is used for viral entry-reduced influenza infectivity by 57%. These results mirror the observation that variants that downregulate ACE2, the SARS-CoV-2 receptor, protect against COVID-19 (ref. 7). Collectively, these findings highlight downregulation of key cell surface receptors used for viral entry as treatment opportunities to prevent COVID-19 and influenza.
Assuntos
COVID-19 , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Influenza Humana , SARS-CoV-2 , Humanos , Influenza Humana/genética , Influenza Humana/epidemiologia , Influenza Humana/virologia , COVID-19/genética , COVID-19/virologia , Fatores de Risco , SARS-CoV-2/genética , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Pessoa de Meia-IdadeRESUMO
UNLABELLED: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. AVAILABILITY AND IMPLEMENTATION: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
Assuntos
Genoma Humano , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Anotação de Sequência Molecular/métodos , Software , Variação Genética , Genótipo , Humanos , InternetRESUMO
To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.