RESUMO
Stage T1 bladder cancers have the highest progression and recurrence rates of all non-muscle-invasive bladder cancers (NMIBCs). Most T1 cancers are treated with bacillus Calmette-Guérin (BCG), but many will progress or recur, and some T1 patients will die from bladder cancer. Particularly aggressive tumors could be treated with early cystectomy. To better understand the molecular heterogeneity of T1 cancers, we performed transcriptome profiling and unsupervised clustering, and identified five consensus subtypes of T1 tumors treated with repeat transurethral resection (reTUR) and induction and maintenance BCG. The T1-LumGU subtype was associated with carcinoma in situ (CIS; six/13, 46% of all CIS), had high E2F1 and EZH2 expression, and was enriched in E2F target and G2M checkpoint hallmarks. The T1-Inflam subtype was inflamed and infiltrated with immune cells. While most T1 tumors were classified as luminal papillary, the T1-TLum subtype had the highest median luminal papillary score and FGFR3 expression, no recurrence events, and the fewest copy number gains. T1-Myc and T1-Early subtypes had the most recurrences (14/30 within 24 mo), the highest median MYC expression, and, when combined, had significantly worse recurrence-free survival than the other three subtypes. T1-Early had five (38%) recurrences within the first 6 mo of BCG, and repressed IFN-α and IFN-γ hallmarks and inflammation. We developed a single-patient T1 classifier and validated our subtype biology in a second cohort of T1 tumors. Future research will be necessary to validate the proposed T1 subtypes and to determine if therapies can be individualized for each subtype. PATIENT SUMMARY: We identified and characterized expression subtypes of high-grade stage T1 bladder cancer that are biologically heterogeneous and have variable responses to bacillus Calmette-Guérin treatment. We validated the subtypes and describe a single-patient classifier.
Assuntos
Neoplasias da Bexiga Urinária/classificação , Neoplasias da Bexiga Urinária/patologia , Terapia Combinada , Humanos , Estadiamento de Neoplasias , Transcriptoma , Neoplasias da Bexiga Urinária/genética , Neoplasias da Bexiga Urinária/terapiaRESUMO
Plasma lipid levels are risk factors for cardiovascular disease, a leading cause of death worldwide. While many studies have been conducted in genetic variation underlying lipid levels, they mainly comprise individuals of European ancestry and thus their transferability to non-European populations is unclear. We performed genome-wide (GWAS) and imputed transcriptome-wide association studies of four lipid traits in the Hispanic Community Health Study/Study of Latinos cohort (HCHS/SoL, n = 11,103), replicated top hits in the Multi-Ethnic Study of Atherosclerosis (MESA, n = 3,855), and compared the results to the larger, predominantly European ancestry meta-analysis by the Global Lipids Genetics Consortium (GLGC, n = 196,475). In our GWAS, we found significant SNP associations in regions within or near known lipid genes, but in our admixture mapping analysis, we did not find significant associations between local ancestry and lipid phenotypes. In the imputed transcriptome-wide association study in multiple tissues and in different ethnicities, we found 59 significant gene-tissue-phenotype associations (P < 3.61×10-8) with 14 unique significant genes, many of which occurred across multiple phenotypes, tissues, and ethnicities and replicated in MESA (45/59) and in GLGC (44/59). These include well-studied lipid genes such as SORT1, CETP, and PSRC1, as well as genes that have been implicated in cardiovascular phenotypes, such as CCL22 and ICAM1. The majority (40/59) of significant associations colocalized with expression quantitative trait loci (eQTLs), indicating a possible mechanism of gene regulation in lipid level variation. To fully characterize the genetic architecture of lipid traits in diverse populations, larger studies in non-European ancestry populations are needed.
Assuntos
Regulação da Expressão Gênica , Hispânico ou Latino/genética , Lipídeos/sangue , Etnicidade , Feminino , Estudo de Associação Genômica Ampla , Humanos , Lipídeos/genética , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , População Branca/genéticaRESUMO
Nearly 100 loci have been identified for pulmonary function, almost exclusively in studies of European ancestry populations. We extend previous research by meta-analyzing genome-wide association studies of 1000 Genomes imputed variants in relation to pulmonary function in a multiethnic population of 90,715 individuals of European (N = 60,552), African (N = 8429), Asian (N = 9959), and Hispanic/Latino (N = 11,775) ethnicities. We identify over 50 additional loci at genome-wide significance in ancestry-specific or multiethnic meta-analyses. Using recent fine-mapping methods incorporating functional annotation, gene expression, and differences in linkage disequilibrium between ethnicities, we further shed light on potential causal variants and genes at known and newly identified loci. Several of the novel genes encode proteins with predicted or established drug targets, including KCNK2 and CDK12. Our study highlights the utility of multiethnic and integrative genomics approaches to extend existing knowledge of the genetics of lung function and clinical relevance of implicated loci.
Assuntos
Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Pneumopatias/etnologia , Pneumopatias/genética , Pulmão/fisiologia , Polimorfismo de Nucleotídeo Único , Asiático , População Negra/genética , Feminino , Volume Expiratório Forçado , Predisposição Genética para Doença , Genômica , Hispânico ou Latino , Humanos , Masculino , Doença Pulmonar Obstrutiva Crônica , Locos de Características Quantitativas , Análise de Regressão , Tamanho da Amostra , Fumar , Capacidade Vital , População Branca/genéticaRESUMO
For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n = 233), Hispanic (HIS, n = 352), and European (CAU, n = 578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene in each population, TACSTD2 in AFA and CHURC1 in CAU and HIS, had similar prediction performance across populations with R2 > 0.8 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop.
Assuntos
Etnicidade/genética , Regulação da Expressão Gênica , Genética Populacional , Negro ou Afro-Americano/genética , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/metabolismo , Moléculas de Adesão Celular/genética , Moléculas de Adesão Celular/metabolismo , Mapeamento Cromossômico , Frequência do Gene , Estudo de Associação Genômica Ampla , Genômica , Técnicas de Genotipagem , Hispânico ou Latino/genética , Humanos , Modelos Genéticos , Herança Multifatorial , Fenótipo , Locos de Características Quantitativas , Transcriptoma , População Branca/genéticaRESUMO
Plasma lipid levels are risk factors for cardiovascular disease, a leading cause of death worldwide. While many studies have been conducted on lipid genetics, they mainly focus on Europeans and thus their transferability to diverse populations is unclear. We performed SNP- and gene-level genome-wide association studies (GWAS) of four lipid traits in cohorts from Nigeria and the Philippines and compared them to the results of larger, predominantly European meta-analyses. Two previously implicated loci met genome-wide significance in our SNP-level GWAS in the Nigerian cohort, rs34065661 in CETP associated with HDL cholesterol (P = 9.0 × 10-10) and rs1065853 upstream of APOE associated with LDL cholesterol (P = 6.6 × 10-9). The top SNP in the Filipino cohort associated with triglyceride levels (rs662799; P = 2.7 × 10-16) and has been previously implicated in other East Asian studies. While this SNP is located directly upstream of well known APOA5, we show it may also be involved in the regulation of BACE1 and SIDT2. Our gene-based association analysis, PrediXcan, revealed decreased expression of BACE1 and decreased expression of SIDT2 in several tissues, all driven by rs662799, significantly associate with increased triglyceride levels in Filipinos (FDR <0.1). In addition, our PrediXcan analysis implicated gene regulation as the mechanism underlying the associations of many other previously discovered lipid loci. Our novel BACE1 and SIDT2 findings were confirmed using summary statistics from the Global Lipids Genetic Consortium (GLGC) meta-GWAS.
RESUMO
DNA-protein loops can be essential for gene regulation. The Escherichia coli lactose (lac) operon is controlled by DNA-protein loops that have been studied for decades. Here we adapt this model to test the hypothesis that negative superhelical strain facilitates the formation of short-range (6-8 DNA turns) repression loops in E. coli. The natural negative superhelicity of E. coli DNA is regulated by the interplay of gyrase and topoisomerase enzymes, adding or removing negative supercoils, respectively. Here, we measured quantitatively DNA looping in three different E. coli strains characterized by different levels of global supercoiling: wild type, gyrase mutant (gyrB226), and topoisomerase mutant (ΔtopA10). DNA looping in each strain was measured by assaying repression of the endogenous lac operon, and repression of ten reporter constructs with DNA loop sizes between 70-85 base pairs. Our data are most simply interpreted as supporting the hypothesis that negative supercoiling facilitates gene repression by small DNA-protein loops in living bacteria.
Assuntos
DNA Bacteriano/metabolismo , DNA Super-Helicoidal/metabolismo , Escherichia coli/genética , DNA Girase/genética , DNA Girase/metabolismo , DNA Topoisomerases Tipo I/genética , DNA Topoisomerases Tipo I/metabolismo , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Super-Helicoidal/química , DNA Super-Helicoidal/genética , Eletroforese em Gel de Ágar , Genes Reporter , Óperon Lac/genética , Mutação , Conformação de Ácido NucleicoRESUMO
This work probes the mystery of what balance of forces creates the extraordinary mechanical stiffness of DNA to bending and twisting. Here we explore the relationship between base stacking, functional group occupancy of the DNA minor and major grooves, and DNA mechanical properties. We study double-helical DNA molecules substituting either inosine for guanosine or 2,6-diaminopurine for adenine. These DNA variants, respectively, remove or add an amino group from the DNA minor groove, with corresponding changes in hydrogen-bonding and base stacking energy. Using the techniques of ligase-catalyzed cyclization kinetics, atomic force microscopy, and force spectroscopy with optical tweezers, we show that these DNA variants have bending persistence lengths within the range of values reported for sequence-dependent variation of the natural DNA bases. Comparison with seven additional DNA variants that modify the DNA major groove reveals that DNA bending stiffness is not correlated with base stacking energy or groove occupancy. Data from circular dichroism spectroscopy indicate that base analog substitution can alter DNA helical geometry, suggesting a complex relationship among base stacking, groove occupancy, helical structure, and DNA bend stiffness.
Assuntos
2-Aminopurina/análogos & derivados , DNA/química , Conformação de Ácido Nucleico , Nucleosídeos/química , Eletricidade Estática , 2-Aminopurina/química , Pareamento de Bases , Ligação de Hidrogênio , Estresse MecânicoRESUMO
BACKGROUND: Retrotransposons are mobile DNA elements that spread through genomes via the action of element-encoded reverse transcriptases. They are ubiquitous constituents of most eukaryotic genomes, especially those of higher plants. The pericentromeric regions of soybean (Glycine max) chromosomes contain >3,200 intact copies of the Gmr9/GmOgre retrotransposon. Between the 3' end of the coding region and the long terminal repeat, this retrotransposon family contains a polymorphic minisatellite region composed of five distinct, interleaved minisatellite families. To better understand the possible role and origin of retrotransposon-associated minisatellites, a computational project to map and physically characterize all members of these families in the G. max genome, irrespective of their association with Gmr9, was undertaken. METHODS: A computational pipeline was developed to map and analyze the organization and distribution of five Gmr9-associated minisatellites throughout the soybean genome. Polymerase chain reaction amplifications were used to experimentally assess the computational outputs. RESULTS: A total of 63,841 copies of Gmr9-associated minisatellites were recovered from the assembled G. max genome. Ninety percent were associated with Gmr9, an additional 9% with other annotated retrotransposons, and 1% with uncharacterized repetitive DNAs. Monomers were tandemly interleaved and repeated up to 149 times per locus. CONCLUSIONS: The computational pipeline enabled a fast, accurate, and detailed characterization of known minisatellites in a large, downloaded DNA database, and PCR amplification supported the general organization of these arrays.
Assuntos
Genoma de Planta , Glycine max/genética , Repetições Minissatélites , Retroelementos , Mapeamento Cromossômico , DNA de Plantas/químicaRESUMO
Retrotransposons constitute the majority of pseudogenic protein coding regions of most eukaryotic genomes. Most genomes carry tens to thousands of retrotransposon copies derived from dozens of distinct families, but most if not all of these copies are non-functional and contain disabling mutations, including large numbers of indels. Until recently, most regions rich in these elements were virtually ignored in all but the most complete genome sequencing projects, and the full extent of their impact on the structure and function of the genomes of higher eukaryotes was under-appreciated. Even when new retrotransposons are encountered and annotated by automated gene finding programs and similarity searches, coding regions are treated as exons and invariably and not surprisingly mistranslated because of numerous frameshift mutations and large indels. Very few functional retrotransposons contain introns, as in silico annotations imply. While many repetitive DNA consensus sequences have been assembled from collections of largely full-length copies using full-length templates, we have shown that repetitive DNA consensus sequence contigs representing long, moderately high copy-number elements can also be generated ex novo in the absence of templates from very short overlapping sequences. We have devised an in silico strategy to recover and reconstruct consensus sequences of elements up to 20,000 bp by building dense contigs of hundreds of overlapping 400 to 900-bp records found in the Genbank Genome Survey Sequence database. The results are hypothetical ancestral sequences that encode elements that appear to be fully functional with intact open reading frames and other conserved features.