RESUMO
Short-read sequencing is the workhorse of cancer genomics yet is thought to miss many structural variants (SVs), particularly large chromosomal alterations. To characterize missing SVs in short-read whole genomes, we analyzed 'loose ends'-local violations of mass balance between adjacent DNA segments. In the landscape of loose ends across 1,330 high-purity cancer whole genomes, most large (>10-kb) clonal SVs were fully resolved by short reads in the 87% of the human genome where copy number could be reliably measured. Some loose ends represent neotelomeres, which we propose as a hallmark of the alternative lengthening of telomeres phenotype. These pan-cancer findings were confirmed by long-molecule profiles of 38 breast cancer and melanoma cases. Our results indicate that aberrant homologous recombination is unlikely to drive the majority of large cancer SVs. Furthermore, analysis of mass balance in short-read whole genome data provides a surprisingly complete picture of cancer chromosomal structure.
Assuntos
Neoplasias da Mama , Genômica , Humanos , Feminino , Genômica/métodos , Análise de Sequência de DNA/métodos , Genoma Humano/genética , Aberrações Cromossômicas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Variação Estrutural do Genoma/genéticaRESUMO
High-order three-dimensional (3D) interactions between more than two genomic loci are common in human chromatin, but their role in gene regulation is unclear. Previous high-order 3D chromatin assays either measure distant interactions across the genome or proximal interactions at selected targets. To address this gap, we developed Pore-C, which combines chromatin conformation capture with nanopore sequencing of concatemers to profile proximal high-order chromatin contacts at the genome scale. We also developed the statistical method Chromunity to identify sets of genomic loci with frequencies of high-order contacts significantly higher than background ('synergies'). Applying these methods to human cell lines, we found that synergies were enriched in enhancers and promoters in active chromatin and in highly transcribed and lineage-defining genes. In prostate cancer cells, these included binding sites of androgen-driven transcription factors and the promoters of androgen-regulated genes. Concatemers of high-order contacts in highly expressed genes were demethylated relative to pairwise contacts at the same loci. Synergies in breast cancer cells were associated with tyfonas, a class of complex DNA amplicons. These results rigorously link genome-wide high-order 3D interactions to lineage-defining transcriptional programs and establish Pore-C and Chromunity as scalable approaches to assess high-order genome structure.
Assuntos
Sequenciamento por Nanoporos , Nanoporos , Androgênios , Cromatina/genética , Humanos , Fatores de Transcrição/genéticaRESUMO
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
Assuntos
Variação Estrutural do Genoma/genética , Genômica/métodos , Neoplasias/genética , Inversão Cromossômica/genética , Cromotripsia , Variações do Número de Cópias de DNA/genética , Rearranjo Gênico/genética , Genoma Humano/genética , Humanos , Mutação/genética , Sequenciamento Completo do Genoma/métodosRESUMO
Kinases play a critical role in cellular signaling and are dysregulated in a number of diseases, such as cancer, diabetes, and neurodegeneration. Therapeutics targeting kinases currently account for roughly 50% of cancer drug discovery efforts. The ability to explore human kinase biochemistry and biophysics in the laboratory is essential to designing selective inhibitors and studying drug resistance. Bacterial expression systems are superior to insect or mammalian cells in terms of simplicity and cost effectiveness but have historically struggled with human kinase expression. Following the discovery that phosphatase coexpression produced high yields of Src and Abl kinase domains in bacteria, we have generated a library of 52 His-tagged human kinase domain constructs that express above 2 µg/mL of culture in an automated bacterial expression system utilizing phosphatase coexpression (YopH for Tyr kinases and lambda for Ser/Thr kinases). Here, we report a structural bioinformatics approach to identifying kinase domain constructs previously expressed in bacteria and likely to express well in our protocol, experiments demonstrating our simple construct selection strategy selects constructs with good expression yields in a test of 84 potential kinase domain boundaries for Abl, and yields from a high-throughput expression screen of 96 human kinase constructs. Using a fluorescence-based thermostability assay and a fluorescent ATP-competitive inhibitor, we show that the highest-expressing kinases are folded and have well-formed ATP binding sites. We also demonstrate that these constructs can enable characterization of clinical mutations by expressing a panel of 48 Src and 46 Abl mutations. The wild-type kinase construct library is available publicly via Addgene.
Assuntos
Bactérias/metabolismo , Sítios de Ligação , Escherichia coli/metabolismo , Humanos , Fosforilação , Estrutura Secundária de Proteína , Proteínas Tirosina Quinases/metabolismo , Proteínas Proto-Oncogênicas c-abl/metabolismo , Quinases da Família src/metabolismoRESUMO
Many eukaryotic protein kinases are activated by phosphorylation on a specific conserved residue in the regulatory activation loop, a post-translational modification thought to stabilize the active DFG-In state of the catalytic domain. Here we use a battery of spectroscopic methods that track different catalytic elements of the kinase domain to show that the ~100 fold activation of the mitotic kinase Aurora A (AurA) by phosphorylation occurs without a population shift from the DFG-Out to the DFG-In state, and that the activation loop of the activated kinase remains highly dynamic. Instead, molecular dynamics simulations and electron paramagnetic resonance experiments show that phosphorylation triggers a switch within the DFG-In subpopulation from an autoinhibited DFG-In substate to an active DFG-In substate, leading to catalytic activation. This mechanism raises new questions about the functional role of the DFG-Out state in protein kinases.
Assuntos
Regulação Alostérica , Aurora Quinase A/química , Aurora Quinase A/metabolismo , Ativação Enzimática , Processamento de Proteína Pós-Traducional , Espectroscopia de Ressonância de Spin Eletrônica , Simulação de Dinâmica Molecular , Fosforilação , Análise EspectralRESUMO
The catalytic activity of many protein kinases is controlled by conformational changes of a conserved Asp-Phe-Gly (DFG) motif. We used an infrared probe to track the DFG motif of the mitotic kinase Aurora A (AurA) and found that allosteric activation by the spindle-associated protein Tpx2 involves an equilibrium shift toward the active DFG-in state. Förster resonance energy transfer experiments show that the activation loop undergoes a nanometer-scale movement that is tightly coupled to the DFG equilibrium. Tpx2 further activates AurA by stabilizing a water-mediated allosteric network that links the C-helix to the active site through an unusual polar residue in the regulatory spine. The polar spine residue and water network of AurA are essential for phosphorylation-driven activation, but an alternative form of the water network found in related kinases can support Tpx2-driven activation, suggesting that variations in the water-mediated hydrogen bond network mediate regulatory diversification in protein kinases.
Assuntos
Aurora Quinase A/metabolismo , Água/metabolismo , Regulação Alostérica , Ativação Enzimática , Humanos , Modelos Moleculares , Água/químicaRESUMO
Atomistic molecular simulations are a powerful way to make quantitative predictions, but the accuracy of these predictions depends entirely on the quality of the force field employed. Although experimental measurements of fundamental physical properties offer a straightforward approach for evaluating force field quality, the bulk of this information has been tied up in formats that are not machine-readable. Compiling benchmark data sets of physical properties from non-machine-readable sources requires substantial human effort and is prone to the accumulation of human errors, hindering the development of reproducible benchmarks of force-field accuracy. Here, we examine the feasibility of benchmarking atomistic force fields against the NIST ThermoML data archive of physicochemical measurements, which aggregates thousands of experimental measurements in a portable, machine-readable, self-annotating IUPAC-standard format. As a proof of concept, we present a detailed benchmark of the generalized Amber small-molecule force field (GAFF) using the AM1-BCC charge model against experimental measurements (specifically, bulk liquid densities and static dielectric constants at ambient pressure) automatically extracted from the archive and discuss the extent of data available for use in larger scale (or continuously performed) benchmarks. The results of even this limited initial benchmark highlight a general problem with fixed-charge force fields in the representation low-dielectric environments, such as those seen in binding cavities or biological membranes.