RESUMEN
Genomic DNA breakages and the subsequent insertion and deletion mutations are important contributors to genome instability and linked diseases. Unlike the research in point mutations, the relationship between DNA sequence context and the propensity for strand breaks remains elusive. Here, by analyzing the differences and commonalities across myriads of genomic breakage datasets, we extract the sequence-linked rules and patterns behind DNA fragility. We show the overall deconvolution of the sequence influence into short-, mid- and long-range effects, and the stressor-dependent differences in defining the range and compositional effects on DNA fragility. We summarize and release our feature compendium as a library that can be seamlessly incorporated into genomic machine learning procedures, where DNA fragility is of concern, and train a generalized DNA fragility model on cancer-associated breakages. Structural variants (SVs) tend to stabilize regions in which they emerge, with the effect most pronounced for pathogenic SVs. In contrast, the effects of chromothripsis are seen across regions less prone to breakages. We find that viral integration may bring genome fragility, particularly for cancer-associated viruses. Overall, this work offers novel insights into the genomic sequence basis of DNA fragility and presents a powerful machine learning resource to further enhance our understanding of genome (in)stability and evolution.
RESUMEN
Chargaff's second parity rule (PR-2), where the complementary base and k-mer contents are matching within the same strand of a double stranded DNA (dsDNA), is a phenomenon that invited many explanations. The strict compliance of nearly all nuclear dsDNA to PR-2 implies that the explanation should also be similarly adamant. In this work, we revisited the possibility of mutation rates driving PR-2 compliance. Starting from the assumption-free approach, we constructed kinetic equations for unconstrained simulations. The results were analysed for their PR-2 compliance by employing symbolic regression and machine learning techniques. We arrived to a generalised set of mutation rate interrelations in place in most species that allow for their full PR-2 compliance. Importantly, our constraints explain PR-2 in genomes out of the scope of the prior explanations based on the equilibration under mutation rates with simpler no-strand-bias constraints. We thus reinstate the role of mutation rates in PR-2 through its molecular core, now shown, under our formulation, to be tolerant to previously noted strand biases and incomplete compositional equilibration. We further investigate the time for any genome to reach PR-2, showing that it is generally earlier than the compositional equilibrium, and well within the age of life on Earth.
Asunto(s)
ADN , Genoma , Tasa de Mutación , ADN/química , ADN/genética , Genómica , Humanos , Animales , Eucariontes/genética , Células Procariotas/químicaRESUMEN
MOTIVATION: Various computational biology calculations require a probabilistic optimization protocol to determine the parameters that capture the system at a desired state in the configurational space. Many existing methods excel at certain scenarios, but fail in others due, in part, to an inefficient exploration of the parameter space and easy trapping into local minima. Here, we developed a general-purpose optimization engine in R that can be plugged to any, simple or complex, modelling initiative through a few lucid interfacing functions, to perform a seamless optimization with rigorous parameter sampling. RESULTS: ROptimus features simulated annealing and replica exchange implementations equipped with adaptive thermoregulation to drive Monte Carlo optimization process in a flexible manner, through constrained acceptance frequency but unconstrained adaptive pseudo temperature regimens. We exemplify the applicability of our R optimizer to a diverse set of problems spanning data analyses and computational biology tasks. AVAILABILITY AND IMPLEMENTATION: ROptimus is written and implemented in R, and is freely available from CRAN (http://cran.r-project.org/web/packages/ROptimus/index.html) and GitHub (http://github.com/SahakyanLab/ROptimus).
Asunto(s)
Biología Computacional , Programas Informáticos , Biología Computacional/métodos , Método de Montecarlo , TemperaturaRESUMEN
Genomic maps of DNA G-quadruplexes (G4s) can help elucidate the roles that these secondary structures play in various organisms. Herein, we employ an improved version of a G-quadruplex sequencing method (G4-seq) to generate whole genome G4 maps for 12 species that include widely studied model organisms and also pathogens of clinical relevance. We identify G4 structures that form under physiological K+ conditions and also G4s that are stabilized by the G4-targeting small molecule pyridostatin (PDS). We discuss the various structural features of the experimentally observed G-quadruplexes (OQs), highlighting differences in their prevalence and enrichment across species. Our study describes diversity in sequence composition and genomic location for the OQs in the different species and reveals that the enrichment of OQs in gene promoters is particular to mammals such as mouse and human, among the species studied. The multi-species maps have been made publicly available as a resource to the research community. The maps can serve as blueprints for biological experiments in those model organisms, where G4 structures may play a role.
Asunto(s)
Mapeo Cromosómico/métodos , G-Cuádruplex , Genoma , Aminoquinolinas/química , Animales , Arabidopsis/clasificación , Arabidopsis/genética , Secuencia de Bases , Caenorhabditis elegans , Drosophila melanogaster/clasificación , Drosophila melanogaster/genética , Escherichia coli/clasificación , Escherichia coli/genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Leishmania major/clasificación , Leishmania major/genética , Ratones , Filogenia , Ácidos Picolínicos/química , Plasmodium falciparum/clasificación , Plasmodium falciparum/genética , Rhodobacter sphaeroides/clasificación , Rhodobacter sphaeroides/genética , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/genética , Trypanosoma brucei brucei/clasificación , Trypanosoma brucei brucei/genética , Pez Cebra/clasificación , Pez Cebra/genéticaRESUMEN
Recent studies indicate that i-DNA, a four-stranded cytosine-rich DNA also known as the i-motif, is actually formed in vivo; however, a systematic study on sequence effects on stability has been missing. Herein, an unprecedented number of different sequences (271) bearing four runs of 3-6 cytosines with different spacer lengths has been tested. While i-DNA stability is nearly independent on total spacer length, the central spacer plays a special role on stability. Stability also depends on the length of the C-tracts at both acidic and neutral pHs. This study provides a global picture on i-DNA stability thanks to the large size of the introduced data set; it reveals unexpected features and allows to conclude that determinants of i-DNA stability do not mirror those of G-quadruplexes. Our results illustrate the structural roles of loops and C-tracts on i-DNA stability, confirm its formation in cells, and allow establishing rules to predict its stability.
RESUMEN
The alphabet of modified DNA bases goes beyond the conventional four letters, with biological roles being found for many such modifications. Herein, we describe the observation of a modified thymine base that arises from spontaneous N1 -C2 ring opening of the oxidation product 5-formyl uracil, after N3 deprotonation. We first observed this phenomenon in silico through ab initio calculations, followed by in vitro experiments to verify its formation at a mononucleoside level and in a synthetic DNA oligonucleotide context. We show that the new base modification (Trex , thymine ring expunged) can form under physiological conditions, and is resistant to the action of common repair machineries. Furthermore, we found cases of the natural existence of Trex while screening a number of human cell types and mESC (E14), thus suggesting potential biological relevance of this modification.
Asunto(s)
ADN/metabolismo , Timina/metabolismo , Línea Celular Tumoral , ADN/genética , Células HeLa , Humanos , Estructura Molecular , Oxidación-Reducción , Timina/químicaRESUMEN
We introduce RNA G-quadruplex sequencing (rG4-seq), a transcriptome-wide RNA G-quadruplex (rG4) profiling method that couples rG4-mediated reverse transcriptase stalling with next-generation sequencing. Using rG4-seq on polyadenylated-enriched HeLa RNA, we generated a global in vitro map of thousands of canonical and noncanonical rG4 structures. We characterize rG4 formation relative to cytosine content and alternative RNA structure stability, uncover rG4-dependent differences in RNA folding and show evolutionarily conserved enrichment in transcripts mediating RNA processing and stability.
Asunto(s)
G-Cuádruplex , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Citosina/metabolismo , Guanina/metabolismo , Células HeLa , Humanos , Estabilidad del ARN , ARN Mensajero/metabolismoRESUMEN
BACKGROUND: Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. RESULTS: The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. CONCLUSIONS: The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.
Asunto(s)
Genoma Humano , Mutación de Línea Germinal , Tasa de Mutación , Composición de Base , Mapeo Cromosómico , Biología Computacional/métodos , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADNRESUMEN
Proline isomerization is a ubiquitous process that plays a key role in the folding of proteins and in the regulation of their functions. Different families of enzymes, known as "peptidyl-prolyl isomerases" (PPIases), catalyze this reaction, which involves the interconversion between the cis and trans isomers of the N-terminal amide bond of the amino acid proline. However, complete descriptions of the mechanisms by which these enzymes function have remained elusive. We show here that cyclophilin A, one of the most common PPIases, provides a catalytic environment that acts on the substrate through an electrostatic handle mechanism. In this mechanism, the electrostatic field in the catalytic site turns the electric dipole associated with the carbonyl group of the amino acid preceding the proline in the substrate, thus causing the rotation of the peptide bond between the two residues. We identified this mechanism using a combination of NMR measurements, molecular dynamics simulations, and density functional theory calculations to simultaneously determine the cis-bound and trans-bound conformations of cyclophilin A and its substrate as the enzymatic reaction takes place. We anticipate that this approach will be helpful in elucidating whether the electrostatic handle mechanism that we describe here is common to other PPIases and, more generally, in characterizing other enzymatic processes.
Asunto(s)
Ciclofilina A/química , Simulación de Dinámica Molecular , Prolina/química , Catálisis , Humanos , Resonancia Magnética Nuclear Biomolecular , Electricidad EstáticaRESUMEN
BACKGROUND: The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. RESULTS: We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. CONCLUSIONS: Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.
Asunto(s)
Empalme Alternativo , Cardiomiopatías/genética , Diabetes Mellitus Tipo 2/genética , Neoplasias/genética , Exones , Humanos , Intrones , MutaciónRESUMEN
RNA G-quadruplex (rG4) structures are of fundamental importance to biology. A novel approach is introduced to detect and structurally map rG4s at single-nucleotide resolution in RNAs. The approach, denoted SHALiPE, couples selective 2'-hydroxyl acylation with lithium ion-based primer extension, and identifies characteristic structural fingerprints for rG4 mapping. We apply SHALiPE to interrogate the human precursor microRNAâ 149, and reveal the formation of an rG4 structure in this non-coding RNA. Additional analyses support the SHALiPE results and uncover that this rG4 has a parallel topology, is thermally stable, and is conserved in mammals. An inâ vitro Dicer assay shows that this rG4 inhibits Dicer processing, supporting the potential role of rG4 structures in microRNA maturation and post-transcriptional regulation of mRNAs.
Asunto(s)
G-Cuádruplex , Hidróxidos/química , MicroARNs/análisis , Acilación , Humanos , Estructura MolecularRESUMEN
We present a chemical method to selectively tag and enrich thymine modifications, 5-formyluracil (5-fU) and 5-hydroxymethyluracil (5-hmU), found naturally in DNA. Inherent reactivity differences have enabled us to tag 5-fU chemoselectively over its C modification counterpart, 5-formylcytosine (5-fC). We rationalized the enhanced reactivity of 5-fU compared to 5-fC via ab initio quantum mechanical calculations. We exploited this chemical tagging reaction to provide proof of concept for the enrichment of 5-fU containing DNA from a pool that contains 5-fC or no modification. We further demonstrate that 5-hmU can be chemically oxidized to 5-fU, providing a strategy for the enrichment of 5-hmU. These methods will enable the mapping of 5-fU and 5-hmU in genomic DNA, to provide insights into their functional role and dynamics in biology.
Asunto(s)
ADN/química , Timina/química , Secuencia de Bases , ADN/genética , Modelos Moleculares , Conformación de Ácido Nucleico , Oligodesoxirribonucleótidos/química , Oligodesoxirribonucleótidos/genética , Pentoxil (Uracilo)/análogos & derivados , Pentoxil (Uracilo)/química , Uracilo/análogos & derivados , Uracilo/químicaRESUMEN
Recent improvements in the accuracy of structure-based methods for the prediction of nuclear magnetic resonance chemical shifts have inspired numerous approaches for determining the secondary and tertiary structures of proteins. Such advances also suggest the possibility of using chemical shifts to characterize the conformational fluctuations of these molecules. Here we describe a method of using methyl chemical shifts as restraints in replica-averaged molecular dynamics (MD) simulations, which enables us to determine the conformational ensemble of the HU dimer and characterize the range of motions accessible to its flexible ß-arms. Our analysis suggests that the bending action of HU on DNA is mediated by a mechanical clamping mechanism, in which metastable structural intermediates sampled during the hinge motions of the ß-arms in the free state are presculpted to bind DNA. These results illustrate that using side-chain chemical shift data in conjunction with MD simulations can provide quantitative information about the free energy landscapes of proteins and yield detailed insights into their functional mechanisms.
Asunto(s)
Proteínas Bacterianas/química , Proteínas de Unión al ADN/química , ADN/química , Espectroscopía de Resonancia Magnética , Simulación de Dinámica Molecular , Sitios de Unión , Dimerización , Metano/química , Conformación MolecularRESUMEN
Almost (all atom molecular simulation toolkit) is an open source computational package for structure determination and analysis of complex molecular systems including proteins, and nucleic acids. Almost has been designed with two primary goals: to provide tools for molecular structure determination using various types of experimental measurements as conformational restraints, and to provide methods for the analysis and assessment of structural and dynamical properties of complex molecular systems. The methods incorporated in Almost include the determination of structural and dynamical features of proteins using distance restraints derived from nuclear Overhauser effect measurements, orientational restraints obtained from residual dipolar couplings and the structural restraints from chemical shifts. Here, we present the first public release of Almost, highlight the key aspects of its computational design and discuss the main features currently implemented. Almost is available for the most common Unix-based operating systems, including Linux and Mac OS X. Almost is distributed free of charge under the GNU Public License, and is available both as a source code and as a binary executable from the project web site at http://www.open-almost.org. Interested users can follow and contribute to the further development of Almost on http://sourceforge.net/projects/almost.
Asunto(s)
Simulación de Dinámica Molecular , Proteínas/química , Programas Informáticos , Resonancia Magnética Nuclear Biomolecular , Conformación ProteicaRESUMEN
We are witnessing a steep increase in model development initiatives in genomics that employ high-end machine learning methodologies. Of particular interest are models that predict certain genomic characteristics based solely on DNA sequence. These models, however, treat the DNA as a mere collection of four, A, T, G and C, letters, dismissing the past advancements in science that can enable the use of more intricate information from nucleic acid sequences. Here, we provide a comprehensive database of quantum mechanical (QM) and geometric features for all the permutations of 7-meric DNA in their representative B, A and Z conformations. The database is generated by employing the applicable high-cost and time-consuming QM methodologies. This can thus make it seamless to associate a wealth of novel molecular features to any DNA sequence, by scanning it with a matching k-meric window and pulling the pre-computed values from our database for further use in modelling. We demonstrate the usefulness of our deposited features through their exclusive use in developing a model for A->C mutation rates.
Asunto(s)
ADN , Aprendizaje Automático , Teoría CuánticaRESUMEN
It has been recently shown that NMR chemical shifts can be used to determine the structures of proteins. In order to begin to extend this type of approach to nucleic acids, we present an equation that relates the structural parameters and the (13)C chemical shifts of the ribose group. The parameters in the equation were determined by maximizing the agreement between the DFT-derived chemical shifts and those predicted through the equation for a database of ribose structures. Our results indicate that this type of approach represents a promising way of establishing quantitative and computationally efficient analytical relationships between chemical shifts and structural parameters in nucleic acids.
Asunto(s)
Teoría Cuántica , ARN/química , Ribosa/química , Nucleósidos/química , Nucleótidos/químicaRESUMEN
Protein methyl groups have recently been the subject of much attention in NMR spectroscopy because of the opportunities that they provide to obtain information about the structure and dynamics of proteins and protein complexes. With the advent of selective labeling schemes, methyl groups are particularly interesting in the context of chemical shift based protein structure determination, an approach that to date has exploited primarily the mapping between protein structures and backbone chemical shifts. In order to extend the scope of chemical shifts for structure determination, we present here the CH3Shift method of performing structure-based predictions of methyl chemical shifts. The terms considered in the predictions take account of ring current, magnetic anisotropy, electric field, rotameric type, and dihedral angle effects, which are considered in conjunction with polynomial functions of interatomic distances. We show that the CH3Shift method achieves an accuracy in the predictions that ranges from 0.133 to 0.198 ppm for (1)H chemical shifts for Ala, Thr, Val, Leu and Ile methyl groups. We illustrate the use of the method by assessing the accuracy of side-chain structures in structural ensembles representing the dynamics of proteins.
Asunto(s)
Resonancia Magnética Nuclear Biomolecular/métodos , Proteínas/química , Cristalografía por Rayos X , Bases de Datos de Proteínas , Isoleucina/química , Leucina/química , Metano/química , Modelos Moleculares , Conformación Proteica , Ubiquitina/química , Valina/químicaRESUMEN
Electric field (EF) induced changes of one-bond indirect spin-spin coupling constants are investigated on a wide range of molecules including peptide models. EFs were both externally applied and internally calculated without external EF application by the hybrid density functional theory method. Reliable agreement with experimental data has been obtained for calculated one-bond J-couplings. The role of the EF sign and direction, internal and induced components, hydrogen bonding, internuclear distance and hyperconjugative interactions on the one-bond J-coupling vs EF interconnection is analyzed. A linear dependence of 1J on EF projection along the bond is obtained, if the bound atoms possess different enough electron densities and an EF determined by the electronic polarization exists along the bond. Accentuating the 1JNH couplings as possible EF sensitive parameters, a systematic study is done in two sets of molecules with a large variation of the native internal EF value. The most EF affected component of the 1JNH coupling constant is the spin-dipole term of Ramsey's formulation; however, in the total J-coupling formation, the EF influence on the Fermi contact term is the most significant. The induced EF projection along the bond is 6.7 times weaker in magnitude than the simulated external uniform field. The absolute EF dependence of the one-bond J-coupling involves only the internal field, which is the sum of the induced field (if the external field exists) and the internuclear field determined by the native polarization. That linear and universal dependence joins the corresponding couplings in a diverse set of molecules under various electrostatic conditions. Many types of the one-bond J-couplings can be potentially measured in biomolecules, and the study of their relation with the electrostatic properties at the corresponding sites opens a new avenue to the full exploitation of the NMR measurable parameters with novel and exciting applications.
Asunto(s)
Electrones , Modelos Químicos , Teoría Cuántica , Acetamidas/química , Electricidad , Formamidas/química , Enlace de Hidrógeno , Péptidos/química , Sensibilidad y Especificidad , VibraciónRESUMEN
To estimate the torsion sensitivity of dipolar coupling, biphenylic molecules were chosen as probes due to their relatively simple structure and the surprisingly high ambiguity of the only flexible parameter-the interring torsion angle. Solution structures of 4,4'-dibromobiphenyl and 4,4'-diiodobiphenyl are reported for the first time in two liquid crystals I52 and ZLI 1695. The comparison of NMR structures of various para-substituted biphenyls (BPs), calculated by the additive potential maximum entropy (APME) approach, shows that the small spread of torsion angle values in case of different solvents and para-substituents is in good agreement with theoretical expectations from hybrid density functional theory (DFT) methods. Furthermore, the real structural changes of interring torsion and the prevalence of solvent effects over para-halosubstitution can be correctly revealed from these small fluctuations.
RESUMEN
Dielectric permittivity (epsilon) and temperature effects on indirect spin-spin coupling constants were studied using acetonitrile as a probe molecule. Experiments were accompanied by hybrid DFT (density functional theory) studies, where the solvent was modeled using the polarization continuum model. Owing to its numerous types of J-couplings, acetonitrile is a very convenient molecule against which various basis sets can be tested or the best basis set can be selected for a given study. The results show reasonable agreement between calculated and experimental values. According to our data, scalar spin-spin coupling constants undergo substantial shifts at lower values of the dielectric constant. Thus J-coupling values are not transferable between measurements made at differing epsilon-conditions, and the assumption of the epsilon-independence of the J-coupling can lead to crucial mistakes in experiments using low-epsilon media. Dielectric permittivity also causes small geometric fluctuations within the molecule, which themselves can affect J-coupling values. Examinations of the results computed with frozen and relaxed geometries show that geometry mediation mostly affects the spin-dipole term of the J-coupling; hence, for accurate evaluation of the latter, frozen geometries are not acceptable. Another interesting fact revealed is the connection between the solvent dielectric properties and the temperature-dependence slopes of J-couplings in corresponding media.