Búsqueda | Portal Regional de la BVS

1.

Modeling the mosaic structure of bacterial genomes to infer their evolutionary history.

Sheinman, Michael; Arndt, Peter F; Massip, Florian.

Proc Natl Acad Sci U S A ; 121(13): e2313367121, 2024 Mar 26.

Artículo en Inglés | MEDLINE | ID: mdl-38517978

RESUMEN

The chronology and phylogeny of bacterial evolution are difficult to reconstruct due to a scarce fossil record. The analysis of bacterial genomes remains challenging because of large sequence divergence, the plasticity of bacterial genomes due to frequent gene loss, horizontal gene transfer, and differences in selective pressure from one locus to another. Therefore, taking advantage of the rich and rapidly accumulating genomic data requires accurate modeling of genome evolution. An important technical consideration is that loci with high effective mutation rates may diverge beyond the detection limit of the alignment algorithms used, biasing the genome-wide divergence estimates toward smaller divergences. In this article, we propose a novel method to gain insight into bacterial evolution based on statistical properties of genome comparisons. We find that the length distribution of sequence matches is shaped by the effective mutation rates of different loci, by the horizontal transfers, and by the aligner sensitivity. Based on these inputs, we build a model and show that it accounts for the empirically observed distributions, taking the Enterobacteriaceae family as an example. Our method allows to distinguish segments of vertical and horizontal origins and to estimate the time divergence and exchange rate between any pair of taxa from genome-wide alignments. Based on the estimated time divergences, we construct a time-calibrated phylogenetic tree to demonstrate the accuracy of the method.

Asunto(s)

Genoma Bacteriano , Modelos Genéticos , Filogenia , Genoma Bacteriano/genética , Genómica/métodos , Bacterias/genética , Evolución Molecular

2.

Modeling gene expression cascades during cell state transitions.

Rosebrock, Daniel; Vingron, Martin; Arndt, Peter F.

iScience ; 27(4): 109386, 2024 Apr 19.

Artículo en Inglés | MEDLINE | ID: mdl-38500834

RESUMEN

During cellular processes such as differentiation or response to external stimuli, cells exhibit dynamic changes in their gene expression profiles. Single-cell RNA sequencing (scRNA-seq) can be used to investigate these dynamic changes. To this end, cells are typically ordered along a pseudotemporal trajectory which recapitulates the progression of cells as they transition from one cell state to another. We infer transcriptional dynamics by modeling the gene expression profiles in pseudotemporally ordered cells using a Bayesian inference approach. This enables ordering genes along transcriptional cascades, estimating differences in the timing of gene expression dynamics, and deducing regulatory gene interactions. Here, we apply this approach to scRNA-seq datasets derived from mouse embryonic forebrain and pancreas samples. This analysis demonstrates the utility of the method to derive the ordering of gene dynamics and regulatory relationships critical for proper cellular differentiation and maturation across a variety of developmental contexts.

3.

Enhanced cortical neural stem cell identity through short SMAD and WNT inhibition in human cerebral organoids facilitates emergence of outer radial glial cells.

Rosebrock, Daniel; Arora, Sneha; Mutukula, Naresh; Volkman, Rotem; Gralinska, Elzbieta; Balaskas, Anastasios; Aragonés Hernández, Amèlia; Buschow, René; Brändl, Björn; Müller, Franz-Josef; Arndt, Peter F; Vingron, Martin; Elkabetz, Yechiel.

Nat Cell Biol ; 24(6): 981-995, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35697781

RESUMEN

Cerebral organoids exhibit broad regional heterogeneity accompanied by limited cortical cellular diversity despite the tremendous upsurge in derivation methods, suggesting inadequate patterning of early neural stem cells (NSCs). Here we show that a short and early Dual SMAD and WNT inhibition course is necessary and sufficient to establish robust and lasting cortical organoid NSC identity, efficiently suppressing non-cortical NSC fates, while other widely used methods are inconsistent in their cortical NSC-specification capacity. Accordingly, this method selectively enriches for outer radial glia NSCs, which cyto-architecturally demarcate well-defined outer sub-ventricular-like regions propagating from superiorly radially organized, apical cortical rosette NSCs. Finally, this method culminates in the emergence of molecularly distinct deep and upper cortical layer neurons, and reliably uncovers cortex-specific microcephaly defects. Thus, a short SMAD and WNT inhibition is critical for establishing a rich cortical cell repertoire that enables mirroring of fundamental molecular and cyto-architectural features of cortical development and meaningful disease modelling.

Asunto(s)

Células-Madre Neurales , Organoides , Diferenciación Celular , Corteza Cerebral , Células Ependimogliales , Humanos , Neurogénesis , Neuronas

4.

Modelling segmental duplications in the human genome.

Abdullaev, Eldar T; Umarova, Iren R; Arndt, Peter F.

BMC Genomics ; 22(1): 496, 2021 Jul 02.

Artículo en Inglés | MEDLINE | ID: mdl-34215180

RESUMEN

BACKGROUND: Segmental duplications (SDs) are long DNA sequences that are repeated in a genome and have high sequence identity. In contrast to repetitive elements they are often unique and only sometimes have multiple copies in a genome. There are several well-studied mechanisms responsible for segmental duplications: non-allelic homologous recombination, non-homologous end joining and replication slippage. Such duplications play an important role in evolution, however, we do not have a full understanding of the dynamic properties of the duplication process. RESULTS: We study segmental duplications through a graph representation where nodes represent genomic regions and edges represent duplications between them. The resulting network (the SD network) is quite complex and has distinct features which allow us to make inference on the evolution of segmantal duplications. We come up with the network growth model that explains features of the SD network thus giving us insights on dynamics of segmental duplications in the human genome. Based on our analysis of genomes of other species the network growth model seems to be applicable for multiple mammalian genomes. CONCLUSIONS: Our analysis suggests that duplication rates of genomic loci grow linearly with the number of copies of a duplicated region. Several scenarios explaining such a preferential duplication rates were suggested.

Asunto(s)

Genoma Humano , Duplicaciones Segmentarias en el Genoma , Animales , Evolución Molecular , Duplicación de Gen , Genómica , Humanos

5.

Identical sequences found in distant genomes reveal frequent horizontal transfer across the bacterial domain.

Sheinman, Michael; Arkhipova, Ksenia; Arndt, Peter F; Dutilh, Bas E; Hermsen, Rutger; Massip, Florian.

Elife ; 102021 06 14.

Artículo en Inglés | MEDLINE | ID: mdl-34121661

RESUMEN

Horizontal gene transfer (HGT) is an essential force in microbial evolution. Despite detailed studies on a variety of systems, a global picture of HGT in the microbial world is still missing. Here, we exploit that HGT creates long identical DNA sequences in the genomes of distant species, which can be found efficiently using alignment-free methods. Our pairwise analysis of 93,481 bacterial genomes identified 138,273 HGT events. We developed a model to explain their statistical properties as well as estimate the transfer rate between pairs of taxa. This reveals that long-distance HGT is frequent: our results indicate that HGT between species from different phyla has occurred in at least 8% of the species. Finally, our results confirm that the function of sequences strongly impacts their transfer rate, which varies by more than three orders of magnitude between different functional categories. Overall, we provide a comprehensive view of HGT, illuminating a fundamental process driving bacterial evolution.

Asunto(s)

Bacterias , Evolución Molecular , Transferencia de Gen Horizontal/genética , Genoma Bacteriano/genética , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Genoma Arqueal/genética , Genómica , Alineación de Secuencia , Análisis de Secuencia de ADN

6.

Widespread Chromosomal Losses and Mitochondrial DNA Alterations as Genetic Drivers in Hürthle Cell Carcinoma.

Gopal, Raj K; Kübler, Kirsten; Calvo, Sarah E; Polak, Paz; Livitz, Dimitri; Rosebrock, Daniel; Sadow, Peter M; Campbell, Braidie; Donovan, Samuel E; Amin, Salma; Gigliotti, Benjamin J; Grabarek, Zenon; Hess, Julian M; Stewart, Chip; Braunstein, Lior Z; Arndt, Peter F; Mordecai, Scott; Shih, Angela R; Chaves, Frances; Zhan, Tiannan; Lubitz, Carrie C; Kim, Jiwoong; Iafrate, A John; Wirth, Lori; Parangi, Sareh; Leshchiner, Ignaty; Daniels, Gilbert H; Mootha, Vamsi K; Dias-Santagata, Dora; Getz, Gad; McFadden, David G.

Cancer Cell ; 34(2): 242-255.e5, 2018 08 13.

Artículo en Inglés | MEDLINE | ID: mdl-30107175

RESUMEN

Hürthle cell carcinoma of the thyroid (HCC) is a form of thyroid cancer recalcitrant to radioiodine therapy that exhibits an accumulation of mitochondria. We performed whole-exome sequencing on a cohort of primary, recurrent, and metastatic tumors, and identified recurrent mutations in DAXX, TP53, NRAS, NF1, CDKN1A, ARHGAP35, and the TERT promoter. Parallel analysis of mtDNA revealed recurrent homoplasmic mutations in subunits of complex I of the electron transport chain. Analysis of DNA copy-number alterations uncovered widespread loss of chromosomes culminating in near-haploid chromosomal content in a large fraction of HCC, which was maintained during metastatic spread. This work uncovers a distinct molecular origin of HCC compared with other thyroid malignancies.

Asunto(s)

Aberraciones Cromosómicas , ADN Mitocondrial/genética , Mutación , Neoplasias de la Tiroides/genética , Variaciones en el Número de Copia de ADN , Haploidia , Humanos , Metástasis de la Neoplasia , Telomerasa/genética , Neoplasias de la Tiroides/patología , Secuenciación del Exoma

7.

Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans.

Smith, Thomas C A; Arndt, Peter F; Eyre-Walker, Adam.

PLoS Genet ; 14(3): e1007254, 2018 03.

Artículo en Inglés | MEDLINE | ID: mdl-29590096

RESUMEN

It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investigate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show different patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that cannot be explained by variation at smaller scales, however the level of this variation is modest at large scales-at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore structure of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between species is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered.

Asunto(s)

Variación Genética , Animales , Composición de Base , Conjuntos de Datos como Asunto , Conversión Génica , Genoma Humano , Mutación de Línea Germinal , Humanos

8.

The information capacity of the genetic code: Is the natural code optimal?

Kuruoglu, Ercan E; Arndt, Peter F.

J Theor Biol ; 419: 227-237, 2017 04 21.

Artículo en Inglés | MEDLINE | ID: mdl-28163008

RESUMEN

We envision the molecular evolution process as an information transfer process and provide a quantitative measure for information preservation in terms of the channel capacity according to the channel coding theorem of Shannon. We calculate Information capacities of DNA on the nucleotide (for non-coding DNA) and the amino acid (for coding DNA) level using various substitution models. We extend our results on coding DNA to a discussion about the optimality of the natural codon-amino acid code. We provide the results of an adaptive search algorithm in the code domain and demonstrate the existence of a large number of genetic codes with higher information capacity. Our results support the hypothesis of an ancient extension from a 2-nucleotide codon to the current 3-nucleotide codon code to encode the various amino acids.

Asunto(s)

Algoritmos , Codón/genética , Código Genético/genética , Modelos Genéticos , Aminoácidos/genética , Secuencia de Bases , Evolución Molecular

9.

Variation in the molecular clock of primates.

Moorjani, Priya; Amorim, Carlos Eduardo G; Arndt, Peter F; Przeworski, Molly.

Proc Natl Acad Sci U S A ; 113(38): 10607-12, 2016 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-27601674

RESUMEN

Events in primate evolution are often dated by assuming a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life history traits, suggesting it should also be found among primates. Motivated by these considerations, we analyze whole genomes from 10 primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs), and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are up to 64% higher in lineages leading from the hominoid-NWM ancestor to NWMs than to apes. Within apes, rates are â¼2% higher in chimpanzees and â¼7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however; in particular, transitions at CpG sites exhibit a more clocklike behavior than do other types, presumably because of their nonreplicative origin. Thus, not only the total rate, but also the mutational spectrum, varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate the human and chimpanzee divergence time is 12.1 million years,â and the human and gorilla divergence time is 15.1 million yearsâ.

Asunto(s)

Evolución Molecular , Variación Genética , Genoma/genética , Primates/genética , Sustitución de Aminoácidos/genética , Animales , Evolución Biológica , Metilación de ADN/genética , Conversión Génica/genética , Gorilla gorilla/genética , Humanos , Pan troglodytes/genética

10.

Evolutionary dynamics of selfish DNA explains the abundance distribution of genomic subsequences.

Sheinman, Michael; Ramisch, Anna; Massip, Florian; Arndt, Peter F.

Sci Rep ; 6: 30851, 2016 08 04.

Artículo en Inglés | MEDLINE | ID: mdl-27488939

RESUMEN

Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as Zipf's law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the process of selfish DNA spreading in a genome during its evolution. The obtained results shed light on how evolution of selfish DNA elements shapes non-trivial statistical properties of genomes.

Asunto(s)

Elementos Alu/genética , Evolución Molecular , Genoma Humano/genética , Modelos Genéticos , Humanos

11.

Comparing the Statistical Fate of Paralogous and Orthologous Sequences.

Massip, Florian; Sheinman, Michael; Schbath, Sophie; Arndt, Peter F.

Genetics ; 204(2): 475-482, 2016 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-27474728

RESUMEN

For several decades, sequence alignment has been a widely used tool in bioinformatics. For instance, finding homologous sequences with a known function in large databases is used to get insight into the function of nonannotated genomic regions. Very efficient tools like BLAST have been developed to identify and rank possible homologous sequences. To estimate the significance of the homology, the ranking of alignment scores takes a background model for random sequences into account. Using this model we can estimate the probability to find two exactly matching subsequences by chance in two unrelated sequences. For two homologous sequences, the corresponding probability is much higher, which allows us to identify them. Here we focus on the distribution of lengths of exact sequence matches between protein-coding regions of pairs of evolutionarily distant genomes. We show that this distribution exhibits a power-law tail with an exponent [Formula: see text] Developing a simple model of sequence evolution by substitutions and segmental duplications, we show analytically and computationally that paralogous and orthologous gene pairs contribute differently to this distribution. Our model explains the differences observed in the comparison of coding and noncoding parts of genomes, thus providing a better understanding of statistical properties of genomic sequences and their evolution.

Asunto(s)

Biología Computacional/métodos , Evolución Molecular , Alineación de Secuencia/métodos , Homología de Secuencia , Genoma , Genómica , Modelos Genéticos , Probabilidad , Duplicaciones Segmentarias en el Genoma/genética

12.

sciReptor: analysis of single-cell level immunoglobulin repertoires.

Imkeller, Katharina; Arndt, Peter F; Wardemann, Hedda; Busse, Christian E.

BMC Bioinformatics ; 17: 67, 2016 Feb 04.

Artículo en Inglés | MEDLINE | ID: mdl-26847109

RESUMEN

BACKGROUND: The sequencing of immunoglobulin (Ig) transcripts from single B cells yields essential information about Ig heavy:light chain pairing, which is lost in conventional bulk sequencing experiments. The previously limited throughput of single-cell approaches has recently been overcome by the introduction of multiple next-generation sequencing (NGS)-based platforms. Furthermore, single-cell techniques allow the assignment of additional data types (e.g. cell surface marker expression), which are crucial for biological interpretation. However, the currently available computational tools are not designed to handle single-cell data and do not provide integral solutions for linking of sequence data to other biological data. RESULTS: Here we introduce sciReptor, a flexible toolkit for the processing and analysis of antigen receptor repertoire sequencing data at single-cell level. The software combines bioinformatics tools for immunoglobulin sequence annotation with a relational database, where raw data and analysis results are stored and linked. sciReptor supports attribution of additional data categories such as cell surface marker expression or immunological metadata. Furthermore, it comprises a quality control module as well as basic repertoire visualization tools. CONCLUSION: sciReptor is a flexible framework for standardized sequence analysis of antigen receptor repertoires on single-cell level. The relational database allows easy data sharing and downstream analyses as well as immediate comparisons between different data sets.

Asunto(s)

Biología Computacional/métodos , Genes de Inmunoglobulinas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Inmunoglobulinas/genética , Análisis de la Célula Individual/métodos , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Receptores Inmunológicos/genética

13.

Genome-wide patterns and properties of de novo mutations in humans.

Francioli, Laurent C; Polak, Paz P; Koren, Amnon; Menelaou, Androniki; Chun, Sung; Renkens, Ivo; van Duijn, Cornelia M; Swertz, Morris; Wijmenga, Cisca; van Ommen, Gertjan; Slagboom, P Eline; Boomsma, Dorret I; Ye, Kai; Guryev, Victor; Arndt, Peter F; Kloosterman, Wigard P; de Bakker, Paul I W; Sunyaev, Shamil R.

Nat Genet ; 47(7): 822-826, 2015 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-25985141

RESUMEN

Mutations create variation in the population, fuel evolution and cause genetic diseases. Current knowledge about de novo mutations is incomplete and mostly indirect. Here we analyze 11,020 de novo mutations from the whole genomes of 250 families. We show that de novo mutations in the offspring of older fathers are not only more numerous but also occur more frequently in early-replicating, genic regions. Functional regions exhibit higher mutation rates due to CpG dinucleotides and show signatures of transcription-coupled repair, whereas mutation clusters with a unique signature point to a new mutational mechanism. Mutation and recombination rates independently associate with nucleotide diversity, and regional variation in human-chimpanzee divergence is only partly explained by heterogeneity in mutation rate. Finally, we provide a genome-wide mutation rate map for medical and population genetics applications. Our results provide new insights and refine long-standing hypotheses about human mutagenesis.

Asunto(s)

Mutación de Línea Germinal , Animales , Evolución Molecular , Femenino , Genoma Humano , Humanos , Masculino , Modelos Genéticos , Tasa de Mutación , Pan troglodytes/genética , Edad Paterna

14.

Quantification of GC-biased gene conversion in the human genome.

Glémin, Sylvain; Arndt, Peter F; Messer, Philipp W; Petrov, Dmitri; Galtier, Nicolas; Duret, Laurent.

Genome Res ; 25(8): 1215-28, 2015 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-25995268

RESUMEN

Much evidence indicates that GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, a detailed quantification of the process is still lacking. The strength of gBGC can be measured from the analysis of derived allele frequency spectra (DAF), but this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors and by spatial heterogeneity in gBGC strength. We propose a new general method to quantify gBGC from DAF spectra, incorporating polarization errors, taking spatial heterogeneity into account, and jointly estimating mutation bias. Applying it to human polymorphism data from the 1000 Genomes Project, we show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. Genome-wide, the intensity of gBGC is in the nearly neutral area. However, given that recombination occurs primarily within recombination hotspots, 1%-2% of the human genome is subject to strong gBGC. On average, gBGC is stronger in African than in non-African populations, reflecting differences in effective population sizes. However, due to more heterogeneous recombination landscapes, the fraction of the genome affected by strong gBGC is larger in non-African than in African populations. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that, in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.

Asunto(s)

Composición de Base , Conversión Génica , Genoma Humano , Grupos Raciales/genética , Islas de CpG , Frecuencia de los Genes , Humanos , Modelos Genéticos , Polimorfismo Genético

15.

Statistical properties of pairwise distances between leaves on a random Yule tree.

Sheinman, Michael; Massip, Florian; Arndt, Peter F.

PLoS One ; 10(3): e0120206, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-25826216

RESUMEN

A Yule tree is the result of a branching process with constant birth and death rates. Such a process serves as an instructive null model of many empirical systems, for instance, the evolution of species leading to a phylogenetic tree. However, often in phylogeny the only available information is the pairwise distances between a small fraction of extant species representing the leaves of the tree. In this article we study statistical properties of the pairwise distances in a Yule tree. Using a method based on a recursion, we derive an exact, analytic and compact formula for the expected number of pairs separated by a certain time distance. This number turns out to follow a increasing exponential function. This property of a Yule tree can serve as a simple test for empirical data to be well described by a Yule process. We further use this recursive method to calculate the expected number of the n-most closely related pairs of leaves and the number of cherries separated by a certain time distance. To make our results more useful for realistic scenarios, we explicitly take into account that the leaves of a tree may be incompletely sampled and derive a criterion for poorly sampled phylogenies. We show that our result can account for empirical data, using two families of birds species.

Asunto(s)

Hojas de la Planta , Árboles/crecimiento & desarrollo

16.

Evolutionary consequences of DNA methylation on the GC content in vertebrate genomes.

Mugal, Carina F; Arndt, Peter F; Holm, Lena; Ellegren, Hans.

G3 (Bethesda) ; 5(3): 441-7, 2015 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-25591920

RESUMEN

The genomes of many vertebrates show a characteristic variation in GC content. To explain its origin and evolution, mainly three mechanisms have been proposed: selection for GC content, mutation bias, and GC-biased gene conversion. At present, the mechanism of GC-biased gene conversion, i.e., short-scale, unidirectional exchanges between homologous chromosomes in the neighborhood of recombination-initiating double-strand breaks in favor for GC nucleotides, is the most widely accepted hypothesis. We here suggest that DNA methylation also plays an important role in the evolution of GC content in vertebrate genomes. To test this hypothesis, we investigated one mammalian (human) and one avian (chicken) genome. We used bisulfite sequencing to generate a whole-genome methylation map of chicken sperm and made use of a publicly available whole-genome methylation map of human sperm. Inclusion of these methylation maps into a model of GC content evolution provided significant support for the impact of DNA methylation on the local equilibrium GC content. Moreover, two different estimates of equilibrium GC content, one that neglects and one that incorporates the impact of DNA methylation and the concomitant CpG hypermutability, give estimates that differ by approximately 15% in both genomes, arguing for a strong impact of DNA methylation on the evolution of GC content. Thus, our results put forward that previous estimates of equilibrium GC content, which neglect the hypermutability of CpG dinucleotides, need to be reevaluated.

Asunto(s)

Composición de Base , Metilación de ADN , Evolución Molecular , Genoma Humano , Animales , Pollos , Humanos

17.

How evolution of genomes is reflected in exact DNA sequence match statistics.

Massip, Florian; Sheinman, Michael; Schbath, Sophie; Arndt, Peter F.

Mol Biol Evol ; 32(2): 524-35, 2015 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-25398628

RESUMEN

Genome evolution is shaped by a multitude of mutational processes, including point mutations, insertions, and deletions of DNA sequences, as well as segmental duplications. These mutational processes can leave distinctive qualitative marks in the statistical features of genomic DNA sequences. One such feature is the match length distribution (MLD) of exactly matching sequence segments within an individual genome or between the genomes of related species. These have been observed to exhibit characteristic power law decays in many species. Here, we show that simple dynamical models consisting solely of duplication and mutation processes can already explain the characteristic features of MLDs observed in genomic sequences. Surprisingly, we find that these features are largely insensitive to details of the underlying mutational processes and do not necessarily rely on the action of natural selection. Our results demonstrate how analyzing statistical features of DNA sequences can help us reveal and quantify the different mutational processes that underlie genome evolution.

Asunto(s)

Genoma/genética , Genómica/métodos , Animales , Evolución Biológica , Evolución Molecular , Duplicación de Gen/genética , Humanos , Duplicaciones Segmentarias en el Genoma/genética , Selección Genética

18.

Germline methylation patterns determine the distribution of recombination events in the dog genome.

Berglund, Jonas; Quilez, Javier; Arndt, Peter F; Webster, Matthew T.

Genome Biol Evol ; 7(2): 522-30, 2014 Dec 19.

Artículo en Inglés | MEDLINE | ID: mdl-25527838

RESUMEN

The positive-regulatory domain containing nine gene, PRDM9, which strongly associates with the location of recombination events in several vertebrates, is inferred to be inactive in the dog genome. Here, we address several questions regarding the control of recombination and its influence on genome evolution in dogs. First, we address whether the association between CpG islands (CGIs) and recombination hotspots is generated by lack of methylation, GC-biased gene conversion (gBGC), or both. Using a genome-wide dog single nucleotide polymorphism data set and comparisons of the dog genome with related species, we show that recombination-associated CGIs have low CpG mutation rates, and that CpG mutation rate is negatively correlated with recombination rate genome wide, indicating that nonmethylation attracts the recombination machinery. We next use a neighbor-dependent model of nucleotide substitution to disentangle the effects of CpG mutability and gBGC and analyze the effects that loss of PRDM9 has on these rates. We infer that methylation patterns have been stable during canid genome evolution, but that dog CGIs have experienced a drastic increase in substitution rate due to gBGC, consistent with increased levels of recombination in these regions. We also show that gBGC is likely to have generated many new CGIs in the dog genome, but these mostly occur away from genes, whereas the number of CGIs in gene promoter regions has not increased greatly in recent evolutionary history. Recombination has a major impact on the distribution of CGIs that are detected in the dog genome due to the interaction between methylation and gBGC. The results indicate that germline methylation patterns are the main determinant of recombination rates in the absence of PRDM9.

Asunto(s)

Metilación de ADN/genética , Perros/genética , Genoma , Células Germinativas/metabolismo , Recombinación Genética , Animales , Composición de Base/genética , Islas de CpG/genética , Polimorfismo de Nucleótido Simple/genética , Ursidae/genética

19.

Evidence of a cancer type-specific distribution for consecutive somatic mutation distances.

Muiño, Jose M; Kuruoglu, Ercan E; Arndt, Peter F.

Comput Biol Chem ; 53 Pt A: 79-83, 2014 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-25179009

RESUMEN

Specific molecular mechanisms may affect the pattern of mutation in particular regions, and therefore leaving a footprint or signature in the DNA of their activity. The common approach to identify these signatures is studying the frequency of substitutions. However, such an analysis ignores the important spatial information, which is important with regards to the mutation occurrence statistics. In this work, we propose that the study of the distribution of distances between consecutive mutations along the DNA molecule can provide information about the types of somatic mutational processes. In particular, we have found that specific cancer types show a power-law in interoccurrence distances, instead of the expected exponential distribution dictated with the Poisson assumption commonly made in the literature. Cancer genomes exhibiting power-law interoccurrence distances were enriched in cancer types where the main mutational process is described to be the activity of the APOBEC protein family, which produces a particular pattern of mutations called Kataegis. Therefore, the observation of a power-law in interoccurence distances could be used to identify cancer genomes with Kataegis.

Asunto(s)

Apolipoproteínas B/genética , Neoplasias de la Mama/genética , Genoma Humano , Mutación , Proteínas de Neoplasias/genética , Neoplasias/genética , Femenino , Humanos , Masculino , Modelos Genéticos , Neoplasias/clasificación , Distribuciones Estadísticas

20.

Distribution of segmental duplications in the context of higher order chromatin organisation of human chromosome 7.

Ebert, Grit; Steininger, Anne; Weißmann, Robert; Boldt, Vivien; Lind-Thomsen, Allan; Grune, Jana; Badelt, Stefan; Heßler, Melanie; Peiser, Matthias; Hitzler, Manuel; Jensen, Lars R; Müller, Ines; Hu, Hao; Arndt, Peter F; Kuss, Andreas W; Tebel, Katrin; Ullmann, Reinhard.

BMC Genomics ; 15: 537, 2014 Jun 29.

Artículo en Inglés | MEDLINE | ID: mdl-24973960

RESUMEN

BACKGROUND: Segmental duplications (SDs) are not evenly distributed along chromosomes. The reasons for this biased susceptibility to SD insertion are poorly understood. Accumulation of SDs is associated with increased genomic instability, which can lead to structural variants and genomic disorders such as the Williams-Beuren syndrome. Despite these adverse effects, SDs have become fixed in the human genome. Focusing on chromosome 7, which is particularly rich in interstitial SDs, we have investigated the distribution of SDs in the context of evolution and the three dimensional organisation of the chromosome in order to gain insights into the mutual relationship of SDs and chromatin topology. RESULTS: Intrachromosomal SDs preferentially accumulate in those segments of chromosome 7 that are homologous to marmoset chromosome 2. Although this formerly compact segment has been re-distributed to three different sites during primate evolution, we can show by means of public data on long distance chromatin interactions that these three intervals, and consequently the paralogous SDs mapping to them, have retained their spatial proximity in the nucleus. Focusing on SD clusters implicated in the aetiology of the Williams-Beuren syndrome locus we demonstrate by cross-species comparison that these SDs have inserted at the borders of a topological domain and that they flank regions with distinct DNA conformation. CONCLUSIONS: Our study suggests a link of nuclear architecture and the propagation of SDs across chromosome 7, either by promoting regional SD insertion or by contributing to the establishment of higher order chromatin organisation themselves. The latter could compensate for the high risk of structural rearrangements and thus may have contributed to their evolutionary fixation in the human genome.

Asunto(s)

Cromatina/genética , Cromosomas Humanos Par 7 , Duplicaciones Segmentarias en el Genoma , Acetilación , Cromatina/metabolismo , Cromosomas Humanos Par 2 , Epistasis Genética , Evolución Molecular , Sitios Genéticos , Genómica , Histonas/metabolismo , Humanos , Transcripción Genética , Síndrome de Williams/genética

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA