RESUMEN
BACKGROUND: Eukaryotic genome is compartmentalized into structural and functional domains. One of the concepts of higher order organization of chromatin posits that the DNA is organized in constrained loops that behave as independent functional domains. Nuclear Matrix (NuMat), a ribo-proteinaceous nucleoskeleton, provides the structural basis for this organization. DNA sequences located at base of the loops are known as the Matrix Attachment Regions (MARs). NuMat relates to multiple nuclear processes and is partly cell type specific in composition. It is a biochemically defined structure and several protocols have been used to isolate the NuMat where some of the steps have been critically evaluated. These sequences play an important role in genomic organization it is imperative to know their dynamics during development and differentiation. RESULTS: Here we look into the dynamics of MARs when the preparation process is varied and during embryonic development of D. melanogaster. A subset of MARs termed as "Core-MARs" present abundantly in pericentromeric heterochromatin, are constant unalterable anchor points as they associate with NuMat through embryonic development and are independent of the isolation procedure. Euchromatic MARs are dynamic and reflect the transcriptomic profile of the cell. New MARs are generated by nuclear stabilization, and during development, mostly at paused RNA polymerase II promoters. Paused Pol II MARs depend on RNA transcripts for NuMat association. CONCLUSIONS: Our data reveals the role of MARs in functionally dynamic nucleus and contributes to the current understanding of nuclear architecture in genomic context.
Asunto(s)
Drosophila melanogaster , Heterocromatina , Animales , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Heterocromatina/metabolismo , ARN Polimerasa II/metabolismo , Matriz Nuclear/genética , Matriz Nuclear/química , Matriz Nuclear/metabolismo , Cromatina/genética , Cromatina/metabolismo , ADN/metabolismo , ARN/metabolismoRESUMEN
Microsatellites are short tandem repeats of 1-6 nucleotide motifs, studied for their utility as genome markers and in forensics. Recent evidence points to the role of microsatellites in important regulatory functions, and their length polymorphisms at coding regions are linked to various neurodegenerative disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and their evolution remains poorly understood. Though other databases of microsatellites exist, they fall short on several fronts. MSDB (MicroSatellite DataBase) is a collection of >4 billion microsatellites from 37 680 genomes presented in a user-friendly web portal for easy, interactive analysis and visualization. This is by far the most comprehensive, annotated, updated database to access and analyze microsatellite data of multiple species. The features of MSDB enable users to explore the data as tables that can be filtered and exported, and also as interactive charts to view and compare the data of multiple species simultaneously. Its modularity and architecture permit seamless updates with new data, making it a powerful tool and useful resource to researchers working on this important class of DNA elements, particularly in context of their evolution and emerging roles in genome organization and gene regulation.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Repeticiones de Microsatélite , Genoma , Humanos , Anotación de Secuencia MolecularRESUMEN
BACKGROUND: Microsatellites, or Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes. Emerging evidence points to their role in cellular processes and gene regulation. Despite the huge resource of genomic information currently available, SSRs have been studied in a limited context and compared across relatively few species. RESULTS: We have identified ~ 685 million eukaryotic microsatellites and analyzed their genomic trends across 15 taxonomic subgroups from protists to mammals. The distribution of SSRs reveals taxon-specific variations in their exonic, intronic and intergenic densities. Our analysis reveals the differences among non-related species and novel patterns uniquely demarcating closely related species. We document several repeats common across subgroups as well as rare SSRs that are excluded almost throughout evolution. We further identify species-specific signatures in pathogens like Leishmania as well as in cereal crops, Drosophila, birds and primates. We also find that distinct SSRs preferentially exist as long repeating units in different subgroups; most unicellular organisms show no length preference for any SSR class, while many SSR motifs accumulate as long repeats in complex organisms, especially in mammals. CONCLUSIONS: We present a comprehensive analysis of SSRs across taxa at an unprecedented scale. Our analysis indicates that the SSR composition of organisms with heterogeneous cell types is highly constrained, while simpler organisms such as protists, green algae and fungi show greater diversity in motif abundance, density and GC content. The microsatellite dataset generated in this work provides a large number of candidates for functional analysis and for studying their roles across the evolutionary landscape.
Asunto(s)
Eucariontes/genética , Repeticiones de Microsatélite , Animales , Genoma , Genómica , Humanos , Motivos de NucleótidosRESUMEN
Motivation: Microsatellites or Simple Sequence Repeats (SSRs) are short tandem repeats of DNA motifs present in all genomes. They have long been used for a variety of purposes in the areas of population genetics, genotyping, marker-assisted selection and forensics. Numerous studies have highlighted their functional roles in genome organization and gene regulation. Though several tools are currently available to identify SSRs from genomic sequences, they have significant limitations. Results: We present a novel algorithm called PERF for extremely fast and comprehensive identification of microsatellites from DNA sequences of any size. PERF is several fold faster than existing algorithms and uses up to 5-fold lesser memory. It provides a clean and flexible command-line interface to change the default settings, and produces output in an easily-parseable tab-separated format. In addition, PERF generates an interactive and stand-alone HTML report with charts and tables for easy downstream analysis. Availability and implementation: PERF is implemented in the Python programming language. It is freely available on PyPI under the package name perf_ssr, and can be installed directly using pip or easy_install. The documentation of PERF is available at https://github.com/rkmlab/perf. The source code of PERF is deposited in GitHub at https://github.com/rkmlab/perf under an MIT license. Contact: tej@ccmb.res.in. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Genoma Humano , Genómica/métodos , Repeticiones de Microsatélite , Análisis de Secuencia de ADN/métodos , Humanos , Programas InformáticosRESUMEN
Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb.