RESUMEN
BACKGROUND: Microsatellites are increasingly realized to have biological significance in human genome and health in past decades, the assembled complete reference sequence of human genome T2T-CHM13 brought great help for a comprehensive study of short tandem repeats in the human genome. RESULTS: Microsatellites density landscapes of all 24 chromosomes were built here for the first complete reference sequence of human genome T2T-CHM13. These landscapes showed that short tandem repeats (STRs) are prone to aggregate characteristically to form a large number of STRs density peaks. We classified 8,823 High Microsatellites Density Peaks (HMDPs), 35,257 Middle Microsatellites Density Peaks (MMDPs) and 199, 649 Low Microsatellites Density Peaks (LMDPs) on the 24 chromosomes; and also classified the motif types of every microsatellites density peak. These STRs density aggregation peaks are mainly composing of a single motif, and AT is the most dominant motif, followed by AATGG and CCATT motifs. And 514 genomic regions were characterized by microsatellite density feature in the full T2T-CHM13 genome. CONCLUSIONS: These landscape maps exhibited that microsatellites aggregate in many genomic positions to form a large number of microsatellite density peaks with composing of mainly single motif type in the complete reference genome, indicating that the local microsatellites density varies enormously along the every chromosome of T2T-CHM13.
Asunto(s)
Genoma Humano , Repeticiones de Microsatélite , Humanos , Genómica/métodos , Motivos de Nucleótidos , Cromosomas Humanos/genéticaRESUMEN
The phylogenetic structure of the genus Niviventer has been studied based on several individual mitochondrial and nuclear genes, but the results seem to be inconsistent. In order to clarify the phylogeny of Niviventer, we sequenced the complete mitochondrial genome of white-bellied rat (Niviventer andersoni of the family Muridae) by next-generation sequencing. The 16,291 bp mitochondrial genome consists of 22 transfer RNA genes, 13 protein-coding genes (PCGs), two ribosomal RNA genes, and one noncoding control region (D-Loop). Phylogenetic analyses of the nucleotide sequences of all 13 PCGs, PCGs minus ND6, and the entire mitogenome sequence except for the D-loop revealed well-resolved topologies supporting that N. andersoni was clustered with N. excelsior forming a sister division with N. confucianus, which statistically rejected the hypothesis based on the tree of cytochrome b (cytb) gene that N. confucianus is sister to N. fulvescens. Our research provides the first annotated complete mitochondrial genome of N. andersoni, extending the understanding about taxonomy and mitogenomic evolution of the genus Niviventer.
RESUMEN
Lots of viral genomes were found to contain microsatellites (SSRs) including Ebolavirus, and majority of Ebolavirus microsatellite sites are distributed in protein-coding regions of the genomes. Here, we totally identified 212 reserved microsatellite sites in the protein-coding regions of 213 genomic sequences from five Ebolavirus species. In these reserved microsatellite sites, there is only one significantly conserved microsatellite site among the sample Ebolavirus genomic sequences, and this microsatellite is located at RNA editing site of the GP gene, indicating the selective relevance with RNA editing there. This analysis may help to further explore the biological significance of various microsatellites in Ebolavirus genomes.
Asunto(s)
Ebolavirus/genética , Repeticiones de Microsatélite/genética , Edición GénicaRESUMEN
Plant pathogens have agricultural impacts on a global scale and resolving the timing and route of their spread can aid crop protection and inform control strategies. However, the evolutionary and phylogeographic history of plant pathogens in Eurasia remains largely unknown because of the difficulties in sampling across such a large landmass. Here, we show that turnip mosaic potyvirus (TuMV), a significant pathogen of brassica crops, spread from west to east across Eurasia from about the 17th century CE. We used a Bayesian phylogenetic approach to analyze 579 whole genome sequences and up to 713 partial sequences of TuMV, including 122 previously unknown genome sequences from isolates that we collected over the past five decades. Our phylogeographic and molecular clock analyses showed that TuMV isolates of the Asian-Brassica/Raphanus (BR) and basal-BR groups and world-Brassica3 (B3) subgroup spread from the center of emergence to the rest of Eurasia in relation to the host plants grown in each country. The migration pathways of TuMV have retraced some of the major historical trade arteries in Eurasia, a network that formed the Silk Road, and the regional variation of the virus is partly characterized by different type patterns of recombinants. Our study presents a complex and detailed picture of the timescale and major transmission routes of an important plant pathogen.
Asunto(s)
Brassica/virología , Economía , Genoma Viral , Genómica , Enfermedades de las Plantas/virología , Potyvirus/fisiología , Variación Genética , Genómica/métodos , Geografía , Filogenia , Filogeografía , Potyvirus/clasificaciónRESUMEN
BACKGROUND: Though interest in human simple sequence repeats (SSRs) is increasing, little is known about the exact distributional features of numerous SSRs in human Y-DNA at chromosomal level. Herein, totally 540 maps were established, which could clearly display SSR landscape in every bin of 1 k base pairs (Kbp) along the sequenced part of human reference Y-DNA (NC_000024.10), by our developed differential method for improving the existing method to reveal SSR distributional characteristics in large genomic sequences. RESULTS: The maps show that SSRs accumulate significantly with forming density peaks in at least 2040 bins of 1 Kbp, which involve different coding, noncoding and intergenic regions of the Y-DNA, and 10 especially high density peaks were reported to associate with biological significances, suggesting that the other hundreds of especially high density peaks might also be biologically significant and worth further analyzing. In contrast, the maps also show that SSRs are extremely sparse in at least 207 bins of 1 Kbp, including many noncoding and intergenic regions of the Y-DNA, which is inconsistent with the widely accepted view that SSRs are mostly rich in these regions, and these sparse distributions are possibly due to powerfully regional selection. Additionally, many regions harbor SSR clusters with same or similar motif in the Y-DNA. CONCLUSIONS: These 540 maps may provide the important information of clearly position-related SSR distributional features along the human reference Y-DNA for better understanding the genome structures of the Y-DNA. This study may contribute to further exploring the biological significance and distribution law of the huge numbers of SSRs in human Y-DNA.
Asunto(s)
Repeticiones de Microsatélite , Polimorfismo Genético , ADN/genética , Genoma , Genoma de Planta , Humanos , Repeticiones de Microsatélite/genética , Análisis de Secuencia de ADNRESUMEN
Uruguayan beef is one of the most popular products in the export market. In this study, we report the complete mitochondrial genome sequence of Uruguayan native cattle for the first time. The total mitochondrial genome sequence is 16,339 bp in length with the base composition of 33.4% for A, 27.2% for T, 26.0% for C, and 13.4% for G. The description of all genes is similar to the typical mitochondrial genomes of cattle. The annotated mitochondrial genome of Uruguayan native cattle would serve as an important genetic data set for further study.
RESUMEN
In recent years, high energy density polymer capacitors have attracted a lot of scientific interest due to their potential applications in advanced power systems and electronic devices. Here, core-shell structured TiO2@SrTiO3@polydamine nanowires (TiO2@SrTiO3@PDA NWs) were synthesized via a combination of surface conversion reaction and in-situ polymerization method, and then incorporated into the poly(vinylidene fluoride) (PVDF) matrix. Our results showed that a small amount of TiO2@SrTiO3@PDA NWs can simultaneously enhance the breakdown strength and electric displacement of nanocomposite (NC) films, resulting in improved energy storage capability. The 5 wt% TiO2@SrTiO3@PDA NWs/PVDF NC demonstrates 1.72 times higher maximum discharge energy density compared to pristine PVDF (10.34 J/cm3 at 198 MV/m vs. 6.01 J/cm3 at 170 MV/m). In addition, the NC with 5 wt% TiO2@SrTiO3@PDA NWs also demonstrates an excellent charge-discharge efficiency (69% at 198 MV/m). Enhanced energy storage performance is due to hierarchical interfacial polarization among their multiple interfaces, the large aspect ratio as well as surface modification of the TiO2@SrTiO3 NWs. The results of this study provide guidelines and a foundation for the preparation of the polymer NCs with an outstanding discharge energy density.
RESUMEN
BACKGROUND: The ubiquitous presence of short tandem repeats (STRs) in virtually all genomes implicates their functional relevance, while a widely-accepted definition of STR is yet to be established. Previous studies majorly focus on relatively longer STRs, while shorter repeats were generally excluded. Herein, we have adopted a more generous criteria to define shorter repeats, which has led to the definition of a much larger number of STRs that lack prior analysis. Using this definition, we analyzed the short repeats in 55 randomly selected segments in 55 randomly selected genomic sequences from a fairly wide range of species covering animals, plants, fungi, protozoa, bacteria, archaea and viruses. RESULTS: Our analysis reveals a high percentage of short repeats in all 55 randomly selected segments, indicating that the universal presence of high-content short repeats could be a common characteristic of genomes across all biological kingdoms. Therefore, it is reasonable to assume a mechanism for continuous production of repeats that can make the replicating process relatively semi-conservative. We have proposed a folded replication slippage model that considers the geometric space of nucleotides and hydrogen bond stability to explain the mechanism more explicitly, with improving the existing straight-line slippage model. The folded slippage model can explain the expansion and contraction of mono- to hexa- nucleotide repeats with proper folding angles. Analysis of external forces in the folding template strands also suggests that expansion exists more commonly than contraction in the short tandem repeats. CONCLUSION: The folded replication slippage model provides a reasonable explanation for the continuous occurrences of simple sequence repeats in genomes. This model also contributes to the explanation of STR-to-genome evolution and is an alternative model that complements semi-conservative replication.
Asunto(s)
Genoma , Repeticiones de Microsatélite , Animales , Genómica , Repeticiones de Microsatélite/genéticaRESUMEN
Microsatellites (SSRs) are ubiquitous in coding and non-coding regions of the Ebolavirus genomes. We synthetically analyzed the microsatellites in whole-genome and terminal regions of 219 Ebolavirus genomes from five species. The Ebolavirus sequences were observed with small intraspecies variations and large interspecific variations, especially in the terminal non-coding regions. Only five conserved microsatellites were detected in the complete genomes, and four of them which well base-paired to help forming conserved stem-loop structures mainly appeared in the terminal non-coding regions. These results suggest that the conserved microsatellites may be evolutionary selected to form conserved secondary structures in 5', 3' terminals of Ebolavirus genomes. It may help to understand the biological significance of microsatellites in Ebolavirus and also other virus genomes.
Asunto(s)
Secuencia Conservada , Ebolavirus/genética , Genoma Viral , Secuencias Invertidas Repetidas , Repeticiones de Microsatélite , ARN Viral/genética , Emparejamiento Base , Bases de Datos Genéticas , Ebolavirus/clasificación , Evolución Molecular , Conformación de Ácido Nucleico , ARN Viral/química , Selección Genética , Alineación de Secuencia , Homología de Secuencia de Ácido NucleicoRESUMEN
High-throughput reporter assays have been recently developed to directly and quantitatively assess enhancer activity for thousands of regulatory elements. However, there is still no database to collect these enhancers. We developed RAEdb, the first database to collect enhancers identified by high-throughput reporter assays. RAEdb includes 538 320 enhancers derived from eight studies, most of which were from six human cell lines. An activity score was assigned to each enhancer based on reporter assays. Based on these enhancers, 7658 epromoters (promoters with enhancer activity) were identified and stored in the database. RAEdb provides two ways of searches: the first is to search studies by species and cell line; the other is to search enhancers or epromoters by position, activity score, sequence and gene. RAEdb also provides a genome browser to query, visualize and compare enhancers. All data in RAEdb is freely available for download.
Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Elementos de Facilitación Genéticos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , HumanosRESUMEN
MOTIVATION: Receptor mediated entry is the first step for viral infection. However, the question of how viruses select receptors remains unanswered. RESULTS: Here, by manually curating a high-quality database of 268 pairs of mammalian virus-host receptor interaction, which included 128 unique viral species or sub-species and 119 virus receptors, we found the viral receptors are structurally and functionally diverse, yet they had several common features when compared to other cell membrane proteins: more protein domains, higher level of N-glycosylation, higher ratio of self-interaction and more interaction partners, and higher expression in most tissues of the host. This study could deepen our understanding of virus-receptor interaction. AVAILABILITY AND IMPLEMENTATION: The database of mammalian virus-host receptor interaction is available at http://www.computationalbiology.cn: 5000/viralReceptor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Virosis , Animales , Glicosilación , Mamíferos , Proteínas de la Membrana , Internalización del Virus , VirusRESUMEN
Rapid determination of the antigenicity of influenza A virus could help identify the antigenic variants in time. Currently, there is a lack of computational models for predicting antigenic variants of some common hemagglutinin (HA) subtypes of influenza A viruses. By means of sequence analysis, we demonstrate here that multiple HA subtypes of influenza A virus undergo similar mutation patterns of HA1 protein (the immunogenic part of HA). Further analysis on the antigenic variation of influenza A virus H1N1, H3N2 and H5N1 showed that the amino acid residues' contribution to antigenic variation highly differed in these subtypes, while the regional bands, defined based on their distance to the top of HA1, played conserved roles in antigenic variation of these subtypes. Moreover, the computational models for predicting antigenic variants based on regional bands performed much better in the testing HA subtype than those did based on amino acid residues. Therefore, a universal computational model, named PREDAV-FluA, was built based on the regional bands to predict the antigenic variants for all HA subtypes of influenza A viruses. The model achieved an accuracy of 0.77 when tested with avian influenza H9N2 viruses. It may help for rapid identification of antigenic variants in influenza surveillance.
Asunto(s)
Variación Antigénica , Antígenos Virales/inmunología , Hemaglutininas Virales/inmunología , Virus de la Influenza A/inmunología , Animales , Antígenos Virales/química , Biología Computacional , Virus de la Influenza A/genética , Modelos Teóricos , Análisis de Secuencia de ProteínaRESUMEN
Microsatellites or simple sequence repeats (SSRs) are known to present ubiquitously in genomes of eukaryotes and prokaryotes, as well as viruses. A comprehensive analysis of microsatellites and compound microsatellites (CM) was performed for 67 T4-like bacteriophage genomes. We found that the number of repeats was generally proportional to the size of the genome. CM were more abundant in genic regions, while their relative abundance was higher in intergenic regions. Meanwhile, the number of CM rapidly decreased with the increase of complexity but gradually increased with higher dMAX (maximum distance between any two adjacent microsatellites). (A)n/(T)n, (AT)n/(TA)n and (AAG)n were the most abundant repeats of mono-, di- and trinucleotide microsatellites, respectively. The number of microsatellites in reference sequences was significantly lower than that in corresponding random sequences. This result was mainly attributed to mono- and dinucleotide repeats which hardly exceeded 6bp in T4-like viruses. These observations may be helpful to understand the distribution of microsatellites and viral genetic diversity in T4-like viruses.
Asunto(s)
Bacteriófago T4/genética , Repeticiones de Microsatélite , ADN Viral/análisis , Variación Genética , Tamaño del GenomaRESUMEN
Simple sequence repeats (SSRs), or microsatellites, are special DNA/RNA sequences with repeated unit of 1-6 bp. The genomes of Herpesvirales have many repeating structures, which is an excellent system to study the evolution and roles of microsatellites and compound microsatellites in viruses. Therefore, 56 genomes of Herpesvirales were selected and the occurrence, composition and complexity of different repeats were investigated in the genomes. A total of 63,939 microsatellites and 5825 compound microsatellites were extracted from 56 genomes. It found that GC content has a significant strong correlation with both the counts of microsatellites (CM) and the counts of compound microsatellites (CCM). However, genome size has a moderate correlation only with CM and almost no correlation with CCM. The compound microsatellites occurring in genic regions are obviously more than that in intergenic regions. In general, the number of compound microsatellite decreases with the increase of complexity (C) (the count of individual microsatellites being part of a compound microsatellite) and the complexity hardly exceeds C=4. The vast majority of compound microsatellites exist in intergenic regions, when C≥10. The distributions of SSRs tend to be organism-specific rather than host-specific in herpesvirus genomes. The diversity of microsatellites and compound microsatellites may be helpful for a better understanding of the viral genetic diversity, genotyping, and evolutionary biology in herpesviruses genomes.
Asunto(s)
Genoma Viral , Herpesviridae/genética , Repeticiones de Microsatélite , Composición de Base , ADN Intergénico , Tamaño del Genoma , Interacciones Huésped-Patógeno/genéticaRESUMEN
Mononucleotide repeats (MNRs) have been systematically investigated in the genomes of eukaryotic and prokaryotic organisms. However, detailed information on the distribution of MNRs in viral genomes is limited. In this study, we examined the distributions of MNRs in 256 fully sequenced virus genomes which showed extensive variations across viral genomes, and is significantly influenced by both genome size and CG content. Furthermore, the ratio of the observed to the expected number of MNRs (O/E ratio) appears to be influenced by both the host range and genome type of a particular virus. Additionally, the densities and frequencies of MNRs in genic regions are lower than in non-coding regions, suggesting that selective pressure acts on viral genomes. We also discuss the potential functional roles that these MNR loci could play in virus genomes. To our knowledge, this is the first analysis focusing on MNRs in viruses, and our study could have potential implications for a deeper understanding of virus genome stability and the co-evolution that occurs between a virus and its host.
Asunto(s)
ADN Viral/genética , Genoma Viral/genética , Repeticiones de Microsatélite/genética , Virus/genética , Animales , Composición de Base/genética , Secuencia de Bases , Mapeo Cromosómico , ADN Viral/análisis , Variación Genética , Tamaño del Genoma , Especificidad del Huésped/genética , HumanosRESUMEN
Extensive simple sequence repeat (SSR) surveys have been performed for eukaryotic prokaryotic and viral genomes, but information regarding SSRs in viroids is limited. We undertook a survey to examine the presence of SSRs in viroid genomes. Our results show that the distribution of SSRs in viroids may influence secondary structure, and that SSRs could play a role in generating genetic diversity. We also discuss the potential evolutionary role of repeated sequences in the viroid genome. This is the first report of SSR loci in viroids, and our study could be helpful in understanding the structure and evolution of viroid genomes.
RESUMEN
BACKGROUND: Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. RESULTS: In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome < 100 kb, genomes containing penta- and hexa- SSRs are not more than 50%. Principal components analysis (PCA) indicated that dinucleotide repeat affects the differences of SSRs most strongly among virus genomes. Results showed that SSRs tend to accumulate in larger virus genomes; and the longer genome sequence, the longer repeat units. CONCLUSIONS: We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.
Asunto(s)
Tamaño del Genoma , Genoma Viral , Repeticiones de Microsatélite/genética , Virus/genética , Secuencia de Bases , Evolución Molecular , Análisis de Componente PrincipalRESUMEN
Compound microsatellites consist of two or more individual microsatellites, and may originate from dynamic mutations or imperfection of microsatellites. Previous studies have found microsatellites were present in 81 completed Human Immunodeficiency Virus Type 1 (HIV-1) genomes, suggesting compound microsatellites may exist in viral genomes. However, up to now, compound microsatellites have not been analyzed in any viral genomes. We identified and characterized 238 compound microsatellites in 81 completed HIV-1 genomes. About 0-24.24% of all microsatellites could be categorized as compound microsatellites. Compound microsatellite distribution is very different in two aspects between diverse HIV-1 genomes. First, the number and motifs of compound microsatellites are variable between surveyed genomes. Second, the relative abundance and relative density of compound microsatellites exhibit very significant differences between these surveyed genomes, respectively. The relative abundance and relative density of compound microsatellites were weakly correlated with genome size and microsatellite density. We observed a more dynamic picture of compound microsatellites than previously reported in eukaryotes. This might be attributed to the lack of proofreading in HIV-1 genomes, as it has been demonstrated that the loss of polymerase proofreading activity can greatly enhance the mutation rate of microsatellites.
Asunto(s)
Genoma Viral , Infecciones por VIH/virología , VIH-1/genética , Repeticiones de Microsatélite , Secuencia de Bases , Mapeo Cromosómico , Genotipo , Humanos , Análisis de Secuencia de ADNRESUMEN
The presence, locations and composition of simple sequence repeats (SSRs) in Herpes simplex virus type 1 (HSV-1) genome were extracted and analyzed by using the software Imperfect Microsatellite Extractor (IMEx). There were 663 mon-, 502 di-, 184 tri-, 20 tetra-, 4 penta- and 4 hexanucleotide SSRs that were observed in different distribution between coding and noncoding regions in the HSV-1 genome. G/C, GC/CG, and (GGC)(n) were predominant in mononucleotide, dinucletide, trinucleotide repeats respectively. Indeed, the results showed that GC content in simple sequence repeats was notably higher than that in entire HSV-1 genome. Our data might be helpful for studying the pathogenesis, genome structure and evolution of HSV-1.
Asunto(s)
ADN Viral/genética , Genoma Viral/genética , Herpesvirus Humano 1/genética , Repeticiones de Microsatélite/genética , Animales , Composición de Base/genética , Composición de Base/fisiología , Secuencia de Bases , Mapeo Cromosómico , ADN Viral/análisis , HumanosRESUMEN
Previous works have demonstrated that ligninolytic enzymes mediated effective degradation of lignin wastes. The degrading ability greatly relied on the interactions of ligninolytic enzymes with lignin. Ligninolytic enzymes mainly contain laccase (Lac), lignin peroxidase (LiP) and manganese peroxidase (MnP). In the present study, the binding modes of lignin to Lac, LiP and MnP were systematically determined, respectively. Robustness of these modes was further verified by molecular dynamics (MD) simulations. Residues GLU460, PRO346 and SER113 in Lac, residues ARG43, ALA180 and ASP183 in LiP and residues ARG42, HIS173 and ARG177 in MnP were most crucial in binding of lignin, respectively. Interactional analyses showed hydrophobic contacts were most abundant, playing an important role in the determination of substrate specificity. This information is an important contribution to the details of enzyme-catalyzed reactions in the process of lignin biodegradation, which can be used as references for designing enzyme mutants with a better lignin-degrading activity.