RESUMO
The genomes of higher plants and animals are highly differentiated, and are composed of a relatively small number of genes and a large fraction of repetitive DNA. The bulk of this repetitive DNA constitutes transposable, and especially retrotransposable, elements. It has been hypothesized that most of these elements are heavily methylated relative to genes, but the evidence for this is controversial. We show here that repeat sequences in maize are largely excluded from genomic shotgun libraries by the selection of an appropriate host strain because of their sensitivity to bacterial restriction-modification systems. In contrast, unmethylated genic regions are preserved in these genetically filtered libraries if the insert size is less than the average size of genes. The representation of unique maize sequences not found in plant reference genomes is also greatly enriched. This demonstrates that repeats, and not genes, are the primary targets of methylation in maize. The use of restrictive libraries in genome shotgun sequencing in plant genomes should allow significant representation of genes, reducing the number of reactions required.
Assuntos
Clonagem Molecular/métodos , Metilação de DNA , Genes de Plantas/genética , Genoma de Planta , Retroelementos/genética , Zea mays/genética , Enzimas de Restrição do DNA/metabolismo , Escherichia coli/genética , Biblioteca Genômica , Dados de Sequência Molecular , Hibridização de Ácido Nucleico , Análise de Sequência de DNA/métodos , Homologia de Sequência do Ácido NucleicoRESUMO
Nine different regions totaling 9.7 Mb of the 4.02 Gb Aegilops tauschii genome were sequenced using the Sanger sequencing technology and compared with orthologous Brachypodium distachyon, Oryza sativa (rice), and Sorghum bicolor (sorghum) genomic sequences. The ancestral gene content in these regions was inferred and used to estimate gene deletion and gene duplication rates along each branch of the phylogenetic tree relating the four species. The total gene number in the extant Ae. tauschii genome was estimated to be 36,371. The gene deletion and gene duplication rates and total gene numbers in the four genomes were used to estimate the total gene number in each node of the phylogenetic tree. The common ancestor of the Brachypodieae and Triticeae lineages was estimated to have had 28,558 genes, and the common ancestor of the Panicoideae, Ehrhartoideae, and Pooideae subfamilies was estimated to have had 27,152 or 28,350 genes, depending on the ancestral gene scenario. Relative to the Brachypodieae and Triticeae common ancestor, the gene number was reduced in B. distachyon by 3,026 genes and increased in Ae. tauschii by 7,813 genes. The sum of gene deletion and gene duplication rates, which reflects the rate of gene synteny loss, was correlated with the rate of structural chromosome rearrangements and was highest in the Ae. tauschii lineage and lowest in the rice lineage. The high rate of gene space evolution in the Ae. tauschii lineage accounts for the fact that, contrary to the expectations, the level of synteny between the phylogenetically more related Ae. tauschii and B. distachyon genomes is similar to the level of synteny between the Ae. tauschii genome and the genomes of the less related rice and sorghum. The ratio of gene duplication to gene deletion rates in these four grass species closely parallels both the total number of genes in a species and the overall genome size. Because the overall genome size is to a large extent a function of the repeated sequence content in a genome, we suggest that the amount and activity of repeated sequences are important factors determining the number of genes in a genome.
Assuntos
Genoma de Planta , Primulaceae , Análise de Sequência de DNA/métodos , Sequências de Repetição em Tandem , Brachypodium/genética , Evolução Molecular , Deleção de Genes , Duplicação Gênica , Oryza/genética , Primulaceae/genética , Sorghum/genéticaRESUMO
Transcription factors containing the Myb-homologous DNA-binding domain are widely found in eukaryotes. In plants, R2R3 Myb-domain proteins are involved in the control of form and metabolism. The Arabidopsis genome harbors >100 R2R3 Myb genes, but few have been found in monocots, animals, and fungi. Using RT-PCR from different maize organs, we cloned 480 fragments corresponding to a 42-44 residue-long sequence spanning the region between the conserved DNA-recognition helices (Myb(BRH)) of R2R3 Myb domains. We determined that maize expresses >80 different R2R3 Myb genes, and evolutionary distances among maize Myb(BRH) sequences indicate that most of the amplification of the R2R3 Myb gene family occurred after the origin of land plants but prior to the separation of monocots and dicots. In addition, evidence is provided for the very recent duplication of particular classes of R2R3 Myb genes in the grasses. Together, these findings render a novel line of evidence for the amplification of the R2R3 Myb gene family in the early history of land plants and suggest that maize provides a possible model system to examine the hypothesis that the expansion of Myb genes is associated with the regulation of novel plant cellular functions.
Assuntos
Proteínas de Ligação a DNA/genética , Evolução Molecular , Amplificação de Genes/genética , Genes de Plantas/genética , Proteínas de Plantas/genética , Proteínas Proto-Oncogênicas c-myb , Zea mays/genética , Sequência de Aminoácidos , Animais , Proteínas de Arabidopsis , Clonagem Molecular , Códon/genética , Proteínas de Ligação a DNA/química , Duplicação Gênica , Genes de Plantas/fisiologia , Variação Genética/genética , Humanos , Dados de Sequência Molecular , Família Multigênica/genética , Mutação/genética , Filogenia , Proteínas de Plantas/química , Estruturas Vegetais/genética , Alinhamento de SequênciaRESUMO
We have developed a new strategy designated SIMF (Systematic Insertional Mutagenesis of Families), to identify DNA insertions in many members of a gene family simultaneously. This method requires only a short amino acid sequence conserved in all members of the family to make a degenerate oligonucleotide, and a sequence from the end of the DNA insertion. The SIMF strategy was successfully applied to the large maize R2R3 Myb family of regulatory genes, and Mutator insertions in several novel Myb genes were identified. Application of this technique to identify insertions in other large gene families could significantly decrease the effort involved in screening at the same time for insertions in all members of groups of genes that share a limited sequence identity.