RESUMO
BACKGROUND: At the end of the Pliocene and the beginning of Pleistocene glaciation and deglaciation cycles Ginkgo biloba went extinct all over the world, and only few populations remained in China in relict areas serving as sanctuary for Tertiary relict trees. Yet the status of these regions as refuge areas with naturally existing populations has been proven not earlier than one decade ago. Herein we elaborated the hypothesis that during the Pleistocene cooling periods G. biloba expanded its distribution range in China repeatedly. Whole plastid genomes were sequenced, assembled and annotated, and sequence data was analyzed in a phylogenetic framework of the entire gymnosperms to establish a robust spatio-temporal framework for gymnosperms and in particular for G. biloba Pleistocene evolutionary history. RESULTS: Using a phylogenetic approach, we identified that Ginkgoatae stem group age is about 325 million years, whereas crown group radiation of extant Ginkgo started not earlier than 390,000 years ago. During repeated warming phases, Gingko populations were separated and isolated by contraction of distribution range and retreated into mountainous regions serving as refuge for warm-temperate deciduous forests. Diversification and phylogenetic splits correlate with the onset of cooling phases when Ginkgo expanded its distribution range and gene pools merged. CONCLUSIONS: Analysis of whole plastid genome sequence data representing the entire spatio-temporal genetic variation of wild extant Ginkgo populations revealed the deepest temporal footprint dating back to approximately 390,000 years ago. Present-day directional West-East admixture of genetic diversity is shown to be the result of pronounced effects of the last cooling period. Our evolutionary framework will serve as a conceptual roadmap for forthcoming genomic sequence data, which can then provide deep insights into the demographic history of Ginkgo.
Assuntos
Evolução Biológica , Variação Genética , Genética Populacional , Genomas de Plastídeos , Ginkgo biloba/genética , Código de Barras de DNA Taxonômico , Ecossistema , Filogenia , Análise de Sequência de DNARESUMO
BACKGROUND: Climatic and edaphic conditions over geological timescales have generated enormous diversity of adaptive traits and high speciation within the genus Eucalyptus (L. Hér.). Eucalypt species occur from high rainfall to semi-arid zones and from the tropics to latitudes as high as 43°S. Despite several morphological and metabolomic characterizations, little is known regarding gene expression differences that underpin differences in tolerance to environmental change. Using species of contrasting taxonomy, morphology and physiology (E. globulus and E. cladocalyx), this study combines physiological characterizations with 'second-generation' sequencing to identify key genes involved in eucalypt responses to medium-term water limitation. RESULTS: One hundred twenty Million high-quality HiSeq reads were created from 14 tissue samples in plants that had been successfully subjected to a water deficit treatment or a well-watered control. Alignment to the E. grandis genome saw 23,623 genes of which 468 exhibited differential expression (FDR < 0.01) in one or both ecotypes in response to the treatment. Further analysis identified 80 genes that demonstrated a significant species-specific response of which 74 were linked to the 'dry' species E. cladocalyx where 23 of these genes were uncharacterised. The majority (approximately 80%) of these differentially expressed genes, were expressed in stem tissue. Key genes that differentiated species responses were linked to photoprotection/redox balance, phytohormone/signalling, primary photosynthesis/cellular metabolism and secondary metabolism based on plant metabolic pathway network analysis. CONCLUSION: These results highlight a more definitive response to water deficit by a 'dry' climate eucalypt, particularly in stem tissue, identifying key pathways and associated genes that are responsible for the differences between 'wet' and 'dry' climate eucalypts. This knowledge provides the opportunity to further investigate and understand the mechanisms and genetic variation linked to this important environmental response that will assist with genomic efforts in managing native populations as well as in tree improvement programs under future climate scenarios.
Assuntos
Secas , Eucalyptus/genética , Regulação da Expressão Gênica de Plantas , Estresse Fisiológico/genética , Transcriptoma , Biologia Computacional/métodos , Ecótipo , Eucalyptus/metabolismo , Perfilação da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Folhas de Planta , Transdução de SinaisRESUMO
BACKGROUND: A positive relationship between genome size and intron length is observed across eukaryotes including Angiosperms plants, indicating a co-evolution of genome size and gene structure. Conifers have very large genomes and longer introns on average than most plants, but impacts of their large genome and longer introns on gene structure has not be described. RESULTS: Gene structure was analyzed for 35 genes of Picea glauca obtained from BAC sequencing and genome assembly, including comparisons with A. thaliana, P. trichocarpa and Z. mays. We aimed to develop an understanding of impact of long introns on the structure of individual genes. The number and length of exons was well conserved among the species compared but on average, P. glauca introns were longer and genes had four times more intronic sequence than Arabidopsis, and 2 times more than poplar and maize. However, pairwise comparisons of individual genes gave variable results and not all contrasts were statistically significant. Genes generally accumulated one or a few longer introns in species with larger genomes but the position of long introns was variable between plant lineages. In P. glauca, highly expressed genes generally had more intronic sequence than tissue preferential genes. Comparisons with the Pinus taeda BACs and genome scaffolds showed a high conservation for position of long introns and for sequence of short introns. A survey of 1836 P. glauca genes obtained by sequence capture mostly containing introns <1 Kbp showed that repeated sequences were 10× more abundant in introns than in exons. CONCLUSION: Conifers have large amounts of intronic sequence per gene for seed plants due to the presence of few long introns and repetitive element sequences are ubiquitous in their introns. Results indicate a complex landscape of intron sizes and distribution across taxa and between genes with different expression profiles.
Assuntos
Genes de Plantas , Íntrons/genética , Picea/genética , Sequência de Bases , Bases de Dados Genéticas , Evolução Molecular , Éxons/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Tamanho do Genoma , Pinus/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Sequências Repetitivas de Ácido Nucleico/genética , Homologia de Sequência do Ácido NucleicoRESUMO
We present a phylogenetic analysis and comparison of structural features of chloroplast genomes for 39 species of the eucalypt group (genera Eucalyptus, Corymbia, Angophora, and outgroups Allosyncarpia and Stockwellia). We use 41 complete chloroplast genome sequences, adding 39 finished-quality chloroplast genomes to two previously published genomes. Maximum parsimony and Bayesian analyses, based on >7000 variable nucleotide positions, produced one fully resolved phylogenetic tree (35 supported nodes, 27 with 100% bootstrap support). Eucalyptus and its sister lineage Angophora+Corymbia show a deep divergence. Within Eucalyptus, three lineages are resolved: the 'eudesmid', 'symphyomyrt' and 'monocalypt' groups. Corymbia is paraphyletic with respect to Angophora. Gene content and order do not vary among eucalypt chloroplasts; length mutations, especially frame shifts, are uncommon in protein-coding genes. Some non-synonymous mutations are highly incongruent with the overall phylogenetic signal, notably in rbcL, and may be adaptive. Application of custom informatics pipelines (GYDLE Inc.) enabled direct chloroplast genome assembly, resolving each genome to finished-quality with no need for PCR gap-filling or contig order resolution. Analysis of whole chloroplast genomes resolved major eucalypt clades and revealed variable regions of the genome that will be useful in lower-level genetic studies (including phylogeography and geneflow).
Assuntos
Genoma de Cloroplastos , Genoma de Planta , Myrtaceae/classificação , Filogenia , Teorema de Bayes , Hibridização Genômica Comparativa , DNA de Plantas/genética , Eucalyptus/genética , Mutação da Fase de Leitura , Variação Genética , Myrtaceae/genética , Ribulose-Bifosfato Carboxilase/genética , Análise de Sequência de DNARESUMO
BACKGROUND: Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. RESULTS: To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. CONCLUSIONS: Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants.
Assuntos
Mapeamento Cromossômico , Embaralhamento de DNA , Evolução Molecular , Genoma de Planta/genética , Filogenia , Picea/genética , Cromossomos de Plantas/genética , Extinção Biológica , Duplicação Gênica/genética , Regulação da Expressão Gênica de Plantas , Genes de Plantas/genética , Ligação Genética , Metiltransferases/genética , Anotação de Sequência Molecular , Família Multigênica/genética , Picea/enzimologia , Pinus/genéticaRESUMO
BACKGROUND: Conifers have very large genomes (13 to 30 Gigabases) that are mostly uncharacterized although extensive cDNA resources have recently become available. This report presents a global overview of transcriptome variation in a conifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues. RESULTS: An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents 23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison of secondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highly conserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robust validation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed from over 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on the microarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues than genes of known or predicted function. Diversity of expression within gene families may be rapidly assessed from PiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA) osmotic regulation proteins occur in large gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family, while diverse patterns suggested a greater degree of diversification among LEAs. CONCLUSION: Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource of this kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate genetic studies in the large and important group comprised of conifer trees.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Picea/genética , Proteínas de Plantas/genética , Xilema/genética , Transporte Biológico , DNA Complementar/genética , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Variação Genética , Tamanho do Genoma , Família Multigênica , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Filogenia , Proteínas de Plantas/classificação , Análise de Sequência de RNA , Água/metabolismoRESUMO
BACKGROUND: The domestic pig is an important livestock species and there is strong interest in the factors that affect the development of viable embryos and offspring in this species. A limited understanding of the molecular mechanisms involved in early embryonic development has inhibited our ability to fully elucidate these factors. Next generation deep sequencing and microarray technologies are powerful tools for delineation of molecular pathways involved in the developing embryo. RESULTS: Here we present the development of a porcine-embryo-specific microarray platform created from a large expressed sequence tag (EST) analysis generated by Roche/454 next-generation sequencing of cDNAs constructed from critical stages of in vivo or in vitro porcine preimplantation embryos. Two cDNA libraries constructed from in vitro and in vivo produced preimplantation porcine embryos were normalized and sequenced using 454 Titanium pyrosequencing technology. Over one million high-quality EST sequences were obtained and used to develop the EMbryogene Porcine Version 1 (EMPV1) microarray composed of 43,795 probes. Based on an initial probe sequence annotation, the EMPV1 features 17,409 protein-coding, 473 pseudogenes, 46 retrotransposed, 2,359 non-coding RNA, 4,121 splice variants in 2,862 genes and a total of 12,324 Novel Transcript Regions (NTR). After re-annotation, the total unique genes increased from 11,961 to 16,281 and 1.9% of them belonged to a large olfactory receptor (OR) gene family. Quality control on the EMPV1 was performed and revealed an even distribution of ten clusters of spiked-in control spots and array to array (dye-swap) correlation was 0.97. CONCLUSIONS: Using next-generation deep sequencing we have produced a large EST dataset to allow for the selection of probe sequences for the development of the EMPV1 microarray platform. The quality of this embryo-specific array was confirmed with a high-level of reproducibility using current Agilent microarray technology. With more than an estimated 20,000 unique genes represented on the EMPV1, this platform will provide the foundation for future research into the in vivo and in vitro factors that affect the viability of porcine embryos, as well as the effects of these factors on the live offspring that result from these embryos.
Assuntos
Embrião de Mamíferos/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Animais , SuínosRESUMO
Several angiosperm plant genomes, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), and grapevine (Vitis vinifera), have been sequenced, but the lack of reference genomes in gymnosperm phyla reduces our understanding of plant evolution and restricts the potential impacts of genomics research. A gene catalog was developed for the conifer tree Picea glauca (white spruce) through large-scale expressed sequence tag sequencing and full-length cDNA sequencing to facilitate genome characterizations, comparative genomics, and gene mapping. The resource incorporates new and publicly available sequences into 27,720 cDNA clusters, 23,589 of which are represented by full-length insert cDNAs. Expressed sequence tags, mate-pair cDNA clone analysis, and custom sequencing were integrated through an iterative process to improve the accuracy of clustering outcomes. The entire catalog spans 30 Mb of unique transcribed sequence. We estimated that the P. glauca nuclear genome contains up to 32,520 transcribed genes owing to incomplete, partially sequenced, and unsampled transcripts and that its transcriptome could span up to 47 Mb. These estimates are in the same range as the Arabidopsis and rice transcriptomes. Next-generation methods confirmed and enhanced the catalog by providing deeper coverage for rare transcripts, by extending many incomplete clusters, and by augmenting the overall transcriptome coverage to 38 Mb of unique sequence. Genomic sample sequencing at 8.5% of the 19.8-Gb P. glauca genome identified 1,495 clusters representing highly repeated sequences among the cDNA clusters. With a conifer transcriptome in full view, functional and protein domain annotations clearly highlighted the divergences between conifers and angiosperms, likely reflecting their respective evolutionary paths.
Assuntos
Genoma de Planta , Traqueófitas/genética , DNA Complementar/genética , Evolução Molecular , Etiquetas de Sequências Expressas , Família Multigênica , RNA Mensageiro/genéticaRESUMO
While most assisted reproductive technologies (ART) are considered routine for the reproduction of species of economical importance, such as the bovine, the impact of these manipulations on the developing embryo remains largely unknown. In an effort to obtain a comprehensive survey of the bovine embryo transcriptome and how it is modified by ART, resources were combined to design an embryo-specific microarray. Close to one million high-quality reads were produced from subtracted bovine embryo libraries using Roche 454 Titanium deep sequencing technology, which enabled the creation of an augmented bovine genome catalog. This catalog was enriched with bovine embryo transcripts, and included newly discovered indel type and 3'UTR variants. Using this augmented bovine genome catalog, the EmbryoGENE Bovine Microarray was designed and is composed of a total of 42,242 probes, including 21,139 known reference genes; 9,322 probes for novel transcribed regions (NTRs); 3,677 alternatively spliced exons; 3,353 3'-tiling probes; and 3,723 controls. A suite of bioinformatics tools was also developed to facilitate microrarray data analysis and database creation; it includes a quality control module, a Laboratory Information Management System (LIMS) and microarray analysis software. Results obtained during this study have already led to the identification of differentially expressed blastocyst targets, NTRs, splice variants of the indel type, and 3'UTR variants. We were able to confirm microarray results by real-time PCR, indicating that the EmbryoGENE bovine microarray has the power to detect physiologically relevant changes in gene expression.
Assuntos
Bovinos/embriologia , Bovinos/genética , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transcriptoma/fisiologia , Animais , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Embrião de Mamíferos , Feminino , Perfilação da Expressão Gênica/normas , Células da Granulosa/metabolismo , Reação em Cadeia da Polimerase em Tempo Real , Reprodutibilidade dos Testes , Técnicas de Reprodução Assistida , Interface Usuário-ComputadorRESUMO
Angiosperms have become the dominant terrestrial plant group by diversifying for ~145 million years into a broad range of environments. During the course of evolution, numerous morphological innovations arose, often preceded by whole genome duplications (WGD). The mustard family (Brassicaceae), a successful angiosperm clade with ~4000 species, has been diversifying into many evolutionary lineages for more than 30 million years. Here we develop a species inventory, analyze morphological variation, and present a maternal, plastome-based genus-level phylogeny. We show that increased morphological disparity, despite an apparent absence of clade-specific morphological innovations, is found in tribes with WGDs or diversification rate shifts. Both are important processes in Brassicaceae, resulting in an overall high net diversification rate. Character states show frequent and independent gain and loss, and form varying combinations. Therefore, Brassicaceae pave the way to concepts of phylogenetic genome-wide association studies to analyze the evolution of morphological form and function.
Assuntos
Evolução Biológica , Brassicaceae/classificação , Brassicaceae/genética , Evolução Molecular , Genoma de Planta/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , FilogeniaRESUMO
The NLRs or NBS-LRRs (nucleotide-binding, leucine-rich-repeat) form the largest resistance gene family in plants, with lineage-specific contingents of TNL, CNL and RNL subfamilies and a central role in resilience to stress. The origin, evolution and distribution of NLR sequences has been unclear owing in part to the variable size and diversity of the RNL subfamily and a lack of data in Gymnosperms. We developed, searched and annotated transcriptomes assemblies of seven conifers and identified a resource of 3816 expressed NLR sequences. Our analyses encompassed sequences data spanning the major groups of land plants and determinations of NLR transcripts levels in response to drought in white spruce. We showed that conifers have among the most diverse and numerous RNLs in tested land plants. We report an evolutionary swap in the formation of RNLs, which emerged from the fusion of an RPW8 domain to a NB-ARC domain of CNL. We uncovered a quantitative relationship between RNLs and TNLs across all land plants investigated, with an average ratio of 1:10. The conifer RNL repertoire harbours four distinct groups, with two that differ from Angiosperms, one of which contained several upregulated sequences in response to drought while the majority of responsive NLRs are downregulated.
Assuntos
Secas , Genes de Plantas , Proteínas NLR/genética , Proteínas de Plantas/genética , Traqueófitas/genética , Adaptação Fisiológica/genética , Sequência de Aminoácidos , Evolução Molecular , Proteínas NLR/química , Proteínas de Plantas/química , Traqueófitas/fisiologia , TranscriptomaRESUMO
BACKGROUND: Leishmania parasites cause a diverse spectrum of diseases in humans ranging from spontaneously healing skin lesions (e.g., L. major) to life-threatening visceral diseases (e.g., L. infantum). The high conservation in gene content and genome organization between Leishmania major and Leishmania infantum contrasts their distinct pathophysiologies, suggesting that highly regulated hierarchical and temporal changes in gene expression may be involved. RESULTS: We used a multispecies DNA oligonucleotide microarray to compare whole-genome expression patterns of promastigote (sandfly vector) and amastigote (mammalian macrophages) developmental stages between L. major and L. infantum. Seven per cent of the total L. infantum genome and 9.3% of the L. major genome were differentially expressed at the RNA level throughout development. The main variations were found in genes involved in metabolism, cellular organization and biogenesis, transport and genes encoding unknown function. Remarkably, this comparative global interspecies analysis demonstrated that only 10-12% of the differentially expressed genes were common to L. major and L. infantum. Differentially expressed genes are randomly distributed across chromosomes further supporting a posttranscriptional control, which is likely to involve a variety of 3'UTR elements. CONCLUSION: This study highlighted substantial differences in gene expression patterns between L. major and L. infantum. These important species-specific differences in stage-regulated gene expression may contribute to the disease tropism that distinguishes L. major from L. infantum.
Assuntos
Perfilação da Expressão Gênica , Genoma de Protozoário , Leishmania infantum/crescimento & desenvolvimento , Leishmania infantum/genética , Leishmania major/crescimento & desenvolvimento , Leishmania major/genética , Estágios do Ciclo de Vida , Regiões 3' não Traduzidas/genética , Animais , Linhagem Celular , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Camundongos , Camundongos Endogâmicos A , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/isolamento & purificação , RNA de Protozoário/isolamento & purificação , Retroelementos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Especificidade da EspécieRESUMO
Presently, there are numerous bioinformatics databases available on different websites. Although RDF was proposed as a standard format for the web, these databases are still available in various formats. With the increasing popularity of the semantic web technologies and the ever growing number of databases in bioinformatics, there is a pressing need to develop mashup systems to help the process of bioinformatics knowledge integration. Bio2RDF is such a system, built from rdfizer programs written in JSP, the Sesame open source triplestore technology and an OWL ontology. With Bio2RDF, documents from public bioinformatics databases such as Kegg, PDB, MGI, HGNC and several of NCBI's databases can now be made available in RDF format through a unique URL in the form of http://bio2rdf.org/namespace:id. The Bio2RDF project has successfully applied the semantic web technology to publicly available databases by creating a knowledge space of RDF documents linked together with normalized URIs and sharing a common ontology. Bio2RDF is based on a three-step approach to build mashups of bioinformatics data. The present article details this new approach and illustrates the building of a mashup used to explore the implication of four transcription factor genes in Parkinson's disease. The Bio2RDF repository can be queried at http://bio2rdf.org.
Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Linguagens de Programação , Animais , Humanos , Disseminação de Informação/métodos , Internet/estatística & dados numéricos , Doença de Parkinson/genética , Semântica , Integração de Sistemas , Terminologia como Assunto , Fatores de Transcrição/análise , Fatores de Transcrição/metabolismo , Vocabulário ControladoRESUMO
Temperatures are expected to increase over the next century in all terrestrial biomes and particularly in boreal forests, where drought-induced mortality has been predicted to rise. Genomics research is helping to develop hypotheses regarding the molecular basis of drought tolerance and recent work proposed that the osmo-protecting dehydrin proteins have undergone a clade-specific expansion in the Pinaceae, a major group of conifer trees. The objectives of this study were to identify all of the putative members of the gene family, trace their evolutionary origin, examine their structural diversity and test for drought-responsive expression. We identified 41 complete dehydrin coding sequences in Picea glauca, which is four times more than most angiosperms studied to date, and more than in pines. Phylogenetic reconstructions indicated that the family has undergone an expansion in conifers, with parallel evolution implicating the sporadic resurgence of certain amino acid sequence motifs, and a major duplication giving rise to a clade specific to the Pinaceae. A variety of plant dehydrin structures were identified with variable numbers of the A-, E-, S- and K-segments and an N-terminal (N1) amino acid motif including assemblages specific to conifers. The expression of several of the spruce dehydrins was tissue preferential under non-stressful conditions or responded to water stress after 7-18 days without watering, reflecting changes in osmotic potential. We found that dehydrins with N1 K2 and N1 AESK2 sequences were the most responsive to the lack of water. Together, the family expansion, drought-responsive expression and structural diversification involving loss and gain of amino acid motifs suggests that subfunctionalization has driven the diversification seen among dehydrin gene duplicates. Our findings clearly indicate that dehydrins represent a large family of candidate genes for drought tolerance in spruces and in other Pinaceae that may underpin adaptability in spatially and temporally variable environments.
Assuntos
Evolução Molecular , Regulação da Expressão Gênica de Plantas , Família Multigênica/genética , Pinaceae/fisiologia , Proteínas de Plantas/genética , Secas , Filogenia , Picea/genética , Picea/fisiologia , Pinaceae/genética , Proteínas de Plantas/metabolismo , Análise de Sequência de DNARESUMO
BACKGROUND: Numerous scaffold-level sequences for wheat are now being released and, in this context, we report on a strategy for improving the overall assembly to a level comparable to that of the human genome. RESULTS: Using chromosome 7A of wheat as a model, sequence-finished megabase-scale sections of this chromosome were established by combining a new independent assembly using a bacterial artificial chromosome (BAC)-based physical map, BAC pool paired-end sequencing, chromosome-arm-specific mate-pair sequencing and Bionano optical mapping with the International Wheat Genome Sequencing Consortium RefSeq v1.0 sequence and its underlying raw data. The combined assembly results in 18 super-scaffolds across the chromosome. The value of finished genome regions is demonstrated for two approximately 2.5 Mb regions associated with yield and the grain quality phenotype of fructan carbohydrate grain levels. In addition, the 50 Mb centromere region analysis incorporates cytological data highlighting the importance of non-sequence data in the assembly of this complex genome region. CONCLUSIONS: Sufficient genome sequence information is shown to now be available for the wheat community to produce sequence-finished releases of each chromosome of the reference genome. The high-level completion identified that an array of seven fructosyl transferase genes underpins grain quality and that yield attributes are affected by five F-box-only-protein-ubiquitin ligase domain and four root-specific lipid transfer domain genes. The completed sequence also includes the centromere.
Assuntos
Agricultura , Genoma de Planta , Fenômenos Ópticos , Mapeamento Físico do Cromossomo/métodos , Triticum/genética , Centrômero/metabolismo , Cromossomos Artificiais Bacterianos/genética , Cromossomos de Plantas/genética , Frutanos/análise , Sementes/genéticaRESUMO
BACKGROUND: Next-generation sequencing technologies provide new opportunities to identify the genetic components responsible for trait variation. However, in species with large polyploid genomes, such as bread wheat, the ability to rapidly identify genes underlying quantitative trait loci (QTL) remains non-trivial. To overcome this, we introduce a novel pipeline that analyses, by RNA-sequencing, multiple near-isogenic lines segregating for a targeted QTL. RESULTS: We use this approach to characterize a major and widely utilized seed dormancy QTL located on chromosome 4AL. It exploits the power and mapping resolution afforded by large multi-parent mapping populations, whilst reducing complexity by using multi-allelic contrasts at the targeted QTL region. Our approach identifies two adjacent candidate genes within the QTL region belonging to the ABA-induced Wheat Plasma Membrane 19 family. One of them, PM19-A1, is highly expressed during grain maturation in dormant genotypes. The second, PM19-A2, shows changes in sequence causing several amino acid alterations between dormant and non-dormant genotypes. We confirm that PM19 genes are positive regulators of seed dormancy. CONCLUSIONS: The efficient identification of these strong candidates demonstrates the utility of our transcriptomic pipeline for rapid QTL to gene mapping. By using this approach we are able to provide a comprehensive genetic analysis of the major source of grain dormancy in wheat. Further analysis across a diverse panel of bread and durum wheats indicates that this important dormancy QTL predates hexaploid wheat. The use of these genes by wheat breeders could assist in the elimination of pre-harvest sprouting in wheat.
Assuntos
Regulação da Expressão Gênica de Plantas , Dormência de Plantas/genética , Proteínas de Plantas/genética , Triticum/genética , Mapeamento Cromossômico , Cromossomos de Plantas/genética , Perfilação da Expressão Gênica , Inativação Gênica , Genótipo , Germinação , Família Multigênica , Poliploidia , Locos de Características Quantitativas , Análise de Sequência de RNA , Triticum/classificaçãoRESUMO
High-density SNP genotyping arrays can be designed for any species given sufficient sequence information of high quality. Two high-density SNP arrays relying on the Infinium iSelect technology (Illumina) were designed for use in the conifer white spruce (Picea glauca). One array contained 7338 segregating SNPs representative of 2814 genes of various molecular functional classes for main uses in genetic association and population genetics studies. The other one contained 9559 segregating SNPs representative of 9543 genes for main uses in population genetics, linkage mapping of the genome and genomic prediction. The SNPs assayed were discovered from various sources of gene resequencing data. SNPs predicted from high-quality sequences derived from genomic DNA reached a genotyping success rate of 64.7%. Nonsingleton in silico SNPs (i.e. a sequence polymorphism present in at least two reads) predicted from expressed sequenced tags obtained with the Roche 454 technology and Illumina GAII analyser resulted in a similar genotyping success rate of 71.6% when the deepest alignment was used and the most favourable SNP probe per gene was selected. A variable proportion of these SNPs was shared by other nordic and subtropical spruce species from North America and Europe. The number of shared SNPs was inversely proportional to phylogenetic divergence and standing genetic variation in the recipient species, but positively related to allele frequency in P. glauca natural populations. These validated SNP resources should open up new avenues for population genetics and comparative genetic mapping at a genomic scale in spruce species.
Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Picea/genética , Polimorfismo de Nucleotídeo Único , Genômica , Genótipo , Filogenia , Picea/classificaçãoRESUMO
Marker-assisted selection holds promise for highly influencing tree breeding, especially for wood traits, by considerably reducing breeding cycles and increasing selection accuracy. In this study, we used a candidate gene approach to test for associations between 944 single-nucleotide polymorphism markers from 549 candidate genes and 25 wood quality traits in white spruce. A mixed-linear model approach, including a weak but nonsignificant population structure, was implemented for each marker-trait combination. Relatedness among individuals was controlled using a kinship matrix estimated either from the known half-sib structure or from the markers. Both additive and dominance effect models were tested. Between 8 and 21 single-nucleotide polymorphisms (SNPs) were found to be significantly associated (P ≤ 0.01) with each of earlywood, latewood, or total wood traits. After controlling for multiple testing (Q ≤ 0.10), 13 SNPs were still significant across as many genes belonging to different families, each accounting for between 3 and 5% of the phenotypic variance in 10 wood characters. Transcript accumulation was determined for genes containing SNPs associated with these traits. Significantly different transcript levels (P ≤ 0.05) were found among the SNP genotypes of a 1-aminocyclopropane-1-carboxylate oxidase, a ß-tonoplast intrinsic protein, and a long-chain acyl-CoA synthetase 9. These results should contribute toward the development of efficient marker-assisted selection in an economically important tree species.
Assuntos
Regulação da Expressão Gênica de Plantas , Estudos de Associação Genética , Picea/genética , Característica Quantitativa Herdável , Madeira/genética , Análise por Conglomerados , Perfilação da Expressão Gênica , Genes de Plantas/genética , Genótipo , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , Dinâmica Populacional , RNA Mensageiro/genética , RNA Mensageiro/metabolismoRESUMO
BACKGROUND: Drug resistance can be complex, and several mutations responsible for it can co-exist in a resistant cell. Transcriptional profiling is ideally suited for studying complex resistance genotypes and has the potential to lead to novel discoveries. We generated full genome 70-mer oligonucleotide microarrays for all protein coding genes of the human protozoan parasites Leishmania major and Leishmania infantum. These arrays were used to monitor gene expression in methotrexate resistant parasites. RESULTS: Leishmania is a eukaryotic organism with minimal control at the level of transcription initiation and few genes were differentially expressed without concomitant changes in DNA copy number. One exception was found in Leishmania major, where the expression of whole chromosomes was down-regulated. The microarrays highlighted several mechanisms by which the copy number of genes involved in resistance was altered; these include gene deletion, formation of extrachromosomal circular or linear amplicons, and the presence of supernumerary chromosomes. In the case of gene deletion or gene amplification, the rearrangements have occurred at the sites of repeated (direct or inverted) sequences. These repeats appear highly conserved in both species to facilitate the amplification of key genes during environmental changes. When direct or inverted repeats are absent in the vicinity of a gene conferring a selective advantage, Leishmania will resort to supernumerary chromosomes to increase the levels of a gene product. CONCLUSION: Aneuploidy has been suggested as an important cause of drug resistance in several organisms and additional studies should reveal the potential importance of this phenomenon in drug resistance in Leishmania.
Assuntos
Resistência a Medicamentos/genética , Leishmania/efeitos dos fármacos , Leishmania/genética , Mutação , Proteínas de Protozoários/genética , Aneuploidia , Animais , Proteínas de Transporte de Ânions/genética , Amplificação de Genes , Deleção de Genes , Perfilação da Expressão Gênica , Genes de Protozoários , Leishmania infantum/efeitos dos fármacos , Leishmania infantum/genética , Leishmania infantum/metabolismo , Leishmania major/efeitos dos fármacos , Leishmania major/genética , Leishmania major/metabolismo , Metotrexato/farmacologia , Complexos Multienzimáticos/genética , Análise de Sequência com Séries de Oligonucleotídeos , Oxirredutases/genética , Tetra-Hidrofolato Desidrogenase/genética , Timidilato Sintase/genéticaRESUMO
We have developed a new microarray technology for quantitative gene-expression profiling on the basis of randomly assembled arrays of beads. Each bead carries a gene-specific probe sequence. There are multiple copies of each sequence-specific bead in an array, which contributes to measurement precision and reliability. We optimized the system for specific and sensitive analysis of mammalian RNA, and using RNA controls of defined concentration, obtained the following estimates of system performance: specificity of 1:250,000 in mammalian poly(A(+)) mRNA; limit of detection 0.13 pM; dynamic range 3.2 logs; and sufficient precision to detect 1.3-fold differences with 95% confidence within the dynamic range. Measurements of expression differences between human brain and liver were validated by concordance with quantitative real-time PCR (R(2) = 0.98 for log-transformed ratios, and slope of the best-fit line = 1.04, for 20 genes). Quantitative performance was further verified using a mouse B- and T-cell model system. We found published reports of B- or T-cell-specific expression for 42 of 59 genes that showed the greatest differential expression between B- and T-cells in our system. All of the literature observations were concordant with our results. Our experiments were carried out on a 96-array matrix system that requires only 100 ng of input RNA and uses standard microtiter plates to process samples in parallel. Our technology has advantages for analyzing multiple samples, is scalable to all known genes in a genome, and is flexible, allowing the use of standard or custom probes in an array.