RESUMO
Dynamic intron retention (IR) in vertebrate cells is of widespread biological importance. Aberrant IR is associated with numerous human diseases including several cancers. Despite consistent reports demonstrating that intrinsic sequence features can help introns evade splicing, conflicting findings about cell type- or condition-specific IR regulation by trans-regulatory and epigenetic mechanisms demand an unbiased and systematic analysis of IR in a controlled experimental setting. We integrated matched mRNA sequencing (mRNA-Seq), whole-genome bisulfite sequencing (WGBS), nucleosome occupancy methylome sequencing (NOMe-Seq) and chromatin immunoprecipitation sequencing (ChIP-Seq) data from primary human myeloid and lymphoid cells. Using these multi-omics data and machine learning, we trained two complementary models to determine the role of epigenetic factors in the regulation of IR in cells of the innate immune system. We show that increased chromatin accessibility, as revealed by nucleosome-free regions, contributes substantially to the retention of introns in a cell-specific manner. We also confirm that intrinsic characteristics of introns are key for them to evade splicing. This study suggests an important role for chromatin architecture in IR regulation. With an increasing appreciation that pathogenic alterations are linked to RNA processing, our findings may provide useful insights for the development of novel therapeutic approaches that target aberrant splicing.
Assuntos
Diferenciação Celular , Cromatina , Íntrons , Humanos , Cromatina/genética , Íntrons/genética , Nucleossomos/genética , RNA MensageiroRESUMO
Chromatin accessibility maps are important for the functional interpretation of the genome. Here, we systematically analysed assay specific differences between DNase I-seq, ATAC-seq and NOMe-seq in a side by side experimental and bioinformatic setup. We observe that most prominent nucleosome depleted regions (NDRs, e.g. in promoters) are roboustly called by all three or at least two assays. However, we also find a high proportion of assay specific NDRs that are often 'called' by only one of the assays. We show evidence that these assay specific NDRs are indeed genuine open chromatin sites and contribute important information for accurate gene expression prediction. While technically ATAC-seq and DNase I-seq provide a superb high NDR calling rate for relatively low sequencing costs in comparison to NOMe-seq, NOMe-seq singles out for its genome-wide coverage allowing to not only detect NDRs but also endogenous DNA methylation and as we show here genome wide segmentation into heterochromatic B domains and local phasing of nucleosomes outside of NDRs. In summary, our comparisons strongly suggest to consider assay specific differences for the experimental design and for generalized and comparative functional interpretations.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/métodos , Sequenciamento de Cromatina por Imunoprecipitação/normas , Células Hep G2 , Humanos , Nucleossomos/química , Nucleossomos/metabolismo , Regiões Promotoras GenéticasRESUMO
Across kingdoms, RNA interference (RNAi) has been shown to control gene expression at the transcriptional- or the post-transcriptional level. Here, we describe a mechanism which involves both aspects: truncated transgenes, which fail to produce intact mRNA, induce siRNA accumulation and silencing of homologous loci in trans in the ciliate Paramecium We show that silencing is achieved by co-transcriptional silencing, associated with repressive histone marks at the endogenous gene. This is accompanied by secondary siRNA accumulation, strictly limited to the open reading frame of the remote locus. Our data shows that in this mechanism, heterochromatic marks depend on a variety of RNAi components. These include RDR3 and PTIWI14 as well as a second set of components, which are also involved in post-transcriptional silencing: RDR2, PTIWI13, DCR1 and CID2. Our data indicates differential processing of nascent un-spliced and long, spliced transcripts thus suggesting a hitherto-unrecognized functional interaction between post-transcriptional and co-transcriptional RNAi. Both sets of RNAi components are required for efficient trans-acting RNAi at the chromatin level and our data indicates similar mechanisms contributing to genome wide regulation of gene expression by epigenetic mechanisms.
Assuntos
Heterocromatina/metabolismo , Paramecium/genética , Proteínas de Protozoários/genética , Interferência de RNA , RNA de Cadeia Dupla/genética , Transgenes , Montagem e Desmontagem da Cromatina , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Ontologia Genética , Heterocromatina/química , Anotação de Sequência Molecular , Paramecium/metabolismo , Plasmídeos/química , Plasmídeos/metabolismo , Polinucleotídeo Adenililtransferase/genética , Polinucleotídeo Adenililtransferase/metabolismo , Proteínas de Protozoários/antagonistas & inibidores , Proteínas de Protozoários/metabolismo , RNA de Cadeia Dupla/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Mapping-by-sequencing, as implemented in SHOREmap ('SHOREmapping'), is greatly accelerating the identification of causal mutations. The original SHOREmap approach based on resequencing of bulked segregants required a highly accurate and complete reference sequence. However, current whole-genome or transcriptome assemblies from next-generation sequencing data of non-model organisms do not produce chromosome-length scaffolds. We have therefore developed a method that exploits synteny with a related genome for genetic mapping. We first demonstrate how mapping-by-sequencing can be performed using a reduced number of markers, and how the associated decrease in the number of markers can be compensated for by enrichment of marker sequences. As proof of concept, we apply this method to Arabidopsis thaliana gene models ordered by synteny with the genome sequence of the distant relative Brassica rapa, whose genome has several large-scale rearrangements relative to A. thaliana. Our approach provides an alternative method for high-resolution genetic mapping in species that lack finished genome reference sequences or for which only RNA-seq assemblies are available. Finally, for improved identification of causal mutations by fine-mapping, we introduce a new likelihood ratio test statistic, transforming local allele frequency estimations into a confidence interval similar to conventional mapping intervals.
Assuntos
Arabidopsis/genética , Brassica rapa/genética , Mapeamento Cromossômico/métodos , Genoma de Planta/genética , Sintenia/genética , Proteínas de Arabidopsis , Análise Mutacional de DNA , DNA de Plantas/química , DNA de Plantas/genética , Flores/genética , Frequência do Gene , Biblioteca Gênica , Ligação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Domínio MADS , Mutação , Análise de Sequência de DNA/métodos , TranscriptomaRESUMO
Skeletal muscle has an enormous plastic potential to adapt to various external and internal perturbations. Although morphological changes in endurance-trained muscles are well described, the molecular underpinnings of training adaptation are poorly understood. We therefore aimed to elucidate the molecular signature of muscles of trained male mice and unravel the training status-dependent responses to an acute bout of exercise. Our results reveal that, even though at baseline an unexpectedly low number of genes define the trained muscle, training status substantially affects the transcriptional response to an acute challenge, both quantitatively and qualitatively, in part associated with epigenetic modifications. Finally, transiently activated factors such as the peroxisome proliferator-activated receptor-γ coactivator 1α are indispensable for normal training adaptation. Together, these results provide a molecular framework of the temporal and training status-dependent exercise response that underpins muscle plasticity in training.
Assuntos
Treino Aeróbico , Condicionamento Físico Animal , Humanos , Camundongos , Masculino , Animais , Músculo Esquelético/fisiologia , Condicionamento Físico Animal/fisiologiaRESUMO
The Solute Carriers (SLCs) are membrane proteins that regulate transport of many types of substances over the cell membrane. The SLCs are found in at least 46 gene families in the human genome. Here, we performed the first evolutionary analysis of the entire SLC family based on whole genome sequences. We systematically mined and analyzed the genomes of 17 species to identify SLC genes. In all, we identified 4,813 SLC sequences in these genomes, and we delineated the evolutionary history of each of the subgroups. Moreover, we also identified ten new human sequences not previously classified as SLCs, which most likely belong to the SLC family. We found that 43 of the 46 SLC families found in Homo sapiens were also found in Caenorhabditis elegans, whereas 42 of them were also found in insects. Mammals have a higher number of SLC genes in most families, perhaps reflecting important roles for these in central nervous system functions. This study provides a systematic analysis of the evolutionary history of the SLC families in Eukaryotes showing that the SLC superfamily is ancient with multiple branches that were present before early divergence of Bilateria. The results provide foundation for overall classification of SLC genes and are valuable for annotation and prediction of substrates for the many SLCs that have not been tested in experimental transport assays.
Assuntos
Evolução Biológica , Evolução Molecular , Proteínas de Membrana Transportadoras/genética , Animais , Análise por Conglomerados , Bases de Dados Genéticas , Humanos , Proteínas de Membrana Transportadoras/classificação , Família Multigênica , FilogeniaRESUMO
Several families of G protein-coupled receptors (GPCRs) show no significant sequence similarities to each other, and it has been debated which of them share a common origin. We developed and performed integrated and independent HHsearch, Needleman--Wunsch-based and motif analyses on more than 6,600 unique GPCRs from 12 species. Moreover, we mined the evolutionary important Trichoplax adhaerens, Nematostella vectensis, Thalassiosira pseudonana, and Strongylocentrotus purpuratus genomes, revealing remarkably rich vertebrate-like GPCR repertoires already in the early Metazoan species. We found strong evidence that the Adhesion and Frizzled families are children to the cyclic AMP (cAMP) family with HHsearch homology probabilities of 99.8% and 99.4%, respectively, also supported by the Needleman--Wunsch analysis and several motifs. We also found that the large Rhodopsin family is likely a child of the cAMP family with an HHsearch homology probability of 99.4% and conserved motifs. Therefore, we suggest that the Adhesion and Frizzled families originated from the cAMP family in an event close to that which gave rise to the Rhodopsin family. We also found convincing evidence that the Rhodopsin family is parent to the important sensory families; Taste 2 and Vomeronasal type 1 as well as the Nematode chemoreceptor families. The insect odorant, gustatory, and Trehalose receptors, frequently referred to as GPCRs, form a separate cluster without relationship to the other families, and we propose, based on these and others' results, that these families are ligand-gated ion channels rather than GPCRs. Overall, we suggest common descent of at least 97% of the GPCRs sequences found in humans.
Assuntos
Evolução Molecular , Família Multigênica/genética , Filogenia , Receptores Acoplados a Proteínas G/genética , Sequência de Aminoácidos , Animais , Biologia Computacional , Sequência Conservada , Eucariotos , Genoma , Humanos , Dados de Sequência Molecular , Rodopsina/genéticaRESUMO
BACKGROUND: Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees. RESULTS: We present the SPR Identification Tool (SPRIT), a novel algorithm that solves the fixed parameter tractable minimum RSPR problem and its GPL licensed Java implementation. The algorithm can be used in two ways, exhaustive search that guarantees the minimum RSPR distance and a heuristic approach that guarantees finding a solution, but not necessarily the minimum one. We benchmarked SPRIT against other software in two different settings, small to medium sized trees i.e. five to one hundred taxa and large trees i.e. thousands of taxa. In the small to medium tree size setting with random artificial incongruence, SPRIT's heuristic mode outperforms the other software by always delivering a solution with a low overestimation of the RSPR distance. In the large tree setting SPRIT compares well to the alternatives when benchmarked on finding a minimum solution within a reasonable time. SPRIT presents both the minimum RSPR distance and the intermediate trees. CONCLUSIONS: When used in exhaustive search mode, SPRIT identifies the minimum number of RSPRs needed to reconcile two incongruent rooted trees. SPRIT also performs quick approximations of the minimum RSPR distance, which are comparable to, and often better than, purely heuristic solutions. Put together, SPRIT is an excellent tool for identification of HGT events and pinpointing which taxa have been involved in HGT.
Assuntos
Transferência Genética Horizontal , Filogenia , Software , AlgoritmosRESUMO
The Adhesion G-protein-coupled receptors (GPCRs) are the most complex gene family among GPCRs with large genomic size, multiple introns, and a fascinating flora of functional domains, though the evolutionary origin of this family has been obscure. Here we studied the evolution of all class B (7tm2)-related genes, including the Adhesion, Secretin, and Methuselah families of GPCRs with a focus on nine genomes. We found that the cnidarian genome of Nematostella vectensis has a remarkably rich set of Adhesion GPCRs with a broad repertoire of N-terminal domains although this genome did not have any Secretin GPCRs. Moreover, the single-celled and colony-forming eukaryotes Monosiga brevicollis and Dictyostelium discoideum contain Adhesion-like GPCRs although these genomes do not have any Secretin GPCRs suggesting that the Adhesion types of GPCRs are the most ancient among class B GPCRs. Phylogenetic analysis found Adhesion group V (that contains GPR133 and GPR144) to be the closest relative to the Secretin family in the Adhesion family. Moreover, Adhesion group V sequences in N. vectensis share the same splice site setup as the Secretin GPCRs. Additionally, one of the most conserved motifs in the entire Secretin family is only found in group V of the Adhesion family. We suggest therefore that the Secretin family of GPCRs could have descended from group V Adhesion GPCRs. We found a set of unique Adhesion-like GPCRs in N. vectensis that have long N-termini containing one Somatomedin B domain each, which is a domain configuration similar to that of a set of Adhesion-like GPCRs found in Branchiostoma floridae. These sequences show slight similarities to Methuselah sequences found in insects. The extended class B GPCRs have a very complex evolutionary history with several species-specific expansions, and we identified at least 31 unique N-terminal domains originating from other protein classes. The overall N-terminal domain structure, however, concurs with the phylogenetic analysis of the transmembrane domains, thus enabling us to track the origin of most of the subgroups.
Assuntos
Evolução Molecular , Receptores Acoplados a Proteínas G/genética , Secretina/genética , Animais , Genoma , Filogenia , Sítios de Splice de RNARESUMO
The Adhesion family is unique among the GPCR (G protein-coupled receptor) families because of several features including long N-termini with multiple domains. The gene repertoire has recently been mined in great detail in several species including mouse, rat, dog, chicken and the early vertebrate Branchiostoma (Branchiostoma floridae) and one of the most primitive animals, the cniderian Nematostella (Nematostella vectensis). There is a one-to-one relationship of the rodent (mouse and rat) and human orthologues with the exception the EMR2 and EMR3 that do not seem to have orthologues in either rat or mouse. All 33 human Adhesion GPCR genes are present in the dog genome but the dog genome also contains 5 additional full-length Adhesion genes. The dog and human Adhesion orthologues have higher average protein sequence identity than the rodent (rat and mouse) and the human sequences. The Adhesion family is well-represented in chicken with 21 one-to-one orthologous with humans, while 12 human Adhesion GPCRs lack a chicken ortholog. Branchiostoma has rich repertoire of Adhesion GPCRs with at least 37 genes. Moreover, the Adhesion GPCRs in Branchiostoma have several novel domains their N-termini, like Somatomedin B, Kringle, Lectin C-type, SRCR, LDLa, Immunoglobulin I-set, CUB and TNFR. Nematostella has also Adhesion GPCRs that are show domain structure and sequence similarities in the transmembrane regions with different classes of mammalian GPCRs. The Nematostella genome has a unique set of Adhesion-like sequences lacking GPS domains. There is considerable evidence showing that the Adhesion family is ancestral to the peptide hormone binding Secretin family of GPCRs.
Assuntos
Evolução Biológica , Isoformas de Proteínas/classificação , Isoformas de Proteínas/genética , Receptores Acoplados a Proteínas G/classificação , Receptores Acoplados a Proteínas G/genética , Animais , Humanos , Filogenia , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Estrutura Terciária de Proteína , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/metabolismoRESUMO
BACKGROUND: Membrane proteins form key nodes in mediating the cell's interaction with the surroundings, which is one of the main reasons why the majority of drug targets are membrane proteins. RESULTS: Here we mined the human proteome and identified the membrane proteome subset using three prediction tools for alpha-helices: Phobius, TMHMM, and SOSUI. This dataset was reduced to a non-redundant set by aligning it to the human genome and then clustered with our own interactive implementation of the ISODATA algorithm. The genes were classified and each protein group was manually curated, virtually evaluating each sequence of the clusters, applying systematic comparisons with a range of databases and other resources. We identified 6,718 human membrane proteins and classified the majority of them into 234 families of which 151 belong to the three major functional groups: receptors (63 groups, 1,352 members), transporters (89 groups, 817 members) or enzymes (7 groups, 533 members). Also, 74 miscellaneous groups with 697 members were determined. Interestingly, we find that 41% of the membrane proteins are singlets with no apparent affiliation or identity to any human protein family. Our results identify major differences between the human membrane proteome and the ones in unicellular organisms and we also show a strong bias towards certain membrane topologies for different functional classes: 77% of all transporters have more than six helices while 60% of proteins with an enzymatic function and 88% receptors, that are not GPCRs, have only one single membrane spanning alpha-helix. Further, we have identified and characterized new gene families and novel members of existing families. CONCLUSION: Here we present the most detailed roadmap of gene numbers and families to our knowledge, which is an important step towards an overall classification of the entire human proteome. We estimate that 27% of the total human proteome are alpha-helical transmembrane proteins and provide an extended classification together with in-depth investigations of the membrane proteome's functional, structural, and evolutionary features.
Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana/fisiologia , Proteoma/química , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Algoritmos , Enzimas/química , Evolução Molecular , Humanos , Proteínas de Membrana/química , Proteínas de Membrana/classificação , Proteínas de Membrana Transportadoras/química , Família Multigênica , Filogenia , Estrutura Secundária de Proteína , Receptores de Superfície Celular/químicaRESUMO
We studied the genomic positions of 38,129 putative ncRNAs from the RIKEN dataset in relation to protein-coding genes. We found that the dataset has 41% sense, 6% antisense, 24% intronic and 29% intergenic transcripts. Interestingly, 17,678 (47%) of the FANTOM3 transcripts were found to potentially be internally primed from longer transcripts. The highest fraction of these transcripts was found among the intronic transcripts and as many as 77% or 6929 intronic transcripts were both internally primed and unspliced. We defined a filtered subset of 8535 transcripts that did not overlap with protein-coding genes, did not contain ORFs longer than 100 residues and were not internally primed. This dataset contains 53% of the FANTOM3 transcripts associated to known ncRNA in RNAdb and expands previous similar efforts with 6523 novel transcripts. This bioinformatic filtering of the FANTOM3 non-coding dataset has generated a lead dataset of transcripts without signs of being artefacts, providing a suitable dataset for investigation with hybridization-based techniques.
Assuntos
Bases de Dados Genéticas , RNA não Traduzido/genética , Transcrição Gênica , Biologia Computacional , Etiquetas de Sequências Expressas , Genoma Humano , Humanos , Íntrons/genética , Proteínas/genética , RNA Mensageiro/genética , Análise de Sequência de RNARESUMO
Catalytically inactive dCas9 fused to transcriptional activators (dCas9-VPR) enables activation of silent genes. Many disease genes have counterparts, which serve similar functions but are expressed in distinct cell types. One attractive option to compensate for the missing function of a defective gene could be to transcriptionally activate its functionally equivalent counterpart via dCas9-VPR. Key challenges of this approach include the delivery of dCas9-VPR, activation efficiency, long-term expression of the target gene, and adverse effects in vivo. Using dual adeno-associated viral vectors expressing split dCas9-VPR, we show efficient transcriptional activation and long-term expression of cone photoreceptor-specific M-opsin (Opn1mw) in a rhodopsin-deficient mouse model for retinitis pigmentosa. One year after treatment, this approach yields improved retinal function and attenuated retinal degeneration with no apparent adverse effects. Our study demonstrates that dCas9-VPR-mediated transcriptional activation of functionally equivalent genes has great potential for the treatment of genetic disorders.
Assuntos
Sistemas CRISPR-Cas , Terapia Genética , Animais , Cegueira/genética , Cegueira/terapia , Camundongos , Fatores de Transcrição/genética , Ativação TranscricionalRESUMO
BACKGROUND: G protein-coupled receptors (GPCRs) are one of the largest families of genes in mammals. Branchiostoma floridae (amphioxus) is one of the species most closely related species to vertebrates. RESULTS: Mining and phylogenetic analysis of the amphioxus genome showed the presence of at least 664 distinct GPCRs distributed among all the main families of GPCRs; Glutamate (18), Rhodopsin (570), Adhesion (37), Frizzled (6) and Secretin (16). Surprisingly, the Adhesion GPCR repertoire in amphioxus includes receptors with many new domains not previously observed in this family. We found many Rhodopsin GPCRs from all main groups including many amine and peptide binding receptors and several previously uncharacterized expansions were also identified. This genome has however no genes coding for bitter taste receptors (TAS2), the sweet and umami (TAS1), pheromone (VR1 or VR2) or mammalian olfactory receptors. CONCLUSION: The amphioxus genome is remarkably rich in various GPCR subtypes while the main GPCR groups known to sense exogenous substances (such as Taste 2, mammalian olfactory, nematode chemosensory, gustatory, vomeronasal and odorant receptors) in other bilateral species are absent.
Assuntos
Cordados não Vertebrados/genética , Variação Genética , Genoma , Receptores Acoplados a Proteínas G/genética , Animais , Evolução Molecular , Humanos , Funções Verossimilhança , FilogeniaRESUMO
Solute carriers (SLCs) is the largest group of transporters, embracing transporters for inorganic ions, amino acids, neurotransmitters, sugars, purines and fatty acids among other substrates. We mined the finished assembly of the human genome using Hidden Markov Models (HMMs) obtaining a total of 384 unique SLC sequences. Detailed clustering and phylogenetic analysis of the entire SLC family showed that 15 of the families place into four large phylogenetic clusters with the largest containing eight SLC families, suggesting that many of the distinct families of SLCs have a common evolutionary origin. This study represents the first overall genomic roadmap of the SLCs providing large sequence sets and clarifies the phylogenetic relationships among the families of the second largest group of membrane proteins.
Assuntos
Genoma Humano , Proteínas de Membrana Transportadoras/classificação , Proteínas de Membrana Transportadoras/genética , Mapeamento Cromossômico , Evolução Molecular , Humanos , Cadeias de Markov , Filogenia , Análise de Sequência de DNARESUMO
Phenotypic variation of a single genotype is achieved by alterations in gene expression patterns. Regulation of such alterations depends on their time scale, where short-time adaptations differ from permanently established gene expression patterns maintained by epigenetic mechanisms. In the ciliate Paramecium, serotypes were described for an epigenetically controlled gene expression pattern of an individual multigene family. Paradoxically, individual serotypes can be triggered in Paramecium by alternating environments but are then stabilized by epigenetic mechanisms, thus raising the question to which extend their expression follows environmental stimuli. To characterize environmental adaptation in the context of epigenetically controlled serotype expression, we used RNA-seq to characterize transcriptomes of serotype pure cultures. The resulting vegetative transcriptome resource is first analysed for genes involved in the adaptive response to the altered environment. Secondly, we identified groups of genes that do not follow the adaptive response but show co-regulation with the epigenetically controlled serotype system, suggesting that their gene expression pattern becomes manifested by similar mechanisms. In our experimental set-up, serotype expression and the entire group of co-regulated genes were stable among environmental changes and only heat-shock genes altered expression of these gene groups. The data suggest that the maintenance of these gene expression patterns in a lineage represents epigenetically controlled robustness counteracting short-time adaptation processes.
Assuntos
Epigênese Genética , Regulação da Expressão Gênica , Paramecium tetraurellia/genética , Sorogrupo , Transcriptoma , Adaptação Biológica/genética , Antígenos de Protozoários/genética , Análise por Conglomerados , Temperatura Baixa , DNA/metabolismo , Perfilação da Expressão Gênica , Resposta ao Choque Térmico/genética , Família Multigênica , Paramecium tetraurellia/classificação , Paramecium tetraurellia/metabolismo , Biossíntese de Proteínas , Inanição/genéticaRESUMO
Despite evolutionary conserved mechanisms to silence transposable element activity, there are drastic differences in the abundance of transposable elements even among closely related plant species. We conducted a de novo assembly for the 375â Mb genome of the perennial model plant, Arabis alpina. Analysing this genome revealed long-lasting and recent transposable element activity predominately driven by Gypsy long terminal repeat retrotransposons, which extended the low-recombining pericentromeres and transformed large formerly euchromatic regions into repeat-rich pericentromeric regions. This reduced capacity for long terminal repeat retrotransposon silencing and removal in A. alpina co-occurs with unexpectedly low levels of DNA methylation. Most remarkably, the striking reduction of symmetrical CG and CHG methylation suggests weakened DNA methylation maintenance in A. alpina compared with Arabidopsis thaliana. Phylogenetic analyses indicate a highly dynamic evolution of some components of methylation maintenance machinery that might be related to the unique methylation in A. alpina.
RESUMO
Genes underlying mutant phenotypes can be isolated by combining marker discovery, genetic mapping and resequencing, but a more straightforward strategy for mapping mutations would be the direct comparison of mutant and wild-type genomes. Applying such an approach, however, is hampered by the need for reference sequences and by mutational loads that confound the unambiguous identification of causal mutations. Here we introduce NIKS (needle in the k-stack), a reference-free algorithm based on comparing k-mers in whole-genome sequencing data for precise discovery of homozygous mutations. We applied NIKS to eight mutants induced in nonreference rice cultivars and to two mutants of the nonmodel species Arabis alpina. In both species, comparing pooled F2 individuals selected for mutant phenotypes revealed small sets of mutations including the causal changes. Moreover, comparing M3 seedlings of two allelic mutants unambiguously identified the causal gene. Thus, for any species amenable to mutagenesis, NIKS enables forward genetics without requiring segregating populations, genetic maps and reference sequences.
Assuntos
Algoritmos , Arabis/genética , Genoma de Planta/genética , Mutação/genética , Oryza/genética , Análise de Sequência de DNA/métodos , Alelos , Arabidopsis/metabolismo , Pareamento de Bases/genética , Sequência de Bases , Mapeamento Cromossômico , Cruzamentos Genéticos , Metanossulfonato de Etila , Flores/genética , Genes de Plantas/genética , Dados de Sequência Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Padrões de Referência , Deleção de SequênciaRESUMO
Perennial plants live for more than 1 year and flower only after an extended vegetative phase. We used Arabis alpina, a perennial relative of annual Arabidopsis thaliana, to study how increasing age and exposure to winter cold (vernalization) coordinate to establish competence to flower. We show that the APETALA2 transcription factor, a target of microRNA miR172, prevents flowering before vernalization. Additionally, miR156 levels decline as A. alpina ages, causing increased production of SPL (SQUAMOSA PROMOTER BINDING PROTEIN LIKE) transcription factors and ensuring that flowering occurs in response to cold. The age at which plants respond to vernalization can be altered by manipulating miR156 levels. Although miR156 and miR172 levels are uncoupled in A. alpina, miR156 abundance represents the timer controlling age-dependent flowering responses to cold.
Assuntos
Arabis/fisiologia , Temperatura Baixa , Flores/fisiologia , Estações do Ano , Arabis/genética , Flores/genética , Regulação da Expressão Gênica de Plantas , MicroRNAs/metabolismo , Dados de Sequência Molecular , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fatores de Tempo , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismoRESUMO
Mapping-by-sequencing combines genetic mapping with whole-genome sequencing in order to accelerate mutant identification. However, application of mapping-by-sequencing requires decisions on various practical settings on the experimental design that are not intuitively answered. Following an experimentally determined recombination landscape of Arabidopsis and next generation sequencing-specific biases, we simulated more than 400,000 mapping-by-sequencing experiments. This allowed us to evaluate a broad range of different types of experiments and to develop general rules for mapping-by-sequencing in Arabidopsis. Most importantly, this informs about the properties of different crossing scenarios, the number of recombinants and sequencing depth needed for successful mapping experiments.