ABSTRACT
The Solanaceae or "nightshade" family is an economically important group with remarkable diversity. To gain a better understanding of how the unique biology of the Solanaceae relates to the family's small RNA (sRNA) genomic landscape, we downloaded over 255 publicly available sRNA data sets that comprise over 2.6 billion reads of sequence data. We applied a suite of computational tools to predict and annotate two major sRNA classes: (1) microRNAs (miRNAs), typically 20- to 22-nucleotide (nt) RNAs generated from a hairpin precursor and functioning in gene silencing and (2) short interfering RNAs (siRNAs), including 24-nt heterochromatic siRNAs typically functioning to repress repetitive regions of the genome via RNA-directed DNA methylation, as well as secondary phased siRNAs and trans-acting siRNAs generated via miRNA-directed cleavage of a polymerase II-derived RNA precursor. Our analyses described thousands of sRNA loci, including poorly understood clusters of 22-nt siRNAs that accumulate during viral infection. The birth, death, expansion, and contraction of these sRNA loci are dynamic evolutionary processes that characterize the Solanaceae family. These analyses indicate that individuals within the same genus share similar sRNA landscapes, whereas comparisons between distinct genera within the Solanaceae reveal relatively few commonalities.
Subject(s)
MicroRNAs , RNA, Small Interfering , Solanaceae , DNA Methylation , DNA-Directed RNA Polymerases/genetics , Gene Silencing , MicroRNAs/genetics , RNA, Plant/genetics , RNA, Small Interfering/genetics , Solanaceae/geneticsABSTRACT
In monocots other than maize (Zea mays) and rice (Oryza sativa), the repertoire and diversity of microRNAs (miRNAs) and the populations of phased, secondary, small interfering RNAs (phasiRNAs) are poorly characterized. To remedy this, we sequenced small RNAs (sRNA) from vegetative and dissected inflorescence tissue in 28 phylogenetically diverse monocots and from several early-diverging angiosperm lineages, as well as publicly available data from 10 additional monocot species. We annotated miRNAs, small interfering RNAs (siRNAs) and phasiRNAs across the monocot phylogeny, identifying miRNAs apparently lost or gained in the grasses relative to other monocot families, as well as a number of transfer RNA fragments misannotated as miRNAs. Using our miRNA database cleaned of these misannotations, we identified conservation at the 8th, 9th, 19th, and 3'-end positions that we hypothesize are signatures of selection for processing, targeting, or Argonaute sorting. We show that 21-nucleotide (nt) reproductive phasiRNAs are far more numerous in grass genomes than other monocots. Based on sequenced monocot genomes and transcriptomes, DICER-LIKE5, important to 24-nt phasiRNA biogenesis, likely originated via gene duplication before the diversification of the grasses. This curated database of phylogenetically diverse monocot miRNAs, siRNAs, and phasiRNAs represents a large collection of data that should facilitate continued exploration of sRNA diversification in flowering plants.
Subject(s)
Inflorescence/genetics , Magnoliopsida/growth & development , Magnoliopsida/genetics , RNA, Plant , Reproduction/genetics , Reproduction/physiology , Gene Expression Regulation, Plant , Genetic Variation , Genotype , Inflorescence/physiology , MicroRNAs , Sequence Analysis, RNAABSTRACT
We developed public web sites and resources for data access, display, and analysis of plant small RNAs. These web sites are interconnected with related data types. The current generation of these informatics tools was developed for Illumina data, evolving over more than 15 years of improvements. Our online databases have customized web interfaces to uniquely handle and display RNA-derived data from diverse plant species, ranging from Arabidopsis (Arabidopsis thaliana) to wheat (Triticum spp.), including many crop and model species. The web interface displays the abundance and genomic context of data for small RNAs, parallel analysis of RNA ends/degradome reads, RNA sequencing, and even chromatin immunoprecipitation sequencing data; it also provides information about potentially novel transcripts (antisense transcripts, alternative splice isoforms, and regulatory intergenic transcripts). Numerous options are included for downloading data as tables or via web services. Interpretation of these data is facilitated by the inclusion of extensive repeat or transposon data in our genome viewer. We have developed graphical and analytical tools, including a new viewer and a query page for the analysis of phased small RNAs; these are particularly useful for understanding the complex small RNA pathways of plants. These public databases are accessible at https://mpss.danforthcenter.org.