RESUMEN
The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Asunto(s)
Evolución Biológica , Peces/clasificación , Peces/genética , Genoma/genética , Animales , Animales Modificados Genéticamente , Embrión de Pollo , Secuencia Conservada/genética , Elementos de Facilitación Genéticos/genética , Evolución Molecular , Extremidades/anatomía & histología , Extremidades/crecimiento & desarrollo , Peces/anatomía & histología , Peces/fisiología , Genes Homeobox/genética , Genómica , Inmunoglobulina M/genética , Ratones , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Filogenia , Alineación de Secuencia , Análisis de Secuencia de ADN , Vertebrados/anatomía & histología , Vertebrados/genética , Vertebrados/fisiologíaRESUMEN
Aspalathin, the main polyphenol of rooibos (Aspalathus linearis), is associated with diverse health promoting properties of the tea. During fermentation, aspalathin is oxidized and concentrations are significantly reduced. Standardized methods for quality control of rooibos products do not investigate aspalathin, since current techniques of aspalathin detection require expensive equipment and expertise. Here, we describe a simple and fast thin-layer chromatography (TLC) method that can reproducibly visualize aspalathin in rooibos herbal tea and plant extracts at a limit of detection (LOD) equal to 178.7 ng and a limit of quantification (LOQ) equal to 541.6 ng. Aspalathin is a rare compound, so far only found in A. linearis and its (rare) sister species A. pendula. Therefore, aspalathin could serve as a marker compound for authentication and quality control of rooibos products, and the described TLC method represents a cost-effective approach for high-throughput screening of plant and herbal tea extracts.
Asunto(s)
Aspalathus/química , Chalconas/análisis , Tés de Hierbas/normas , Cromatografía en Capa Delgada , Ensayos Analíticos de Alto Rendimiento , Extractos Vegetales/normas , Control de CalidadRESUMEN
The fungal family Clavicipitaceae includes plant symbionts and parasites that produce several psychoactive and bioprotective alkaloids. The family includes grass symbionts in the epichloae clade (Epichloë and Neotyphodium species), which are extraordinarily diverse both in their host interactions and in their alkaloid profiles. Epichloae produce alkaloids of four distinct classes, all of which deter insects, and some-including the infamous ergot alkaloids-have potent effects on mammals. The exceptional chemotypic diversity of the epichloae may relate to their broad range of host interactions, whereby some are pathogenic and contagious, others are mutualistic and vertically transmitted (seed-borne), and still others vary in pathogenic or mutualistic behavior. We profiled the alkaloids and sequenced the genomes of 10 epichloae, three ergot fungi (Claviceps species), a morning-glory symbiont (Periglandula ipomoeae), and a bamboo pathogen (Aciculosporium take), and compared the gene clusters for four classes of alkaloids. Results indicated a strong tendency for alkaloid loci to have conserved cores that specify the skeleton structures and peripheral genes that determine chemical variations that are known to affect their pharmacological specificities. Generally, gene locations in cluster peripheries positioned them near to transposon-derived, AT-rich repeat blocks, which were probably involved in gene losses, duplications, and neofunctionalizations. The alkaloid loci in the epichloae had unusual structures riddled with large, complex, and dynamic repeat blocks. This feature was not reflective of overall differences in repeat contents in the genomes, nor was it characteristic of most other specialized metabolism loci. The organization and dynamics of alkaloid loci and abundant repeat blocks in the epichloae suggested that these fungi are under selection for alkaloid diversification. We suggest that such selection is related to the variable life histories of the epichloae, their protective roles as symbionts, and their associations with the highly speciose and ecologically diverse cool-season grasses.
Asunto(s)
Alcaloides , Claviceps , Epichloe , Alcaloides de Claviceps , Selección Genética , Alcaloides/química , Alcaloides/clasificación , Alcaloides/genética , Alcaloides/metabolismo , Claviceps/genética , Claviceps/metabolismo , Claviceps/patogenicidad , Epichloe/genética , Epichloe/metabolismo , Epichloe/patogenicidad , Alcaloides de Claviceps/genética , Alcaloides de Claviceps/metabolismo , Regulación Fúngica de la Expresión Génica , Hypocreales/genética , Hypocreales/metabolismo , Neotyphodium , Poaceae/genética , Poaceae/metabolismo , Poaceae/parasitología , Simbiosis/genéticaRESUMEN
BACKGROUND: De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies. RESULTS: Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci. CONCLUSIONS: IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.
Asunto(s)
Algoritmos , Genoma Fúngico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular , Neurospora crassa/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Análisis por Conglomerados , Programas InformáticosRESUMEN
G-protein coupled chemosensory receptors (GPCR-CRs) aid in the perception of odors and tastes in vertebrates. So far, six GPCR-CR families have been identified that are conserved in most vertebrate species. Phylogenetic analyses indicate differing evolutionary dynamics between teleost fish and tetrapods. The coelacanth Latimeria chalumnae belongs to the lobe-finned fishes, which represent a phylogenetic link between these two groups. We searched the genome of L. chalumnae for GPCR-CRs and found that coelacanth taste receptors are more similar to those in tetrapods than in teleost fish: two coelacanth T1R2s co-segregate with the tetrapod T1R2s that recognize sweet substances, and our phylogenetic analyses indicate that the teleost T1R2s are closer related to T1R1s (umami taste receptors) than to tetrapod T1R2s. Furthermore, coelacanths are the first fish with a large repertoire of bitter taste receptors (58 T2Rs). Considering current knowledge on feeding habits of coelacanths the question arises if perception of bitter taste is the only function of these receptors. Similar to teleost fish, coelacanths have a variety of olfactory receptors (ORs) necessary for perception of water-soluble substances. However, they also have seven genes in the two tetrapod OR subfamilies predicted to recognize airborne molecules. The two coelacanth vomeronasal receptor families are larger than those in teleost fish, and similar to tetrapods and form V1R and V2R monophyletic clades. This may point to an advanced development of the vomeronasal organ as reported for lungfish. Our results show that the intermediate position of Latimeria in the phylogeny is reflected in its GPCR-CR repertoire.
Asunto(s)
Peces/genética , Receptores Odorantes/genética , Gusto/genética , Animales , Evolución Molecular , Filogenia , Receptores Acoplados a Proteínas G/genética , Vertebrados/genética , Órgano VomeronasalRESUMEN
Recent advances in sequencing technologies have made genome sequencing of non-model organisms with very large and complex genomes possible. The data can be used to estimate diverse genome characteristics, including genome size, repeat content, and levels of heterozygosity. K-mer analysis is a powerful biocomputational approach with a wide range of applications, including estimation of genome sizes. However, interpretation of the results is not always straightforward. Here, I review k-mer-based genome size estimation, focusing specifically on k-mer theory and peak calling in k-mer frequency histograms. I highlight common pitfalls in data analysis and result interpretation, and provide a comprehensive overview on current methods and programs developed to conduct these analyses.
Asunto(s)
Algoritmos , Programas Informáticos , Tamaño del Genoma , Análisis de Secuencia de ADN/métodos , Mapeo Cromosómico , Secuencia de BasesRESUMEN
While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries as it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting. Considering that South Africa is home to the highly diverse Cape Floristic Region, local establishment of methods for plant genome analysis is essential. Long-read sequencing is becoming standard procedure for plant genome research, as these reads can span repetitive regions of the DNA, substantially facilitating reassembly of a contiguous genome. With the MinION, Oxford Nanopore offers a cost-efficient sequencing method to generate long reads; however, DNA purification protocols must be adapted for each plant species to generate ultra-pure DNA, essential for these analyses. Here, we describe a cost-effective procedure for the extraction and purification of plant DNA and evaluate diverse genome assembly approaches for the reconstruction of the genome of rooibos (Aspalathus linearis), an endemic South African medicinal plant widely used for tea production. We discuss the pros and cons of nine tested assembly programs, specifically Redbean and NextDenovo, which generated the most contiguous assemblies, and Flye, which produced an assembly closest to the predicted genome size.
RESUMEN
Symbiotic associations between plants and fungi are a dominant feature of many terrestrial ecosystems, yet relatively little is known about the signaling, and associated transcriptome profiles, that define the symbiotic metabolic state. Using the Epichloë festucae-perennial ryegrass (Lolium perenne) association as a model symbiotic experimental system, we show an essential role for the fungal stress-activated mitogen-activated protein kinase (sakA) in the establishment and maintenance of this mutualistic interaction. Deletion of sakA switches the fungal interaction with the host from mutualistic to pathogenic. Infected plants exhibit loss of apical dominance, premature senescence, and dramatic changes in development, including the formation of bulb-like structures at the base of tillers that lack anthocyanin pigmentation. A comparison of the transcriptome of wild-type and sakA associations using high-throughput mRNA sequencing reveals dramatic changes in fungal gene expression consistent with the transition from restricted to proliferative growth, including a down-regulation of several clusters of secondary metabolite genes and up-regulation of a large set of genes that encode hydrolytic enzymes and transporters. Analysis of the plant transcriptome reveals up-regulation of host genes involved in pathogen defense and transposon activation as well as dramatic changes in anthocyanin and hormone biosynthetic/responsive gene expression. These results highlight the fine balance between mutualism and antagonism in a plant-fungal interaction and the power of deep mRNA sequencing to identify candidate sets of genes underlying the symbiosis.
Asunto(s)
Epichloe/genética , Proteínas Fúngicas/metabolismo , Perfilación de la Expresión Génica , Lolium/microbiología , Proteína Quinasa 1 Activada por Mitógenos/metabolismo , Simbiosis , Antocianinas/biosíntesis , Elementos Transponibles de ADN , Epichloe/enzimología , Epichloe/fisiología , Proteínas Fúngicas/genética , Eliminación de Gen , Regulación Fúngica de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Biblioteca de Genes , Lolium/crecimiento & desarrollo , Lolium/metabolismo , Proteína Quinasa 1 Activada por Mitógenos/genética , Reguladores del Crecimiento de las Plantas/biosíntesis , ARN Mensajero/genética , ARN de Planta/genética , Especies Reactivas de Oxígeno/metabolismo , Análisis de Secuencia de ARNRESUMEN
Plant genomes provide information on biosynthetic pathways involved in the production of industrially relevant compounds. Genome size estimates are essential for the initiation of genome projects. The genome size of rooibos (Aspalathus linearis species complex) was estimated using DAPI flow cytometry and k-mer analyses. For flow cytometry, a suitable nuclei isolation buffer, plant tissue and a transport medium for rooibos ecotype samples collected from distant locations were identified. When using radicles from commercial rooibos seedlings, Woody Plant Buffer and Vicia faba as an internal standard, the flow cytometry-estimated genome size of rooibos was 1.24 ± 0.01 Gbp. The estimates for eight wild rooibos growth types did not deviate significantly from this value. K-mer analysis was performed using Illumina paired-end sequencing data from one commercial rooibos genotype. For biocomputational estimation of the genome size, four k-mer analysis methods were investigated: A standard formula and three popular programs (BBNorm, GenomeScope, and FindGSE). GenomeScope estimates were strongly affected by parameter settings, specifically CovMax. When using the complete k-mer frequency histogram (up to 9 × 105), the programs did not deviate significantly, estimating an average rooibos genome size of 1.03 ± 0.04 Gbp. Differences between the flow cytometry and biocomputational estimates are discussed.
RESUMEN
Rooibos (Aspalathus linearis), widely known as a herbal tea, is endemic to the Cape Floristic Region of South Africa (SA). It produces a wide range of phenolic compounds that have been associated with diverse health promoting properties of the plant. The species comprises several growth forms that differ in their morphology and biochemical composition, only one of which is cultivated and used commercially. Here, we established methodologies for non-invasive transcriptome research of wild-growing South African plant species, including (1) harvesting and transport of plant material suitable for RNA sequencing; (2) inexpensive, high-throughput biochemical sample screening; (3) extraction of high-quality RNA from recalcitrant, polysaccharide- and polyphenol rich plant material; and (4) biocomputational analysis of Illumina sequencing data, together with the evaluation of programs for transcriptome assembly (Trinity, IDBA-Trans, SOAPdenovo-Trans, CLC), protein prediction, as well as functional and taxonomic transcript annotation. In the process, we established a biochemically characterized sample pool from 44 distinct rooibos ecotypes (1-5 harvests) and generated four in-depth annotated transcriptomes (each comprising on average ≈86,000 transcripts) from rooibos plants that represent distinct growth forms and differ in their biochemical profiles. These resources will serve future rooibos research and plant breeding endeavours.
RESUMEN
BACKGROUND: Subtilisin-like proteases (SLPs) form a superfamily of enzymes that act to degrade protein substrates. In fungi, SLPs can play either a general nutritive role, or may play specific roles in cell metabolism, or as pathogenicity or virulence factors. RESULTS: Fifteen different genes encoding SLPs were identified in the genome of the grass endophytic fungus Epichloë festucae. Phylogenetic analysis indicated that these SLPs belong to four different subtilisin families: proteinase K, kexin, pyrolysin and subtilisin. The pattern of intron loss and gain is consistent with this phylogeny. E. festucae is exceptional in that it contains two kexin-like genes. Phylogenetic analysis in Hypocreales fungi revealed an extensive history of gene loss and duplication. CONCLUSION: This study provides new insights into the evolution of the SLP superfamily in filamentous fungi.
Asunto(s)
Epichloe/genética , Evolución Molecular , Familia de Multigenes , Subtilisinas/genética , Clonación Molecular , Secuencia Conservada , ADN de Hongos/genética , Epichloe/enzimología , Duplicación de Gen , Genes Fúngicos , Genoma Fúngico , Biblioteca Genómica , Intrones , Filogenia , Análisis de Secuencia de ADN , SinteníaRESUMEN
Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies.
RESUMEN
Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score based filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms.
RESUMEN
Miniature inverted-repeat transposable elements (MITEs) are abundant repeat elements in plant and animal genomes; however, there are few analyses of these elements in fungal genomes. Analysis of the draft genome sequence of the fungal endophyte Epichloë festucae revealed 13 MITE families that make up almost 1% of the E. festucae genome, and relics of putative autonomous parent elements were identified for three families. Sequence and DNA hybridization analyses suggest that at least some of the MITEs identified in the study were active early in the evolution of Epichloë but are not found in closely related genera. Analysis of MITE integration sites showed that these elements have a moderate integration site preference for 5' genic regions of the E. festucae genome and are particularly enriched near genes for secondary metabolism. Copies of the EFT-3m/Toru element appear to have mediated recombination events that may have abolished synthesis of two fungal alkaloids in different epichloae. This work provides insight into the potential impact of MITEs on epichloae evolution and provides a foundation for analysis in other fungal genomes.