RESUMO
Moringa oleifera is a plant well-known for its nutrition value, drought resistance and medicinal properties. cDNA libraries from five different tissues (leaf, root, stem, seed and flower) of M.â¯oleifera cultivar Bhagya were generated and sequenced. We developed a bioinformatics pipeline to assemble transcriptome, along with the previously published M.â¯oleifera genome, to predict 17,148 gene models. Few candidate genes related to biosynthesis of secondary metabolites, vitamins and ion transporters were identified. Expressions were further confirmed by real-time quantitative PCR experiments for few promising leads. Quantitative estimation of metabolites, as well as elemental analysis, was also carried out to support our observations. Enzymes in the biosynthesis of vitamins and metabolites like quercetin and kaempferol are highly expressed in leaves, flowers and seeds. The expression of iron transporters and calcium storage proteins were observed in root and leaves. In general, leaves retain the highest amount of small molecules of interest.
Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/fisiologia , Moringa oleifera , Metabolismo Secundário/fisiologia , Transcriptoma/fisiologia , Biblioteca Gênica , Moringa oleifera/genética , Moringa oleifera/metabolismoRESUMO
BACKGROUND: Krishna Tulsi, a member of Lamiaceae family, is a herb well known for its spiritual, religious and medicinal importance in India. The common name of this plant is 'Tulsi' (or 'Tulasi' or 'Thulasi') and is considered sacred by Hindus. We present the draft genome of Ocimum tenuiflurum L (subtype Krishna Tulsi) in this report. The paired-end and mate-pair sequence libraries were generated for the whole genome sequenced with the Illumina Hiseq 1000, resulting in an assembled genome of 374 Mb, with a genome coverage of 61 % (612 Mb estimated genome size). We have also studied transcriptomes (RNA-Seq) of two subtypes of O. tenuiflorum, Krishna and Rama Tulsi and report the relative expression of genes in both the varieties. RESULTS: The pathways leading to the production of medicinally-important specialized metabolites have been studied in detail, in relation to similar pathways in Arabidopsis thaliana and other plants. Expression levels of anthocyanin biosynthesis-related genes in leaf samples of Krishna Tulsi were observed to be relatively high, explaining the purple colouration of Krishna Tulsi leaves. The expression of six important genes identified from genome data were validated by performing q-RT-PCR in different tissues of five different species, which shows the high extent of urosolic acid-producing genes in young leaves of the Rama subtype. In addition, the presence of eugenol and ursolic acid, implied as potential drugs in the cure of many diseases including cancer was confirmed using mass spectrometry. CONCLUSIONS: The availability of the whole genome of O.tenuiflorum and our sequence analysis suggests that small amino acid changes at the functional sites of genes involved in metabolite synthesis pathways confer special medicinal properties to this herb.
Assuntos
Regulação da Expressão Gênica de Plantas , Genoma de Planta , Ocimum/genética , Índia , Ocimum/metabolismo , Folhas de Planta/metabolismo , Plantas Medicinais/genética , Plantas Medicinais/metabolismoRESUMO
Lantana camara L. is an invasive species of global concern. An ornamental plant originating from central America, it has now spread across natural and human-dominated habitats across tropical and subtropical regions globally. Understanding the population and evolutionary genetics of this species could help gain deeper insights into invasion biology, and provide tools for more effective management. Such investigation would require a relatively good quality genome assembly. While there have been reports of a transcriptome, it has been challenging to construct the genome assembly because of the large genome size. We present here the first draft genome assembly of Lantana camara L. which has an N50 value of 62 Kb, genome completeness of 99.3% and genome coverage of 74.3%. We hope that such an assembly will help researchers study colonization history, the genetic basis of adaptation and invasiveness, and help design strategies to contain the invasiveness of this plant, allowing biodiversity recovery in several parts of the globe.
RESUMO
Fenugreek (Trigonella foenum-graecum L.) is a self-pollinated leguminous crop belonging to the Fabaceae family. It is a multipurpose crop used as herb, spice, vegetable and forage. It is a traditional medicinal plant in India attributed with several nutritional and medicinal properties including antidiabetic and anticancer. We have performed a combined transcriptome assembly from RNA sequencing data derived from leaf, stem and root tissues. Around 209,831 transcripts were deciphered from the assembly of 92% completeness and an N50 of 1382 bases. Whilst secondary metabolites of medicinal value, such as trigonelline, diosgenin, 4-hydroxyisoleucine and quercetin, are distributed in several tissues, we report transcripts that bear sequence signatures of enzymes involved in the biosynthesis of such metabolites and are highly expressed in leaves, stem and roots. One of the antidiabetic alkaloid, trigonelline and its biosynthesising enzyme, is highly abundant in leaves. These findings are of value to nutritional and the pharmaceutical industry.
Assuntos
Diosgenina , Plantas Medicinais , Trigonella , Diosgenina/metabolismo , Hipoglicemiantes/metabolismo , Extratos Vegetais/metabolismo , Plantas Medicinais/genética , Plantas Medicinais/metabolismo , Transcriptoma , Trigonella/genética , Trigonella/metabolismoRESUMO
This protocol describes a stepwise process to identify proteins of interest from a query proteome derived from NGS data. We implemented this protocol on Moringa oleifera transcriptome to identify proteins involved in secondary metabolite and vitamin biosynthesis and ion transport. This knowledge-driven protocol identifies proteins using an integrated approach involving sensitive sequence search and evolutionary relationships. We make use of functionally important residues (FIR) specific for the query protein family identified through its homologous sequences and literature. We screen protein hits based on the clustering with true homologues through phylogenetic tree reconstruction complemented with the FIR mapping. The protocol was validated for the protein hits through qRT-PCR and transcriptome quantification. Our protocol demonstrated a higher specificity as compared to other methods, particularly in distinguishing cross-family hits. This protocol was effective in transcriptome data analysis of M. oleifera as described in Pasha et al.â¢Knowledge-driven protocol to identify secondary metabolite synthesizing protein in a highly specific manner.â¢Use of functionally important residues for screening of true hits.â¢Beneficial for metabolite pathway reconstruction in any (species, metagenomics) NGS data.
RESUMO
In this paper, we present the data acquired during transcriptome analysis of the plant Moringa oleifera [1] from five different tissues (root, stem, leaf, flower and seed) by RNA sequencing. A total of 271 million reads were assembled with an N50 of 2094â¯bp. The combined transcriptome was assessed for transcript abundance across five tissues. The protein coding genes identified from the transcripts were annotated and used for orthology analysis. Further, enzymes involved in the biosynthesis of select medicinally important secondary metabolites, vitamins and ion transporters were identified and their expression levels across tissues were examined. The data generated by RNA sequencing has been deposited to NCBI public repository under the accession number PRJNA394193 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA394193).
RESUMO
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around â¼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Assuntos
Estudo de Associação Genômica Ampla , Domínios Proteicos , Proteínas/química , Proteínas/genética , Animais , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Família Multigênica , Conformação Proteica , Domínios Proteicos/genética , Reprodutibilidade dos Testes , Relação Estrutura-AtividadeRESUMO
The availability of the genome sequence of Mycobacterium tuberculosis H37Rv has encouraged determination of large numbers of protein structures and detailed definition of the biological information encoded therein; yet, the functions of many proteins in M. tuberculosis remain unknown. The emergence of multidrug resistant strains makes it a priority to exploit recent advances in homology recognition and structure prediction to re-analyse its gene products. Here we report the structural and functional characterization of gene products encoded in the M. tuberculosis genome, with the help of sensitive profile-based remote homology search and fold recognition algorithms resulting in an enhanced annotation of the proteome where 95% of the M. tuberculosis proteins were identified wholly or partly with information on structure or function. New information includes association of 244 proteins with 205 domain families and a separate set of new association of folds to 64 proteins. Extending structural information across uncharacterized protein families represented in the M. tuberculosis proteome, by determining superfamily relationships between families of known and unknown structures, has contributed to an enhancement in the knowledge of structural content. In retrospect, such superfamily relationships have facilitated recognition of probable structure and/or function for several uncharacterized protein families, eventually aiding recognition of probable functions for homologous proteins corresponding to such families. Gene products unique to mycobacteria for which no functions could be identified are 183. Of these 18 were determined to be M. tuberculosis specific. Such pathogen-specific proteins are speculated to harbour virulence factors required for pathogenesis. A re-annotated proteome of M. tuberculosis, with greater completeness of annotated proteins and domain assigned regions, provides a valuable basis for experimental endeavours designed to obtain a better understanding of pathogenesis and to accelerate the process of drug target discovery.