RESUMO
Red alder (Alnus rubra Bong.) is an ecologically significant and important fast-growing commercial tree species native to western coastal and riparian regions of North America, having highly desirable wood, pigment, and medicinal properties. We have sequenced the genome of a rapidly growing clone. The assembly is nearly complete, containing the full complement of expected genes. This supports our objectives of identifying and studying genes and pathways involved in nitrogen-fixing symbiosis and those related to secondary metabolites that underlie red alder's many interesting defense, pigmentation, and wood quality traits. We established that this clone is most likely diploid and identified a set of SNPs that will have utility in future breeding and selection endeavors, as well as in ongoing population studies. We have added a well-characterized genome to others from the order Fagales. In particular, it improves significantly upon the only other published alder genome sequence, that of Alnus glutinosa. Our work initiated a detailed comparative analysis of members of the order Fagales and established some similarities with previous reports in this clade, suggesting a biased retention of certain gene functions in the vestiges of an ancient genome duplication when compared with more recent tandem duplications.
Assuntos
Alnus , Alnus/metabolismo , Diploide , Melhoramento Vegetal , Simbiose , ÁrvoresRESUMO
The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession's specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the "set-partitioning" concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C++ in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/.
RESUMO
We report an advanced web server, the plant-specific small noncoding RNA interference tool pssRNAit, which can be used to design a pool of small interfering RNAs (siRNAs) for highly effective, specific, and nontoxic gene silencing in plants. In developing this tool, we integrated the transcript dataset of plants, several rules governing gene silencing, and a series of computational models of the biological mechanism of the RNA interference (RNAi) pathway. The designed pool of siRNAs can be used to construct a long double-strand RNA and expressed through virus-induced gene silencing (VIGS) or synthetic transacting siRNA vectors for gene silencing. We demonstrated the performance of pssRNAit by designing and expressing the VIGS constructs to silence Phytoene desaturase (PDS) or a ribosomal protein-encoding gene, RPL10 (QM), in Nicotiana benthamiana We analyzed the expression levels of predicted intended-target and off-target genes using reverse transcription quantitative PCR. We further conducted an RNA-sequencing-based transcriptome analysis to assess genome-wide off-target gene silencing triggered by the fragments that were designed by pssRNAit, targeting different homologous regions of the PDS gene. Our analyses confirmed the high accuracy of siRNA constructs designed using pssRNAit The pssRNAit server, freely available at https://plantgrn.noble.org/pssRNAit/, supports the design of highly effective and specific RNAi, VIGS, or synthetic transacting siRNA constructs for high-throughput functional genomics and trait improvement in >160 plant species.
Assuntos
Genoma de Planta/genética , Regulação da Expressão Gênica de Plantas , Oxirredutases/genética , Oxirredutases/metabolismo , Interferência de RNA/fisiologia , RNA Interferente Pequeno/genética , Nicotiana/genéticaRESUMO
A growing number of small secreted peptides (SSPs) in plants are recognized as important regulatory molecules with roles in processes such as growth, development, reproduction, stress tolerance, and pathogen defense. Recent discoveries further implicate SSPs in regulating root nodule development, which is of particular significance for legumes. SSP-coding genes are frequently overlooked, because genome annotation pipelines generally ignore small open reading frames, which are those most likely to encode SSPs. Also, SSP-coding small open reading frames are often expressed at low levels or only under specific conditions, and thus are underrepresented in non-tissue-targeted or non-condition-optimized RNA-sequencing projects. We previously identified 4,439 SSP-encoding genes in the model legume Medicago truncatula To support systematic characterization and annotation of these putative SSP-encoding genes, we developed the M. truncatula Small Secreted Peptide Database (MtSSPdb; https://mtsspdb.noble.org/). MtSSPdb currently hosts (1) a compendium of M. truncatula SSP candidates with putative function and family annotations; (2) a large-scale M. truncatula RNA-sequencing-based gene expression atlas integrated with various analytical tools, including differential expression, coexpression, and pathway enrichment analyses; (3) an online plant SSP prediction tool capable of analyzing protein sequences at the genome scale using the same protocol as for the identification of SSP genes; and (4) information about a library of synthetic peptides and root and nodule phenotyping data from synthetic peptide screens in planta. These datasets and analytical tools make MtSSPdb a unique and valuable resource for the plant research community. MtSSPdb also has the potential to become the most complete database of SSPs in plants.
Assuntos
Medicago truncatula/genética , Peptídeos/metabolismo , Proteínas de Plantas/metabolismo , Bases de Dados Factuais , Genoma de Planta/genética , Peptídeos/genética , Proteínas de Plantas/genéticaRESUMO
Plant regulatory small RNAs (sRNAs), which include most microRNAs (miRNAs) and a subset of small interfering RNAs (siRNAs), such as the phased siRNAs (phasiRNAs), play important roles in regulating gene expression. Although generated from genetically distinct biogenesis pathways, these regulatory sRNAs share the same mechanisms for post-translational gene silencing and translational inhibition. psRNATarget was developed to identify plant sRNA targets by (i) analyzing complementary matching between the sRNA sequence and target mRNA sequence using a predefined scoring schema and (ii) by evaluating target site accessibility. This update enhances its analytical performance by developing a new scoring schema that is capable of discovering miRNA-mRNA interactions at higher 'recall rates' without significantly increasing total prediction output. The scoring procedure is customizable for the users to search both canonical and non-canonical targets. This update also enables transmitting and analyzing 'big' data empowered by (a) the implementation of multi-threading chunked file uploading, which can be paused and resumed, using HTML5 APIs and (b) the allocation of significantly more computing nodes to its back-end Linux cluster. The updated psRNATarget server has clear, compelling and user-friendly interfaces that enhance user experiences and present data clearly and concisely. The psRNATarget is freely available at http://plantgrn.noble.org/psRNATarget/.
Assuntos
Biologia Computacional , Internet , MicroRNAs/genética , RNA Interferente Pequeno/genética , Software , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , RNA de Plantas , Análise de Sequência de RNARESUMO
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many 'unknown' yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/.
Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Redes Reguladoras de Genes , Transdução de Sinais , Algoritmos , Arabidopsis/fisiologia , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Internet , SoftwareRESUMO
RNA interference (RNAi) is one of the most popular and effective molecular technologies for knocking down the expression of an individual gene of interest in living organisms. Yet the technology still faces the major issue of nonspecific gene silencing, which can compromise gene functional characterization and the interpretation of phenotypes associated with individual gene knockdown. Designing an effective and target-specific small interfering RNA (siRNA) for induction of RNAi is therefore the major challenge in RNAi-based gene silencing. A 'good' siRNA molecule must possess three key features: (a) the ability to specifically silence an individual gene of interest, (b) little or no effect on the expressions of unintended siRNA gene targets (off-target genes), and (c) no cell toxicity. Although several siRNA design and analysis algorithms have been developed, only a few of them are specifically focused on gene silencing in plants. Furthermore, current algorithms lack a comprehensive consideration of siRNA specificity, efficacy, and nontoxicity in siRNA design, mainly due to lack of integration of all known rules that govern different steps in the RNAi pathway. In this review, we first describe popular RNAi methods that have been used for gene silencing in plants and their serious limitations regarding gene-silencing potency and specificity. We then present novel, rationale-based strategies in combination with computational and experimental approaches to induce potent, specific, and nontoxic gene silencing in plants.
Assuntos
Algoritmos , Plantas/genética , Interferência de RNA , Biologia Computacional/métodos , Técnicas de Silenciamento de Genes/métodos , Genes de PlantasRESUMO
The accurate construction and interpretation of gene association networks (GANs) is challenging, but crucial, to the understanding of gene function, interaction and cellular behavior at the genome level. Most current state-of-the-art computational methods for genome-wide GAN reconstruction require high-performance computational resources. However, even high-performance computing cannot fully address the complexity involved with constructing GANs from very large-scale expression profile datasets, especially for the organisms with medium to large size of genomes, such as those of most plant species. Here, we present a new approach, GPLEXUS (http://plantgrn.noble.org/GPLEXUS/), which integrates a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing that is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs â¼1000 times faster. GPLEXUS integrates Markov Clustering Algorithm to effectively identify functional subnetworks. Furthermore, GPLEXUS includes a novel 'condition-removing' method to identify the major experimental conditions in which each subnetwork operates from very large-scale gene expression datasets across several experimental conditions, which allows users to annotate the various subnetworks with experiment-specific conditions. We demonstrate GPLEXUS's capabilities by construing global GANs and analyzing subnetworks related to defense against biotic and abiotic stress, cell cycle growth and division in Arabidopsis thaliana.
Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Divisão Celular/genética , Processos de Crescimento Celular/genética , Genômica/métodos , Cadeias de MarkovRESUMO
Along with the canonical miRNA, distinct miRNA-like sequences called sibling miRNAs (sib-miRs) are generated from the same pre-miRNA. Among them, isomeric sequences featuring slight variations at the terminals, relative to the canonical miRNA, constitute a pool of isomeric sibling miRNAs (isomiRs). Despite the high prevalence of isomiRs in eukaryotes, their features and relevance remain elusive. In this study, we performed a comprehensive analysis of mature precursor miRNA (pre-miRNA) sequences from Arabidopsis to understand their features and regulatory targets. The influence of isomiR terminal heterogeneity in target binding was examined comprehensively. Our comprehensive analyses suggested a novel computational strategy that utilizes miRNA and its isomiRs to enhance the accuracy of their regulatory target prediction in Arabidopsis. A few targets are shared by several members of isomiRs; however, this phenomenon was not typical. Gene Ontology (GO) enrichment analysis showed that commonly targeted mRNAs were enriched for certain GO terms. Moreover, comparison of these commonly targeted genes with validated targets from published data demonstrated that the validated targets are bound by most isomiRs and not only the canonical miRNA. Furthermore, the biological role of isomiRs in target cleavage was supported by degradome data. Incorporating this finding, we predicted potential target genes of several miRNAs and confirmed them by experimental assays. This study proposes a novel strategy to improve the accuracy of predicting miRNA targets through combined use of miRNA with its isomiRs.
Assuntos
Arabidopsis/genética , MicroRNAs/genética , Precursores de RNA/genética , RNA de Plantas/genética , Pequeno RNA não Traduzido/genética , Proteínas de Arabidopsis/genética , Sequência de Bases , Biologia Computacional/métodos , Bases de Dados Genéticas/estatística & dados numéricos , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Dados de Sequência Molecular , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/estatística & dados numéricos , Homologia de Sequência de AminoácidosRESUMO
Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.
Assuntos
Biologia Computacional , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Expressão Gênica , Genoma , Internet , Genética Reversa , SoftwareRESUMO
BACKGROUND: Plants regulate intrinsic gene expression through transcription factors (TFs), transcriptional regulators (TRs), chromatin regulators (CRs), and the basal transcription machinery. An understanding of plant gene regulatory mechanisms at a systems level requires the identification of these regulatory elements on a genomic scale. RESULTS: Here, we present PlantTFcat, a high-performance web-based analysis tool that is designed to identify and categorize plant TF/TR/CR genes from genome-scale protein and nucleic acid sequences by systematically analyzing InterProScan domain patterns in protein sequences. The comprehensive prediction logics that are included in PlantTFcat are based on relationships between gene families and conserved domains from 108 published plant TF/TR/CR families. These prediction logics effectively distinguish TF/TR/CR families with common conserved domains. Our systematic performance evaluations indicate that PlantTFcat annotates known TF/TR/CR families with high coverage and sensitivity. CONCLUSIONS: PlantTFcat provides an analysis tool to identify and categorize plant TF/TR/CR genes on a genomic scale. PlantTFcat is freely available to the public at http://plantgrn.noble.org/PlantTFcat/.
Assuntos
Regulação da Expressão Gênica de Plantas , Fatores de Transcrição/genética , Arabidopsis/genética , Cromatina/genética , Biologia Computacional/métodos , Previsões , Genoma de Planta/genética , Internet , Medicago truncatula/genética , Proteínas de Plantas/genética , Valor Preditivo dos Testes , Software , Zea mays/genéticaRESUMO
Legumes play a vital role in maintaining the nitrogen cycle of the biosphere. They conduct symbiotic nitrogen fixation through endosymbiotic relationships with bacteria in root nodules. However, this and other characteristics of legumes, including mycorrhization, compound leaf development and profuse secondary metabolism, are absent in the typical model plant Arabidopsis thaliana. We present LegumeIP (http://plantgrn.noble.org/LegumeIP/), an integrative database for comparative genomics and transcriptomics of model legumes, for studying gene function and genome evolution in legumes. LegumeIP compiles gene and gene family information, syntenic and phylogenetic context and tissue-specific transcriptomic profiles. The database holds the genomic sequences of three model legumes, Medicago truncatula, Glycine max and Lotus japonicus plus two reference plant species, A. thaliana and Populus trichocarpa, with annotations based on UniProt, InterProScan, Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes databases. LegumeIP also contains large-scale microarray and RNA-Seq-based gene expression data. Our new database is capable of systematic synteny analysis across M. truncatula, G. max, L. japonicas and A. thaliana, as well as construction and phylogenetic analysis of gene families across the five hosted species. Finally, LegumeIP provides comprehensive search and visualization tools that enable flexible queries based on gene annotation, gene family, synteny and relative gene expression.
Assuntos
Bases de Dados Genéticas , Fabaceae/genética , Perfilação da Expressão Gênica , Genoma de Planta , Sistema Enzimático do Citocromo P-450/genética , Fabaceae/metabolismo , Fabaceae/microbiologia , Genes de Plantas , Genômica , Glicosiltransferases/genética , Lotus/genética , Medicago truncatula/genética , Modelos Genéticos , Software , Glycine max/genética , Simbiose , Integração de SistemasRESUMO
Plant endogenous non-coding short small RNAs (20-24 nt), including microRNAs (miRNAs) and a subset of small interfering RNAs (ta-siRNAs), play important role in gene expression regulatory networks (GRNs). For example, many transcription factors and development-related genes have been reported as targets of these regulatory small RNAs. Although a number of miRNA target prediction algorithms and programs have been developed, most of them were designed for animal miRNAs which are significantly different from plant miRNAs in the target recognition process. These differences demand the development of separate plant miRNA (and ta-siRNA) target analysis tool(s). We present psRNATarget, a plant small RNA target analysis server, which features two important analysis functions: (i) reverse complementary matching between small RNA and target transcript using a proven scoring schema, and (ii) target-site accessibility evaluation by calculating unpaired energy (UPE) required to 'open' secondary structure around small RNA's target site on mRNA. The psRNATarget incorporates recent discoveries in plant miRNA target recognition, e.g. it distinguishes translational and post-transcriptional inhibition, and it reports the number of small RNA/target site pairs that may affect small RNA binding activity to target transcript. The psRNATarget server is designed for high-throughput analysis of next-generation data with an efficient distributed computing back-end pipeline that runs on a Linux cluster. The server front-end integrates three simplified user-friendly interfaces to accept user-submitted or preloaded small RNAs and transcript sequences; and outputs a comprehensive list of small RNA/target pairs along with the online tools for batch downloading, key word searching and results sorting. The psRNATarget server is freely available at http://plantgrn.noble.org/psRNATarget/.
Assuntos
RNA de Plantas/química , Pequeno RNA não Traduzido/química , Software , Regulação da Expressão Gênica de Plantas , MicroRNAs/química , MicroRNAs/metabolismo , RNA Mensageiro/química , RNA Mensageiro/metabolismo , RNA de Plantas/metabolismo , RNA Interferente Pequeno/química , RNA Interferente Pequeno/metabolismo , Pequeno RNA não Traduzido/metabolismo , Análise de Sequência de RNARESUMO
Plant microRNAs (miRNA) target recognition mechanism was once thought to be simple and straightforward, i.e. through perfect reverse complementary matching; therefore, very few target prediction tools and algorithms were developed for plants as compared to those for animals. However, the discovery of transcription suppression and the more recent observation of widespread translational regulation by miRNAs highlight the enormous diversity and complexity of gene regulation in plant systems. This, in turn, necessitates the need for advanced computational tools/algorithms for comprehensive miRNA target analysis to help understand miRNA regulatory mechanisms. Yet, advanced/comprehensive plant miRNA target analysis tools are still lacking despite the desirability and importance of such tools, especially the ability of predicting translational inhibition and integrating transcriptome data. This review focuses on recent progress in plant miRNA target recognition mechanism, principles of target prediction based on these understandings, comparison of current prediction tools and algorithms for plant miRNA target analysis and the outlook for future directions in the development of plant miRNA target tools and algorithms.
Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica de Plantas , MicroRNAs/metabolismo , Plantas/genética , RNA de Plantas/metabolismo , Algoritmos , Sequência de Bases , Dados de Sequência MolecularRESUMO
Eukaryotic messenger RNA (mRNA) contains not only protein-coding regions but also a plethora of functional cis-elements that influence or coordinate a number of regulatory aspects of gene expression, such as mRNA stability, splicing forms, and translation rates. Understanding the rules that apply to each of these element types (e.g., whether the element is defined by primary or higher-order structure) allows for the discovery of novel mechanisms of gene expression as well as the design of transcripts with controlled expression. Bioinformatics plays a major role in creating databases and finding non-evident patterns governing each type of eukaryotic functional element. Much of what we currently know about mRNA regulatory elements in eukaryotes is derived from microorganism and animal systems, with the particularities of plant systems lagging behind. In this review, we provide a general introduction to the most well-known eukaryotic mRNA regulatory motifs (splicing regulatory elements, internal ribosome entry sites, iron-responsive elements, AU-rich elements, zipcodes, and polyadenylation signals) and describe available bioinformatics resources (databases and analysis tools) to analyze eukaryotic transcripts in search of functional elements, focusing on recent trends in bioinformatics methods and tool development. We also discuss future directions in the development of better computational tools based upon current knowledge of these functional elements. Improved computational tools would advance our understanding of the processes underlying gene regulations. We encourage plant bioinformaticians to turn their attention to this subject to help identify novel mechanisms of gene expression regulation using RNA motifs that have potentially evolved or diverged in plant species.
RESUMO
Plant secretory trichomes have a unique capacity for chemical synthesis and secretion and have been described as biofactories for the production of natural products. However, until recently, most trichome-specific metabolic pathways and genes involved in various trichome developmental stages have remained unknown. Furthermore, only a very limited amount of plant trichome genomics information is available in scattered databases. We present an integrated "omics" database, TrichOME, to facilitate the study of plant trichomes. The database hosts a large volume of functional omics data, including expressed sequence tag/unigene sequences, microarray hybridizations from both trichome and control tissues, mass spectrometry-based trichome metabolite profiles, and trichome-related genes curated from published literature. The expressed sequence tag/unigene sequences have been annotated based upon sequence similarity with popular databases (e.g. Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and Transporter Classification Database). The unigenes, metabolites, curated genes, and probe sets have been mapped against each other to enable comparative analysis. The database also integrates bioinformatics tools with a focus on the mining of trichome-specific genes in unigenes and microarray-based gene expression profiles. TrichOME is a valuable and unique resource for plant trichome research, since the genes and metabolites expressed in trichomes are often underrepresented in regular non-tissue-targeted cDNA libraries. TrichOME is freely available at http://www.planttrichome.org/.
Assuntos
Bases de Dados Factuais , Plantas/metabolismo , Plantas/ultraestrutura , Etiquetas de Sequências Expressas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Caules de Planta/ultraestrutura , Análise Serial de ProteínasRESUMO
BACKGROUND: Membrane transporters play crucial roles in living cells. Experimental characterization of transporters is costly and time-consuming. Current computational methods for transporter characterization still require extensive curation efforts, especially for eukaryotic organisms. We developed a novel genome-scale transporter prediction and characterization system called TransportTP that combined homology-based and machine learning methods in a two-phase classification approach. First, traditional homology methods were employed to predict novel transporters based on sequence similarity to known classified proteins in the Transporter Classification Database (TCDB). Second, machine learning methods were used to integrate a variety of features to refine the initial predictions. A set of rules based on transporter features was developed by machine learning using well-curated proteomes as guides. RESULTS: In a cross-validation using the yeast proteome for training and the proteomes of ten other organisms for testing, TransportTP achieved an equivalent recall and precision of 81.8%, based on TransportDB, a manually annotated transporter database. In an independent test using the Arabidopsis proteome for training and four recently sequenced plant proteomes for testing, it achieved a recall of 74.6% and a precision of 73.4%, according to our manual curation. CONCLUSIONS: TransportTP is the most effective tool for eukaryotic transporter characterization up to date.
Assuntos
Biologia Computacional/métodos , Proteínas de Membrana Transportadoras/classificação , Software , Inteligência Artificial , Bases de Dados de Proteínas , Proteínas de Membrana Transportadoras/química , Análise de Sequência de ProteínaRESUMO
In plants, short RNAs including approximately 21-nt microRNA (miRNA) and 21-nt trans-acting siRNA (ta-siRNA) compose a 'miRNA --> ta-siRNA --> target gene' cascade pathway that regulates gene expression at the posttranscriptional level. In this cascade, biogenesis of ta-siRNA clusters requires 21-nt intervals (i.e. phasing) and miRNA (phase-initiator) cleavage sites on its TAS transcript. Here, we report a novel web server, pssRNAMiner, which is developed to identify both the clusters of phased small RNAs as well as the potential phase-initiator. To detect phased small RNA clusters, the pssRNAMiner maps input small RNAs against user-specified transcript/genomic sequences, and then identifies phased small RNA clusters by evaluating P-values of hypergeometric distribution. To identify potential phase-initiators, pssRNAMiner aligns input phase-initiators with transcripts of TAS candidates using the Smith-Waterman algorithm. Potential cleavage sites on TAS candidates are further identified from complementary regions by weighting the alignment expectation and its distance to detected phased small RNA clusters. The pssRNAMiner web server is freely available at http://bioinfo3.noble.org/pssRNAMiner/.