RESUMEN
Agriculture faces increasing demand for yield, higher plant-derived protein content and diversity while facing pressure to achieve sustainability. Although the genomes of many of the important crops have been sequenced, the subcellular locations of most of the encoded proteins remain unknown or are only predicted. Protein subcellular location is crucial in determining protein function and accumulation patterns in plants, and is critical for targeted improvements in yield and resilience. Integrating location data from over 800 studies for 12 major crop species into the cropPAL2020 data collection showed that while >80% of proteins in most species are not localised by experimental data, combining species data or integrating predictions can help bridge gaps at similar accuracy. The collation and integration of over 61 505 experimental localisations and more than 6 million predictions showed that the relative sizes of the protein catalogues located in different subcellular compartments are comparable between crops and Arabidopsis. A comprehensive cross-species comparison showed that between 50% and 80% of the subcellulomes are conserved across species and that conservation only depends to some degree on the phylogenetic relationship of the species. Protein subcellular locations in major biosynthesis pathways are more often conserved than in metabolic pathways. Underlying this conservation is a clear potential for subcellular diversity in protein location between species by means of gene duplication and alternative splicing. Our cropPAL data set and search platform (https://crop-pal.org) provide a comprehensive subcellular proteomics resource to drive compartmentation-based approaches for improving yield, protein composition and resilience in future crop varieties.
Asunto(s)
Productos Agrícolas/metabolismo , Bases de Datos de Proteínas , Proteínas de Plantas/metabolismo , Compartimento Celular , Productos Agrícolas/citología , Fitomejoramiento , Células Vegetales/metabolismo , Especificidad de la EspecieRESUMEN
In eukaryotic organisms, subcellular protein location is critical in defining protein function and understanding sub-functionalization of gene families. Some proteins have defined locations, whereas others have low specificity targeting and complex accumulation patterns. There is no single approach that can be considered entirely adequate for defining the in vivo location of all proteins. By combining evidence from different approaches, the strengths and weaknesses of different technologies can be estimated, and a location consensus can be built. The Subcellular Location of Proteins in Arabidopsis database ( http://suba.live/ ) combines experimental data sets that have been reported in the literature and is analyzing these data to provide useful tools for biologists to interpret their own data. Foremost among these tools is a consensus classifier (SUBAcon) that computes a proposed location for all proteins based on balancing the experimental evidence and predictions. Further tools analyze sets of proteins to define the abundance of cellular structures. Extending these types of resources to plant crop species has been complex due to polyploidy, gene family expansion and contraction, and the movement of pathways and processes within cells across the plant kingdom. The Crop Proteins of Annotated Location database ( http://crop-pal.org/ ) has developed a range of subcellular location resources including a species-specific voting consensus for 12 plant crop species that offers collated evidence and filters for current crop proteomes akin to SUBA. Comprehensive cross-species comparison of these data shows that the sub-cellular proteomes (subcellulomes) depend only to some degree on phylogenetic relationship and are more conserved in major biosynthesis than in metabolic pathways. Together SUBA and cropPAL created reference subcellulomes for plants as well as species-specific subcellulomes for cross-species data mining. These data collections are increasingly used by the research community to provide a subcellular protein location layer, inform models of compartmented cell function and protein-protein interaction network, guide future molecular crop breeding strategies, or simply answer a specific question-where is my protein of interest inside the cell?
Asunto(s)
Arabidopsis , Arabidopsis/genética , Bases de Datos de Proteínas , Humanos , Filogenia , Proteómica , Especificidad de la Especie , Fracciones SubcelularesRESUMEN
We applied 15N labeling approaches to leaves of the Arabidopsis thaliana rosette to characterize their protein degradation rate and understand its determinants. The progressive labeling of new peptides with 15N and measuring the decrease in the abundance of >60,000 existing peptides over time allowed us to define the degradation rate of 1228 proteins in vivo. We show that Arabidopsis protein half-lives vary from several hours to several months based on the exponential constant of the decay rate for each protein. This rate was calculated from the relative isotope abundance of each peptide and the fold change in protein abundance during growth. Protein complex membership and specific protein domains were found to be strong predictors of degradation rate, while N-end amino acid, hydrophobicity, or aggregation propensity of proteins were not. We discovered rapidly degrading subunits in a variety of protein complexes in plastids and identified the set of plant proteins whose degradation rate changed in different leaves of the rosette and correlated with leaf growth rate. From this information, we have calculated the protein turnover energy costs in different leaves and their key determinants within the proteome.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Arabidopsis/crecimiento & desarrollo , Isótopos de Nitrógeno , Hojas de la Planta/crecimiento & desarrollo , Hojas de la Planta/metabolismo , Proteolisis , ProteomaRESUMEN
The SUBcellular location database for Arabidopsis proteins (SUBA4, http://suba.live) is a comprehensive collection of manually curated published data sets of large-scale subcellular proteomics, fluorescent protein visualization, protein-protein interaction (PPI) as well as subcellular targeting calls from 22 prediction programs. SUBA4 contains an additional 35 568 localizations totalling more than 60 000 experimental protein location claims as well as 37 new suborganellar localization categories. The experimental PPI data has been expanded to 26 327 PPI pairs including 856 PPI localizations from experimental fluorescent visualizations. The new SUBA4 user interface enables users to choose quickly from the filter categories: 'subcellular location', 'protein properties', 'protein-protein interaction' and 'affiliations' to build complex queries. This allows substantial expansion of search parameters into 80 annotation types comprising 1 150 204 new annotations to study metadata associated with subcellular localization. The 'BLAST' tab contains a sequence alignment tool to enable a sequence fragment from any species to find the closest match in Arabidopsis and retrieve data on subcellular location. Using the location consensus SUBAcon, the SUBA4 toolbox delivers three novel data services allowing interactive analysis of user data to provide relative compartmental protein abundances and proximity relationship analysis of PPI and coexpression partners from a submitted list of Arabidopsis gene identifiers.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Espacio Intracelular/metabolismo , Anotación de Secuencia Molecular , Transporte de Proteínas , Proteómica , Programas Informáticos , Navegador WebRESUMEN
Measuring changes in protein or organelle abundance in the cell is an essential, but challenging aspect of cell biology. Frequently-used methods for determining organelle abundance typically rely on detection of a very few marker proteins, so are unsatisfactory. In silico estimates of protein abundances from publicly available protein spectra can provide useful standard abundance values but contain only data from tissue proteomes, and are not coupled to organelle localization data. A new protein abundance score, the normalized protein abundance scale (NPAS), expands on the number of scored proteins and the scoring accuracy of lower-abundance proteins in Arabidopsis. NPAS was combined with subcellular protein localization data, facilitating quantitative estimations of organelle abundance during routine experimental procedures. A suite of targeted proteomics markers for subcellular compartment markers was developed, enabling independent verification of in silico estimates for relative organelle abundance. Estimation of relative organelle abundance was found to be reproducible and consistent over a range of tissues and growth conditions. In silico abundance estimations and localization data have been combined into an online tool, multiple marker abundance profiling, available in the SUBA4 toolbox (http://suba.live).
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Proteoma , Proteómica , Biomarcadores/metabolismo , Orgánulos/metabolismo , Transporte de ProteínasRESUMEN
The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.
Asunto(s)
Embryophyta/genética , Modelos Estructurales , Proteínas de Plantas/química , Edición de ARN/genética , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Embryophyta/metabolismo , Mitocondrias/metabolismo , Modelos Moleculares , Anotación de Secuencia Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plastidios/metabolismo , Transporte de Proteínas , Proteínas con Motivos de Reconocimiento de ARN/química , Proteínas con Motivos de Reconocimiento de ARN/genética , Proteínas con Motivos de Reconocimiento de ARN/metabolismo , ARN de Planta/genética , Alineación de SecuenciaRESUMEN
Barley, wheat, rice and maize provide the bulk of human nutrition and have extensive industrial use as agricultural products. The genomes of these crops each contains >40,000 genes encoding proteins; however, the major genome databases for these species lack annotation information of protein subcellular location for >80% of these gene products. We address this gap, by constructing the compendium of crop protein subcellular locations called crop Proteins with Annotated Locations (cropPAL). Subcellular location is most commonly determined by fluorescent protein tagging of live cells or mass spectrometry detection in subcellular purifications, but can also be predicted from amino acid sequence or protein expression patterns. The cropPAL database collates 556 published studies, from >300 research institutes in >30 countries that have been previously published, as well as compiling eight pre-computed subcellular predictions for all Hordeum vulgare, Triticum aestivum, Oryza sativa and Zea mays protein sequences. The data collection including metadata for proteins and published studies can be accessed through a search portal http://crop-PAL.org. The subcellular localization information housed in cropPAL helps to depict plant cells as compartmentalized protein networks that can be investigated for improving crop yield and quality, and developing new biotechnological solutions to agricultural challenges.
Asunto(s)
Bases de Datos Genéticas , Genoma de Planta/genética , Hordeum/genética , Oryza/genética , Triticum/genética , Zea mays/genética , Secuencia de Aminoácidos , Biología Computacional , Productos Agrícolas , Hordeum/metabolismo , Proteínas de Plantas/genética , Transporte de ProteínasRESUMEN
MOTIVATION: Knowing the subcellular location of proteins is critical for understanding their function and developing accurate networks representing eukaryotic biological processes. Many computational tools have been developed to predict proteome-wide subcellular location, and abundant experimental data from green fluorescent protein (GFP) tagging or mass spectrometry (MS) are available in the model plant, Arabidopsis. None of these approaches is error-free, and thus, results are often contradictory. RESULTS: To help unify these multiple data sources, we have developed the SUBcellular Arabidopsis consensus (SUBAcon) algorithm, a naive Bayes classifier that integrates 22 computational prediction algorithms, experimental GFP and MS localizations, protein-protein interaction and co-expression data to derive a consensus call and probability. SUBAcon classifies protein location in Arabidopsis more accurately than single predictors. AVAILABILITY: SUBAcon is a useful tool for recovering proteome-wide subcellular locations of Arabidopsis proteins and is displayed in the SUBA3 database (http://suba.plantenergy.uwa.edu.au). The source code and input data is available through the SUBA3 server (http://suba.plantenergy.uwa.edu.au//SUBAcon.html) and the Arabidopsis SUbproteome REference (ASURE) training set can be accessed using the ASURE web portal (http://suba.plantenergy.uwa.edu.au/ASURE).
Asunto(s)
Algoritmos , Proteínas de Arabidopsis/análisis , Arabidopsis/química , Proteoma/análisis , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Teorema de Bayes , Bases de Datos de Proteínas , Proteínas Fluorescentes Verdes/genética , Espectrometría de Masas , Proteínas de la Membrana/análisis , Mapeo de Interacción de Proteínas , Proteoma/genética , Proteoma/metabolismo , Programas InformáticosRESUMEN
In reverse genetic knockout (KO) studies that aim to assign function to specific genes, confirming the reduction in abundance of the encoded protein will often aid the link between genotype and phenotype. However, measuring specific protein abundance is particularly difficult in plant research, where only a limited number of antibodies are available. This problem is enhanced when studying gene families or different proteins derived from the same gene (isoforms), as many antibodies cross react with more than one protein. We show that utilizing selected reaction monitoring (SRM) mass spectrometry allows researchers to confirm protein abundance in mutant lines, even when discrimination between very similar proteins is needed. Selecting the best peptides for SRM analysis to ensure that protein- or gene-specific information can be obtained requires a series of steps, aids, and interpretation. To enable this process in Arabidopsis (Arabidopsis thaliana), we have built a Web-based tool, the Arabidopsis Proteotypic Predictor, to select candidate SRM transitions when no previous mass spectrometry evidence exists. We also provide an in-depth analysis of the theoretical Arabidopsis proteome and its use in selecting candidate SRM peptides to establish assays for use in determining protein abundance. To test the effectiveness of SRM mass spectrometry in determining protein abundance in mutant lines, we selected two enzymes with multiple isoforms, aconitase and malate dehydrogenase. Selected peptides were quantified to estimate the abundance of each of the two mitochondrial isoforms in wild-type, KO, double KO, and complemented plant lines. We show that SRM protein analysis is a sensitive and rapid approach to quantify protein abundance differences in Arabidopsis for specific and highly related enzyme isoforms.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Espectrometría de Masas/métodos , Programas Informáticos , Aconitato Hidratasa/metabolismo , Secuencia de Aminoácidos , Proteínas de Arabidopsis/química , Simulación por Computador , Técnicas de Inactivación de Genes , Malato Deshidrogenasa/metabolismo , Mitocondrias/enzimología , Proteínas Mitocondriales/química , Proteínas Mitocondriales/metabolismo , Datos de Secuencia Molecular , Péptidos/química , Péptidos/metabolismo , Extractos Vegetales/metabolismo , Hojas de la Planta/metabolismo , Proteoma/metabolismo , Tripsina/metabolismoRESUMEN
The subcellular location database for Arabidopsis proteins (SUBA3, http://suba.plantenergy.uwa.edu.au) combines manual literature curation of large-scale subcellular proteomics, fluorescent protein visualization and protein-protein interaction (PPI) datasets with subcellular targeting calls from 22 prediction programs. More than 14 500 new experimental locations have been added since its first release in 2007. Overall, nearly 650 000 new calls of subcellular location for 35 388 non-redundant Arabidopsis proteins are included (almost six times the information in the previous SUBA version). A re-designed interface makes the SUBA3 site more intuitive and easier to use than earlier versions and provides powerful options to search for PPIs within the context of cell compartmentation. SUBA3 also includes detailed localization information for reference organelle datasets and incorporates green fluorescent protein (GFP) images for many proteins. To determine as objectively as possible where a particular protein is located, we have developed SUBAcon, a Bayesian approach that incorporates experimental localization and targeting prediction data to best estimate a protein's location in the cell. The probabilities of subcellular location for each protein are provided and displayed as a pictographic heat map of a plant cell in SUBA3.
Asunto(s)
Proteínas de Arabidopsis/análisis , Bases de Datos de Proteínas , Internet , Mapeo de Interacción de Proteínas , Proteómica , Integración de Sistemas , Interfaz Usuario-ComputadorRESUMEN
Omics research in Oryza sativa (rice) relies on the use of multiple databases to obtain different types of information to define gene function. We present Rice DB, an Oryza information portal that is a functional genomics database, linking gene loci to comprehensive annotations, expression data and the subcellular location of encoded proteins. Rice DB has been designed to integrate the direct comparison of rice with Arabidopsis (Arabidopsis thaliana), based on orthology or 'expressology', thus using and combining available information from two pre-eminent plant models. To establish Rice DB, gene identifiers (more than 40 types) and annotations from a variety of sources were compiled, functional information based on large-scale and individual studies was manually collated, hundreds of microarrays were analysed to generate expression annotations, and the occurrences of potential functional regulatory motifs in promoter regions were calculated. A range of computational subcellular localization predictions were also run for all putative proteins encoded in the rice genome, and experimentally confirmed protein localizations have been collated, curated and linked to functional studies in rice. A single search box allows anything from gene identifiers (for rice and/or Arabidopsis), motif sequences, subcellular location, to keyword searches to be entered, with the capability of Boolean searches (such as AND/OR). To demonstrate the utility of Rice DB, several examples are presented including a rice mitochondrial proteome, which draws on a variety of sources for subcellular location data within Rice DB. Comparisons of subcellular location, functional annotations, as well as transcript expression in parallel with Arabidopsis reveals examples of conservation between rice and Arabidopsis, using Rice DB (http://ricedb.plantenergy.uwa.edu.au).
Asunto(s)
Bases de Datos Genéticas , Genoma de Planta/genética , Genómica , Oryza/genética , Interfaz Usuario-Computador , Arabidopsis/genética , Arabidopsis/metabolismo , Secuencia de Bases , Evolución Biológica , Internet , Mitocondrias/genética , Anotación de Secuencia Molecular , Oryza/metabolismo , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Proteoma , ARN Mensajero/genética , ARN de Planta/genética , Programas Informáticos , TranscriptomaRESUMEN
Proteomics has become a critical tool in the functional understanding of plant processes at the molecular level. Proteomics-based studies have also contributed to the ever-expanding array of data in modern biology, with many generating Web portals and online resources that contain incrementally expanding and updated information. Many of these resources reflect specialist research areas with significant and novel information that is not currently captured by centralized repositories. The Arabidopsis (Arabidopsis thaliana) community is well served by a number of online proteomics resources that hold an abundance of functional information. These sites can be difficult to locate among a multitude of online resources. Furthermore, they can be difficult to navigate in order to identify specific features of interest without significant technical knowledge. Recently, members of the Arabidopsis proteomics community involved in developing many of these resources decided to develop a summary aggregation portal that is capable of retrieving proteomics data from a series of online resources on the fly. The Web portal is known as the MASCP Gator and can be accessed at the following address: http://gator.masc-proteomics.org/. Significantly, proteomics data displayed at this site retrieve information from the data repositories upon each request. This means that information is always up to date and displays the latest data sets. The site also provides hyperlinks back to the source information hosted at each of the curated databases to facilitate more in-depth analysis of the primary data.
Asunto(s)
Arabidopsis/metabolismo , Bases de Datos de Proteínas , Internet , Proteómica/métodos , Arabidopsis/enzimología , Proteínas de Arabidopsis/metabolismo , Minería de Datos , Fosforilación , Proteínas Quinasas/metabolismo , Interfaz Usuario-ComputadorRESUMEN
The provision of precise metadata is an important but a largely underrated challenge for modern science [Nature 2009, 461, 145]. We describe here a dictionary methods language dREL that has been designed to enable complex data relationships to be expressed as formulaic scripts in data dictionaries written in DDLm [Spadaccini and Hall J. Chem. Inf. Model.2012 doi:10.1021/ci300075z]. dREL describes data relationships in a simple but powerful canonical form that is easy to read and understand and can be executed computationally to evaluate or validate data. The execution of dREL expressions is not a substitute for traditional scientific computation; it is to provide precise data dependency information to domain-specific definitions and a means for cross-validating data. Some scientific fields apply conventional programming languages to methods scripts but these tend to inhibit both dictionary development and accessibility. dREL removes the programming barrier and encourages the production of the metadata needed for seamless data archiving and exchange in science.
Asunto(s)
Diccionarios como Asunto , Informática/métodos , Lenguajes de ProgramaciónRESUMEN
The increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n = 30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.
RESUMEN
BACKGROUND: Arabidopsis thaliana is clearly established as the model plant species. Given the ever-growing demand for food, there is a need to translate the knowledge learned in Arabidopsis to agronomically important species, such as rice (Oryza sativa). To gain a comparative insight into the similarities and differences into how organs are built and how plants respond to stress, the transcriptomes of Arabidopsis and rice were compared at the level of gene orthology and functional categorisation. RESULTS: Organ specific transcripts in rice and Arabidopsis display less overlap in terms of gene orthology compared to the orthology observed between both genomes. Although greater overlap in terms of functional classification was observed between root specific transcripts in rice and Arabidopsis, this did not extend to flower, leaf or seed specific transcripts. In contrast, the overall abiotic stress response transcriptome displayed a significantly greater overlap in terms of gene orthology compared to the orthology observed between both genomes. However, ~50% or less of these orthologues responded in a similar manner in both species. In fact, under cold and heat treatments as many or more orthologous genes responded in an opposite manner or were unchanged in one species compared to the other. Examples of transcripts that responded oppositely include several genes encoding proteins involved in stress and redox responses and non-symbiotic hemoglobins that play central roles in stress signalling pathways. The differences observed in the abiotic transcriptomes were mirrored in the presence of cis-acting regulatory elements in the promoter regions of stress responsive genes and the transcription factors that potentially bind these regulatory elements. Thus, both the abiotic transcriptome and its regulation differ between rice and Arabidopsis. CONCLUSIONS: These results reveal significant divergence between Arabidopsis and rice, in terms of the abiotic stress response and its regulation. Both plants are shown to employ unique combinations of genes to achieve growth and stress responses. Comparison of these networks provides a more rational approach to translational studies that is based on the response observed in these two diverse plant models.
Asunto(s)
Arabidopsis/genética , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Oryza/genética , Frío , Bases de Datos Genéticas , Sequías , Regulación de la Expresión Génica de las Plantas/efectos de los fármacos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Cloruro de Sodio/farmacologíaRESUMEN
The RNA-binding pentatricopeptide repeat (PPR) family comprises hundreds to thousands of genes in most plants, but only a few dozen in algae, indicating massive gene expansions during land plant evolution. The nature and timing of these expansions has not been well defined due to the sparse sequence data available from early-diverging land plant lineages. In this study, we exploit the comprehensive OneKP datasets of over 1000 transcriptomes from diverse plants and algae toward establishing a clear picture of the evolution of this massive gene family, focusing on the proteins typically associated with RNA editing, which show the most spectacular variation in numbers and domain composition across the plant kingdom. We characterize over 2 250 000 PPR motifs in over 400 000 proteins. In lycophytes, polypod ferns, and hornworts, nearly 10% of expressed protein-coding genes encode putative PPR editing factors, whereas they are absent from algae and complex-thalloid liverworts. We show that rather than a single expansion, most land plant lineages with high numbers of editing factors have continued to generate novel sequence diversity. We identify sequence variations that imply functional differences between PPR proteins in seed plants versus non-seed plants and variations we propose to be linked to seed-plant-specific editing co-factors. Finally, using the sequence variations across the datasets, we develop a structural model of the catalytic DYW domain associated with C-to-U editing and identify a clade of unique DYW variants that are strong candidates as U-to-C RNA-editing factors, given their phylogenetic distribution and sequence characteristics.
Asunto(s)
Embryophyta/genética , Proteínas de Plantas/genética , Edición de ARN/genética , Proteínas de Unión al ARN/genética , Secuencias de Aminoácidos , Bases de Datos Genéticas , Embryophyta/clasificación , Evolución Molecular , Duplicación de Gen , Variación Genética , Modelos Moleculares , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Plantas/clasificación , Plantas/genética , Dominios Proteicos , ARN de Planta/metabolismo , Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/metabolismo , Secuencias Repetitivas de AminoácidoRESUMEN
Plant mitochondria play central roles in cellular energy production, metabolism and stress responses. Recent phosphoproteomic studies in mammalian and yeast mitochondria have presented evidence indicating that protein phosphorylation is a likely regulatory mechanism across a broad range of important mitochondrial processes. This study investigated protein phosphorylation in purified mitochondria from cell suspensions of the model plant Arabidopsis thaliana using affinity enrichment and proteomic tools. Eighteen putative phosphoproteins consisting of mitochondrial metabolic enzymes, HSPs, a protease and several proteins of unknown function were detected on 2-DE separations of Arabidopsis mitochondrial proteins and affinity-enriched phosphoproteins using the Pro-Q Diamond phospho-specific in-gel dye. Comparisons with mitochondrial phosphoproteomes of yeast and mouse indicate that these three species share few validated phosphoproteins. Phosphorylation sites for seven of the eighteen mitochondrial proteins were characterized by titanium dioxide enrichment and MS/MS. In the process, 71 phosphopeptides from Arabidopsis proteins which are not present in mitochondria but found as contaminants in various types of mitochondrial preparations were also identified, indicating the low level of phosphorylation of mitochondrial components compared with other cellular components in Arabidopsis. Information gained from this study provides a better understanding of protein phosphorylation at both the subcellular and the cellular level in Arabidopsis.
Asunto(s)
Proteínas de Arabidopsis/análisis , Arabidopsis/metabolismo , Proteínas Mitocondriales/análisis , Fosfoproteínas/análisis , Proteoma/análisis , Adenosina Trifosfato/farmacología , Animales , Cromatografía de Afinidad , Electroforesis en Gel Bidimensional , Marcaje Isotópico , Ratones , Fosfopéptidos/análisis , Radioisótopos de Fósforo , Fosforilación/efectos de los fármacosRESUMEN
Queens of social insects make all mate-choice decisions on a single day, except in honeybees whose queens can conduct mating flights for several days even when already inseminated by a number of drones. Honeybees therefore appear to have a unique, evolutionarily derived form of sexual conflict: a queen's decision to pursue risky additional mating flights is driven by later-life fitness gains from genetically more diverse worker-offspring but reduces paternity shares of the drones she already mated with. We used artificial insemination, RNA-sequencing and electroretinography to show that seminal fluid induces a decline in queen vision by perturbing the phototransduction pathway within 24-48 hr. Follow up field trials revealed that queens receiving seminal fluid flew two days earlier than sister queens inseminated with saline, and failed more often to return. These findings are consistent with seminal fluid components manipulating queen eyesight to reduce queen promiscuity across mating flights.
Asunto(s)
Abejas/fisiología , Factores Biológicos/metabolismo , Vuelo Animal , Semen/química , Conducta Sexual Animal , Sobrevida , Visión Ocular/efectos de los fármacos , Animales , Electrorretinografía , Análisis de Secuencia de ARNRESUMEN
Sub-functionalization during the expansion of gene families in eukaryotes has occurred in part through specific subcellular localization of different family members. To better understand this process in plants, compiled records of large-scale proteomic and fluorescent protein localization datasets can be explored and bioinformatic predictions for protein localization can be used to predict the gaps in experimental data. This process can be followed by targeted experiments to test predictions. The SUBA3 database is a free web-service at http://suba.plantenergy.uwa.edu.au that helps users to explore reported experimental data and predictions concerning proteins encoded by gene families and to define the experiments required to locate these homologous sets of proteins. Here we show how SUBA3 can be used to explore the subcellular location of the Deg protease family of ATP-independent serine endopeptidases (Deg1-Deg16). Combined data integration and new experiments refined location information for Deg1 and Deg9, confirmed Deg2, Deg5, and Deg8 in plastids and Deg 15 in peroxisomes and provide substantial experimental evidence for mitochondrial localized Deg proteases. Two of these, Deg3 and Deg10, additionally localized to the plastid, revealing novel dual-targeted Deg proteases in the plastid and the mitochondrion. SUBA3 is continually updated to ensure that researchers can use the latest published data when planning the experimental steps remaining to localize gene family functions.
RESUMEN
Fluorescent protein (FP) tagging approaches are widely used to determine the subcellular location of plant proteins. Here we give a brief overview of FP approaches, highlight potential technical problems, and discuss what to consider when designing FP/protein fusion constructs and performing transformation assays. We analyze published FP tagging data sets along with data from proteomics studies collated in SUBA3, a subcellular location database for Arabidopsis proteins, and assess the reliability of these data sets by comparing them. We also outline the limitations of the FP tagging approach for defining protein location and investigate multiple localization claims by FP tagging. We conclude that the collation of localization datasets in databases like SUBA3 is helpful for revealing discrepancies in location attributions by different techniques and/or by different research groups.