RESUMO
Novel platelet and megakaryocyte transcriptome analysis allows prediction of the full or theoretical proteome of a representative human platelet. Here, we integrated the established platelet proteomes from six cohorts of healthy subjects, encompassing 5.2 k proteins, with two novel genome-wide transcriptomes (57.8 k mRNAs). For 14.8 k protein-coding transcripts, we assigned the proteins to 21 UniProt-based classes, based on their preferential intracellular localization and presumed function. This classified transcriptome-proteome profile of platelets revealed: (i) Absence of 37.2 k genome-wide transcripts. (ii) High quantitative similarity of platelet and megakaryocyte transcriptomes (R = 0.75) for 14.8 k protein-coding genes, but not for 3.8 k RNA genes or 1.9 k pseudogenes (R = 0.43-0.54), suggesting redistribution of mRNAs upon platelet shedding from megakaryocytes. (iii) Copy numbers of 3.5 k proteins that were restricted in size by the corresponding transcript levels (iv) Near complete coverage of identified proteins in the relevant transcriptome (log2fpkm > 0.20) except for plasma-derived secretory proteins, pointing to adhesion and uptake of such proteins. (v) Underrepresentation in the identified proteome of nuclear-related, membrane and signaling proteins, as well proteins with low-level transcripts. We then constructed a prediction model, based on protein function, transcript level and (peri)nuclear localization, and calculated the achievable proteome at ~ 10 k proteins. Model validation identified 1.0 k additional proteins in the predicted classes. Network and database analysis revealed the presence of 2.4 k proteins with a possible role in thrombosis and hemostasis, and 138 proteins linked to platelet-related disorders. This genome-wide platelet transcriptome and (non)identified proteome database thus provides a scaffold for discovering the roles of unknown platelet proteins in health and disease.
Assuntos
Plaquetas/metabolismo , Doenças Hematológicas/genética , Megacariócitos/metabolismo , Proteoma/genética , Transcriptoma , Humanos , Anotação de Sequência Molecular , Proteoma/classificação , Proteoma/metabolismoRESUMO
For proteome analyses, the tissue samples are mostly preserved either snap frozen or formalin-fixed, paraffin-embedded form. Use of RNAlater-a non-toxic solution primarily used to stabilize the RNA content of samples-in tissue preservation for proteome analysis recently described equally reliable with snap-frozen preservation in human tissues. Even though RNALater storage has great potential in the preservation of Peripheral Blood Mononuclear Cells (PBMC), its impact on the results of proteome analysis is poorly described at qualitative and quantitative measures. The present study investigated protein profiles of RNAlater preserved and fresh PBMCs using three extraction buffers viz. Triton X-100, RIPA and SDS. Proteins are separated in SDS-PAGE and quantified using densitometry. On an average 19.3 bands from fresh and 15.6 bands from RNAlater storage cells were obtained with a molecular weight ranging from 25 to > 250 kDa. RNAlater storage generated a fewer number and lesser quantity of low molecular weight proteins while yielded a similar or high quantity of high molecular weight protein fractions. The principal component analysis showed that Triton X-100 is inferior as compared to SDS and RIPA with respect to their protein bands and quantity yielded. While RNAlater is effective in preserving PBMC for proteome analysis, our findings warrant caution in its use in proteomics experiments especially if the target is low molecular weight proteins.
Assuntos
Leucócitos Mononucleares/química , Proteoma/isolamento & purificação , RNA/química , Preservação de Tecido/métodos , Animais , Bovinos , Misturas Complexas/química , Eletroforese em Gel de Poliacrilamida , Microextração em Fase Líquida/métodos , Peso Molecular , Octoxinol/química , Conservantes Farmacêuticos/química , Cultura Primária de Células , Análise de Componente Principal , Proteoma/química , Proteoma/classificação , RNA/isolamento & purificação , Dodecilsulfato de Sódio/químicaRESUMO
Extracellular vesicle (EV) is a unified terminology of membrane-enclosed vesicular species ubiquitously secreted by almost every cell type and present in all body fluids. They carry a cargo of lipids, metabolites, nucleic acids and proteins for their clearance from cells as well as for cell-to-cell communications. The exact composition of EVs and their specific functions are not well understood due to the underdevelopment of the separation protocols, especially those from the central nervous system including animal and human brain tissues as well as cerebrospinal fluids, and the low yield of proteins in the separated EVs. To understand their exact molecular composition and their functional roles, development of the reliable protocols for EV separation is necessary. Here we report the methods for EV separation from human and mouse unfixed frozen brain tissues by a sucrose step gradient ultracentrifugation method, and from human cerebrospinal fluids by an affinity capture method. The separated EVs were assessed for morphological, biophysical and proteomic properties of separated EVs by nanoparticle tracking analysis, transmission electron microscopy, and labeled and label-free mass spectrometry for protein profiling with step-by-step protocols for each assessment.
Assuntos
Encéfalo/metabolismo , Vesículas Extracelulares/química , Proteínas do Tecido Nervoso/isolamento & purificação , Proteoma/isolamento & purificação , Proteômica/métodos , Animais , Biomarcadores/líquido cefalorraquidiano , Química Encefálica , Comunicação Celular , Centrifugação com Gradiente de Concentração/métodos , Cromatografia de Afinidade/métodos , Cromatografia em Gel/métodos , Vesículas Extracelulares/metabolismo , Humanos , Camundongos , Proteínas do Tecido Nervoso/classificação , Neurônios/química , Neurônios/metabolismo , Proteoma/classificação , Proteômica/instrumentação , Ultracentrifugação/métodosRESUMO
Leishmania donovani is the primary cause of a fatal disease visceral leishmaniasis (VL) in East Africa and in the Indian subcontinent. Human beings are the only known reservoir of L. donovani and due to the emergence and the spread of drug resistance control for this disease is become worse. Therefore, identification of novel drug target is very important to develop new drug and combat drug resistance issue. Experimental determination of target is costly and time-consuming, hence it is necessary to first identify the efficient target with the accurate mathematical method and then further go for in vitro/in vivo study. Earlier we have predicted the role of protein in term of the target with Naïve Bayes probabilistic classifier on the proteins identified in our L. donovani membrane proteomics study. This time we have used alternative and the popular method named as a Rough Set method (an important part of soft computing method relevance in many real-world applications) and tried to re-visit/validate our earlier findings of L. donovani membrane proteomics and additionally decipher the unknown class/family of membrane proteins as known one. Comparing this result with other classifiers (NB, SVM, RF, C4.5 decision tree) Rough Set method has outperformed and we found the accuracy was 89.28%. This study further validates our previous finding strongly and predicts the class/family of unknown proteins which are very important for the identification and selection toward some novel drug target (still unexplored) and ultimately move in the direction of development of effective antileishmanials.
Assuntos
Leishmania donovani/metabolismo , Proteoma/classificação , Proteínas de Protozoários/classificação , Teorema de Bayes , Teoria da Decisão , Humanos , Leishmaniose Visceral/parasitologia , Conceitos Matemáticos , Proteínas de Membrana/classificação , Proteínas de Membrana/metabolismo , Modelos Biológicos , Modelos Estatísticos , Proteoma/metabolismo , Proteômica/estatística & dados numéricos , Proteínas de Protozoários/metabolismoRESUMO
The costs and benefits of protein expression are balanced through evolution. Expression of un-utilized protein (that have no benefits in the current environment) incurs a quantifiable fitness costs on cellular growth rates; however, the magnitude and variability of un-utilized protein expression in natural settings is unknown, largely due to the challenge in determining environment-specific proteome utilization. We address this challenge using absolute and global proteomics data combined with a recently developed genome-scale model of Escherichia coli that computes the environment-specific cost and utility of the proteome on a per gene basis. We show that nearly half of the proteome mass is unused in certain environments and accounting for the cost of this unused protein expression explains >95% of the variance in growth rates of Escherichia coli across 16 distinct environments. Furthermore, reduction in unused protein expression is shown to be a common mechanism to increase cellular growth rates in adaptive evolution experiments. Classification of the unused protein reveals that the unused protein encodes several nutrient- and stress- preparedness functions, which may convey fitness benefits in varying environments. Thus, unused protein expression is the source of large and pervasive fitness costs that may provide the benefit of hedging against environmental change.
Assuntos
Biologia Computacional/métodos , Proteínas de Escherichia coli , Escherichia coli , Proteoma , Bases de Dados de Proteínas , Escherichia coli/genética , Escherichia coli/metabolismo , Escherichia coli/fisiologia , Proteínas de Escherichia coli/análise , Proteínas de Escherichia coli/classificação , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Modelos Biológicos , Proteoma/análise , Proteoma/classificação , Proteoma/genética , Proteoma/metabolismoRESUMO
BACKGROUND: Schistosomiasis remains an important parasitic disease and a major economic problem in many countries. The Schistosoma mansoni genome and predicted proteome sequences were recently published providing the opportunity to identify new drug candidates. Eukaryotic protein kinases (ePKs) play a central role in mediating signal transduction through complex networks and are considered druggable targets from the medical and chemical viewpoints. Our work aimed at analyzing the S. mansoni predicted proteome in order to identify and classify all ePKs of this parasite through combined computational approaches. Functional annotation was performed mainly to yield insights into the parasite signaling processes relevant to its complex lifestyle and to select some ePKs as potential drug targets. RESULTS: We have identified 252 ePKs, which corresponds to 1.9% of the S. mansoni predicted proteome, through sequence similarity searches using HMMs (Hidden Markov Models). Amino acid sequences corresponding to the conserved catalytic domain of ePKs were aligned by MAFFT and further used in distance-based phylogenetic analysis as implemented in PHYLIP. Our analysis also included the ePK homologs from six other eukaryotes. The results show that S. mansoni has proteins in all ePK groups. Most of them are clearly clustered with known ePKs in other eukaryotes according to the phylogenetic analysis. None of the ePKs are exclusively found in S. mansoni or belong to an expanded family in this parasite. Only 16 S. mansoni ePKs were experimentally studied, 12 proteins are predicted to be catalytically inactive and approximately 2% of the parasite ePKs remain unclassified. Some proteins were mentioned as good target for drug development since they have a predicted essential function for the parasite. CONCLUSIONS: Our approach has improved the functional annotation of 40% of S. mansoni ePKs through combined similarity and phylogenetic-based approaches. As we continue this work, we will highlight the biochemical and physiological adaptations of S. mansoni in response to diverse environments during the parasite development, vector interaction, and host infection.
Assuntos
Proteínas Quinases/classificação , Proteínas Quinases/metabolismo , Proteômica , Schistosoma mansoni/enzimologia , Schistosoma mansoni/parasitologia , Animais , Domínio Catalítico , Cadeias de Markov , Filogenia , Proteínas Quinases/química , Proteoma/química , Proteoma/classificação , Proteoma/metabolismo , Schistosoma mansoni/citologia , Transdução de SinaisRESUMO
BACKGROUND: Cells react to changing intra- and extracellular signals by dynamically modulating complex biochemical networks. Cellular responses to extracellular signals lead to changes in gene and protein expression. Since the majority of genes encode proteins, we investigated possible correlations between protein parameters and gene expression patterns to identify proteome-wide characteristics indicative of trends common to expressed proteins. RESULTS: Numerous bioinformatics methods were used to filter and merge information regarding gene and protein annotations. A new statistical time point-oriented analysis was developed for the study of dynamic correlations in large time series data. The method was applied to investigate microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. CONCLUSION: We show that the properties of proteins synthesized correlate dynamically with the gene expression profile, indicating that not only is the actual identity and function of expressed proteins important for cellular responses but that several physicochemical and other protein properties correlate with gene expression as well. Gene expression correlates strongly with amino acid composition, composition- and sequence-derived variables, functional, structural, localization and gene ontology parameters. Thus, our results suggest that a dynamic relationship exists between proteome properties and gene expression in many biological systems, and therefore this relationship is fundamental to understanding cellular mechanisms in health and disease.
Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/classificação , Animais , Linfócitos B/fisiologia , Ciclo Celular/genética , Biologia Computacional/métodos , Apresentação de Dados , Drosophila melanogaster/genética , Processamento Eletrônico de Dados , Frequência do Gene , Humanos , Armazenamento e Recuperação da Informação/métodos , Ativação Linfocitária/genética , Cadeias de Markov , Modelos Biológicos , Saccharomyces cerevisiae/genética , Análise de Sequência de Proteína/métodos , Transdução de Sinais/genética , Software , Linfócitos T/fisiologiaRESUMO
With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.
Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Glicólise/fisiologia , Modelos Biológicos , Complexos Multienzimáticos/metabolismo , Proteoma/classificação , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Cadeias de Markov , Modelos EstatísticosRESUMO
The functional classification of genes on a genome-wide scale is now in its infancy, and we make a first attempt to assess existing methods and identify sources of error. To this end, we compared two independent efforts for associating proteins with functions, one implemented by FlyBase and the other by PANTHER at Celera Genomics. Both methods make inferences based on sequence similarity and the available experimental evidence. However, they differ considerably in methodology and process. Overall, assuming that the systematic error across the two methods is relatively small, we find the protein-to-function association error rate of both the FlyBase and PANTHER methods to be <2%. The primary source of error for both methods appears to be simple human error. Although homology-based inference can certainly cause errors in annotation, our analysis indicates that the frequency of such errors is relatively small compared with the number of correct inferences. Moreover, these homology errors can be minimized by careful tree-based inference, such as that implemented in PANTHER. Often, functional associations are made by one method and not the other, indicating that one of the greatest challenges lies in improving the completeness of available ontology associations.
Assuntos
Bases de Dados de Proteínas , Proteínas de Drosophila/classificação , Proteínas de Drosophila/fisiologia , Drosophila melanogaster/genética , Genoma , Proteoma/classificação , Proteoma/fisiologia , Animais , Proteínas de Drosophila/genética , Proteômica/métodos , Proteômica/normas , Homologia de Sequência do Ácido NucleicoRESUMO
This paper reports an analysis of the encoded proteins (the proteome) of the genomes of human, fly, worm, yeast, and representatives of bacteria and archaea in terms of the three-dimensional structures of their globular domains together with a general sequence-based study. We show that 39% of the human proteome can be assigned to known structures. We estimate that for 77% of the proteome, there is some functional annotation, but only 26% of the proteome can be assigned to standard sequence motifs that characterize function. Of the human protein sequences, 13% are transmembrane proteins, but only 3% of the residues in the proteome form membrane-spanning regions. There are substantial differences in the composition of globular domains of transmembrane proteins between the proteomes we have analyzed. Commonly occurring structural superfamilies are identified within the proteome. The frequencies of these superfamilies enable us to estimate that 98% of the human proteome evolved by domain duplication, with four of the 10 most duplicated superfamilies specific for multicellular organisms. The zinc-finger superfamily is massively duplicated in human compared to fly and worm, and occurrence of domains in repeats is more common in metazoa than in single cellular organisms. Structural superfamilies over- and underrepresented in human disease genes have been identified. Data and results can be downloaded and analyzed via web-based applications at http://www.sbg.bio.ic.ac.uk.