RESUMO
A high-quality genome annotation greatly facilitates successful cell line engineering. Standard draft genome annotation pipelines are based largely on de novo gene prediction, homology, and RNA-Seq data. However, draft annotations can suffer from incorrect predictions of translated sequence, inaccurate splice isoforms, and missing genes. Here, we generated a draft annotation for the newly assembled Chinese hamster genome and used RNA-Seq, proteomics, and Ribo-Seq to experimentally annotate the genome. We identified 3529 new proteins compared to the hamster RefSeq protein annotation and 2256 novel translational events (e.g., alternative splices, mutations, and novel splices). Finally, we used this pipeline to identify the source of translated retroviruses contaminating recombinant products from Chinese hamster ovary (CHO) cell lines, including 119 type-C retroviruses, thus enabling future efforts to eliminate retroviruses to reduce the costs incurred with retroviral particle clearance. In summary, the improved annotation provides a more accurate resource for CHO cell line engineering, by facilitating the interpretation of omics data, defining of cellular pathways, and engineering of complex phenotypes.
Assuntos
Cricetulus/genética , Genoma/genética , Proteogenômica , Proteômica/métodos , Animais , Células CHO , Cricetinae , Anotação de Sequência Molecular/métodos , RNA-Seq/métodos , Análise de Sequência de RNA/métodosRESUMO
Mass spectrometry is being used to identify protein biomarkers that can facilitate development of drug treatment. Mass spectrometry-based labeling proteomic experiments result in complex proteomic data that is hierarchical in nature often with small sample size studies. The generalized linear model (GLM) is the most popular approach in proteomics to compare protein abundances between groups. However, GLM does not address all the complexities of proteomics data such as repeated measures and variance heterogeneity. Linear models for microarray data (LIMMA) and mixed models are two approaches that can address some of these data complexities to provide better statistical estimates. We compared these three statistical models (GLM, LIMMA, and mixed models) under two different normalization approaches (quantile normalization and median sweeping) to demonstrate when each approach is the best for tagged proteins. We evaluated these methods using a spiked-in data set of known protein abundances, a systemic lupus erythematosus (SLE) data set, and simulated data from multiplexed labeling experiments that use tandem mass tags (TMT). Data are available via ProteomeXchange with identifier PXD005486. We found median sweeping to be a preferred approach of data normalization, and with this normalization approach there was overlap with findings across all methods with GLM being a subset of mixed models. The conclusion is that the mixed model had the best type I error with median sweeping, whereas LIMMA had the better overall statistical properties regardless of normalization approaches.
Assuntos
Proteínas Sanguíneas/isolamento & purificação , Proteínas de Escherichia coli/isolamento & purificação , Lúpus Eritematoso Sistêmico/genética , Modelos Estatísticos , Análise Serial de Proteínas/estatística & dados numéricos , Proteínas Sanguíneas/química , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/química , Humanos , Lúpus Eritematoso Sistêmico/sangue , Lúpus Eritematoso Sistêmico/diagnóstico , Lúpus Eritematoso Sistêmico/patologia , Proteômica/métodos , Proteômica/estatística & dados numéricos , Coloração e Rotulagem/métodosRESUMO
Chinese hamster ovary cells represent the dominant host for therapeutic recombinant protein production. However, few large-scale data sets have been generated to characterize this host organism and derived CHO cell lines at the proteomics level. Consequently, an extensive label-free quantitative proteomics analysis of two cell lines (CHO-S and CHO DG44) and two Chinese hamster tissues (liver and ovary) was used to identify a total of 11â¯801 unique proteins containing at least two unique peptides. 9359 unique proteins were identified specifically in the cell lines, representing a 56% increase over previous work. Additionally, 6663 unique proteins were identified across liver and ovary tissues, providing the first Chinese hamster tissue proteome. Protein expression was more conserved within cell lines during both growth phases than across cell lines, suggesting large genetic differences across cell lines. Overall, both gene ontology and KEGG pathway analysis revealed enrichment of cell-cycle activity in cells. In contrast, upregulated molecular functions in tissue include glycosylation and lipid transporter activity. Furthermore, cellular components including Golgi apparatus are upregulated in both tissues. In conclusion, this large-scale proteomics analysis enables us to delineate specific changes between tissues and cells derived from these tissues, which can help explain specific tissue function and the adaptations cells incur for applications in biopharmaceutical productions.
Assuntos
Células CHO/metabolismo , Proteoma/genética , Proteômica , Proteínas Recombinantes/genética , Animais , Cricetinae , Cricetulus/genética , Cricetulus/metabolismo , Proteínas Recombinantes/metabolismo , Espectrometria de Massas em TandemRESUMO
Derivitization of peptides with isobaric tags such as iTRAQ and TMT is widely employed in proteomics due to their compatibility with multiplex quantitative measurements. We recently made publicly available a large peptide library derived from iTRAQ 4-plex labeled spectra. This resource has not been used for identifying peptides labeled with related tags with different masses, because values for virtually all masses of precursor and most product ions would differ for ions containing the different tags as well as containing different tag-specific peaks. We describe a method for interconverting spectra from iTRAQ 4-plex to TMT (6- and 10-plex) and to iTRAQ 8-plex. We interconvert spectra by appropriately mass shifting sequence ions and discarding derivative-specific peaks. After this "cleaning" of search spectra, we demonstrate that the converted libraries perform well in terms of peptide spectral matches. This is demonstrated by comparing results using sequence database searches as well as by comparing search effectiveness using original and converted libraries. At 1% FDR TMT labeled query spectra match 97% as many spectra against a converted iTRAQ library as compared to an original TMT library. Overall this interconversion strategy provides a practical way to extend results from one derivatization method to others that share related chemistry and do not significantly alter fragmentation profiles.
Assuntos
Biblioteca de Peptídeos , Proteômica/métodos , Bases de Dados de Proteínas , Espectrometria de Massas , Peso Molecular , Coloração e RotulagemRESUMO
Mammalian expression systems such as Chinese hamster ovary (CHO), mouse myeloma (NS0), and human embryonic kidney (HEK) cells serve a critical role in the biotechnology industry as the production host of choice for recombinant protein therapeutics. Most of the recombinant biologics are glycoproteins that contain complex oligosaccharide or glycan attachments representing a principal component of product quality. Both N-glycans and O-glycans are present in these mammalian cells, but the engineering of N-linked glycosylation is of critical interest in industry and many efforts have been directed to improve this pathway. This is because altering the N-glycan composition can change the product quality of recombinant biotherapeutics in mammalian hosts. In addition, sialylation and fucosylation represent components of the glycosylation pathway that affect circulatory half-life and antibody-dependent cellular cytotoxicity, respectively. In this chapter, we first offer an overview of the glycosylation, sialylation, and fucosylation networks in mammalian cells, specifically CHO cells, which are extensively used in antibody production. Next, genetic engineering technologies used in CHO cells to modulate glycosylation pathways are described. We provide examples of their use in CHO cell engineering approaches to highlight these technologies further. Specifically, we describe efforts to overexpress glycosyltransferases and sialyltransfereases, and efforts to decrease sialidase cleavage and fucosylation. Finally, this chapter covers new strategies and future directions of CHO cell glycoengineering, such as the application of glycoproteomics, glycomics, and the integration of 'omics' approaches to identify, quantify, and characterize the glycosylated proteins in CHO cells. Graphical Abstract.
Assuntos
Glicoproteínas , Animais , Células CHO , Cricetinae , Cricetulus , Glicoproteínas/genética , Glicoproteínas/metabolismo , Glicosilação , Proteínas Recombinantes/genéticaRESUMO
Chinese hamster ovary (CHO) cells are the predominant production vehicle for biotherapeutics. Quantitative proteomics data were obtained from two CHO cell lines (CHO-S and CHO DG44) and compared with seven Chinese hamster (Cricetulus griseus) tissues (brain, heart, kidney, liver, lung, ovary and spleen) by tandem mass tag (TMT) labeling followed by mass spectrometry, providing a comprehensive hamster tissue and cell line proteomics atlas. Of the 8470 unique proteins identified, high similarity was observed between CHO-S and CHO DG44 and included increases in proteins involved in DNA replication, cell cycle, RNA processing, and chromosome processing. Alternatively, gene ontology and pathway analysis in tissues indicated increased protein intensities related to important tissue functionalities. Proteins enriched in the brain included those involved in acidic amino acid metabolism, Golgi apparatus, and ion and phospholipid transport. The lung showed enrichment in proteins involved in BCAA catabolism, ROS metabolism, vesicle trafficking, and lipid synthesis while the ovary exhibited enrichments in extracellular matrix and adhesion proteins. The heart proteome included vasoconstriction, complement activation, and lipoprotein metabolism enrichments. These detailed comparisons of CHO cell lines and hamster tissues will enhance understanding of the relationship between proteins and tissue function and pinpoint potential pathways of biotechnological relevance for future cell engineering.
Assuntos
Células CHO/metabolismo , Cricetulus/metabolismo , Animais , Encéfalo/metabolismo , Ciclo Celular , Cromossomos de Mamíferos/metabolismo , Replicação do DNA , Feminino , Rim/metabolismo , Pulmão/metabolismo , Miocárdio/metabolismo , Ovário/metabolismo , Proteínas/metabolismo , Proteômica , Baço/metabolismo , Espectrometria de Massas em TandemRESUMO
BACKGROUND: Meningiomas are heterogeneous, with differences in anatomical, histopathological, and clinical characteristics. Such spatial variability in meningioma biology is thought to result from differences in the expression of critical developmental regulators. We hypothesized that the variability in meningioma biology would follow gradients such as in embryology and tested a cohort of 366 meningiomas for histopathological and immunohistochemical gradients. METHODS: The medical records from 366 patients treated for meningiomas from 2003 to 2016 were retrospectively analyzed for age, gender, anatomical localization, recurrence-free survival, overall survival, histopathological diagnosis, and immunohistochemistry findings for 6 markers: epithelial membrane antigen (EMA), progesterone receptor (PR), CD34, S100, p53, and Ki-67 labeling index. RESULTS: EMA, PR, S100, p53, and CD34 were expressed in 94%, 73%, 49%, 26%, and 23% of the tumors, respectively. p53 expression correlated positively with Ki-67 and World Health Organization (WHO) grade (rτ = 0.31 and rτ = 0.4, respectively). PR positivity correlated inversely with S100, p53, Ki-67, and WHO grade (rτ = -0.19, rτ = -0.14, rτ = -0.15, and rτ = -0.16, respectively). All secretory meningiomas were positive for EMA and PR and negative for S100, and this pattern exhibited a rostrocaudal gradient. The overall proportion of EMA+PR+S100- cases was significantly lower in the cranial vault (30.3%) than in the skull base (45.89%; P = 0.021). The proportion of WHO grade II-III tumors was greater in cranial vault than in skull base meningiomas. CONCLUSIONS: Unsupervised methods detected an association between the anatomical location and tumor biology in meningiomas. Unlike the categorical associations that former studies had indicated, the present study revealed a rostrocaudal gradient in both the cranial vault and the skull base, correlating with human developmental biology.
Assuntos
Neoplasias Meníngeas/imunologia , Meningioma/imunologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Seguimentos , Humanos , Imuno-Histoquímica , Masculino , Neoplasias Meníngeas/mortalidade , Neoplasias Meníngeas/patologia , Neoplasias Meníngeas/cirurgia , Meninges/imunologia , Meninges/patologia , Meningioma/mortalidade , Meningioma/patologia , Meningioma/cirurgia , Pessoa de Meia-Idade , Estudos Retrospectivos , Análise de Sobrevida , Adulto JovemRESUMO
The number of proteins encoded in the human genome has been estimated at between 20,000 and 25,000, despite estimates that the entire proteome contains more than a million proteins. One reason for this difference is due to many post-translational modifications of protein that contribute to proteome complexity. Among these, glycosylation is of particular relevance because it serves to modify a large number of cellular proteins. Glycogenomics, glycoproteomics, glycomics, and glycoinformatics are helping to accelerate our understanding of the cellular events involved in generating the glycoproteome, the variety of glycan structures possible, and the importance of roles that glycans play in therapeutics and disease. Indeed, interest in glycosylation has expanded rapidly over the past decade, as large amounts of experimental 'omics data relevant to glycosylation processing have accumulated. Furthermore, new and more sophisticated glycoinformatics tools and databases are now available for glycan and glycosylation pathway analysis. Here, we summarize some of the recent advances in both experimental profiling and analytical methods involving N- and O-linked glycosylation processing for biotechnological and medically relevant cells together with the unique opportunities and challenges associated with interrogating and assimilating multiple, disparate high-throughput glycosylation data sets. This emerging era of advanced glycomics will lead to the discovery of key glycan biomarkers linked to diseases and help establish a better understanding of physiology and improved control of glycosylation processing in diverse cells and tissues important to disease and production of recombinant therapeutics. Furthermore, methodologies that facilitate the integration of glycomics measurements together with other 'omics data sets will lead to a deeper understanding and greater insights into the nature of glycosylation as a complex cellular process.
Assuntos
Glicômica/métodos , Polissacarídeos/metabolismo , Processamento de Proteína Pós-Traducional/fisiologia , Proteoma/metabolismo , Biomarcadores/metabolismo , Glicosilação , HumanosRESUMO
Recent advancements in proteomics have enabled the generation of high-quality data sets useful for applications ranging from target and monoclonal antibody (mAB) discovery to bioprocess optimization. Comparative proteomics approaches have recently been used to identify novel disease targets in oncology and other disease conditions. Proteomics has also been applied as a new avenue for mAb discovery. Finally, CHO and Escherichia coli cells represent the dominant production hosts for biopharmaceutical development, yet the physiology of these cells types has yet to be fully established. Proteomics approaches can provide new insights into these cell types, aiding in recombinant protein production, cell growth regulation, and medium formulation. Optimization of sample preparations and protein database developments are enhancing the quantity and accuracy of proteomic results. In these ways, innovations in proteomics are enriching biotechnology and bioprocessing research across a wide spectrum of applications.