RESUMO
The ability to reliably and reproducibly measure any protein of the human proteome in any tissue or cell type would be transformative for understanding systems-level properties as well as specific pathways in physiology and disease. Here, we describe the generation and verification of a compendium of highly specific assays that enable quantification of 99.7% of the 20,277 annotated human proteins by the widely accessible, sensitive, and robust targeted mass spectrometric method selected reaction monitoring, SRM. This human SRMAtlas provides definitive coordinates that conclusively identify the respective peptide in biological samples. We report data on 166,174 proteotypic peptides providing multiple, independent assays to quantify any human protein and numerous spliced variants, non-synonymous mutations, and post-translational modifications. The data are freely accessible as a resource at http://www.srmatlas.org/, and we demonstrate its utility by examining the network response to inhibition of cholesterol synthesis in liver cells and to docetaxel in prostate cancer lines.
Assuntos
Bases de Dados de Proteínas , Proteoma , Acesso à Informação , Antineoplásicos/uso terapêutico , Linhagem Celular Tumoral , Colesterol/biossíntese , Docetaxel , Feminino , Humanos , Internet , Fígado/efeitos dos fármacos , Masculino , Mutação , Neoplasias da Próstata/tratamento farmacológico , Splicing de RNA , Taxoides/uso terapêuticoRESUMO
Preterm birth (PTB) complications are the leading cause of long-term morbidity and mortality in children. By using whole blood samples, we integrated whole-genome sequencing (WGS), RNA sequencing (RNA-seq), and DNA methylation data for 270 PTB and 521 control families. We analyzed this combined dataset to identify genomic variants associated with PTB and secondary analyses to identify variants associated with very early PTB (VEPTB) as well as other subcategories of disease that may contribute to PTB. We identified differentially expressed genes (DEGs) and methylated genomic loci and performed expression and methylation quantitative trait loci analyses to link genomic variants to these expression and methylation changes. We performed enrichment tests to identify overlaps between new and known PTB candidate gene systems. We identified 160 significant genomic variants associated with PTB-related phenotypes. The most significant variants, DEGs, and differentially methylated loci were associated with VEPTB. Integration of all data types identified a set of 72 candidate biomarker genes for VEPTB, encompassing genes and those previously associated with PTB. Notably, PTB-associated genes RAB31 and RBPJ were identified by all three data types (WGS, RNA-seq, and methylation). Pathways associated with VEPTB include EGFR and prolactin signaling pathways, inflammation- and immunity-related pathways, chemokine signaling, IFN-γ signaling, and Notch1 signaling. Progress in identifying molecular components of a complex disease is aided by integrated analyses of multiple molecular data types and clinical data. With these data, and by stratifying PTB by subphenotype, we have identified associations between VEPTB and the underlying biology.
Assuntos
Predisposição Genética para Doença/genética , Nascimento Prematuro/genética , Metilação de DNA/genética , Feminino , Genômica/métodos , Humanos , Recém-Nascido , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Transdução de Sinais/genética , Sequenciamento Completo do Genoma/métodosRESUMO
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost.
Assuntos
Internet , Proteômica/métodos , Software , Estatística como Assunto , Computadores , Interface Usuário-ComputadorRESUMO
Methanogens catalyze the critical methane-producing step (called methanogenesis) in the anaerobic decomposition of organic matter. Here, we present the first predictive model of global gene regulation of methanogenesis in a hydrogenotrophic methanogen, Methanococcus maripaludis. We generated a comprehensive list of genes (protein-coding and noncoding) for M. maripaludis through integrated analysis of the transcriptome structure and a newly constructed Peptide Atlas. The environment and gene-regulatory influence network (EGRIN) model of the strain was constructed from a compendium of transcriptome data that was collected over 58 different steady-state and time-course experiments that were performed in chemostats or batch cultures under a spectrum of environmental perturbations that modulated methanogenesis. Analyses of the EGRIN model have revealed novel components of methanogenesis that included at least three additional protein-coding genes of previously unknown function as well as one noncoding RNA. We discovered that at least five regulatory mechanisms act in a combinatorial scheme to intercoordinate key steps of methanogenesis with different processes such as motility, ATP biosynthesis, and carbon assimilation. Through a combination of genetic and environmental perturbation experiments we have validated the EGRIN-predicted role of two novel transcription factors in the regulation of phosphate-dependent repression of formate dehydrogenase-a key enzyme in the methanogenesis pathway. The EGRIN model demonstrates regulatory affiliations within methanogenesis as well as between methanogenesis and other cellular functions.
Assuntos
Genes Arqueais , Redes e Vias Metabólicas/genética , Metano/biossíntese , Mathanococcus/enzimologia , Mathanococcus/genética , Proteínas Arqueais/genética , Proteínas Arqueais/metabolismo , Formiato Desidrogenases/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica em Archaea , Interação Gene-Ambiente , Hidrogênio/metabolismo , Mathanococcus/metabolismo , Modelos Genéticos , Deleção de SequênciaRESUMO
Assembly of genes into operons is generally viewed as an important process during the continual adaptation of microbes to changing environmental challenges. However, the genome reorganization events that drive this process are also the roots of instability for existing operons. We have determined that there exists a statistically significant trend that correlates the proportion of genes encoded in operons in archaea to their phylogenetic lineage. We have further characterized how microbes deal with operon instability by mapping and comparing transcriptome architectures of four phylogenetically diverse extremophiles that span the range of operon stabilities observed across archaeal lineages: a photoheterotrophic halophile (Halobacterium salinarum NRC-1), a hydrogenotrophic methanogen (Methanococcus maripaludis S2), an acidophilic and aerobic thermophile (Sulfolobus solfataricus P2), and an anaerobic hyperthermophile (Pyrococcus furiosus DSM 3638). We demonstrate how the evolution of transcriptional elements (promoters and terminators) generates new operons, restores the coordinated regulation of translocated, inverted, and newly acquired genes, and introduces completely novel regulation for even some of the most conserved operonic genes such as those encoding subunits of the ribosome. The inverse correlation (r=-0.92) between the proportion of operons with such internally located transcriptional elements and the fraction of conserved operons in each of the four archaea reveals an unprecedented view into varying stages of operon evolution. Importantly, our integrated analysis has revealed that organisms adapted to higher growth temperatures have lower tolerance for genome reorganization events that disrupt operon structures.
Assuntos
Evolução Molecular , Genoma Arqueal , Transcriptoma , Adenosina Trifosfatases/genética , Archaea/classificação , Archaea/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica em Archaea , Genes Arqueais , Óperon , Filogenia , Regiões Promotoras Genéticas , Biossíntese de Proteínas/genética , Transporte de RNA , Transcrição Gênica , Ativação TranscricionalRESUMO
Tumor-infiltrating neoantigen-reactive T cells can mediate regression of metastatic gastrointestinal cancers yet remain poorly characterized. We performed immunological screening against personalized neoantigens in combination with single-cell RNA sequencing on tumor-infiltrating lymphocytes from bile duct and pancreatic cancer patients to characterize the transcriptomic landscape of neoantigen-reactive T cells. We found that most neoantigen-reactive CD8+ T cells displayed an exhausted state with significant CXCL13 and GZMA co-expression compared with non-neoantigen-reactive bystander cells. Most neoantigen-reactive CD4+ T cells from a patient with bile duct cancer also exhibited an exhausted phenotype but with overexpression of HOPX or ADGRG1 while lacking IL7R expression. Thus, neoantigen-reactive T cells infiltrating gastrointestinal cancers harbor distinct transcriptomic signatures, which may provide new opportunities for harnessing these cells for therapy.
Assuntos
Linfócitos T CD8-Positivos , Neoplasias Gastrointestinais , Antígenos de Neoplasias , Neoplasias Gastrointestinais/genética , Humanos , Linfócitos do Interstício Tumoral , TranscriptomaRESUMO
The ISB Cancer Genomics Cloud (ISB-CGC) is one of three pilot projects funded by the National Cancer Institute to explore new approaches to computing on large cancer datasets in a cloud environment. With a focus on Data as a Service, the ISB-CGC offers multiple avenues for accessing and analyzing The Cancer Genome Atlas, TARGET, and other important references such as GENCODE and COSMIC using the Google Cloud Platform. The open approach allows researchers to choose approaches best suited to the task at hand: from analyzing terabytes of data using complex workflows to developing new analysis methods in common languages such as Python, R, and SQL; to using an interactive web application to create synthetic patient cohorts and to explore the wealth of available genomic data. Links to resources and documentation can be found at www.isb-cgc.org Cancer Res; 77(21); e7-10. ©2017 AACR.
Assuntos
Computação em Nuvem , Biologia Computacional , Genômica , Neoplasias/genética , Conjuntos de Dados como Assunto , Genoma Humano , Humanos , Internet , National Cancer Institute (U.S.) , Pesquisa/tendências , Software , Estados UnidosRESUMO
Democratization of genomics technologies has enabled the rapid determination of genotypes. More recently the democratization of comprehensive proteomics technologies is enabling the determination of the cellular phenotype and the molecular events that define its dynamic state. Core proteomic technologies include MS to define protein sequence, protein:protein interactions, and protein PTMs. Key enabling technologies for proteomics are bioinformatic pipelines to identify, quantitate, and summarize these events. The Trans-Proteomics Pipeline (TPP) is a robust open-source standardized data processing pipeline for large-scale reproducible quantitative MS proteomics. It supports all major operating systems and instrument vendors via open data formats. Here, we provide a review of the overall proteomics workflow supported by the TPP, its major tools, and how it can be used in its various modes from desktop to cloud computing. We describe new features for the TPP, including data visualization functionality. We conclude by describing some common perils that affect the analysis of MS/MS datasets, as well as some major upcoming features.