Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
1.
Bioinform Adv ; 4(1): vbad190, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38282976

RESUMEN

Motivation: Anti-cancer drug response prediction is a central problem within stratified medicine. Transcriptomic profiles of cancer cell lines are typically used for drug response prediction, but we hypothesize that proteomics or phosphoproteomics might be more suitable as they give a more direct insight into cellular processes. However, there has not yet been a systematic comparison between all three of these datatypes using consistent evaluation criteria. Results: Due to the limited number of cell lines with phosphoproteomics profiles we use learning curves, a plot of predictive performance as a function of dataset size, to compare the current performance and predict the future performance of the three omics datasets with more data. We use neural networks and XGBoost and compare them against a simple rule-based benchmark. We show that phosphoproteomics slightly outperforms RNA-seq and proteomics using the 38 cell lines with profiles of all three omics data types. Furthermore, using the 877 cell lines with proteomics and RNA-seq profiles, we show that RNA-seq slightly outperforms proteomics. With the learning curves we predict that the mean squared error using the phosphoproteomics dataset would decrease by ∼15% if a dataset of the same size as the proteomics/transcriptomics was collected. For the cell lines with proteomics and RNA-seq profiles the learning curves reveal that for smaller dataset sizes neural networks outperform XGBoost and vice versa for larger datasets. Furthermore, the trajectory of the XGBoost curve suggests that it will improve faster than the neural networks as more data are collected. Availability and implementation: See https://github.com/Nik-BB/Learning-curves-for-DRP for the code used.

2.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930029

RESUMEN

The principal use of mass cytometry is to identify distinct cell types and changes in their composition, phenotype and function in different samples and conditions. Combining data from different studies has the potential to increase the power of these discoveries in diverse fields such as immunology, oncology and infection. However, current tools are lacking in scalable, reproducible and automated methods to integrate and study data sets from mass cytometry that often use heterogenous approaches to study similar samples. To address these limitations, we present two novel developments: (1) a pre-trained cell identification model named Immunopred that allows automated identification of immune cells without user-defined prior knowledge of expected cell types and (2) a fully automated cytometry meta-analysis pipeline built around Immunopred. We evaluated this pipeline on six COVID-19 study data sets comprising 270 unique samples and uncovered novel significant phenotypic changes in the wider immune landscape of COVID-19 that were not identified when each study was analyzed individually. Applied widely, our approach will support the discovery of novel findings in research areas where cytometry data sets are available for integration.


Asunto(s)
COVID-19 , Redes Neurales de la Computación , Humanos , Citometría de Flujo/métodos , Fenotipo
3.
PLoS Comput Biol ; 19(6): e1010459, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37352361

RESUMEN

Phosphoproteomics allows one to measure the activity of kinases that drive the fluxes of signal transduction pathways involved in biological processes such as immune function, senescence and cell growth. However, deriving knowledge of signalling network circuitry from these data is challenging due to a scarcity of phosphorylation sites that define kinase-kinase relationships. To address this issue, we previously identified around 6,000 phosphorylation sites as markers of kinase-kinase relationships (that may be conceptualised as network edges), from which empirical cell-model-specific weighted kinase networks may be reconstructed. Here, we assess whether the application of community detection algorithms to such networks can identify new components linked to canonical signalling pathways. Phosphoproteomics data from acute myeloid leukaemia (AML) cells treated separately with PI3K, AKT, MEK and ERK inhibitors were used to reconstruct individual kinase networks. We used modularity maximisation to detect communities in each network, and selected the community containing the main target of the inhibitor used to treat cells. These analyses returned communities that contained known canonical signalling components. Interestingly, in addition to canonical PI3K/AKT/mTOR members, the community assignments returned TTK (also known as MPS1) as a likely component of PI3K/AKT/mTOR signalling. We drew similar insights from an external phosphoproteomics dataset from breast cancer cells treated with rapamycin and oestrogen. We confirmed this observation with wet-lab laboratory experiments showing that TTK phosphorylation was decreased in AML cells treated with AKT and MTOR inhibitors. This study illustrates the application of community detection algorithms to the analysis of empirical kinase networks to uncover new members linked to canonical signalling pathways.


Asunto(s)
Leucemia Mieloide Aguda , Proteínas Proto-Oncogénicas c-akt , Humanos , Proteínas Proto-Oncogénicas c-akt/metabolismo , Fosfatidilinositol 3-Quinasas/metabolismo , Transducción de Señal , Serina-Treonina Quinasas TOR/metabolismo , Fosfotransferasas/metabolismo
4.
Clin Mol Hepatol ; 29(2): 417-432, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36727210

RESUMEN

BACKGROUND/AIMS: Immune and inflammatory cells respond to multiple pathological hits in the development of nonalcoholic steatohepatitis (NASH) and fibrosis. Relatively little is known about how their type and function change through the non-alcoholic fatty liver disease (NAFLD) spectrum. Here we used multi-dimensional mass cytometry and a tailored bioinformatic approach to study circulating immune cells sampled from healthy individuals and people with NAFLD. METHODS: Cytometry by time of flight using 36 metal-conjugated antibodies was applied to peripheral blood mononuclear cells (PBMCs) from biopsy-proven NASH fibrosis (late disease), steatosis (early disease), and healthy patients. Supervised and unsupervised analyses were used, findings confirmed, and mechanisms assessed using independent healthy and disease PBMC samples. RESULTS: Of 36 PBMC clusters, 21 changed between controls and disease samples. Significant differences were observed between diseases stages with changes in T cells and myeloid cells throughout disease and B cell changes in late stages. Semi-supervised gating and re-clustering showed that disease stages were associated with fewer monocytes with active signalling and more inactive NK cells; B and T cells bearing activation markers were reduced in late stages, while B cells bearing co-stimulatory molecules were increased. Functionally, disease states were associated with fewer activated mucosal-associated invariant T cells and reduced toll-like receptor-mediated cytokine production in late disease. CONCLUSION: A range of innate and adaptive immune changes begin early in NAFLD, and disease stages are associated with a functionally less active phenotype compared to controls. Further study of the immune response in NAFLD spectrum may give insight into mechanisms of disease with potential clinical application.


Asunto(s)
Enfermedad del Hígado Graso no Alcohólico , Humanos , Enfermedad del Hígado Graso no Alcohólico/patología , Hígado/patología , Leucocitos Mononucleares , Fenotipo , Fibrosis
5.
Value Health ; 26(7): 1057-1066, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-36804528

RESUMEN

OBJECTIVES: Clinical outcome assessment (COA) developers must ensure that measures assess aspects of health that are meaningful to the target patient population. Although the methodology for doing this is well understood for certain COAs, such as patient-reported outcome measures, there are fewer examples of this practice in the development of digital endpoints using mobile sensor technology such as physical activity monitors. This study explored the utility of social media data, specifically, posts on online health boards, in understanding meaningful aspects of health related to physical activity in 3 different chronic diseases: fibromyalgia, chronic obstructive pulmonary disease, and chronic heart failure. METHODS: We used machine learning and manual coding to summarize the content of posts extracted from 4 online health boards. Where available, patient age and sex were retrieved from post content or user profiles. We utilized analytical approaches to assess the robustness of findings to differences in the characteristics of online samples compared to the true patient population. Finally, we assessed concept saturation by measuring the convergence of autocorrelations. RESULTS: We identify a number of aspects of health described as important by patients in our samples, and summarize these into concepts for measurement. For chronic heart failure, these included purposeful walking duration and speed, fatigue, difficulty going upstairs, standing, and aspects of physical exercise. Overall and age-adjusted results did not differ considerably for each disease group. CONCLUSIONS: This study illustrates the potential of performing concept elicitation research using social media data, which may provide valuable insight to inform COA development.


Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Humanos , Fatiga , Medición de Resultados Informados por el Paciente , Ejercicio Físico , Aprendizaje Automático
6.
BMJ Open ; 11(11): e056601, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34740937

RESUMEN

OBJECTIVES: Online health forums provide rich and untapped real-time data on population health. Through novel data extraction and natural language processing (NLP) techniques, we characterise the evolution of mental and physical health concerns relating to the COVID-19 pandemic among online health forum users. SETTING AND DESIGN: We obtained data from three leading online health forums: HealthBoards, Inspire and HealthUnlocked, from the period 1 January 2020 to 31 May 2020. Using NLP, we analysed the content of posts related to COVID-19. PRIMARY OUTCOME MEASURES: (1) Proportion of forum posts containing COVID-19 keywords; (2) proportion of forum users making their very first post about COVID-19; (3) proportion of COVID-19-related posts containing content related to physical and mental health comorbidities. RESULTS: Data from 739 434 posts created by 53 134 unique users were analysed. A total of 35 581 posts (4.8%) contained a COVID-19 keyword. Posts discussing COVID-19 and related comorbid disorders spiked in early March to mid-March around the time of global implementation of lockdowns prompting a large number of users to post on online health forums for the first time. Over a quarter of COVID-19-related thread titles mentioned a physical or mental health comorbidity. CONCLUSIONS: We demonstrate that it is feasible to characterise the content of online health forum user posts regarding COVID-19 and measure changes over time. The pandemic and corresponding public response has had a significant impact on posters' queries regarding mental health. Social media data sources such as online health forums can be harnessed to strengthen population-level mental health surveillance.


Asunto(s)
COVID-19 , Medios de Comunicación Sociales , Control de Enfermedades Transmisibles , Humanos , Procesamiento de Lenguaje Natural , Pandemias , SARS-CoV-2
7.
Nat Commun ; 11(1): 5420, 2020 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-33110080

RESUMEN

Biomarkers are needed for predicting the effectiveness of disease modifying antirheumatic drugs (DMARDs). Here, using functional lipid mediator profiling and deeply phenotyped patients with early rheumatoid arthritis (RA), we observe that peripheral blood  specialized pro-resolving mediator (SPM) concentrations are linked with both DMARD responsiveness and disease pathotype. Machine learning analysis demonstrates that baseline plasma concentrations of resolvin D4, 10S, 17S-dihydroxy-docosapentaenoic acid, 15R-Lipoxin (LX)A4 and n-3 docosapentaenoic-derived Maresin 1 are predictive of DMARD responsiveness at 6 months. Assessment of circulating SPM concentrations 6-months after treatment initiation establishes that differences between responders and non-responders are maintained, with a decrease in SPM concentrations in patients resistant to DMARD therapy. These findings elucidate the potential utility of  plasma SPM concentrations as biomarkers of DMARD responsiveness in RA.


Asunto(s)
Antirreumáticos/administración & dosificación , Artritis Reumatoide/sangre , Artritis Reumatoide/tratamiento farmacológico , Líquido Sinovial/efectos de los fármacos , Antirreumáticos/sangre , Artritis Reumatoide/patología , Estudios de Cohortes , Ácidos Docosahexaenoicos/sangre , Ácidos Grasos Insaturados/sangre , Humanos , Lipoxinas/sangre , Resultado del Tratamiento
8.
NPJ Breast Cancer ; 6: 38, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32885042

RESUMEN

Widespread mammographic screening programs and improved self-monitoring allow for breast cancer to be detected earlier than ever before. Breast-conserving surgery is a successful treatment for select women. However, up to 40% of women develop local recurrence after surgery despite apparently tumor-free margins. This suggests that morphologically normal breast may harbor early alterations that contribute to increased risk of cancer recurrence. We conducted a comprehensive transcriptomic and proteomic analysis to characterize 57 fresh-frozen tissues from breast cancers and matched histologically normal tissues resected proximal to (<2 cm) and distant from (5-10 cm) the primary tumor, using tissues from cosmetic reduction mammoplasties as baseline. Four distinct transcriptomic subtypes are identified within matched normal tissues: metabolic; immune; matrisome/epithelial-mesenchymal transition, and non-coding enriched. Key components of the subtypes are supported by proteomic and tissue composition analyses. We find that the metabolic subtype is associated with poor prognosis (p < 0.001, HR6.1). Examination of genes representing the metabolic signature identifies several genes able to prognosticate outcome from histologically normal tissues. A subset of these have been reported for their predictive ability in cancer but, to the best of our knowledge, these have not been reported altered in matched normal tissues. This study takes an important first step toward characterizing matched normal tissues resected at pre-defined margins from the primary tumor. Unlocking the predictive potential of unexcised tissue could prove key to driving the realization of personalized medicine for breast cancer patients, allowing for more biologically-driven analyses of tissue margins than morphology alone.

9.
Nat Biotechnol ; 38(4): 493-502, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31959955

RESUMEN

Understanding how oncogenic mutations rewire regulatory-protein networks is important for rationalizing the mechanisms of oncogenesis and for individualizing anticancer treatments. We report a chemical phosphoproteomics method to elucidate the topology of kinase-signaling networks in mammalian cells. We identified >6,000 protein phosphorylation sites that can be used to infer >1,500 kinase-kinase interactions and devised algorithms that can reconstruct kinase network topologies from these phosphoproteomics data. Application of our methods to primary acute myeloid leukemia and breast cancer tumors quantified the relationship between kinase expression and activity, and enabled the identification of hitherto unknown kinase network topologies associated with drug-resistant phenotypes or specific genetic mutations. Using orthogonal methods we validated that PIK3CA wild-type cells adopt MAPK-dependent circuitries in breast cancer cells and that the kinase TTK is important in acute myeloid leukemia. Our phosphoproteomic signatures of network circuitry can identify kinase topologies associated with both phenotypes and genotypes of cancer cells.


Asunto(s)
Neoplasias/metabolismo , Fosfotransferasas/metabolismo , Algoritmos , Biomarcadores de Tumor/antagonistas & inhibidores , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Resistencia a Antineoplásicos , Genómica , Humanos , Neoplasias/genética , Neoplasias/patología , Fosforilación/efectos de los fármacos , Fosfotransferasas/antagonistas & inhibidores , Fosfotransferasas/genética , Inhibidores de Proteínas Quinasas/farmacología , Proteómica , Transducción de Señal , Células Tumorales Cultivadas
10.
Nucleic Acids Res ; 46(D1): D1223-D1228, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-30053269

RESUMEN

PITDB is a freely available database of translated genomic elements (TGEs) that have been observed in PIT (proteomics informed by transcriptomics) experiments. In PIT, a sample is analyzed using both RNA-seq transcriptomics and proteomic mass spectrometry. Transcripts assembled from RNA-seq reads are used to create a library of sample-specific amino acid sequences against which the acquired mass spectra are searched, permitting detection of any TGE, not just those in canonical proteome databases. At the time of writing, PITDB contains over 74 000 distinct TGEs from four species, supported by more than 600 000 peptide spectrum matches. The database, accessible via http://pitdb.org, provides supporting evidence for each TGE, often from multiple experiments and an indication of the confidence in the TGE's observation and its type, ranging from known protein (exact match to a UniProt protein sequence), through multiple types of protein variant including various splice isoforms, to a putative novel molecule. PITDB's modern web interface allows TGEs to be viewed individually or by species or experiment, and downloaded for further analysis. PITDB is for bench scientists seeking to share their PIT results, for researchers investigating novel genome products in model organisms and for those wishing to construct proteomes for lesser studied species.


Asunto(s)
Bases de Datos Factuales , Proteínas/química , Proteínas/genética , Análisis de Secuencia de ARN , Algoritmos , Secuencia de Aminoácidos , Animales , Presentación de Datos , Humanos , Internet , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Isoformas de Proteínas/genética , Proteómica/métodos , Espectrometría de Masas en Tándem , Interfaz Usuario-Computador
11.
Nucleic Acids Res ; 46(10): 4893-4902, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29718325

RESUMEN

Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32 392 polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de Proteínas/genética , Proteómica/métodos , Aedes/genética , Aedes/metabolismo , Animales , Línea Celular , Quirópteros/genética , Quirópteros/metabolismo , Codón Iniciador , Humanos , Proteínas de Insectos/genética , Proteínas de Insectos/metabolismo , Ratones , Sistemas de Lectura Abierta , Polimorfismo Genético , Reproducibilidad de los Resultados , Espectrometría de Masas en Tándem , Tirosina/genética
12.
Cancer Discov ; 8(3): 304-319, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29196464

RESUMEN

We have profiled, for the first time, an evolving human metastatic microenvironment by measuring gene expression, matrisome proteomics, cytokine and chemokine levels, cellularity, extracellular matrix organization, and biomechanical properties, all on the same sample. Using biopsies of high-grade serous ovarian cancer metastases that ranged from minimal to extensive disease, we show how nonmalignant cell densities and cytokine networks evolve with disease progression. Multivariate integration of the different components allowed us to define, for the first time, gene and protein profiles that predict extent of disease and tissue stiffness, while also revealing the complexity and dynamic nature of matrisome remodeling during development of metastases. Although we studied a single metastatic site from one human malignancy, a pattern of expression of 22 matrisome genes distinguished patients with a shorter overall survival in ovarian and 12 other primary solid cancers, suggesting that there may be a common matrix response to human cancer.Significance: Conducting multilevel analysis with data integration on biopsies with a range of disease involvement identifies important features of the evolving tumor microenvironment. The data suggest that despite the large spectrum of genomic alterations, some human malignancies may have a common and potentially targetable matrix response that influences the course of disease. Cancer Discov; 8(3); 304-19. ©2017 AACR.This article is highlighted in the In This Issue feature, p. 253.


Asunto(s)
Matriz Extracelular/metabolismo , Matriz Extracelular/patología , Regulación Neoplásica de la Expresión Génica , Neoplasias Ováricas/patología , Microambiente Tumoral/fisiología , Biomarcadores de Tumor/metabolismo , Recuento de Células , Citocinas/metabolismo , Matriz Extracelular/genética , Femenino , Humanos , Neoplasias Ováricas/genética , Neoplasias Ováricas/mortalidad , Pronóstico , Microambiente Tumoral/genética
13.
BMC Genomics ; 18(1): 101, 2017 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-28103802

RESUMEN

BACKGROUND: Aedes aegypti is a vector for the (re-)emerging human pathogens dengue, chikungunya, yellow fever and Zika viruses. Almost half of the Ae. aegypti genome is comprised of transposable elements (TEs). Transposons have been linked to diverse cellular processes, including the establishment of viral persistence in insects, an essential step in the transmission of vector-borne viruses. However, up until now it has not been possible to study the overall proteome derived from an organism's mobile genetic elements, partly due to the highly divergent nature of TEs. Furthermore, as for many non-model organisms, incomplete genome annotation has hampered proteomic studies on Ae. aegypti. RESULTS: We analysed the Ae. aegypti proteome using our new proteomics informed by transcriptomics (PIT) technique, which bypasses the need for genome annotation by identifying proteins through matched transcriptomic (rather than genomic) data. Our data vastly increase the number of experimentally confirmed Ae. aegypti proteins. The PIT analysis also identified hotspots of incomplete genome annotation, and showed that poor sequence and assembly quality do not explain all annotation gaps. Finally, in a proof-of-principle study, we developed criteria for the characterisation of proteomically active TEs. Protein expression did not correlate with a TE's genomic abundance at different levels of classification. Most notably, long terminal repeat (LTR) retrotransposons were markedly enriched compared to other elements. PIT was superior to 'conventional' proteomic approaches in both our transposon and genome annotation analyses. CONCLUSIONS: We present the first proteomic characterisation of an organism's repertoire of mobile genetic elements, which will open new avenues of research into the function of transposon proteins in health and disease. Furthermore, our study provides a proof-of-concept that PIT can be used to evaluate a genome's annotation to guide annotation efforts which has the potential to improve the efficiency of annotation projects in non-model organisms. PIT therefore represents a valuable new tool to study the biology of the important vector species Ae. aegypti, including its role in transmitting emerging viruses of global public health concern.


Asunto(s)
Aedes/metabolismo , Elementos Transponibles de ADN/genética , Genoma , Proteoma/análisis , Proteómica/métodos , Aedes/genética , Animales , Línea Celular , Cromatografía Líquida de Alta Presión , Mapeo Contig , Proteínas de Insectos/análisis , Proteínas de Insectos/aislamiento & purificación , ARN/aislamiento & purificación , ARN/metabolismo , Análisis de Secuencia de ARN , Espectrometría de Masas en Tándem
14.
Metabolomics ; 12(1): 16, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26617479

RESUMEN

Today's researchers have access to an unprecedented range of powerful machine learning tools with which to build models for classifying samples according to their metabolomic profile (e.g. separating diseased samples from healthy controls). However, such powerful tools need to be used with caution and the diagnostic performance of models produced by them should be rigorously evaluated if their output is to be believed. This involves considerable processing time, and has hitherto required expert knowledge in machine learning. By adopting a constrained nonlinear simplex optimisation for the tuning of support vector machines (SVMs) we have reduced SVM training times more than tenfold compared to a traditional grid search, allowing us to implement a high performance R package that makes it possible for a typical bench scientist to produce powerful SVM ensemble classifiers within a reasonable timescale, with automated bootstrapped training and rigorous permutation testing. This puts a state-of-the-art open source multivariate classification pipeline into the hands of every metabolomics researcher, allowing them to build robust classification models with realistic performance metrics.

15.
Mol Cell Proteomics ; 14(11): 3087-93, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26269333

RESUMEN

With the recent advent of RNA-seq technology the proteomics community has begun to generate sample-specific protein databases for peptide and protein identification, an approach we call proteomics informed by transcriptomics (PIT). This approach has gained a lot of interest, particularly among researchers who work with nonmodel organisms or with particularly dynamic proteomes such as those observed in developmental biology and host-pathogen studies. PIT has been shown to improve coverage of known proteins, and to reveal potential novel gene products. However, many groups are impeded in their use of PIT by the complexity of the required data analysis. Necessarily, this analysis requires complex integration of a number of different software tools from at least two different communities, and because PIT has a range of biological applications a single software pipeline is not suitable for all use cases. To overcome these problems, we have created GIO, a software system that uses the well-established Galaxy platform to make PIT analysis available to the typical bench scientist via a simple web interface. Within GIO we provide workflows for four common use cases: a standard search against a reference proteome; PIT protein identification without a reference genome; PIT protein identification using a genome guide; and PIT genome annotation. These workflows comprise individual tools that can be reconfigured and rearranged within the web interface to create new workflows to support additional use cases.


Asunto(s)
Proteómica/métodos , Programas Informáticos , Transcriptoma , Algoritmos , Minería de Datos , Bases de Datos de Proteínas , Humanos , Espectrometría de Masas/estadística & datos numéricos , Flujo de Trabajo
16.
Proteomics ; 15(18): 3152-62, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26037908

RESUMEN

The mzQuantML standard has been developed by the Proteomics Standards Initiative for capturing, archiving and exchanging quantitative proteomic data, derived from mass spectrometry. It is a rich XML-based format, capable of representing data about two-dimensional features from LC-MS data, and peptides, proteins or groups of proteins that have been quantified from multiple samples. In this article we report the development of an open source Java-based library of routines for mzQuantML, called the mzqLibrary, and associated software for visualising data called the mzqViewer. The mzqLibrary contains routines for mapping (peptide) identifications on quantified features, inference of protein (group)-level quantification values from peptide-level values, normalisation and basic statistics for differential expression. These routines can be accessed via the command line, via a Java programming interface access or a basic graphical user interface. The mzqLibrary also contains several file format converters, including import converters (to mzQuantML) from OpenMS, Progenesis LC-MS and MaxQuant, and exporters (from mzQuantML) to other standards or useful formats (mzTab, HTML, csv). The mzqViewer contains in-built routines for viewing the tables of data (about features, peptides or proteins), and connects to the R statistical library for more advanced plotting options. The mzqLibrary and mzqViewer packages are available from https://code.google.com/p/mzq-lib/.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas/normas , Proteómica/métodos , Proteómica/normas , Programas Informáticos
17.
Biochim Biophys Acta ; 1844(1 Pt A): 88-97, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23584085

RESUMEN

The Human Proteome Organisation - Proteomics Standards Initiative (HUPO-PSI) has been working for ten years on the development of standardised formats that facilitate data sharing and public database deposition. In this article, we review three HUPO-PSI data standards - mzML, mzIdentML and mzQuantML, which can be used to design a complete quantitative analysis pipeline in mass spectrometry (MS)-based proteomics. In this tutorial, we briefly describe the content of each data model, sufficient for bioinformaticians to devise proteomics software. We also provide guidance on the use of recently released application programming interfaces (APIs) developed in Java for each of these standards, which makes it straightforward to read and write files of any size. We have produced a set of example Java classes and a basic graphical user interface to demonstrate how to use the most important parts of the PSI standards, available from http://code.google.com/p/psi-standard-formats-tutorial. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.


Asunto(s)
Proteómica , Programas Informáticos , Biología Computacional , Humanos , Lenguajes de Programación
18.
Inflamm Bowel Dis ; 19(10): 2069-78, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23867873

RESUMEN

BACKGROUND: The aim of this study was to determine whether volatile organic compounds (VOCs) present in the headspace of feces could be used to diagnose or distinguish between chronic diseases of the gastrointestinal tract and apparently healthy volunteers. METHODS: A total of 87 people were recruited, divided between 4 categories: healthy volunteers (n = 19), Crohn's disease (n = 22), ulcerative colitis (n = 20), and irritable bowel syndrome (n = 26). They each supplied fecal samples before, and except for the healthy volunteers, after treatment. Fecal samples were incubated in a sample bag with added purified air at 40°C and headspace samples were taken and concentrated on thermal sorption tubes. Gas chromatography-mass spectrometry then desorbed and analyzed these. The concentrations of a selection of high-abundance compounds were determined and assessed for differences in concentration between the groups. RESULTS: Crohn's disease samples showed significant elevations in the concentrations of ester and alcohol derivates of short-chain fatty acids and indole compared with the other groups; indole and phenol were elevated in ulcerative colitis and irritable bowel syndrome but not at a statistically significant level. After treatment, the levels of many of the VOCs were significantly reduced and were more similar to those concentrations in healthy controls. CONCLUSIONS: The abundance of a number of VOCs in feces differs markedly between Crohn's disease and other gastrointestinal conditions. Following treatment, the VOC profile is altered to more closely resemble that of healthy volunteers.


Asunto(s)
Infecciones Bacterianas/diagnóstico , Colitis Ulcerosa/diagnóstico , Enfermedad de Crohn/diagnóstico , Heces/química , Síndrome del Colon Irritable/diagnóstico , Compuestos Orgánicos Volátiles/análisis , Bacterias/aislamiento & purificación , Infecciones Bacterianas/microbiología , Estudios de Casos y Controles , Enfermedad Crónica , Colitis Ulcerosa/microbiología , Enfermedad de Crohn/microbiología , Heces/microbiología , Femenino , Estudios de Seguimiento , Cromatografía de Gases y Espectrometría de Masas , Voluntarios Sanos , Humanos , Síndrome del Colon Irritable/microbiología , Masculino , Pronóstico
19.
Methods Mol Biol ; 1007: 219-35, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23666728

RESUMEN

Selected reaction monitoring (SRM) is becoming the tool of choice for targeted quantitative proteomics, with applications as diverse as clinical diagnostics and systems biology. Assay design is critical to the success of every SRM experiment. For each protein of interest it is necessary to find a set of peptides that can be monitored as surrogates for that protein. These peptides must satisfy a number of criteria, including uniqueness in the proteome, detectability by mass spectrometry, and suitability of product ion series. Finding peptides that meet all these criteria is time consuming, especially when seeking to quantify multiple proteins in a single run. In response to these challenges, a number of groups have developed freely available tools to assist in the process of SRM assay design-these include databases, online tools, and stand-alone software. This chapter introduces some of these tools and explains how they can help to facilitate reliable SRM experiments.


Asunto(s)
Simulación por Computador , Proteínas/análisis , Proteómica/métodos , Proyectos de Investigación , Bases de Datos de Proteínas , Internet , Péptidos/análisis , Proteínas/química , Proteoma , Programas Informáticos , Biología de Sistemas/métodos
20.
Mol Cell Proteomics ; 12(8): 2332-40, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23599424

RESUMEN

The range of heterogeneous approaches available for quantifying protein abundance via mass spectrometry (MS)(1) leads to considerable challenges in modeling, archiving, exchanging, or submitting experimental data sets as supplemental material to journals. To date, there has been no widely accepted format for capturing the evidence trail of how quantitative analysis has been performed by software, for transferring data between software packages, or for submitting to public databases. In the context of the Proteomics Standards Initiative, we have developed the mzQuantML data standard. The standard can represent quantitative data about regions in two-dimensional retention time versus mass/charge space (called features), peptides, and proteins and protein groups (where there is ambiguity regarding peptide-to-protein inference), and it offers limited support for small molecule (metabolomic) data. The format has structures for representing replicate MS runs, grouping of replicates (for example, as study variables), and capturing the parameters used by software packages to arrive at these values. The format has the capability to reference other standards such as mzML and mzIdentML, and thus the evidence trail for the MS workflow as a whole can now be described. Several software implementations are available, and we encourage other bioinformatics groups to use mzQuantML as an input, internal, or output format for quantitative software and for structuring local repositories. All project resources are available in the public domain from the HUPO Proteomics Standards Initiative http://www.psidev.info/mzquantml.


Asunto(s)
Espectrometría de Masas/normas , Proteómica/normas , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Modelos Teóricos , Proteómica/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...