Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
PLoS One ; 13(9): e0204425, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30261000

RESUMEN

MOTIVATION: The measurement of disease biomarkers in easily-obtained bodily fluids has opened the door to a new type of non-invasive medical diagnostics. New technologies are being developed and fine-tuned in order to make this possibility a reality. One such technology is Field Asymmetric Ion Mobility Spectrometry (FAIMS), which allows the measurement of volatile organic compounds (VOCs) in biological samples such as urine. These VOCs are known to contain a range of information on the relevant person's metabolism and can in principle be used for disease diagnostic purposes. Key to the effective use of such data are well-developed data processing pipelines, which are necessary to extract the most useful data from the complex underlying biological structure. RESULTS: In this study, we present a new data analysis pipeline for FAIMS data, and demonstrate a number of improvements over previously used methods. We evaluate the effect of a series of candidate operational steps during data processing, such as the use of wavelet transforms, principal component analysis (PCA), and classifier ensembles. We also demonstrate the use of FAIMS data in our pipeline to diagnose diabetes on the basis of a simple urine sample using machine learning classifiers. We present results for data generated from a case-control study of 115 urine samples, collected from 72 type II diabetic patients, with 43 healthy volunteers as negative controls. The resulting pipeline combines the steps that resulted in the best classification model performance. These include the use of a two-dimensional discrete wavelet transform, and the Wilcoxon rank-sum test for feature selection. We are able to achieve a best ROC curve AUC of 0.825 (0.747-0.9, 95% CI) for classification of diabetes vs control. We also note that this result is robust to changes in the data pipeline and different analysis runs, with AUC > 0.80 achieved in a range of cases. This is a substantial improvement in performance over previously used data processing methods in this area. Our ability to make strong statements about FAIMS ability to diagnose diabetes is sadly limited, as we found confounding effects from the demographics when including these data in the pipeline. The demographics alone produced a best AUC of 0.87 (0.795-0.94, 95% CI). While the combination of the demographics and FAIMS data resulted in an improvement on the AUC (0.907; 0.848-0.97, 95% CI), it did not prove to be a significant difference. Nevertheless, the pipeline itself shows a significant improvement in performance over more basic methods which have been used with FAIMS data in the past.


Asunto(s)
Diabetes Mellitus/orina , Diagnóstico por Computador/métodos , Compuestos Orgánicos Volátiles/orina , Área Bajo la Curva , Biomarcadores/orina , Femenino , Humanos , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Proyectos Piloto
2.
PLoS One ; 12(12): e0188879, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29252995

RESUMEN

OBJECTIVES: New point of care diagnostics are urgently needed to reduce the over-prescription of antimicrobials for bacterial respiratory tract infection (RTI). We performed a pilot cross sectional study to assess the feasibility of gas-capillary column ion mobility spectrometer (GC-IMS), for the analysis of volatile organic compounds (VOC) in exhaled breath to diagnose bacterial RTI in hospital inpatients. METHODS: 71 patients were prospectively recruited from the Acute Medical Unit of the Royal Liverpool University Hospital between March and May 2016 and classified as confirmed or probable bacterial or viral RTI on the basis of microbiologic, biochemical and radiologic testing. Breath samples were collected at the patient's bedside directly into the electronic nose device, which recorded a VOC spectrum for each sample. Sparse principal component analysis and sparse logistic regression were used to develop a diagnostic model to classify VOC spectra as being caused by bacterial or non-bacterial RTI. RESULTS: Summary area under the receiver operator characteristic curve was 0.73 (95% CI 0.61-0.86), summary sensitivity and specificity were 62% (95% CI 41-80%) and 80% (95% CI 64-91%) respectively (p = 0.00147). CONCLUSIONS: GC-IMS analysis of exhaled VOC for the diagnosis of bacterial RTI shows promise in this pilot study and further trials are warranted to assess this technique.


Asunto(s)
Infecciones Bacterianas/diagnóstico , Nariz Electrónica , Metabolómica , Infecciones del Sistema Respiratorio/diagnóstico , Compuestos Orgánicos Volátiles/análisis , Anciano , Infecciones Bacterianas/microbiología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Proyectos Piloto , Curva ROC , Infecciones del Sistema Respiratorio/microbiología
3.
Arthritis Res Ther ; 18(1): 250, 2016 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-27788684

RESUMEN

BACKGROUND: There is currently no blood-based test for detection of early-stage osteoarthritis (OA) and the anti-cyclic citrullinated peptide (CCP) antibody test for rheumatoid arthritis (RA) has relatively low sensitivity for early-stage disease. Morbidity in arthritis could be markedly decreased if early-stage arthritis could be routinely detected and classified by clinical chemistry test. We hypothesised that damage to proteins of the joint by oxidation, nitration and glycation, and with signatures released in plasma as oxidized, nitrated and glycated amino acids may facilitate early-stage diagnosis and typing of arthritis. METHODS: Patients with knee joint early-stage and advanced OA and RA or other inflammatory joint disease (non-RA) and healthy subjects with good skeletal health were recruited for the study (n = 225). Plasma/serum and synovial fluid was analysed for oxidized, nitrated and glycated proteins and amino acids by quantitative liquid chromatography-tandem mass spectrometry. Data-driven machine learning methods were employed to explore diagnostic utility of the measurements for detection and classifying early-stage OA and RA, non-RA and good skeletal health with training set and independent test set cohorts. RESULTS: Glycated, oxidized and nitrated proteins and amino acids were detected in synovial fluid and plasma of arthritic patients with characteristic patterns found in early and advanced OA and RA, and non-RA, with respect to healthy controls. In early-stage disease, two algorithms for consecutive use in diagnosis were developed: (1) disease versus healthy control, and (2) classification as OA, RA and non-RA. The algorithms featured 10 damaged amino acids in plasma, hydroxyproline and anti-CCP antibody status. Sensitivities/specificities were: (1) good skeletal health, 0.92/0.91; (2) early-stage OA, 0.92/0.90; early-stage RA, 0.80/0.78; and non-RA, 0.70/0.65 (training set). These were confirmed in independent test set validation. Damaged amino acids increased further in severe and advanced OA and RA. CONCLUSIONS: Oxidized, nitrated and glycated amino acids combined with hydroxyproline and anti-CCP antibody status provided a plasma-based biochemical test of relatively high sensitivity and specificity for early-stage diagnosis and typing of arthritic disease.


Asunto(s)
Biomarcadores/sangre , Diagnóstico Precoz , Osteoartritis de la Rodilla/diagnóstico , Procesamiento Proteico-Postraduccional , Adulto , Anciano , Algoritmos , Aminoácidos/metabolismo , Área Bajo la Curva , Cromatografía Liquida , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Nitrosación , Osteoartritis de la Rodilla/sangre , Oxidación-Reducción , Estrés Oxidativo , Curva ROC , Sensibilidad y Especificidad , Espectrometría de Masas en Tándem
4.
Tuberculosis (Edinb) ; 99: 143-146, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27450016

RESUMEN

Tuberculosis (TB) remains one of the world's major health burdens with 9.6 million new infections globally. Though considerable progress has been made in reduction of TB incidence and mortality, there is a continuous need for lower cost, simpler and more robust means of diagnosis. One method that may fulfil these requirements is in the area of breath analysis. In this study we analysed the breath of 21 patients with pulmonary or extra-pulmonary TB, recruited from a UK teaching hospital (University Hospital Coventry and Warwickshire) before or within 1 week of commencing treatment for TB. TB diagnosis was confirmed by reference tests (mycobacterial culture), histology or radiology. 19 controls were recruited to calculate specificity; these patients were all interferon-gamma release assay negative (T.SPOT(®).TB, Oxford Immunotec Ltd.). Whole breath samples were collected with subsequent chemical analysis undertaken by Ion Mobility Spectrometry. Our results produced a sensitivity of 81% and a specificity of 79% for all cases of TB (pulmonary and extra-pulmonary). Though lower than other studies analysing pulmonary TB alone, we believe that this technique shows promise, and a higher sensitivity could be achieved by further improving our sample capture methodology.


Asunto(s)
Pruebas Respiratorias/métodos , Iones , Mycobacterium tuberculosis/patogenicidad , Tuberculosis Pulmonar/diagnóstico , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Antituberculosos/uso terapéutico , Área Bajo la Curva , Técnicas Bacteriológicas , Pruebas Respiratorias/instrumentación , Estudios de Casos y Controles , Inglaterra , Diseño de Equipo , Femenino , Hospitales de Enseñanza , Humanos , Ensayos de Liberación de Interferón gamma , Masculino , Persona de Mediana Edad , Movimiento (Física) , Mycobacterium tuberculosis/efectos de los fármacos , Proyectos Piloto , Valor Predictivo de las Pruebas , Curva ROC , Reproducibilidad de los Resultados , Análisis Espectral , Tuberculosis Pulmonar/tratamiento farmacológico , Tuberculosis Pulmonar/microbiología , Adulto Joven
5.
R Soc Open Sci ; 3(2): 140501, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26998311

RESUMEN

Predicting response to treatment and disease-specific deaths are key tasks in cancer research yet there is a lack of methodologies to achieve these. Large-scale 'omics and digital pathology technologies have led to the need for effective statistical methods for data fusion to extract the most useful patterns from these diverse data types. We present FusionGP, a method for combining heterogeneous data types designed specifically for predicting outcome of treatment and disease. FusionGP is a Gaussian process model that includes a generalization of feature selection for biomarker discovery, allowing for simultaneous, sparse feature selection across multiple data types. Importantly, it can accommodate highly nonlinear structure in the data, and automatically infers the optimal contribution from each input data type. FusionGP compares favourably to several popular classification methods, including the Random Forest classifier, a stepwise logistic regression model and the Support Vector Machine on single data types. By combining gene expression, copy number alteration and digital pathology image data in 119 estrogen receptor (ER)-negative and 345 ER-positive breast tumours, we aim to predict two important clinical outcomes: death and chemoinsensitivity. While gene expression data give the best predictive performance in the majority of cases, the digital pathology data are much better for predicting death in ER cases. Thus, FusionGP is a new tool for selecting informative features from heterogeneous data types and predicting treatment response and prognosis.

6.
PLoS One ; 11(2): e0149756, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26901314

RESUMEN

BACKGROUND: Highly sensitive and specific urine-based tests to detect either primary or recurrent bladder cancer have proved elusive to date. Our ever increasing knowledge of the genomic aberrations in bladder cancer should enable the development of such tests based on urinary DNA. METHODS: DNA was extracted from urine cell pellets and PCR used to amplify the regions of the TERT promoter and coding regions of FGFR3, PIK3CA, TP53, HRAS, KDM6A and RXRA which are frequently mutated in bladder cancer. The PCR products were barcoded, pooled and paired-end 2 x 250 bp sequencing performed on an Illumina MiSeq. Urinary DNA was analysed from 20 non-cancer controls, 120 primary bladder cancer patients (41 pTa, 40 pT1, 39 pT2+) and 91 bladder cancer patients post-TURBT (89 cancer-free). RESULTS: Despite the small quantities of DNA extracted from some urine cell pellets, 96% of the samples yielded mean read depths >500. Analysing only previously reported point mutations, TERT mutations were found in 55% of patients with bladder cancer (independent of stage), FGFR3 mutations in 30% of patients with bladder cancer, PIK3CA in 14% and TP53 mutations in 12% of patients with bladder cancer. Overall, these previously reported bladder cancer mutations were detected in 86 out of 122 bladder cancer patients (70% sensitivity) and in only 3 out of 109 patients with no detectable bladder cancer (97% specificity). CONCLUSION: This simple, cost-effective approach could be used for the non-invasive surveillance of patients with non-muscle-invasive bladder cancers harbouring these mutations. The method has a low DNA input requirement and can detect low levels of mutant DNA in a large excess of normal DNA. These genes represent a minimal biomarker panel to which extra markers could be added to develop a highly sensitive diagnostic test for bladder cancer.


Asunto(s)
ADN de Neoplasias , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reacción en Cadena de la Polimerasa Multiplex/métodos , Mutación , Proteínas de Neoplasias/genética , Neoplasias de la Vejiga Urinaria , Anciano , Anciano de 80 o más Años , ADN de Neoplasias/genética , ADN de Neoplasias/orina , Femenino , Humanos , Masculino , Sensibilidad y Especificidad , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/orina
7.
BMC Syst Biol ; 9: 76, 2015 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-26553024

RESUMEN

BACKGROUND: Cytokine-hormone network deregulations underpin pathologies ranging from autoimmune disorders to cancer, but our understanding of these networks in physiological/pathophysiological states remains patchy. We employed Bayesian networks to analyze cytokine-hormone interactions in vivo using murine lactation as a dynamic, physiological model system. RESULTS: Circulatory levels of estrogen, progesterone, prolactin and twenty-three cytokines were profiled in post partum mice with/without pups. The resultant networks were very robust and assembled about structural hubs, with evidence that interleukin (IL)-12 (p40), IL-13 and monocyte chemoattractant protein (MCP)-1 were the primary drivers of network behavior. Network structural conservation across physiological scenarios coupled with the successful empirical validation of our approach suggested that in silico network perturbations can predict in vivo qualitative responses. In silico perturbation of network components also captured biological features of cytokine interactions (antagonism, synergy, redundancy). CONCLUSION: These findings highlight the potential of network-based approaches in identifying novel cytokine pharmacological targets and in predicting the effects of their exogenous manipulation in inflammatory/immune disorders.


Asunto(s)
Quimiocina CCL2/metabolismo , Citocinas/metabolismo , Interleucina-12/metabolismo , Interleucina-13/metabolismo , Modelos Biológicos , Animales , Teorema de Bayes , Femenino , Hormonas/sangre , Lactancia/fisiología , Ratones , Periodo Posparto , Mapas de Interacción de Proteínas
8.
J Gastrointestin Liver Dis ; 24(2): 197-201, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26114180

RESUMEN

BACKGROUND & AIMS: Non-Alcoholic Fatty Liver Disease (NAFLD) is the commonest cause of chronic liver disease in the western world. Current diagnostic methods including Fibroscan have limitations, thus there is a need for more robust non-invasive screening methods. The gut microbiome is altered in several gastrointestinal and hepatic disorders resulting in altered, unique gut fermentation patterns, detectable by analysis of volatile organic compounds (VOCs) in urine, breath and faeces. We performed a proof of principle pilot study to determine if progressive fatty liver disease produced an altered urinary VOC pattern; specifically NAFLD and Non-Alcoholic Steatohepatitis (NASH). METHODS: 34 patients were recruited: 8 NASH cirrhotics (NASH-C); 7 non-cirrhotic NASH; 4 NAFLD and 15 controls. Urine was collected and stored frozen. For assay, the samples were defrosted and aliquoted into vials, which were heated to 40±0.1°C and the headspace analyzed by FAIMS (Field Asymmetric Ion Mobility Spectroscopy). A previously used data processing pipeline employing a Random Forrest classification algorithm and using a 10 fold cross validation method was applied. RESULTS: Urinary VOC results demonstrated sensitivity of 0.58 (0.33 - 0.88), but specificity of 0.93 (0.68 - 1.00) and an Area Under Curve (AUC) 0.73 (0.55 - 0.90) to distinguish between liver disease and controls. However, NASH/NASH-C was separated from the NAFLD/controls with a sensitivity of 0.73 (0.45 - 0.92), specificity of 0.79 (0.54 - 0.94) and AUC of 0.79 (0.64 - 0.95), respectively. CONCLUSIONS: This pilot study suggests that urinary VOCs detection may offer the potential for early non-invasive characterisation of liver disease using 'smell prints' to distinguish between NASH and NAFLD.


Asunto(s)
Enfermedad del Hígado Graso no Alcohólico/orina , Compuestos Orgánicos Volátiles/orina , Anciano , Área Bajo la Curva , Biomarcadores/orina , Estudios de Casos y Controles , Diagnóstico Diferencial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/diagnóstico , Proyectos Piloto , Valor Predictivo de las Pruebas , Estudios Prospectivos , Curva ROC , Análisis Espectral , Urinálisis
9.
BMC Cancer ; 15: 117, 2015 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-25886033

RESUMEN

BACKGROUND: Patient response to chemotherapy for ovarian cancer is extremely heterogeneous and there are currently no tools to aid the prediction of sensitivity or resistance to chemotherapy and allow treatment stratification. Such a tool could greatly improve patient survival by identifying the most appropriate treatment on a patient-specific basis. METHODS: PubMed was searched for studies predicting response or resistance to chemotherapy using gene expression measurements of human tissue in ovarian cancer. RESULTS: 42 studies were identified and both the data collection and modelling methods were compared. The majority of studies utilised fresh-frozen or formalin-fixed paraffin-embedded tissue. Modelling techniques varied, the most popular being Cox proportional hazards regression and hierarchical clustering which were used by 17 and 11 studies respectively. The gene signatures identified by the various studies were not consistent, with very few genes being identified by more than two studies. Patient cohorts were often noted to be heterogeneous with respect to chemotherapy treatment undergone by patients. CONCLUSIONS: A clinically applicable gene signature capable of predicting patient response to chemotherapy has not yet been identified. Research into a predictive, as opposed to prognostic, model could be highly beneficial and aid the identification of the most suitable treatment for patients.


Asunto(s)
Antineoplásicos/uso terapéutico , Resistencia a Antineoplásicos/efectos de los fármacos , Neoplasias Ováricas/tratamiento farmacológico , Animales , Antineoplásicos/farmacología , Resistencia a Antineoplásicos/genética , Femenino , Humanos , Neoplasias Ováricas/diagnóstico , Neoplasias Ováricas/genética , Valor Predictivo de las Pruebas
10.
Am J Gastroenterol ; 110(4): 588-94, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25823766

RESUMEN

OBJECTIVES: A rapid test to diagnose Clostridium difficile infection (CDI) on hospital wards could minimize common but critical diagnostic delay. Field asymmetric ion mobility spectrometry (FAIMS) is a portable mass spectrometry instrument that quickly analyses the chemical composition of gaseous mixtures (e.g., above a stool sample). Can FAIMS accurately distinguish C. difficile-positive from -negative stool samples? METHODS: We analyzed 213 stool samples with FAIMS, of which 71 were C. difficile positive by microbiological analysis. The samples were divided into training, test, and validation samples. We used the training and test samples (n=135) to identify which sample characteristics discriminate between positive and negative samples, and to build machine learning algorithms interpreting these characteristics. The best performing algorithm was then prospectively validated on new, blinded validation samples (n=78). The predicted probability of CDI (as calculated by the algorithm) was compared with the microbiological test results (direct toxin test and culture). RESULTS: Using a Random Forest classification algorithm, FAIMS had a high discriminatory ability on the training and test samples (C-statistic 0.91 (95% confidence interval (CI): 0.86-0.97)). When applied to the blinded validation samples, the C-statistic was 0.86 (0.75-0.97). For samples analyzed ≤7 days of collection (n=76), diagnostic accuracy was even higher (C-statistic: 0.93 (0.85-1.00)). A cutoff value of 0.32 for predicted probability corresponded with a sensitivity of 92.3% (95% CI: 77.4-98.6%) and specificity of 86.0% (78.3-89.3%). For even fresher samples, discriminatory ability further increased. CONCLUSIONS: FAIMS analysis of unprocessed stool samples can differentiate between C. difficile-positive and -negative samples with high diagnostic accuracy.


Asunto(s)
Algoritmos , Clostridioides difficile/aislamiento & purificación , Enterocolitis Seudomembranosa/diagnóstico , Heces/microbiología , Análisis Espectral/métodos , Infecciones por Clostridium/diagnóstico , Enterocolitis Seudomembranosa/microbiología , Heces/química , Humanos , Sistemas de Atención de Punto , Estudios Prospectivos , Proyectos de Investigación , Sensibilidad y Especificidad , Análisis Espectral/instrumentación
11.
Sci Rep ; 5: 9259, 2015 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-25788417

RESUMEN

There is currently no biochemical test for detection of early-stage osteoarthritis (eOA). Tests for early-stage rheumatoid arthritis (eRA) such as rheumatoid factor (RF) and anti-cyclic citrullinated peptide (CCP) antibodies require refinement to improve clinical utility. We developed robust mass spectrometric methods to quantify citrullinated protein (CP) and free hydroxyproline in body fluids. We detected CP in the plasma of healthy subjects and surprisingly found that CP was increased in both patients with eOA and eRA whereas anti-CCP antibodies were predominantly present in eRA. A 4-class diagnostic algorithm combining plasma/serum CP, anti-CCP antibody and hydroxyproline applied to a cohort gave specific and sensitive detection and discrimination of eOA, eRA, other non-RA inflammatory joint diseases and good skeletal health. This provides a first-in-class plasma/serum-based biochemical assay for diagnosis and type discrimination of early-stage arthritis to facilitate improved treatment and patient outcomes, exploiting citrullinated protein and related differential autoimmunity.


Asunto(s)
Artritis Reumatoide/diagnóstico , Biomarcadores/análisis , Enfermedades Musculoesqueléticas/diagnóstico , Osteoartritis/diagnóstico , Espectrometría de Masas en Tándem , Adulto , Anciano , Algoritmos , Área Bajo la Curva , Autoanticuerpos/sangre , Cromatografía Líquida de Alta Presión , Citrulina/química , Citrulina/metabolismo , Diagnóstico Precoz , Femenino , Humanos , Hidroxiprolina/análisis , Hidroxiprolina/sangre , Masculino , Persona de Mediana Edad , Curva ROC , Sensibilidad y Especificidad
12.
PLoS One ; 8(10): e75748, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24194826

RESUMEN

Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica/genética , Modelos Genéticos , Teorema de Bayes , Análisis por Conglomerados , Humanos , Funciones de Verosimilitud , Distribución Normal
13.
PLoS One ; 8(4): e59795, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23565168

RESUMEN

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.


Asunto(s)
Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Biología Computacional/métodos , Internet , Análisis por Micromatrices , Modelos Estadísticos , Factores de Tiempo
14.
Bioinformatics ; 28(24): 3290-7, 2012 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-23047558

RESUMEN

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.


Asunto(s)
Genómica/métodos , Modelos Estadísticos , Teorema de Bayes , Inmunoprecipitación de Cromatina , Análisis por Conglomerados , Expresión Génica , Perfilación de la Expresión Génica/métodos , Distribución Normal , Análisis de Secuencia por Matrices de Oligonucleótidos , Mapeo de Interacción de Proteínas , Saccharomyces cerevisiae/genética , Biología de Sistemas
15.
PLoS Comput Biol ; 7(10): e1002227, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22028636

RESUMEN

Different data types can offer complementary perspectives on the same biological phenomenon. In cancer studies, for example, data on copy number alterations indicate losses and amplifications of genomic regions in tumours, while transcriptomic data point to the impact of genomic and environmental events on the internal wiring of the cell. Fusing different data provides a more comprehensive model of the cancer cell than that offered by any single type. However, biological signals in different patients exhibit diverse degrees of concordance due to cancer heterogeneity and inherent noise in the measurements. This is a particularly important issue in cancer subtype discovery, where personalised strategies to guide therapy are of vital importance. We present a nonparametric Bayesian model for discovering prognostic cancer subtypes by integrating gene expression and copy number variation data. Our model is constructed from a hierarchy of Dirichlet Processes and addresses three key challenges in data fusion: (i) To separate concordant from discordant signals, (ii) to select informative features, (iii) to estimate the number of disease subtypes. Concordance of signals is assessed individually for each patient, giving us an additional level of insight into the underlying disease structure. We exemplify the power of our model in prostate cancer and breast cancer and show that it outperforms competing methods. In the prostate cancer data, we identify an entirely new subtype with extremely poor survival outcome and show how other analyses fail to detect it. In the breast cancer data, we find subtypes with superior prognostic value by using the concordant results. These discoveries were crucially dependent on our model's ability to distinguish concordant and discordant signals within each patient sample, and would otherwise have been missed. We therefore demonstrate the importance of taking a patient-specific approach, using highly-flexible nonparametric Bayesian methods.


Asunto(s)
Teorema de Bayes , Neoplasias de la Mama/mortalidad , Modelos Biológicos , Modelos Estadísticos , Neoplasias de la Próstata/mortalidad , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN/genética , Femenino , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Masculino , Pronóstico , Neoplasias de la Próstata/clasificación , Neoplasias de la Próstata/genética , Transducción de Señal , Estadísticas no Paramétricas , Análisis de Supervivencia
16.
BMC Bioinformatics ; 12: 399, 2011 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-21995452

RESUMEN

BACKGROUND: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. RESULTS: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. CONCLUSIONS: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.


Asunto(s)
Teorema de Bayes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Distribución Normal , Saccharomyces cerevisiae
17.
Bioinformatics ; 26(12): i158-67, 2010 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-20529901

RESUMEN

MOTIVATION: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. RESULTS: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs. AVAILABILITY: If interested in the code for the work presented in this article, please contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Factores de Transcripción/metabolismo , Teorema de Bayes , Sitios de Unión , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
18.
Semin Cell Dev Biol ; 20(7): 863-8, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19682595

RESUMEN

A major challenge in systems biology is the ability to model complex regulatory interactions, such as gene regulatory networks, and a number of computational approaches have been developed over recent years to address this challenge. This paper reviews a number of these approaches, with a focus on probabilistic graphical models and the integration of diverse data sets, such as gene expression and transcription factor binding site location and activity.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Expresión Génica , Redes Reguladoras de Genes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Biología de Sistemas/métodos
19.
BMC Bioinformatics ; 10: 242, 2009 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-19660130

RESUMEN

BACKGROUND: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS: We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. CONCLUSION: Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Diseño de Software , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Análisis por Conglomerados , Análisis de Secuencia por Matrices de Oligonucleótidos , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...