Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 69
1.
Bioelectron Med ; 10(1): 15, 2024 Jun 17.
Article En | MEDLINE | ID: mdl-38880906

BACKGROUND: Vagus nerve stimulation (VNS) is an established therapy for treating a variety of chronic diseases, such as epilepsy, depression, obesity, and for stroke rehabilitation. However, lack of precision and side-effects have hindered its efficacy and extension to new conditions. Achieving a better understanding of the relationship between VNS parameters and neural and physiological responses is therefore necessary to enable the design of personalized dosing procedures and improve precision and efficacy of VNS therapies. METHODS: We used biomarkers from recorded evoked fiber activity and short-term physiological responses (throat muscle, cardiac and respiratory activity) to understand the response to a wide range of VNS parameters in anaesthetised pigs. Using signal processing, Gaussian processes (GP) and parametric regression models we analyse the relationship between VNS parameters and neural and physiological responses. RESULTS: Firstly, we illustrate how considering multiple stimulation parameters in VNS dosing can improve the efficacy and precision of VNS therapies. Secondly, we describe the relationship between different VNS parameters and the evoked fiber activity and show how spatially selective electrodes can be used to improve fiber recruitment. Thirdly, we provide a detailed exploration of the relationship between the activations of neural fiber types and different physiological effects. Finally, based on these results, we discuss how recordings of evoked fiber activity can help design VNS dosing procedures that optimize short-term physiological effects safely and efficiently. CONCLUSION: Understanding of evoked fiber activity during VNS provide powerful biomarkers that could improve the precision, safety and efficacy of VNS therapies.

2.
J Neural Eng ; 21(2)2024 Apr 02.
Article En | MEDLINE | ID: mdl-38479016

Objective.In bioelectronic medicine, neuromodulation therapies induce neural signals to the brain or organs, modifying their function. Stimulation devices capable of triggering exogenous neural signals using electrical waveforms require a complex and multi-dimensional parameter space to control such waveforms. Determining the best combination of parameters (waveform optimization or dosing) for treating a particular patient's illness is therefore challenging. Comprehensive parameter searching for an optimal stimulation effect is often infeasible in a clinical setting due to the size of the parameter space. Restricting this space, however, may lead to suboptimal therapeutic results, reduced responder rates, and adverse effects.Approach. As an alternative to a full parameter search, we present a flexible machine learning, data acquisition, and processing framework for optimizing neural stimulation parameters, requiring as few steps as possible using Bayesian optimization. This optimization builds a model of the neural and physiological responses to stimulations, enabling it to optimize stimulation parameters and provide estimates of the accuracy of the response model. The vagus nerve (VN) innervates, among other thoracic and visceral organs, the heart, thus controlling heart rate (HR), making it an ideal candidate for demonstrating the effectiveness of our approach.Main results.The efficacy of our optimization approach was first evaluated on simulated neural responses, then applied to VN stimulation intraoperatively in porcine subjects. Optimization converged quickly on parameters achieving target HRs and optimizing neural B-fiber activations despite high intersubject variability.Significance.An optimized stimulation waveform was achieved in real time with far fewer stimulations than required by alternative optimization strategies, thus minimizing exposure to side effects. Uncertainty estimates helped avoiding stimulations outside a safe range. Our approach shows that a complex set of neural stimulation parameters can be optimized in real-time for a patient to achieve a personalized precision dosing.


Vagus Nerve Stimulation , Humans , Animals , Swine , Vagus Nerve Stimulation/methods , Bayes Theorem , Vagus Nerve/physiology , Heart , Nerve Fibers, Myelinated
3.
Clin Gastroenterol Hepatol ; 20(11): 2514-2523.e3, 2022 11.
Article En | MEDLINE | ID: mdl-35183768

BACKGROUND & AIMS: Dysplasia in Barrett's esophagus often is invisible on high-resolution white-light endoscopy (HRWLE). We compared the diagnostic accuracy for inconspicuous dysplasia of the combination of autofluorescence imaging (AFI)-guided probe-based confocal laser endomicroscopy (pCLE) and molecular biomarkers vs HRWLE with Seattle protocol biopsies. METHODS: Barrett's esophagus patients with no dysplastic lesions were block-randomized to standard endoscopy (HRWLE with the Seattle protocol) or AFI-guided pCLE with targeted biopsies for molecular biomarkers (p53 and cyclin A by immunohistochemistry; aneuploidy by image cytometry), with crossover to the other arm after 6 to 12 weeks. The primary end point was the histologic diagnosis from all study biopsies (trial histology). A sensitivity analysis was performed for overall histology, which included diagnoses within 12 months from the first study endoscopy. Endoscopists were blinded to the referral endoscopy and histology results. The primary outcome was diagnostic accuracy for dysplasia by real-time pCLE vs HRWLE biopsies. RESULTS: Of 154 patients recruited, 134 completed both arms. In the primary outcome analysis (trial histology analysis), AFI-guided pCLE had similar sensitivity for dysplasia compared with standard endoscopy (74.3%; 95% CI, 56.7-87.5 vs 80.0%; 95% CI, 63.1-91.6; P = .48). Multivariate logistic regression showed pCLE optical dysplasia, aberrant p53, and aneuploidy had the strongest correlation with dysplasia (secondary outcome). This 3-biomarker panel had higher sensitivity for any grade of dysplasia than the Seattle protocol (81.5% vs 51.9%; P < .001) in the overall histology analysis, but not in the trial histology analysis (91.4% vs 80.0%; P = .16), with an area under the receiver operating curve of 0.83. CONCLUSIONS: Seattle protocol biopsies miss dysplasia in approximately half of patients with inconspicuous neoplasia. AFI-guided pCLE has similar accuracy to the current gold standard. The addition of molecular biomarkers could improve diagnostic accuracy.


Barrett Esophagus , Esophageal Neoplasms , Humans , Barrett Esophagus/complications , Esophagoscopy/methods , Tumor Suppressor Protein p53 , Esophageal Neoplasms/pathology , Microscopy, Confocal/methods , Biopsy , Hyperplasia , Biomarkers/analysis , Aneuploidy , Randomized Controlled Trials as Topic
4.
J Neurotrauma ; 38(4): 455-463, 2021 02 15.
Article En | MEDLINE | ID: mdl-33108942

Loss to follow-up and missing outcomes data are important issues for longitudinal observational studies and clinical trials in traumatic brain injury. One popular solution to missing 6-month outcomes has been to use the last observation carried forward (LOCF). The purpose of the current study was to compare the performance of model-based single-imputation methods with that of the LOCF approach. We hypothesized that model-based methods would perform better as they potentially make better use of available outcome data. The Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) study (n = 4509) included longitudinal outcome collection at 2 weeks, 3 months, 6 months, and 12 months post-injury; a total of 8185 Glasgow Outcome Scale extended (GOSe) observations were included in the database. We compared single imputation of 6-month outcomes using LOCF, a multiple imputation (MI) panel imputation, a mixed-effect model, a Gaussian process regression, and a multi-state model. Model performance was assessed via cross-validation on the subset of individuals with a valid GOSe value within 180 ± 14 days post-injury (n = 1083). All models were fit on the entire available data after removing the 180 ± 14 days post-injury observations from the respective test fold. The LOCF method showed lower accuracy (i.e., poorer agreement between imputed and observed values) than model-based methods of imputation, and showed a strong negative bias (i.e., it imputed lower than observed outcomes). Accuracy and bias for the model-based approaches were similar to one another, with the multi-state model having the best overall performance. All methods of imputation showed variation across different outcome categories, with better performance for more frequent outcomes. We conclude that model-based methods of single imputation have substantial performance advantages over LOCF, in addition to providing more complete outcome data.


Brain Injuries, Traumatic , Data Interpretation, Statistical , Models, Neurological , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Female , Glasgow Outcome Scale , Humans , Infant , Infant, Newborn , Male , Middle Aged , Prognosis , Recovery of Function/physiology , Research Design , Young Adult
5.
Ann Appl Stat ; 14(1): 74-93, 2020 Mar.
Article En | MEDLINE | ID: mdl-34992706

A prompt public health response to a new epidemic relies on the ability to monitor and predict its evolution in real time as data accumulate. The 2009 A/H1N1 outbreak in the UK revealed pandemic data as noisy, contaminated, potentially biased and originating from multiple sources. This seriously challenges the capacity for real-time monitoring. Here, we assess the feasibility of real-time inference based on such data by constructing an analytic tool combining an age-stratified SEIR transmission model with various observation models describing the data generation mechanisms. As batches of data become available, a sequential Monte Carlo (SMC) algorithm is developed to synthesise multiple imperfect data streams, iterate epidemic inferences and assess model adequacy amidst a rapidly evolving epidemic environment, substantially reducing computation time in comparison to standard MCMC, to ensure timely delivery of real-time epidemic assessments. In application to simulated data designed to mimic the 2009 A/H1N1 epidemic, SMC is shown to have additional benefits in terms of assessing predictive performance and coping with parameter nonidentifiability.

6.
Bioinformatics ; 36(5): 1484-1491, 2020 03 01.
Article En | MEDLINE | ID: mdl-31608923

MOTIVATION: Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. RESULTS: The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. AVAILABILITY AND IMPLEMENTATION: An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Algorithms , Single-Cell Analysis , Bayes Theorem , Cluster Analysis , Markov Chains
7.
United European Gastroenterol J ; 7(10): 1389-1398, 2019 12 01.
Article En | MEDLINE | ID: mdl-31807307

Background: Proton-pump inhibitors (PPIs) are the mainstay of gastroesophageal reflux disease (GERD) treatment, however, up to 30% of patients have a poor symptomatic response. PH-impedance is the gold standard to assess whether this is due to persistent acid reflux. We aimed to characterize clinical predictors of persistent esophageal acid reflux on PPIs including gastric pH measured during endoscopy. Methods: We prospectively recruited patients with GERD and/or Barrett's esophagus (BE) on PPIs. All patients completed a symptom questionnaire (RDQ) and underwent gastroscopy with gastric pH analysis, immediately followed by ambulatory 24-hour pH-impedance. We used a modified cut-off of 1.3% for pathological esophageal acid exposure time (AET). Multiple linear regression model was used to analyze the correlation between AET and predictive variables. Results: We recruited 122 patients, of which 92 (75.4%) were included in the final analysis [44 male (47.8%), median age 53 years (IQR: 43-66)]. Forty-four patients (47.8%) had persistent acid reflux with a median total AET of 2.2 (IQR1.2-5.0), as compared to 0.1 (IQR 0.0-0.2) in patients without persistent reflux (n=48; P<.001). There was no difference in age, gender, BMI, PPI-regimen, diagnosis of hiatus hernia or BE, and severity of symptoms between patients with normal and abnormal AET. Median gastric pH was significantly lower in patients with abnormal AET (5.8 vs 6.6, P=0.032) and it correlated with the total AET (P=.045; R2=12.0%). With a pH cut-off of 5.05, single point endoscopic gastric pH analysis had an area under the ROC curve (AUC) of 63.0% (95%CI 51.3-74.7) for prediction of pathological esophageal AET. Conclusions: Symptoms and clinical characteristics are not useful to predict persistent acid reflux in patients on PPIs. One-point gastric pH correlates with 24-hour esophageal AET and could guide clinicians to assess response to PPIs, however, its utility needs validation in larger studies.


Esophageal pH Monitoring , Gastroesophageal Reflux/diagnosis , Gastroesophageal Reflux/drug therapy , Gastroscopy , Proton Pump Inhibitors/therapeutic use , Adult , Aged , Barrett Esophagus/diagnosis , Biomarkers , Diagnosis, Differential , Esophageal pH Monitoring/methods , Gastroscopy/methods , Humans , Middle Aged , Prognosis , Symptom Assessment
8.
PLoS One ; 14(7): e0213221, 2019.
Article En | MEDLINE | ID: mdl-31335867

The copy numbers of genes in cancer samples are often highly disrupted and form a natural amplification/deletion experiment encompassing multiple genes. Matched array comparative genomics and transcriptomics datasets from such samples can be used to predict inter-chromosomal gene regulatory relationships. Previously we published the database METAMATCHED, comprising the results from such an analysis of a large number of publically available cancer datasets. Here we investigate genes in the database which are unusual in that their copy number exhibits consistent heterogeneous disruption in a high proportion of the cancer datasets. We assess the potential relevance of these genes to the pathology of the cancer samples, in light of their predicted regulatory relationships and enriched biological pathways. A network-based method was used to identify enriched pathways from the genes' inferred targets. The analysis predicts both known and new regulator-target interactions and pathway memberships. We examine examples in detail, in particular the gene POGZ, which is disrupted in many of the cancer datasets and has an unusually large number of predicted targets, from which the network analysis predicts membership of cancer related pathways. The results suggest close involvement in known cancer pathways of genes exhibiting consistent heterogeneous copy number disruption. Further experimental work would clarify their relevance to tumor biology. The results of the analysis presented in the database METAMATCHED, and included here as an R archive file, constitute a large number of predicted regulatory relationships and pathway memberships which we anticipate will be useful in informing such experiments.


Databases, Nucleic Acid , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Neoplasm Proteins , Neoplasms , Oncogenes , Transposases , Genomics , Humans , Neoplasm Proteins/biosynthesis , Neoplasm Proteins/genetics , Neoplasms/genetics , Neoplasms/metabolism , Transposases/biosynthesis , Transposases/genetics
9.
Int J Cancer ; 145(12): 3389-3401, 2019 12 15.
Article En | MEDLINE | ID: mdl-31050820

Cancers occurring at the gastroesophageal junction (GEJ) are classified as predominantly esophageal or gastric, which is often difficult to decipher. We hypothesized that the transcriptomic profile might reveal molecular subgroups which could help to define the tumor origin and behavior beyond anatomical location. The gene expression profiles of 107 treatment-naïve, intestinal type, gastroesophageal adenocarcinomas were assessed by the Illumina-HTv4.0 beadchip. Differential gene expression (limma), unsupervised subgroup assignment (mclust) and pathway analysis (gage) were undertaken in R statistical computing and results were related to demographic and clinical parameters. Unsupervised assignment of the gene expression profiles revealed three distinct molecular subgroups, which were not associated with anatomical location, tumor stage or grade (p > 0.05). Group 1 was enriched for pathways involved in cell turnover, Group 2 was enriched for metabolic processes and Group 3 for immune-response pathways. Patients in group 1 showed the worst overall survival (p = 0.019). Key genes for the three subtypes were confirmed by immunohistochemistry. The newly defined intrinsic subtypes were analyzed in four independent datasets of gastric and esophageal adenocarcinomas with transcriptomic data available (RNAseq data: OCCAMS cohort, n = 158; gene expression arrays: Belfast, n = 63; Singapore, n = 191; Asian Cancer Research Group, n = 300). The subgroups were represented in the independent cohorts and pooled analysis confirmed the prognostic effect of the new subtypes. In conclusion, adenocarcinomas at the GEJ comprise three distinct molecular phenotypes which do not reflect anatomical location but rather inform our understanding of the key pathways expressed.


Adenocarcinoma/genetics , Adenocarcinoma/pathology , Esophageal Neoplasms/genetics , Esophageal Neoplasms/pathology , Esophagogastric Junction/pathology , Stomach Neoplasms/genetics , Stomach Neoplasms/pathology , Transcriptome/genetics , Gene Expression Profiling/methods , Humans , Immunohistochemistry/methods , Phenotype , Prognosis , Prospective Studies
10.
Clin Transl Gastroenterol ; 10(4): e00014, 2019 04.
Article En | MEDLINE | ID: mdl-30985335

OBJECTIVES: Low-grade dysplasia (LGD) in Barrett's esophagus (BE) is generally inconspicuous on conventional and magnified endoscopy. Probe-based confocal laser endomicroscopy (pCLE) provides insight into gastro-intestinal mucosa at cellular resolution. We aimed to identify endomicroscopic features and develop pCLE diagnostic criteria for BE-related LGD. METHODS: This was a retrospective study on pCLE videos generated in 2 prospective studies. In phase I, 2 investigators assessed 30 videos to identify LGD endomicroscopic features, which were then validated in an independent video set (n = 25). Criteria with average accuracy >80% and interobserver agreement κ > 0.4 were taken forward. In phase II, 6 endoscopists evaluated the criteria in an independent video set (n = 57). The area under receiver operating characteristic curve was constructed to find the best cutoff. Sensitivity, specificity, interobserver, and intraobserver agreements were calculated. RESULTS: In phase I, 6 out of 8 criteria achieved the agreement and accuracy thresholds (i) dark nonround glands, (ii) irregular gland shape, (iii) lack of goblet cells, (iv) sharp cutoff of darkness, (v) variable cell size, and (vi) cellular stratification. The best cutoff for LGD diagnosis was 3 out of 6 positive criteria. In phase II, the diagnostic criteria had a sensitivity and specificity for LGD of 81.9% and 74.6%, respectively, with an area under receiver operating characteristic of 0.888. The interobserver agreement was substantial (κ = 0.654), and the mean intraobserver agreement was moderate (κ = 0.590). CONCLUSIONS: We have generated and validated pCLE criteria for LGD in BE. Using these criteria, pCLE diagnosis of LGD is reproducible and has a substantial interobserver agreement.


Barrett Esophagus/diagnostic imaging , Esophageal Mucosa/pathology , Esophageal Neoplasms/prevention & control , Esophagoscopy/methods , Barrett Esophagus/pathology , Biopsy , Esophageal Mucosa/diagnostic imaging , Esophageal Neoplasms/pathology , Esophagoscopy/standards , Humans , Microscopy, Confocal/methods , Microscopy, Confocal/standards , Observer Variation , Prospective Studies , ROC Curve , Reference Standards , Reproducibility of Results , Retrospective Studies , Video Recording
11.
Bayesian Anal ; 14(1): 81-109, 2019 Jan.
Article En | MEDLINE | ID: mdl-30631389

Analysing multiple evidence sources is often feasible only via a modular approach, with separate submodels specified for smaller components of the available evidence. Here we introduce a generic framework that enables fully Bayesian analysis in this setting. We propose a generic method for forming a suitable joint model when joining submodels, and a convenient computational algorithm for fitting this joint model in stages, rather than as a single, monolithic model. The approach also enables splitting of large joint models into smaller submodels, allowing inference for the original joint model to be conducted via our multi-stage algorithm. We motivate and demonstrate our approach through two examples: joining components of an evidence synthesis of A/H1N1 influenza, and splitting a large ecology model.

12.
EMBO J ; 38(1)2019 01 03.
Article En | MEDLINE | ID: mdl-30257965

An intricate link is becoming apparent between metabolism and cellular identities. Here, we explore the basis for such a link in an in vitro model for early mouse embryonic development: from naïve pluripotency to the specification of primordial germ cells (PGCs). Using single-cell RNA-seq with statistical modelling and modulation of energy metabolism, we demonstrate a functional role for oxidative mitochondrial metabolism in naïve pluripotency. We link mitochondrial tricarboxylic acid cycle activity to IDH2-mediated production of alpha-ketoglutarate and through it, the activity of key epigenetic regulators. Accordingly, this metabolite has a role in the maintenance of naïve pluripotency as well as in PGC differentiation, likely through preserving a particular histone methylation status underlying the transient state of developmental competence for the PGC fate. We reveal a link between energy metabolism and epigenetic control of cell state transitions during a developmental trajectory towards germ cell specification, and establish a paradigm for stabilizing fleeting cellular states through metabolic modulation.


Cell Differentiation/drug effects , Embryonic Stem Cells/drug effects , Germ Cells/drug effects , Ketoglutaric Acids/pharmacology , Pluripotent Stem Cells/drug effects , Animals , Cell Differentiation/genetics , Cells, Cultured , Embryo, Mammalian , Embryonic Stem Cells/physiology , Epigenesis, Genetic/drug effects , Epigenesis, Genetic/genetics , Female , Gene Expression Regulation, Developmental/drug effects , Germ Cells/physiology , Ketoglutaric Acids/metabolism , Male , Metabolic Networks and Pathways/drug effects , Metabolic Networks and Pathways/genetics , Mice , Mice, Inbred C57BL , Mice, Transgenic , Pluripotent Stem Cells/physiology
13.
Bioinformatics ; 35(4): 611-618, 2019 02 15.
Article En | MEDLINE | ID: mdl-30052778

MOTIVATION: A number of pseudotime methods have provided point estimates of the ordering of cells for scRNA-seq data. A still limited number of methods also model the uncertainty of the pseudotime estimate. However, there is still a need for a method to sample from complicated and multi-modal distributions of orders, and to estimate changes in the amount of the uncertainty of the order during the course of a biological development, as this can support the selection of suitable cells for the clustering of genes or for network inference. RESULTS: In applications to scRNA-seq data we demonstrate the potential of GPseudoRank to sample from complex and multi-modal posterior distributions and to identify phases of lower and higher pseudotime uncertainty during a biological process. GPseudoRank also correctly identifies cells precocious in their antiviral response and links uncertainty in the ordering to metastable states. A variant of the method extends the advantages of Bayesian modelling and MCMC to large droplet-based scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION: Our method is available on github: https://github.com/magStra/GPseudoRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Single-Cell Analysis , Software , Bayes Theorem , Cluster Analysis
14.
Bioinformatics ; 34(17): i1005-i1013, 2018 09 01.
Article En | MEDLINE | ID: mdl-30423108

Motivation: A common class of behaviour encountered in the biological sciences involves branching and recombination. During branching, a statistical process bifurcates resulting in two or more potentially correlated processes that may undergo further branching; the contrary is true during recombination, where two or more statistical processes converge. A key objective is to identify the time of this bifurcation (branch or recombination time) from time series measurements, e.g. by comparing a control time series with perturbed time series. Gaussian processes (GPs) represent an ideal framework for such analysis, allowing for nonlinear regression that includes a rigorous treatment of uncertainty. Currently, however, GP models only exist for two-branch systems. Here, we highlight how arbitrarily complex branching processes can be built using the correct composition of covariance functions within a GP framework, thus outlining a general framework for the treatment of branching and recombination in the form of branch-recombinant Gaussian processes (B-RGPs). Results: We first benchmark the performance of B-RGPs compared to a variety of existing regression approaches, and demonstrate robustness to model misspecification. B-RGPs are then used to investigate the branching patterns of Arabidopsis thaliana gene expression following inoculation with the hemibotrophic bacteria, Pseudomonas syringae DC3000, and a disarmed mutant strain, hrpA. By grouping genes according to the number of branches, we could naturally separate out genes involved in basal immune response from those subverted by the virulent strain, and show enrichment for targets of pathogen protein effectors. Finally, we identify two early branching genes WRKY11 and WRKY17, and show that genes that branched at similar times to WRKY11/17 were enriched for W-box binding motifs, and overrepresented for genes differentially expressed in WRKY11/17 knockouts, suggesting that branch time could be used for identifying direct and indirect binding targets of key transcription factors. Availability and implementation: https://github.com/cap76/BranchingGPs. Supplementary information: Supplementary data are available at Bioinformatics online.


Arabidopsis Proteins , Arabidopsis , Pseudomonas syringae , Transcription Factors , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Computational Biology , Pseudomonas syringae/genetics , Transcription Factors/metabolism
15.
Sci Rep ; 8(1): 12799, 2018 08 24.
Article En | MEDLINE | ID: mdl-30143660

Perinatal depression involves interplay between individual chronic and acute disease burdens, biological and psychosocial environmental and behavioural factors. Here we explored the predictive potential of specific psycho-socio-demographic characteristics for antenatal and postpartum depression symptoms and contribution to severity scores on the Edinburgh Postnatal Depression Scale (EPDS) screening tool. We determined depression risk trajectories in 480 women that prospectively completed the EPDS during pregnancy (TP1) and postpartum (TP2). Multinomial logistic and penalised linear regression investigated covariates associated with increased antenatal and postpartum EPDS scores contributing to the average or the difference of paired scores across time points. History of anxiety was identified as the strongest contribution to antenatal EPDS scores followed by the social status, whereas a history of depression, postpartum depression (PPD) and family history of PPD exhibited the strongest association with postpartum EPDS. These covariates were the strongest differentiating factors that increased the spread between antenatal and postpartum EPDS scores. Available covariates appeared better suited to predict EPDS scores antenatally than postpartum. As women move from the antenatal to the postpartum period, socio-demographic and lifestyle risk factors appear to play a smaller role in risk, and a personal and family history of depression and PPD become increasingly important.


Depression/psychology , Life Style , Peripartum Period/psychology , Female , Humans , Logistic Models , Predictive Value of Tests , Prospective Studies , ROC Curve , Risk , Risk Factors
16.
Brief Bioinform ; 19(1): 162-173, 2018 01 01.
Article En | MEDLINE | ID: mdl-27780826

Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish 'direct' versus 'indirect' TF interactions. We applied four algorithms to measure 'direct' dependence to a compendium of 367 mouse haematopoietic TF ChIP-seq samples and obtained a consensus network known as a 'TF association network' where edges in the network corresponded to likely causal pairwise relationships between TFs. The 'TF association network' illustrates the role of TFs in developmental pathways, is reminiscent of combinatorial TF regulation, corresponds to known protein-protein interactions and indicates substantial TF-binding reorganization in leukemic cell types. With the rapid increase in TF ChIP-Seq data sets, the approach presented here will be a powerful tool to study transcriptional programmes across a wide range of biological systems.


Computer Graphics , Gene Expression Regulation , Genome , Hematopoietic Stem Cells/metabolism , Leukemia/genetics , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites , Cells, Cultured , Chromatin Immunoprecipitation , Computational Biology/methods , Hematopoietic Stem Cells/cytology , Leukemia/pathology , Mice , Models, Statistical , Protein Binding , Transcription Factors/genetics
17.
PLoS Comput Biol ; 13(10): e1005781, 2017 Oct.
Article En | MEDLINE | ID: mdl-29036190

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.


Algorithms , Cluster Analysis , Computational Biology/methods , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Gene Expression Profiling , Humans , Survival Analysis
18.
BMC Genomics ; 18(1): 606, 2017 Aug 11.
Article En | MEDLINE | ID: mdl-28800724

BACKGROUND: Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. RESULTS: We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. CONCLUSIONS: With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological insights, which we illustrate with an example.


Machine Learning , Models, Statistical , Serotyping/methods , Streptococcus pneumoniae/genetics , Bayes Theorem , Oligonucleotide Array Sequence Analysis
19.
Nat Commun ; 8: 16058, 2017 07 13.
Article En | MEDLINE | ID: mdl-28703137

Linking non-coding genetic variants associated with the risk of diseases or disease-relevant traits to target genes is a crucial step to realize GWAS potential in the introduction of precision medicine. Here we set out to determine the mechanisms underpinning variant association with platelet quantitative traits using cell type-matched epigenomic data and promoter long-range interactions. We identify potential regulatory functions for 423 of 565 (75%) non-coding variants associated with platelet traits and we demonstrate, through ex vivo and proof of principle genome editing validation, that variants in super enhancers play an important role in controlling archetypical platelet functions.


Blood Platelets/physiology , Enhancer Elements, Genetic , Erythroblasts/chemistry , Genetic Variation , Megakaryocytes/chemistry , Chromatin , Humans , Promoter Regions, Genetic
20.
BMC Bioinformatics ; 17(1): 355, 2016 Sep 06.
Article En | MEDLINE | ID: mdl-27600248

BACKGROUND: Rapid technological innovation for the generation of single-cell genomics data presents new challenges and opportunities for bioinformatics analysis. One such area lies in the development of new ways to train gene regulatory networks. The use of single-cell expression profiling technique allows the profiling of the expression states of hundreds of cells, but these expression states are typically noisier due to the presence of technical artefacts such as drop-outs. While many algorithms exist to infer a gene regulatory network, very few of them are able to harness the extra expression states present in single-cell expression data without getting adversely affected by the substantial technical noise present. RESULTS: Here we introduce BTR, an algorithm for training asynchronous Boolean models with single-cell expression data using a novel Boolean state space scoring function. BTR is capable of refining existing Boolean models and reconstructing new Boolean models by improving the match between model prediction and expression data. We demonstrate that the Boolean scoring function performed favourably against the BIC scoring function for Bayesian networks. In addition, we show that BTR outperforms many other network inference algorithms in both bulk and single-cell synthetic expression data. Lastly, we introduce two case studies, in which we use BTR to improve published Boolean models in order to generate potentially new biological insights. CONCLUSIONS: BTR provides a novel way to refine or reconstruct Boolean models using single-cell expression data. Boolean model is particularly useful for network reconstruction using single-cell data because it is more robust to the effect of drop-outs. In addition, BTR does not assume any relationship in the expression states among cells, it is useful for reconstructing a gene regulatory network with as few assumptions as possible. Given the simplicity of Boolean models and the rapid adoption of single-cell genomics by biologists, BTR has the potential to make an impact across many fields of biomedical research.


Cells/chemistry , Computational Biology/methods , Algorithms , Animals , Bayes Theorem , Cells/cytology , Cells/metabolism , Gene Expression Profiling , Gene Regulatory Networks , Humans , Models, Genetic , Single-Cell Analysis
...