RESUMO
BACKGROUND: Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. RESULTS: We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E.coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics.intec.ugent.be. CONCLUSIONS: Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior knowledge by expert users but can also be applied in general uses cases. Concluding, we believe that Netter is an interesting second step in the network inference process to further increase the quality of prediction.
Assuntos
Algoritmos , Biologia Computacional/métodos , Escherichia coli/genética , Redes Reguladoras de Genes , Benchmarking , Regulação da Expressão Gênica , HumanosRESUMO
Several analytical techniques, i.e. spectroscopic techniques as Near Infrared (NIR) and Mid-Infrared (MIR), Hyper Spectral Imaging (HSI), Gas Chromatography coupled to Mass Spectrometry (GC-MS) and Proton-transfer Reaction Time-of-Flight Mass spectrometry (PTR-TOF-MS), combined with chemometrics, are examined to evaluate their potential to solve different food authenticity questions on the case of oregano. In total, 102 oregano samples from one harvest season were analyzed for origin and variety assessment, 159 samples for adulteration-assessment and 72 samples for batch-to-batch control. The Gaussian Process Latent Variable Model (GP-LVM) was applied as technique to obtain a reduced two-dimensional space. A Random Forest Regression algorithm was used as regression model for the adulteration assessment. Prediction rates of more than 89% could be achieved for origin assessment. For variety assessment, prediction rates of more than 78% could be obtained. Batch-to-batch control could be successfully performed with NIR and PTR-TOF-MS. Detection of adulteration could be successfully performed from 10% on with HSI, NIR and PTR-TOF-MS.
Assuntos
Origanum , Cromatografia Gasosa-Espectrometria de Massas , Alimentos , Algoritmos , QuimiometriaRESUMO
INTRODUCTION: Blood cultures are often performed in the intensive care unit (ICU) to detect bloodstream infections and identify pathogen type, further guiding treatment. Early detection is essential, as a bloodstream infection can give cause to sepsis, a severe immune response associated with an increased risk of organ failure and death. PROBLEM STATEMENT: The early clinical detection of a bloodstream infection is challenging but rapid targeted treatment, within the first place antimicrobials, substantially increases survival chances. As blood cultures require time to incubate, early clinical detection using physiological signals combined with indicative lab values is pivotal. OBJECTIVE: In this work, a novel method is constructed and explored for the potential prediction of the outcome of a blood culture test. The approach is based on a temporal computational model which uses nine clinical parameters measured over time. METHODOLOGY: We use a bidirectional long short-term memory neural network, a type of recurrent neural network well suited for tasks where the time lag between a predictive event and outcome is unknown. Evaluation is performed using a novel high-quality database consisting of 2177 ICU admissions at the Ghent University Hospital located in Belgium. RESULTS: The network achieves, on average, an area under the receiver operating characteristic curve of 0.99 and an area under the precision-recall curve of 0.82. In addition, our results show that predicting several hours upfront is possible with only a small decrease in predictive power. In this setting, it outperforms traditional non-temporal, machine learning models. CONCLUSION: Our proposed computational model accurately predicts the outcome of blood culture tests using nine clinical parameters. Moreover, it can be used in the ICU as an early warning system to detect patients at risk of blood stream infection.
Assuntos
Hemocultura , Unidades de Terapia Intensiva/organização & administração , Redes Neurais de Computação , Registros Eletrônicos de Saúde , Humanos , Memória de Curto PrazoRESUMO
MOTIVATION: Graphlets are small network patterns that can be counted in order to characterise the structure of a network (topology). As part of a topology optimisation process, one could use graphlet counts to iteratively modify a network and keep track of the graphlet counts, in order to achieve certain topological properties. Up until now, however, graphlets were not suited as a metric for performing topology optimisation; when millions of minor changes are made to the network structure it becomes computationally intractable to recalculate all the graphlet counts for each of the edge modifications. RESULTS: IncGraph is a method for calculating the differences in graphlet counts with respect to the network in its previous state, which is much more efficient than calculating the graphlet occurrences from scratch at every edge modification made. In comparison to static counting approaches, our findings show IncGraph reduces the execution time by several orders of magnitude. The usefulness of this approach was demonstrated by developing a graphlet-based metric to optimise gene regulatory networks. IncGraph is able to quickly quantify the topological impact of small changes to a network, which opens novel research opportunities to study changes in topologies in evolving or online networks, or develop graphlet-based criteria for topology optimisation. AVAILABILITY: IncGraph is freely available as an open-source R package on CRAN (incgraph). The development version is also available on GitHub (rcannood/incgraph).
Assuntos
Software , Algoritmos , Redes Reguladoras de Genes , Modelos BiológicosRESUMO
Sleep apnea is one of the most common sleep disorders. It is characterized by the cessation of breathing during sleep due to airway blockages (obstructive sleep apnea) or disturbances in the signals from the brain (central sleep apnea). The gold standard for diagnosing sleep apnea is performing an overnight polysomnography recording which contains, among others, a wide array of respiratory signals. Respiration information can also be extracted from other physiological signals such as an electrocardiogram or from a bio-impedance measurement on the chest. Studies have shown that algorithms can be developed for automated sleep apnea detection using one of these many respiratory signals. In this work, the predictive power of these different respiratory signals is analyzed and compared. The results provide useful insights into the comparative predictive power of the different respiratory signals in a realistic setting for automated sleep apnea detection and provide a basis for the development of less obtrusive measurement techniques.
Assuntos
Polissonografia , Síndromes da Apneia do Sono/diagnóstico , Adulto , Idoso , Algoritmos , Eletrocardiografia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Respiração , Apneia do Sono Tipo Central/diagnóstico , Apneia Obstrutiva do Sono/diagnósticoRESUMO
Bone age is an essential measure of skeletal maturity in children with growth disorders. It is typically assessed by a trained physician using radiographs of the hand and a reference model. However, it has been described that the reference models leave room for interpretation leading to a large inter-observer and intra-observer variation. In this work, we explore a novel method for automated bone age assessment to assist physicians with their estimation. It consists of a powerful combination of deep learning and Gaussian process regression. Using this combination, sensitivity of the deep learning model to rotations and flips of the input images can be exploited to increase overall predictive performance compared to only using the deep learning network. We validate our approach retrospectively on a set of 12611 radiographs of patients between 0 and 19 years of age.
Assuntos
Determinação da Idade pelo Esqueleto , Aprendizado Profundo , Ossos da Mão/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador , Adolescente , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Distribuição Normal , Variações Dependentes do Observador , Radiografia , Estudos Retrospectivos , Adulto JovemRESUMO
Predicting the bed occupancy of an intensive care unit (ICU) is a daunting task. The uncertainty associated with the prognosis of critically ill patients and the random arrival of new patients can lead to capacity problems and the need for reactive measures. In this paper, we work towards a predictive model based on Random Survival Forests which can assist physicians in estimating the bed occupancy. As input data, we make use of the Sequential Organ Failure Assessment (SOFA) score collected and calculated from 4098 patients at two ICU units of Ghent University Hospital over a time period of four years. We compare the performance of our system with a baseline performance and a standard Random Forest regression approach. Our results indicate that Random Survival Forests can effectively be used to assist in the occupancy prediction problem. Furthermore, we show that a group based approach, such as Random Survival Forests, performs better compared to a setting in which the length of stay of a patient is individually assessed.
Assuntos
Ocupação de Leitos , Cuidados Críticos/organização & administração , Estado Terminal/terapia , Unidades de Terapia Intensiva , Algoritmos , Simulação por Computador , Estado Terminal/mortalidade , Coleta de Dados , Bases de Dados Factuais , Hospitais , Humanos , Tempo de Internação , Aprendizado de Máquina , Informática Médica , Insuficiência de Múltiplos Órgãos/mortalidade , Insuficiência de Múltiplos Órgãos/terapia , Escores de Disfunção Orgânica , Admissão do Paciente , Análise de Regressão , Análise de SobrevidaRESUMO
INTRODUCTION: The length of stay of critically ill patients in the intensive care unit (ICU) is an indication of patient ICU resource usage and varies considerably. Planning of postoperative ICU admissions is important as ICUs often have no nonoccupied beds available. PROBLEM STATEMENT: Estimation of the ICU bed availability for the next coming days is entirely based on clinical judgement by intensivists and therefore too inaccurate. For this reason, predictive models have much potential for improving planning for ICU patient admission. OBJECTIVE: Our goal is to develop and optimize models for patient survival and ICU length of stay (LOS) based on monitored ICU patient data. Furthermore, these models are compared on their use of sequential organ failure (SOFA) scores as well as underlying raw data as input features. METHODOLOGY: Different machine learning techniques are trained, using a 14,480 patient dataset, both on SOFA scores as well as their underlying raw data values from the first five days after admission, in order to predict (i) the patient LOS, and (ii) the patient mortality. Furthermore, to help physicians in assessing the prediction credibility, a probabilistic model is tailored to the output of our best-performing model, assigning a belief to each patient status prediction. A two-by-two grid is built, using the classification outputs of the mortality and prolonged stay predictors to improve the patient LOS regression models. RESULTS: For predicting patient mortality and a prolonged stay, the best performing model is a support vector machine (SVM) with GA,D=65.9% (area under the curve (AUC) of 0.77) and GS,L=73.2% (AUC of 0.82). In terms of LOS regression, the best performing model is support vector regression, achieving a mean absolute error of 1.79 days and a median absolute error of 1.22 days for those patients surviving a nonprolonged stay. CONCLUSION: Using a classification grid based on the predicted patient mortality and prolonged stay, allows more accurate modeling of the patient LOS. The detailed models allow to support the decisions made by physicians in an ICU setting.
Assuntos
Estado Terminal/mortalidade , Tempo de Internação , Escores de Disfunção Orgânica , Máquina de Vetores de Suporte , Bélgica , Conjuntos de Dados como Assunto , Feminino , Humanos , Unidades de Terapia Intensiva , Masculino , Análise de SobrevidaRESUMO
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.