Búsqueda | Portal Regional de la BVS

Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models.

Mayhew, Michael B; Petersen, Brenden K; Sales, Ana Paula; Greene, John D; Liu, Vincent X; Wasson, Todd S.

J Biomed Inform ; 78: 33-42, 2018 02.

Artículo en Inglés | MEDLINE | ID: mdl-29196114

RESUMEN

The widespread adoption of electronic medical records (EMRs) in healthcare has provided vast new amounts of data for statistical machine learning researchers in their efforts to model and predict patient health status, potentially enabling novel advances in treatment. In the case of sepsis, a debilitating, dysregulated host response to infection, extracting subtle, uncataloged clinical phenotypes from the EMR with statistical machine learning methods has the potential to impact patient diagnosis and treatment early in the course of their hospitalization. However, there are significant barriers that must be overcome to extract these insights from EMR data. First, EMR datasets consist of both static and dynamic observations of discrete and continuous-valued variables, many of which may be missing, precluding the application of standard multivariate analysis techniques. Second, clinical populations observed via EMRs and relevant to the study and management of conditions like sepsis are often heterogeneous; properly accounting for this heterogeneity is critical. Here, we describe an unsupervised, probabilistic framework called a composite mixture model that can simultaneously accommodate the wide variety of observations frequently observed in EMR datasets, characterize heterogeneous clinical populations, and handle missing observations. We demonstrate the efficacy of our approach on a large-scale sepsis cohort, developing novel techniques built on our model-based clusters to track patient mortality risk over time and identify physiological trends and distinct subgroups of the dataset associated with elevated risk of mortality during hospitalization.

Asunto(s)

Registros Electrónicos de Salud/clasificación , Registros Electrónicos de Salud/estadística & datos numéricos , Modelos Estadísticos , Sepsis/diagnóstico , Sepsis/epidemiología , Análisis por Conglomerados , Bases de Datos Factuales , Humanos , Riesgo

Learning protein-DNA interaction landscapes by integrating experimental data through computational models.

Zhong, Jianling; Wasson, Todd; Hartemink, Alexander J.

Bioinformatics ; 30(20): 2868-74, 2014 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-24974204

RESUMEN

MOTIVATION: Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape. RESULTS: Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation. AVAILABILITY AND IMPLEMENTATION: The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/â¼amink. CONTACT: amink@cs.duke.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Modelos Biológicos , Unión Competitiva , ADN/genética , Regulación de la Expresión Génica , Nucleosomas/genética , Nucleosomas/metabolismo , Unión Proteica , Termodinámica , Factores de Transcripción/metabolismo , Transcripción Genética

An ensemble model of competitive multi-factor binding of the genome.

Wasson, Todd; Hartemink, Alexander J.

Genome Res ; 19(11): 2101-12, 2009 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-19720867

RESUMEN

Hundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models tend to consider positions as being either binding sites or not. Here, we present a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an "occupancy profile," a probabilistic description of the DNA occupancy of each factor at each position. We implement our model efficiently as the software package COMPETE. We demonstrate genome-wide and at specific loci how modeling nucleosome binding alters TF binding, and vice versa, and illustrate how factor concentration influences binding occupancy. Binding cooperativity between nearby TFs arises implicitly via mutual competition with nucleosomes. Our method applies not only to TFs, but also recapitulates known occupancy profiles of a well-studied replication origin with and without ORC binding. Importantly, the sequence preferences our model takes as input are derived from in vitro experiments. This ensures that the calculated occupancy profiles are the result of the forces of competition represented explicitly in our model and the inherent sequence affinities of the constituent DBFs.

Asunto(s)

Modelos Biológicos , Nucleosomas/metabolismo , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Unión Competitiva , Genoma/genética , Estudio de Asociación del Genoma Completo , Unión Proteica , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA