Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
PLoS One ; 18(11): e0290692, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37972008

RESUMEN

Disparities in healthcare access and utilization associated with demographic and socioeconomic status hinder advancement of health equity. Thus, we designed a novel equity-focused approach to quantify variations of healthcare access/utilization from the expectation in national target populations. We additionally applied survey-weighted logistic regression models, to identify factors associated with usage of a particular type of health care. To facilitate generation of analysis datasets, we built an National Health and Nutrition Examination Survey (NHANES) knowledge graph to help automate source-level dynamic analyses across different survey years and subjects' characteristics. We performed a cross-sectional subgroup disparity analysis of 2013-2018 NHANES on U.S. adults for receipt of diabetes treatments and vaccines against Hepatitis A (HAV), Hepatitis B (HBV), and Human Papilloma (HPV). Results show that in populations with hemoglobin A1c level ≥6%, patients with non-private insurance were less likely to receive newer and more beneficial antidiabetic medications; being Asian further exacerbated these disparities. For widely used drugs such as insulin, Asians experienced insignificant disparities in odds of prescription compared to White patients but received highly inadequate treatments with regard to their distribution in U.S. diabetic population. Vaccination rates were associated with some demographic/socioeconomic factors but not the others at different degrees for different diseases. For instance, while equity scores increase with rising education levels for HBV, they decrease with rising wealth levels for HPV. Among women vaccinated against HPV, minorities and poor communities usually received Cervarix while non-Hispanic White and higher-income groups received the more comprehensive Gardasil vaccine. Our study identified and quantified the impact of determinants of healthcare utilization for antidiabetic medications and vaccinations. Our new methods for semantics-aware disparity analysis of NHANES data could be readily generalized to other public health goals to support more rapid identification of disparities and development of policies, thus advancing health equity.


Asunto(s)
Hepatitis A , Infecciones por Papillomavirus , Adulto , Humanos , Femenino , Estados Unidos , Encuestas Nutricionales , Estudios Transversales , Infecciones por Papillomavirus/prevención & control , Factores Socioeconómicos , Accesibilidad a los Servicios de Salud , Disparidades en Atención de Salud , Hipoglucemiantes , Demografía
2.
J Biomed Semantics ; 14(1): 8, 2023 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-37464259

RESUMEN

BACKGROUND: Clinical decision support systems have been widely deployed to guide healthcare decisions on patient diagnosis, treatment choices, and patient management through evidence-based recommendations. These recommendations are typically derived from clinical practice guidelines created by clinical specialties or healthcare organizations. Although there have been many different technical approaches to encoding guideline recommendations into decision support systems, much of the previous work has not focused on enabling system generated recommendations through the formalization of changes in a guideline, the provenance of a recommendation, and applicability of the evidence. Prior work indicates that healthcare providers may not find that guideline-derived recommendations always meet their needs for reasons such as lack of relevance, transparency, time pressure, and applicability to their clinical practice. RESULTS: We introduce several semantic techniques that model diseases based on clinical practice guidelines, provenance of the guidelines, and the study cohorts they are based on to enhance the capabilities of clinical decision support systems. We have explored ways to enable clinical decision support systems with semantic technologies that can represent and link to details in related items from the scientific literature and quickly adapt to changing information from the guidelines, identifying gaps, and supporting personalized explanations. Previous semantics-driven clinical decision systems have limited support in all these aspects, and we present the ontologies and semantic web based software tools in three distinct areas that are unified using a standard set of ontologies and a custom-built knowledge graph framework: (i) guideline modeling to characterize diseases, (ii) guideline provenance to attach evidence to treatment decisions from authoritative sources, and (iii) study cohort modeling to identify relevant research publications for complicated patients. CONCLUSIONS: We have enhanced existing, evidence-based knowledge by developing ontologies and software that enables clinicians to conveniently access updates to and provenance of guidelines, as well as gather additional information from research studies applicable to their patients' unique circumstances. Our software solutions leverage many well-used existing biomedical ontologies and build upon decades of knowledge representation and reasoning work, leading to explainable results.


Asunto(s)
Ontologías Biológicas , Sistemas de Apoyo a Decisiones Clínicas , Humanos , Programas Informáticos , Bases del Conocimiento , Publicaciones
3.
Front Nutr ; 10: 1196520, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37305078

RESUMEN

Introduction and aims: Dietary Rational Gene Targeting (DRGT) is a therapeutic dietary strategy that uses healthy dietary agents to modulate the expression of disease-causing genes back toward the normal. Here we use the DRGT approach to (1) identify human studies assessing gene expression after ingestion of healthy dietary agents with an emphasis on whole foods, and (2) use this data to construct an online dietary guide app prototype toward eventually aiding patients, healthcare providers, community and researchers in treating and preventing numerous health conditions. Methods: We used the keywords "human", "gene expression" and separately, 51 different dietary agents with reported health benefits to search GEO, PubMed, Google Scholar, Clinical trials, Cochrane library, and EMBL-EBI databases for related studies. Studies meeting qualifying criteria were assessed for gene modulations. The R-Shiny platform was utilized to construct an interactive app called "Eat4Genes". Results: Fifty-one human ingestion studies (37 whole food related) and 96 key risk genes were identified. Human gene expression studies were found for 18 of 41 searched whole foods or extracts. App construction included the option to select either specific conditions/diseases or genes followed by food guide suggestions, key target genes, data sources and links, dietary suggestion rankings, bar chart or bubble chart visualization, optional full report, and nutrient categories. We also present user scenarios from physician and researcher perspectives. Conclusion: In conclusion, an interactive dietary guide app prototype has been constructed as a first step towards eventually translating our DRGT strategy into an innovative, low-cost, healthy, and readily translatable public resource to improve health.

4.
IEEE J Biomed Health Inform ; 27(2): 1084-1095, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36355718

RESUMEN

Randomized clinical trial (RCT) studies are the gold standard for scientific evidence on treatment benefits to patients. RCT outcomes may not be generalizable to clinical practice if the trial population is not representative of the patients for which the treatment is intended. Specifically, enrollment plans may not adequately include groups of patients with protected attributes, such as gender, race, or ethnicity. Inequities in RCTs are a major concern for funding agencies such as the National Institutes of Health (NIH) and for policy makers. We address this challenge by proposing a goal-programming approach, explicitly integrating measurable enrollment goals, to design equitable enrollment plans for RCTs. We evaluate our model in both single and multisite settings using the enrollment criteria and study population from the Systolic Blood Pressure Intervention Trial (SPRINT) study. Our model can successfully generate equitable enrollment plans that satisfy multiple goals such as sample representativeness and minimum total financial cost. Our model can detect deviations from a target plan during the enrollment process and update the plan to reduce deviations in the remaining process. Finally, through appropriate site selection in the planning stage, the model can demonstrate the possibility of enrolling a nationally representative study population if geographic constraints exist in multisite recruitment (e.g., clinical centers in a particular region). Our model can be used to prospectively produce and retrospectively evaluate how equitable enrollment plans are based on subjects' protected attributes, and it allows researchers to provide justifications on validity of scientific analysis and evaluation of subgroup disparities.


Asunto(s)
Objetivos , Proyectos de Investigación , Humanos
5.
AMIA Annu Symp Proc ; 2023: 530-539, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38222411

RESUMEN

Randomized Clinical Trials (RCTs) measure an intervention's efficacy, but they may not be generalizable to a desired target population if the RCT is not equitable. Thus, representativeness of RCTs has become a national priority. Synthetic Controls (SCs) that incorporate observational data into RCTs have shown great potential to produce more efficient studies, but their equity is rarely considered. Here, we examine how to improve treatment effect estimation and equity of a trial by augmenting "on-trial" concurrent controls with SCs to form a Hybrid Control Arm (HCA). We introduce FRESCA - a framework to evaluate HCA construction methods using RCT simulations. FRESCA shows that doing propensity and equity adjustment when constructing the HCA leads to accurate population treatment effect estimates while meeting equity goals with potentially less "on-trial" patients. This work represents the first investigation of equity in HCA design that provides definitions, metrics, compelling questions, and resources for future work.

6.
G3 (Bethesda) ; 12(9)2022 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-35876788

RESUMEN

Circadian rhythms broadly regulate physiological functions by tuning oscillations in the levels of mRNAs and proteins to the 24-h day/night cycle. Globally assessing which mRNAs and proteins are timed by the clock necessitates accurate recognition of oscillations in RNA and protein data, particularly in large omics data sets. Tools that employ fixed-amplitude models have previously been used to positive effect. However, the recognition of amplitude change in circadian oscillations required a new generation of analytical software to enhance the identification of these oscillations. To address this gap, we created the Pipeline for Amplitude Integration of Circadian Exploration suite. Here, we demonstrate the Pipeline for Amplitude Integration of Circadian Exploration suite's increased utility to detect circadian trends through the joint modeling of the Mus musculus macrophage transcriptome and proteome. Our enhanced detection confirmed extensive circadian posttranscriptional regulation in macrophages but highlighted that some of the reported discrepancy between mRNA and protein oscillations was due to noise in data. We further applied the Pipeline for Amplitude Integration of Circadian Exploration suite to investigate the circadian timing of noncoding RNAs, documenting extensive circadian timing of long noncoding RNAs and small nuclear RNAs, which control the recognition of mRNA in the spliceosome complex. By tracking oscillating spliceosome complex proteins using the PAICE suite, we noted that the clock broadly regulates the spliceosome, particularly the major spliceosome complex. As most of the above-noted rhythms had damped amplitude changes in their oscillations, this work highlights the importance of the PAICE suite in the thorough enumeration of oscillations in omics-scale datasets.


Asunto(s)
Relojes Circadianos , Empalmosomas , Animales , Relojes Circadianos/genética , Ritmo Circadiano/genética , Regulación de la Expresión Génica , Macrófagos/metabolismo , Ratones , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN no Traducido , Empalmosomas/genética , Empalmosomas/metabolismo
7.
Entropy (Basel) ; 23(9)2021 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-34573790

RESUMEN

Access to healthcare data such as electronic health records (EHR) is often restricted by laws established to protect patient privacy. These restrictions hinder the reproducibility of existing results based on private healthcare data and also limit new research. Synthetically-generated healthcare data solve this problem by preserving privacy and enabling researchers and policymakers to drive decisions and methods based on realistic data. Healthcare data can include information about multiple in- and out- patient visits of patients, making it a time-series dataset which is often influenced by protected attributes like age, gender, race etc. The COVID-19 pandemic has exacerbated health inequities, with certain subgroups experiencing poorer outcomes and less access to healthcare. To combat these inequities, synthetic data must "fairly" represent diverse minority subgroups such that the conclusions drawn on synthetic data are correct and the results can be generalized to real data. In this article, we develop two fairness metrics for synthetic data, and analyze all subgroups defined by protected attributes to analyze the bias in three published synthetic research datasets. These covariate-level disparity metrics revealed that synthetic data may not be representative at the univariate and multivariate subgroup-levels and thus, fairness should be addressed when developing data generation methods. We discuss the need for measuring fairness in synthetic healthcare data to enable the development of robust machine learning models to create more equitable synthetic healthcare datasets.

8.
JAMIA Open ; 4(3): ooab077, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34568771

RESUMEN

OBJECTIVE: We help identify subpopulations underrepresented in randomized clinical trials (RCTs) cohorts with respect to national, community-based or health system target populations by formulating population representativeness of RCTs as a machine learning (ML) fairness problem, deriving new representation metrics, and deploying them in easy-to-understand interactive visualization tools. MATERIALS AND METHODS: We represent RCT cohort enrollment as random binary classification fairness problems, and then show how ML fairness metrics based on enrollment fraction can be efficiently calculated using easily computed rates of subpopulations in RCT cohorts and target populations. We propose standardized versions of these metrics and deploy them in an interactive tool to analyze 3 RCTs with respect to type 2 diabetes and hypertension target populations in the National Health and Nutrition Examination Survey. RESULTS: We demonstrate how the proposed metrics and associated statistics enable users to rapidly examine representativeness of all subpopulations in the RCT defined by a set of categorical traits (eg, gender, race, ethnicity, smoking status, and blood pressure) with respect to target populations. DISCUSSION: The normalized metrics provide an intuitive standardized scale for evaluating representation across subgroups, which may have vastly different enrollment fractions and rates in RCT study cohorts. The metrics are beneficial complements to other approaches (eg, enrollment fractions) used to identify generalizability and health equity of RCTs. CONCLUSION: By quantifying the gaps between RCT and target populations, the proposed methods can support generalizability evaluation of existing RCT cohorts. The interactive visualization tool can be readily applied to identified underrepresented subgroups with respect to any desired source or target populations.

9.
AMIA Jt Summits Transl Sci Proc ; 2021: 555-564, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34457171

RESUMEN

In this exploratory study, we scrutinize a database of over one million tweets collected from March to July 2020 to illustrate public attitudes towards mask usage during the COVID-19 pandemic. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for each theme using automatic text summarization. In recent months, a body of literature has highlighted the robustness of trends in online activity as proxies for the sociological impact of COVID-19. We find that topic clustering based on mask-related Twitter data offers revealing insights into societal perceptions of COVID- 19 and techniques for its prevention. We observe that the volume and polarity of mask-related tweets has greatly increased. Importantly, the analysis pipeline presented may be leveraged by the health community for qualitative assessment of public response to health intervention techniques in real time.


Asunto(s)
COVID-19 , Medios de Comunicación Sociales , Humanos , Máscaras , Procesamiento de Lenguaje Natural , Pandemias , SARS-CoV-2
10.
Bioinformatics ; 37(6): 767-774, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-33051654

RESUMEN

MOTIVATION: Circadian rhythms are approximately 24-h endogenous cycles that control many biological functions. To identify these rhythms, biological samples are taken over circadian time and analyzed using a single omics type, such as transcriptomics or proteomics. By comparing data from these single omics approaches, it has been shown that transcriptional rhythms are not necessarily conserved at the protein level, implying extensive circadian post-transcriptional regulation. However, as proteomics methods are known to be noisier than transcriptomic methods, this suggests that previously identified arrhythmic proteins with rhythmic transcripts could have been missed due to noise and may not be due to post-transcriptional regulation. RESULTS: To determine if one can use information from less-noisy transcriptomic data to inform rhythms in more-noisy proteomic data, and thus more accurately identify rhythms in the proteome, we have created the Multi-Omics Selection with Amplitude Independent Criteria (MOSAIC) application. MOSAIC combines model selection and joint modeling of multiple omics types to recover significant circadian and non-circadian trends. Using both synthetic data and proteomic data from Neurospora crassa, we showed that MOSAIC accurately recovers circadian rhythms at higher rates in not only the proteome but the transcriptome as well, outperforming existing methods for rhythm identification. In addition, by quantifying non-circadian trends in addition to circadian trends in data, our methodology allowed for the recognition of the diversity of circadian regulation as compared to non-circadian regulation. AVAILABILITY AND IMPLEMENTATION: MOSAIC's full interface is available at https://github.com/delosh653/MOSAIC. An R package for this functionality, mosaic.find, can be downloaded at https://CRAN.R-project.org/package=mosaic.find. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neurospora crassa , Proteómica , Ritmo Circadiano/genética , Neurospora crassa/genética , Proteoma , Transcriptoma
11.
Methods ; 179: 101-110, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32446958

RESUMEN

We propose a machine learning driven approach to derive insights from observational healthcare data to improve public health outcomes. Our goal is to simultaneously identify patient subpopulations with differing health risks and to find those risk factors within each subpopulation. We develop two supervised mixture of experts models: a Supervised Gaussian Mixture model (SGMM) for general features and a Supervised Bernoulli Mixture model (SBMM) tailored to binary features. We demonstrate the two approaches on an analysis of high cost drivers of Medicaid expenditures for inpatient stays. We focus on the three diagnostic categories that accounted for the highest percentage of inpatient expenditures in New York State (NYS) in 2016. When compared with state-of-the-art learning methods (random forests, boosting, neural networks), our approaches provide comparable prediction performance while also extracting insightful subpopulation structure and risk factors. For problems with binary features the proposed SBMM provides as good or better performance than alternative methods while offering insightful explanations. Our results indicate the promise of such approaches for extracting population health insights from electronic health care records.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Informática Médica/métodos , Salud Poblacional/estadística & datos numéricos , Aprendizaje Automático Supervisado , Registros Electrónicos de Salud/estadística & datos numéricos , Humanos , Distribución Normal
12.
IEEE J Biomed Health Inform ; 24(3): 916-925, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31107669

RESUMEN

We consider the problem in precision health of grouping people into subpopulations based on their degree of vulnerability to a risk factor. These subpopulations cannot be discovered with traditional clustering techniques because their quality is evaluated with a supervised metric: The ease of modeling a response variable for observations within them. Instead, we apply the more appropriate supervised cadre model (SCM). We extend the SCM formalism so that it may be applied to multivariate regression and binary classification problems and develop a way to use conditional entropy to assess the confidence in the process by which a subject is assigned their cadre. Using the SCM, we generalize the environment-wide association study (EWAS) to be able to model heterogeneity in population risk. In our EWAS, we consider more than 200 environmental exposure factors and find their association with diastolic blood pressure, systolic blood pressure, and hypertension. This requires adapting the SCM to be applicable to data generated by a complex survey design. After correcting for false positives, we found 25 exposure variables that had a significant association with at least one of our response variables. Eight of these were significant for a discovered subpopulation but not for the overall population. Some of these associations have been identified by previous researchers, whereas others appear to be novel. We examine discovered subpopulations in detail, finding that they are interpretable and suggestive of further research questions.


Asunto(s)
Biología Computacional/métodos , Hipertensión/epidemiología , Modelos Estadísticos , Aprendizaje Automático Supervisado , Macrodatos , Ambiente , Humanos , Descubrimiento del Conocimiento , Encuestas Nutricionales , Factores de Riesgo
13.
Bioinformatics ; 36(3): 773-781, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31384918

RESUMEN

MOTIVATION: Time courses utilizing genome scale data are a common approach to identifying the biological pathways that are controlled by the circadian clock, an important regulator of organismal fitness. However, the methods used to detect circadian oscillations in these datasets are not able to accommodate changes in the amplitude of the oscillations over time, leading to an underestimation of the impact of the clock on biological systems. RESULTS: We have created a program to efficaciously identify oscillations in large-scale datasets, called the Extended Circadian Harmonic Oscillator application, or ECHO. ECHO utilizes an extended solution of the fixed amplitude oscillator that incorporates the amplitude change coefficient. Employing synthetic datasets, we determined that ECHO outperforms existing methods in detecting rhythms with decreasing oscillation amplitudes and in recovering phase shift. Rhythms with changing amplitudes identified from published biological datasets revealed distinct functions from those oscillations that were harmonic, suggesting purposeful biologic regulation to create this subtype of circadian rhythms. AVAILABILITY AND IMPLEMENTATION: ECHO's full interface is available at https://github.com/delosh653/ECHO. An R package for this functionality, echo.find, can be downloaded at https://CRAN.R-project.org/package=echo.find. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Relojes Circadianos , Ritmo Circadiano
14.
ACM BCB ; 2019: 5-14, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31754663

RESUMEN

Circadian rhythms are 24-hour biological cycles that control daily molecular rhythms in many organisms. The cellular elements that fall under the regulation of the clock are often studied through the use of omics-scale data sets gathered over time to determine how circadian regulation impacts cellular physiology. Previously, we created the ECHO (Extended Circadian Harmonic Oscillator) tool to identify rhythms in these data sets. Using ECHO, we found that circadian oscillations widely undergo a change in amplitude over time and that these amplitude changes have a biological function in the cell. However, ECHO does not align gene ontologies with the identified oscillating genes to give functional context. Thus, we created ENCORE (ECHO Native Circadian Ontological Rhythmicity Explorer), a novel visualization tool which combines the disparate databases of Gene Ontologies, protein-protein interactions, and auxiliary information to uncover the meaning of circadianly-regulated genes. This freely-available tool performs automatic enrichment and creates publication-worthy visualizations which we used to extend previously-gathered data on circadian regulation of physiology from published omics-scale studies in three circadian model organisms: mouse, fruit fly, and Neurospora crassa.

15.
Sci Rep ; 9(1): 2740, 2019 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-30809014

RESUMEN

Increased understanding of developmental disorders of the brain has shown that genetic mutations, environmental toxins and biological insults typically act during developmental windows of susceptibility. Identifying these vulnerable periods is a necessary and vital step for safeguarding women and their fetuses against disease causing agents during pregnancy and for developing timely interventions and treatments for neurodevelopmental disorders. We analyzed developmental time-course gene expression data derived from human pluripotent stem cells, with disease association, pathway, and protein interaction databases to identify windows of disease susceptibility during development and the time periods for productive interventions. The results are displayed as interactive Susceptibility Windows Ontological Transcriptome (SWOT) Clocks illustrating disease susceptibility over developmental time. Using this method, we determine the likely windows of susceptibility for multiple neurological disorders using known disease associated genes and genes derived from RNA-sequencing studies including autism spectrum disorder, schizophrenia, and Zika virus induced microcephaly. SWOT clocks provide a valuable tool for integrating data from multiple databases in a developmental context with data generated from next-generation sequencing to help identify windows of susceptibility.


Asunto(s)
Trastorno del Espectro Autista/patología , Discapacidades del Desarrollo/patología , Regulación del Desarrollo de la Expresión Génica , Predisposición Genética a la Enfermedad , Células Madre Pluripotentes/citología , Esquizofrenia/patología , Transcriptoma , Trastorno del Espectro Autista/genética , Encéfalo/metabolismo , Encéfalo/patología , Encéfalo/virología , Niño , Discapacidades del Desarrollo/genética , Femenino , Pruebas Genéticas , Humanos , Células Madre Pluripotentes/metabolismo , Embarazo , Esquizofrenia/genética , Virus Zika/aislamiento & purificación , Infección por el Virus Zika/complicaciones , Infección por el Virus Zika/virología
16.
ACM BCB ; 2017: 455-463, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-31844846

RESUMEN

Circadian rhythms are endogenous cycles of approximately 24 hours reinforced by external cues such as light. These cycles are typically modeled as harmonic oscillators with fixed amplitude peaks. Using experimental data measuring global gene transcription in Neurospora crassa over 48 hours in the dark (i.e. with external queues removed), we demonstrate that many circadian genes frequently exhibit either damped harmonic oscillations, in which the peak amplitudes decrease each day, or driven harmonic oscillations, in which the peak amplitudes increase each day. By fitting extended harmonic oscillator models which include a damping ratio coefficient, we detected additional circadian genes that were not identified by the current standard tools that use fixed amplitude waves as reference, e.g. JTK_CYCLE. Functional Catalogue analysis confirms that our identified damped or driven genes exhibit distinct biological functions. The application of extended damped/driven harmonic oscillator models thus can elucidate, not only previously unidentified circadian genes, but also characterize gene subsets with expression patterns of biological relevance. Thus, expanded harmonic oscillators provide a powerful new tool for circadian system biology.

17.
Big Data ; 3(4): 238-48, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27441405

RESUMEN

Electronic Healthcare Records (EHRs) have the potential to improve healthcare quality and to decrease costs by providing quality metrics, discovering actionable insights, and supporting decision-making to improve future outcomes. Within the United States Medicaid Program, rates of recidivism among emergency department (ED) patients serve as metrics of hospital performance that help ensure efficient and effective treatment within the ED. We analyze ED Medicaid patient data from 1,149,738 EHRs provided by a hospital over a 2-year period to understand the characteristics of the ED return visits within a 72-hour time frame. Frequent flyer patients with multiple revisits account for 47% of Medicaid patient revisits over this period. ED encounters by frequent flyer patients with prior 72-hour revisits in the last 6 months are thrice more likely to result in a readmit than those of infrequent patients. Statistical L1-logistic regression and random forest analyses reveal distinct patterns of ED usage and patient diagnoses between frequent and infrequent patient encounters, suggesting distinct opportunities for interventions to improve efficacy of care and streamline ED workflow. This work forms a foundation for future development of predictive models, which could flag patients at high risk of revisiting.

18.
Biomed Res Int ; 2014: 398484, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24864238

RESUMEN

We develop a novel approach for incorporating expert rules into Bayesian networks for classification of Mycobacterium tuberculosis complex (MTBC) clades. The proposed knowledge-based Bayesian network (KBBN) treats sets of expert rules as prior distributions on the classes. Unlike prior knowledge-based support vector machine approaches which require rules expressed as polyhedral sets, KBBN directly incorporates the rules without any modification. KBBN uses data to refine rule-based classifiers when the rule set is incomplete or ambiguous. We develop a predictive KBBN model for 69 MTBC clades found in the SITVIT international collection. We validate the approach using two testbeds that model knowledge of the MTBC obtained from two different experts and large DNA fingerprint databases to predict MTBC genetic clades and sublineages. These models represent strains of MTBC using high-throughput biomarkers called spacer oligonucleotide types (spoligotypes), since these are routinely gathered from MTBC isolates of tuberculosis (TB) patients. Results show that incorporating rules into problems can drastically increase classification accuracy if data alone are insufficient. The SITVIT KBBN is publicly available for use on the World Wide Web.


Asunto(s)
Bases del Conocimiento , Mycobacterium tuberculosis/clasificación , Teorema de Bayes , Bases de Datos como Asunto , Reproducibilidad de los Resultados
19.
J Chem Inf Model ; 53(12): 3352-66, 2013 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-24261543

RESUMEN

Computational methods that can identify CYP-mediated sites of metabolism (SOMs) of drug-like compounds have become required tools for early stage lead optimization. In recent years, methods that combine CYP binding site features with CYP/ligand binding information have been sought in order to increase the prediction accuracy of such hybrid models over those that use only one representation. Two challenges that any hybrid ligand/structure-based method must overcome are (1) identification of the best binding pose for a specific ligand with a given CYP and (2) appropriately incorporating the results of docking with ligand reactivity. To address these challenges we have created Docking-Regioselectivity-Predictor (DR-Predictor)--a method that incorporates flexible docking-derived information with specialized electronic reactivity and multiple-instance-learning methods to predict CYP-mediated SOMs. In this study, the hybrid ligand-structure-based DR-Predictor method was tested on substrate sets for CYP 1A2 and CYP 2A6. For these data, the DR-Predictor model was found to identify the experimentally observed SOM within the top two predicted rank-positions for 86% of the 261 1A2 substrates and 83% of the 100 2A6 substrates. Given the accuracy and extendibility of the DR-Predictor method, we anticipate that it will further facilitate the prediction of CYP metabolism liabilities and aid in in-silico ADMET assessment of novel structures.


Asunto(s)
Inteligencia Artificial , Hidrocarburo de Aril Hidroxilasas/química , Citocromo P-450 CYP1A2/química , Simulación del Acoplamiento Molecular , Bibliotecas de Moléculas Pequeñas/química , Hidrocarburo de Aril Hidroxilasas/metabolismo , Biotransformación , Dominio Catalítico , Citocromo P-450 CYP1A2/metabolismo , Citocromo P-450 CYP2A6 , Humanos , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Ligandos , Unión Proteica , Bibliotecas de Moléculas Pequeñas/metabolismo , Relación Estructura-Actividad , Especificidad por Sustrato , Termodinámica
20.
IEEE Trans Nanobioscience ; 11(3): 191-202, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22987125

RESUMEN

Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and mycobacterium interspersed repetitive unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes. We use a large patient dataset from the United States Centers for Disease Control and Prevention (CDC) to generate this model. Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligoforest, using three genetic distance measures. An analysis of topological attributes of the spoligoforest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Moreover, we find that the total number of mutation events at consecutive spacers follows a spatially bimodal distribution. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements, and the change point is spacer 34, which is absent in most MTBC strains. Based on this observation, we built two alternative models for mutation length frequency: the Starting Point Model (SPM) and the Longest Block Model (LBM). Both models are plausibly good fits to the mutation length frequency distribution, as verified by the goodness-of-fit test. We also apply SPM and LBM to a dataset from Institut Pasteur de Guadeloupe and verify that these models hold for different strain datasets.


Asunto(s)
Genes Bacterianos , Secuencias Repetitivas Esparcidas/genética , Modelos Genéticos , Mutación , Mycobacterium tuberculosis/genética , Algoritmos , Técnicas de Tipificación Bacteriana , ADN Bacteriano/análisis , ADN Bacteriano/química , Bases de Datos Genéticas , Evolución Molecular , Marcadores Genéticos , Mycobacterium tuberculosis/clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA