RESUMEN
BACKGROUND: Analytic tools to study important clinical issues in complex, chronic diseases such as Crohn's disease (CD) include randomized trials, claims database studies, or small longitudinal epidemiologic cohorts. Using natural language processing (NLP), we sought to define the computable phenotype health state of pediatric and adult CD and develop patient-level longitudinal histories for health outcomes. METHODS: We defined 6 health states for CD using a subjective symptom-based assessment (symptomatic/asymptomatic) and an objective disease state assessment (active/inactive/no testing). Gold standard for the 6 health states was derived using an iterative process during review by our CD experts. We calculated the transition probabilities to estimate the time to transitions between the various health states using nonparametric Kaplan-Meier estimation and a Markov model. Finally, we determined a standard utility measure from clinical patients assigned to different health states. RESULTS: The NLP computable phenotype health state model correctly ascertained the objective test results and symptoms 96% and 85% of the time, respectively, based on a blinded chart evaluation. In our model, >25% of patients who begin as asymptomatic/active transition to symptomatic/active over the following year. For both adult and pediatric CD health states, the utility assessments of a symptomatic/inactive health state closely resembled a symptomatic/active health state. CONCLUSIONS: Our methodology for a computable phenotype health state demonstrates the application of real-world data to define progression and optimal management of a chronic disease such as CD. The application of the model has the potential to lead to a better understanding of the true impact of a therapeutic intervention and can provide long-term cost-effectiveness analyses for a new therapy. HIGHLIGHTS: Using natural language processing, we defined the computable phenotype health state of Crohn's disease and developed patient-level longitudinal histories for health outcomes.Our methodology demonstrates the application of real-world data to define the progression of a chronic disease.The application of the model has the potential to provide better understanding of the true impact of a new therapy.