RESUMEN
Large language models (LLMs) have shown promise for task-oriented dialogue across a range of domains. The use of LLMs in health and fitness coaching is under-explored. Behavior science frameworks such as COM-B, which conceptualizes behavior change in terms of capability (C), Opportunity (O) and Motivation (M), can be used to architect coaching interventions in a way that promotes sustained change. Here we aim to incorporate behavior science principles into an LLM using two knowledge infusion techniques: coach message priming (where exemplar coach responses are provided as context to the LLM), and dialogue re-ranking (where the COM-B category of the LLM output is matched to the inferred user need). Simulated conversations were conducted between the primed or unprimed LLM and a member of the research team, and then evaluated by 8 human raters. Ratings for the primed conversations were significantly higher in terms of empathy and actionability. The same raters also compared a single response generated by the unprimed, primed and re-ranked models, finding a significant uplift in actionability and empathy from the re-ranking technique. This is a proof of concept of how behavior science frameworks can be infused into automated conversational agents for a more principled coaching experience.
RESUMEN
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
Asunto(s)
Benchmarking , Simulación por Computador , Conocimiento , Medicina , Procesamiento de Lenguaje Natural , Sesgo , Competencia Clínica , Comprensión , Conjuntos de Datos como Asunto , Concesión de Licencias , Medicina/métodos , Medicina/normas , Seguridad del Paciente , MédicosRESUMEN
OBJECTIVES: Identifying modifiable risk factors associated with central line-associated bloodstream infections (CLABSIs) may lead to modifications to central line (CL) management. We hypothesize that the number of CL accesses per day is associated with an increased risk for CLABSI and that a significant fraction of CL access may be substituted with non-CL routes. DESIGN: We conducted a retrospective cohort study of patients with at least one CL device day from January 1, 2015, to December 31, 2019. A multivariate mixed-effects logistic regression model was used to estimate the association between the number of CL accesses in a given CL device day and prevalence of CLABSI within the following 3 days. SETTING: A 395-bed pediatric academic medical center. PATIENTS: Patients with at least one CL device day from January 1, 2015, to December 31, 2019. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: There were 138,411 eligible CL device days across 6,543 patients, with 639 device days within 3 days of a CLABSI (a total of 217 CLABSIs). The number of per-day CL accesses was independently associated with risk of CLABSI in the next 3 days (adjusted odds ratio, 1.007; 95% CI, 1.003-1.012; p = 0.002). Of medications administered through CLs, 88% were candidates for delivery through a peripheral line. On average, these accesses contributed a 6.3% increase in daily risk for CLABSI. CONCLUSIONS: The number of daily CL accesses is independently associated with risk of CLABSI in the next 3 days. In the pediatric population examined, most medications delivered through CLs could be safely administered peripherally. Efforts to reduce CL access may be an important strategy to include in contemporary CLABSI-prevention bundles.
Asunto(s)
Bacteriemia , Infecciones Relacionadas con Catéteres , Cateterismo Venoso Central , Catéteres Venosos Centrales , Humanos , Niño , Infecciones Relacionadas con Catéteres/etiología , Estudios Retrospectivos , Cateterismo Venoso Central/efectos adversos , Bacteriemia/epidemiología , Bacteriemia/etiología , Catéteres Venosos Centrales/efectos adversosRESUMEN
Despite the rapid growth of wearables as a consumer technology sector and a growing evidence base supporting their use, they have been slow to be adopted by the health system into clinical care. As regulatory, reimbursement, and technical barriers recede, a persistent challenge remains how to make wearable data actionable for clinicians-transforming disconnected grains of wearable data into meaningful clinical "pearls". In order to bridge this adoption gap, wearable data must become visible, interpretable, and actionable for the clinician. We showcase emerging trends and best practices that illustrate these 3 pillars, and offer some recommendations on how the ecosystem can move forward.
Asunto(s)
Dispositivos Electrónicos Vestibles , Humanos , Arena , EcosistemaRESUMEN
OBJECTIVES: Few machine learning (ML) models are successfully deployed in clinical practice. One of the common pitfalls across the field is inappropriate problem formulation: designing ML to fit the data rather than to address a real-world clinical pain point. METHODS: We introduce a practical toolkit for user-centred design consisting of four questions covering: (1) solvable pain points, (2) the unique value of ML (eg, automation and augmentation), (3) the actionability pathway and (4) the model's reward function. This toolkit was implemented in a series of six participatory design workshops with care managers in an academic medical centre. RESULTS: Pain points amenable to ML solutions included outpatient risk stratification and risk factor identification. The endpoint definitions, triggering frequency and evaluation metrics of the proposed risk scoring model were directly influenced by care manager workflows and real-world constraints. CONCLUSIONS: Integrating user-centred design early in the ML life cycle is key for configuring models in a clinically actionable way. This toolkit can guide problem selection and influence choices about the technical setup of the ML problem.
Asunto(s)
Aprendizaje Automático , Diseño Centrado en el Usuario , Atención a la Salud , Humanos , Dolor , Flujo de TrabajoRESUMEN
Background: Explicit documentation of stage is an endorsed quality metric by the National Quality Forum. Clinical and pathological cancer staging is inconsistently recorded within clinical narratives but can be derived from text in the Electronic Health Record (EHR). To address this need, we developed a Natural Language Processing (NLP) solution for extraction of clinical and pathological TNM stages from the clinical notes in prostate cancer patients. Methods: Data for patients diagnosed with prostate cancer between 2010 and 2018 were collected from a tertiary care academic healthcare system's EHR records in the United States. This system is linked to the California Cancer Registry, and contains data on diagnosis, histology, cancer stage, treatment and outcomes. A randomly selected sample of patients were manually annotated for stage to establish the ground truth for training and validating the NLP methods. For each patient, a vector representation of clinical text (written in English) was used to train a machine learning model alongside a rule-based model and compared with the ground truth. Results: A total of 5,461 prostate cancer patients were identified in the clinical data warehouse and over 30% were missing stage information. Thirty-three to thirty-six percent of patients were missing a clinical stage and the models accurately imputed the stage in 21-32% of cases. Twenty-one percent had a missing pathological stage and using NLP 71% of missing T stages and 56% of missing N stages were imputed. For both clinical and pathological T and N stages, the rule-based NLP approach out-performed the ML approach with a minimum F1 score of 0.71 and 0.40, respectively. For clinical M stage the ML approach out-performed the rule-based model with a minimum F1 score of 0.79 and 0.88, respectively. Conclusions: We developed an NLP pipeline to successfully extract clinical and pathological staging information from clinical narratives. Our results can serve as a proof of concept for using NLP to augment clinical and pathological stage reporting in cancer registries and EHRs to enhance the secondary use of these data.
RESUMEN
INTRODUCTION: Central line-associated bloodstream infections (CLABSIs) are the most common hospital-acquired infection in pediatric patients. High adherence to the CLABSI bundle mitigates CLABSIs. At our institution, there did not exist a hospital-wide system to measure bundle-adherence. We developed an electronic dashboard to monitor CLABSI bundle-adherence across the hospital and in real time. METHODS: Institutional stakeholders and areas of opportunity were identified through interviews and data analyses. We created a data pipeline to pull adherence data from twice-daily bundle checks and populate a dashboard in the electronic health record. The dashboard was developed to allow visualization of overall and individual element bundle-adherence across units. Monthly dashboard accesses and element-level bundle-adherence were recorded, and the nursing staff's feedback about the dashboard was obtained. RESULTS: Following deployment in September 2018, the dashboard was primarily accessed by quality improvement, clinical effectiveness and analytics, and infection prevention and control. Quality improvement and infection prevention and control specialists presented dashboard data at improvement meetings to inform unit-level accountability initiatives. All-element adherence across the hospital increased from 25% in September 2018 to 44% in December 2019, and average adherence to each bundle element increased between 2018 and 2019. CONCLUSIONS: CLABSI bundle-adherence, overall and by element, increased across the hospital following the deployment of a real-time electronic data dashboard. The dashboard enabled population-level surveillance of CLABSI bundle-adherence that informed bundle accountability initiatives. Data transparency enabled by electronic dashboards promises to be a useful tool for infectious disease control.
RESUMEN
OBJECTIVE: Multitask learning (MTL) using electronic health records allows concurrent prediction of multiple endpoints. MTL has shown promise in improving model performance and training efficiency; however, it often suffers from negative transfer - impaired learning if tasks are not appropriately selected. We introduce a sequential subnetwork routing (SeqSNR) architecture that uses soft parameter sharing to find related tasks and encourage cross-learning between them. MATERIALS AND METHODS: Using the MIMIC-III (Medical Information Mart for Intensive Care-III) dataset, we train deep neural network models to predict the onset of 6 endpoints including specific organ dysfunctions and general clinical outcomes: acute kidney injury, continuous renal replacement therapy, mechanical ventilation, vasoactive medications, mortality, and length of stay. We compare single-task (ST) models with naive multitask and SeqSNR in terms of discriminative performance and label efficiency. RESULTS: SeqSNR showed a modest yet statistically significant performance boost across 4 of 6 tasks compared with ST and naive multitasking. When the size of the training dataset was reduced for a given task (label efficiency), SeqSNR outperformed ST for all cases showing an average area under the precision-recall curve boost of 2.1%, 2.9%, and 2.1% for tasks using 1%, 5%, and 10% of labels, respectively. CONCLUSIONS: The SeqSNR architecture shows superior label efficiency compared with ST and naive multitasking, suggesting utility in scenarios in which endpoint labels are difficult to ascertain.
Asunto(s)
Aprendizaje Automático , Insuficiencia Multiorgánica , Registros Electrónicos de Salud , Humanos , Unidades de Cuidados Intensivos , Redes Neurales de la ComputaciónRESUMEN
Early prediction of patient outcomes is important for targeting preventive care. This protocol describes a practical workflow for developing deep-learning risk models that can predict various clinical and operational outcomes from structured electronic health record (EHR) data. The protocol comprises five main stages: formal problem definition, data pre-processing, architecture selection, calibration and uncertainty, and generalizability evaluation. We have applied the workflow to four endpoints (acute kidney injury, mortality, length of stay and 30-day hospital readmission). The workflow can enable continuous (e.g., triggered every 6 h) and static (e.g., triggered at 24 h after admission) predictions. We also provide an open-source codebase that illustrates some key principles in EHR modeling. This protocol can be used by interdisciplinary teams with programming and clinical expertise to build deep-learning prediction models with alternate data sources and prediction tasks.
Asunto(s)
Aprendizaje Profundo , Registros Electrónicos de Salud , Proyectos de Investigación , Medición de Riesgo/métodos , Humanos , Programas Informáticos , Flujo de TrabajoRESUMEN
Importance: Randomized clinical trials (RCTs) are considered the criterion standard for clinical evidence. Despite their many benefits, RCTs have limitations, such as costliness, that may reduce the generalizability of their findings among diverse populations and routine care settings. Objective: To assess the performance of an RCT-derived prognostic model that predicts survival among patients with metastatic castration-resistant prostate cancer (CRPC) when the model is applied to real-world data from electronic health records (EHRs). Design, Setting, and Participants: The RCT-trained model and patient data from the RCTs were obtained from the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge for prostate cancer, which occurred from March 16 to July 27, 2015. This challenge included 4 phase 3 clinical trials of patients with metastatic CRPC. Real-world data were obtained from the EHRs of a tertiary care academic medical center that includes a comprehensive cancer center. In this study, the DREAM challenge RCT-trained model was applied to real-world data from January 1, 2008, to December 31, 2019; the model was then retrained using EHR data with optimized feature selection. Patients with metastatic CRPC were divided into RCT and EHR cohorts based on data source. Data were analyzed from March 23, 2018, to October 22, 2020. Exposures: Patients who received treatment for metastatic CRPC. Main Outcomes and Measures: The primary outcome was the performance of an RCT-derived prognostic model that predicts survival among patients with metastatic CRPC when the model is applied to real-world data. Model performance was compared using 10-fold cross-validation according to time-dependent integrated area under the curve (iAUC) statistics. Results: Among 2113 participants with metastatic CRPC, 1600 participants were included in the RCT cohort, and 513 participants were included in the EHR cohort. The RCT cohort comprised a larger proportion of White participants (1390 patients [86.9%] vs 337 patients [65.7%]) and a smaller proportion of Hispanic participants (14 patients [0.9%] vs 42 patients [8.2%]), Asian participants (41 patients [2.6%] vs 88 patients [17.2%]), and participants older than 75 years (388 patients [24.3%] vs 191 patients [37.2%]) compared with the EHR cohort. Participants in the RCT cohort also had fewer comorbidities (mean [SD], 1.6 [1.8] comorbidities vs 2.5 [2.6] comorbidities, respectively) compared with those in the EHR cohort. Of the 101 variables used in the RCT-derived model, 10 were not available in the EHR data set, 3 of which were among the top 10 features in the DREAM challenge RCT model. The best-performing EHR-trained model included only 25 of the 101 variables included in the RCT-trained model. The performance of the RCT-trained and EHR-trained models was adequate in the EHR cohort (mean [SD] iAUC, 0.722 [0.118] and 0.762 [0.106], respectively); model optimization was associated with improved performance of the best-performing EHR model (mean [SD] iAUC, 0.792 [0.097]). The EHR-trained model classified 256 patients as having a high risk of mortality and 256 patients as having a low risk of mortality (hazard ratio, 2.7; 95% CI, 2.0-3.7; log-rank P < .001). Conclusions and Relevance: In this study, although the RCT-trained models did not perform well when applied to real-world EHR data, retraining the models using real-world EHR data and optimizing variable selection was beneficial for model performance. As clinical evidence evolves to include more real-world data, both industry and academia will likely search for ways to balance model optimization with generalizability. This study provides a pragmatic approach to applying RCT-trained models to real-world data.
Asunto(s)
Toma de Decisiones Asistida por Computador , Modelos Estadísticos , Neoplasias de la Próstata Resistentes a la Castración/mortalidad , Adolescente , Adulto , Anciano , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Pronóstico , Neoplasias de la Próstata Resistentes a la Castración/diagnóstico , Neoplasias de la Próstata Resistentes a la Castración/epidemiología , Ensayos Clínicos Controlados Aleatorios como Asunto , Análisis de Supervivencia , Adulto JovenAsunto(s)
Cardiomiopatías , Lupus Eritematoso Sistémico , Cardiomiopatías/diagnóstico por imagen , Cardiomiopatías/epidemiología , Fibrosis , Humanos , Lupus Eritematoso Sistémico/diagnóstico , Lupus Eritematoso Sistémico/epidemiología , Imagen por Resonancia Magnética , Espectroscopía de Resonancia Magnética , PrevalenciaRESUMEN
OBJECTIVE: The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility. MATERIALS AND METHODS: We searched PubMed for articles applying ML models to improve clinical decision-making using EHR data. We limited our search to papers published between 2015 and 2019. RESULTS: Across the 164 studies reviewed, demographic variables were inconsistently reported and/or included as model inputs. Race/ethnicity was not reported in 64%; gender and age were not reported in 24% and 21% of studies, respectively. Socioeconomic status of the population was not reported in 92% of studies. Studies that mentioned these variables often did not report if they were included as model inputs. Few models (12%) were validated using external populations. Few studies (17%) open-sourced their code. Populations in the ML studies include higher proportions of White and Black yet fewer Hispanic subjects compared to the general US population. DISCUSSION: The demographic characteristics of study populations are poorly reported in the ML literature based on EHR data. Demographic representativeness in training data and model transparency is necessary to ensure that ML models are deployed in an equitable and reproducible manner. Wider adoption of reporting guidelines is warranted to improve representativeness and reproducibility.
Asunto(s)
Demografía , Registros Electrónicos de Salud , Aprendizaje Automático , Etnicidad , Femenino , Humanos , Masculino , Encuestas Nutricionales , Factores SocioeconómicosRESUMEN
OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.
Asunto(s)
Registros Electrónicos de Salud/clasificación , Informática Médica , Aprendizaje Automático Supervisado , Clasificación/métodos , Ciencia de los Datos , Humanos , Estudios Observacionales como AsuntoRESUMEN
BACKGROUND: Reducing hospital-acquired pressure ulcers (PUs) in intensive care units (ICUs) has emerged as an important quality metric for health systems internationally. Limited work has been done to characterize the profile of PUs in the ICU using observational data from the electronic health record (EHR). Consequently, there are limited EHR-based prognostic tools for determining a patient's risk of PU development, with most institutions relying on nurse-calculated risk scores such as the Braden score to identify high-risk patients. METHODS AND RESULTS: Using EHR data from 50,851 admissions in a tertiary ICU (MIMIC-III), we show that the prevalence of PUs at stage 2 or above is 7.8 percent. For the 1,690 admissions where a PU was recorded on day 2 or beyond, we evaluated the prognostic value of the Braden score measured within the first 24 hours. A high-risk Braden score (<=12) had precision 0.09 and recall 0.50 for the future development of a PU. We trained a range of machine learning algorithms using demographic parameters, diagnosis codes, laboratory values and vitals available from the EHR within the first 24 hours. A weighted linear regression model showed precision 0.09 and recall 0.71 for future PU development. Classifier performance was not improved by integrating Braden score elements into the model. CONCLUSION: We demonstrate that an EHR-based model can outperform the Braden score as a screening tool for PUs. This may be a useful tool for automatic risk stratification early in an admission, helping to guide quality protocols in the ICU, including the allocation and timing of prophylactic interventions.
RESUMEN
Clinical and pathological stage are defining parameters in oncology, which direct a patient's treatment options and prognosis. Pathology reports contain a wealth of staging information that is not stored in structured form in most electronic health records (EHRs). Therefore, we evaluated three supervised machine learning methods (Support Vector Machine, Decision Trees, Gradient Boosting) to classify free-text pathology reports for prostate cancer into T, N and M stage groups.
Asunto(s)
Aprendizaje Automático , Neoplasias de la Próstata , Registros Electrónicos de Salud , Humanos , MasculinoRESUMEN
Background: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). Methods: We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). Results: The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. Conclusions: We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms.
RESUMEN
The vision of precision medicine relies on the integration of large-scale clinical, molecular and environmental datasets. Data integration may be thought of along two axes: data fusion across institutions, and data fusion across modalities. Cross-institutional data sharing that maintains semantic integrity hinges on the adoption of data standards and a push toward ontology-driven integration. The goal should be the creation of query-able data repositories spanning primary and tertiary care providers, disease registries, research organizations etc. to produce rich longitudinal datasets. Cross-modality sharing involves the integration of multiple data streams, from structured EHR data (diagnosis codes, laboratory tests) to genomics, imaging, monitors and patient-generated data including wearable devices. This integration presents unique technical, semantic, and ethical challenges; however recent work suggests that multi-modal clinical data can significantly improve the performance of phenotyping and prediction algorithms, powering knowledge discovery at the patient- and population-level.
Asunto(s)
Macrodatos , Difusión de la Información/métodos , Descubrimiento del Conocimiento/métodos , Biología Computacional , Humanos , Medicina de Precisión/métodos , Medicina de Precisión/estadística & datos numéricos , Estados UnidosRESUMEN
BACKGROUND: The collection of patient-reported outcomes (PROs) is an emerging priority internationally, guiding clinical care, quality improvement projects and research studies. After the deployment of Patient-Reported Outcomes Measurement Information System (PROMIS) surveys in routine outpatient workflows at an academic cancer center, electronic health record data were used to evaluate survey completion rates and self-reported global health measures across 2 tumor types: breast and prostate cancer. METHODS: This study retrospectively analyzed 11,657 PROMIS surveys from patients with breast cancer and 4411 surveys from patients with prostate cancer, and it calculated survey completion rates and global physical health (GPH) and global mental health (GMH) scores between 2013 and 2018. RESULTS: A total of 36.6% of eligible patients with breast cancer and 23.7% of patients with prostate cancer completed at least 1 survey, with completion rates lower among black patients for both tumor types (P < .05). The mean T scores (calibrated to a general population mean of 50) for GPH were 48.4 ± 9 for breast cancer and 50.6 ± 9 for prostate cancer, and the GMH scores were 52.7 ± 8 and 52.1 ± 9, respectively. GPH and GMH were frequently lower among ethnic minorities, patients without private health insurance, and those with advanced disease. CONCLUSIONS: This analysis provides important baseline data on patient-reported global health in breast and prostate cancer. Demonstrating that PROs can be integrated into clinical workflows, this study shows that supportive efforts may be needed to improve PRO collection and global health endpoints in vulnerable populations.
Asunto(s)
Neoplasias de la Mama/epidemiología , Neoplasias de la Próstata/epidemiología , Centros Médicos Académicos , Adulto , Anciano , Anciano de 80 o más Años , Neoplasias de la Mama/etnología , Registros Electrónicos de Salud/estadística & datos numéricos , Femenino , Encuestas Epidemiológicas/estadística & datos numéricos , Humanos , Masculino , Salud Mental , Persona de Mediana Edad , Medición de Resultados Informados por el Paciente , Neoplasias de la Próstata/etnología , Estudios Retrospectivos , AutoinformeRESUMEN
BACKGROUND: Electronic health record (EHR) based research in oncology can be limited by missing data and a lack of structured data elements. Clinical research data warehouses for specific cancer types can enable the creation of more robust research cohorts. METHODS: We linked data from the Stanford University EHR with the Stanford Cancer Institute Research Database (SCIRDB) and the California Cancer Registry (CCR) to create a research data warehouse for prostate cancer. The database was supplemented with information from clinical trials, natural language processing of clinical notes and surveys on patient-reported outcomes. RESULTS: 11,898 unique prostate cancer patients were identified in the Stanford EHR, of which 3,936 were matched to the Stanford cancer registry and 6153 in the CCR. 7158 patients with EHR data and at least one of SCIRDB and CCR data were initially included in the warehouse. CONCLUSIONS: A disease-specific clinical research data warehouse combining multiple data sources can facilitate secondary data use and enhance observational research in oncology.