Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
J Am Med Inform Assoc ; 31(3): 640-650, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38128118

RESUMEN

OBJECTIVE: High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS: ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS: ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION: ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION: When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.


Asunto(s)
Algoritmos , Programas Informáticos , Curva ROC , Registros Electrónicos de Salud , Fenotipo
3.
J Med Internet Res ; 25: e45662, 2023 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-37227772

RESUMEN

Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.


Asunto(s)
Neoplasias del Colon , Registros Electrónicos de Salud , Humanos , Algoritmos , Informática , Proyectos de Investigación
4.
J Am Med Inform Assoc ; 30(2): 367-381, 2023 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-36413056

RESUMEN

OBJECTIVE: Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS: We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS: Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION: Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION: Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.


Asunto(s)
Investigación Biomédica , Registros Electrónicos de Salud , Aprendizaje Automático , Algoritmos , Fenotipo
5.
Can J Psychiatry ; 68(6): 426-435, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36453004

RESUMEN

OBJECTIVE: To investigate how primary care access, intensity and quality of care changed among patients living with schizophrenia before and after the onset of the COVID-19 pandemic in Ontario, Canada. METHODS: This cohort study was performed using primary care electronic medical record data from the University of Toronto Practice-Based Research Network (UTOPIAN), a network of > 500 family physicians in Ontario, Canada. Data were collected during primary care visits from 2643 patients living with schizophrenia. Rates of primary care health service use (in-person and virtual visits with family physicians) and key preventive health indices indicated in antipsychotic monitoring (blood pressure readings, hemoglobin A1c, cholesterol and complete blood cell count [CBC] tests) were measured and compared in the 12 months before and after onset of the COVID-19 pandemic. RESULTS: Access to in-person care dropped with the onset of the COVID-19 pandemic. During the first year of the pandemic only 39.5% of patients with schizophrenia had at least one in-person visit compared to 81.0% the year prior. There was a corresponding increase in virtual visits such that 78.0% of patients had a primary care appointment virtually during the pandemic period. Patients prescribed injectable antipsychotics were more likely to continue having more frequent in-person appointments during the pandemic than patients prescribed only oral or no antipsychotic medications. The proportion of patients who did not have recommended tests increased from 41.0% to 72.4% for blood pressure readings, from 48.9% to 60.2% for hemoglobin A1c, from 57.0% to 67.8% for LDL cholesterol and 45.0% to 56.0% for CBC tests during the pandemic. CONCLUSIONS: There were substantial decreases in preventive care after the onset of the pandemic, although primary care access was largely maintained through virtual care. Addressing these deficiencies will be essential to promoting health equity and reducing the risk of poor health outcomes.


Asunto(s)
Antipsicóticos , COVID-19 , Esquizofrenia , Humanos , Ontario/epidemiología , Pandemias , Esquizofrenia/tratamiento farmacológico , Esquizofrenia/epidemiología , Estudios de Cohortes , Hemoglobina Glucada , Antipsicóticos/uso terapéutico , Atención Primaria de Salud
6.
J R Stat Soc Series B Stat Methodol ; 84(4): 1353-1391, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36275859

RESUMEN

In many contemporary applications, large amounts of unlabeled data are readily available while labeled examples are limited. There has been substantial interest in semi-supervised learning (SSL) which aims to leverage unlabeled data to improve estimation or prediction. However, current SSL literature focuses primarily on settings where labeled data is selected uniformly at random from the population of interest. Stratified sampling, while posing additional analytical challenges, is highly applicable to many real world problems. Moreover, no SSL methods currently exist for estimating the prediction performance of a fitted model when the labeled data is not selected uniformly at random. In this paper, we propose a two-step SSL procedure for evaluating a prediction rule derived from a working binary regression model based on the Brier score and overall misclassification rate under stratified sampling. In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for stratified sampling and to improve efficiency. In step II, we augment the initial imputations to ensure the consistency of the resulting estimators regardless of the specification of the prediction model or the imputation model. The final estimator is then obtained with the augmented imputations. We provide asymptotic theory and numerical studies illustrating that our proposals outperform their supervised counterparts in terms of efficiency gain. Our methods are motivated by electronic health record (EHR) research and validated with a real data analysis of an EHR-based study of diabetic neuropathy.

7.
Ann Fam Med ; 20(20 Suppl 1)2022 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-35947415

RESUMEN

Context: Many people have experienced poorer mental health and increased distress during the COVID-19 pandemic. It is unclear to what extent this has resulted in increases in the number of patients presenting with anxiety and/or depression in primary care. Objective: To determine if there are more patients are visiting their family doctor for anxiety/depression during the COVID-19 pandemic compared to before the pandemic, and to determine whether these effects varied based on patient demographic characteristics. Study Design: A retrospective cohort study of family medicine patients from 2017-2020. Data Source: Electronic medical records (EMRs) from the University of Toronto Practice Based-Research Network (UTOPIAN) Data Safe Haven. The majority of physicians in the UTOPIAN EMR database practice in the Greater Toronto Area, a high-COVID region of Canada. Population Studied: Active family practice patients aged 10 and older with at least 1 year of EMR data. Outcome Measures: Visits for anxiety and/or depression; prescriptions for antidepressant medications. Results: Changes in visits for anxiety and depression during the COVID-19 pandemic were consistent with an increased demand for mental healthcare and an increase in the number of individuals with anxiety and depression. Increases in visits for anxiety and depression were larger for younger patients, women, and later in the pandemic. Among younger patients, prescriptions for antidepressants were substantially reduced during the first few months of the pandemic (April-May 2020) but incidences rates increased later in 2020. Increases in visit volume during the pandemic were consist with more frequent visits for anxiety/depression and more new patients presenting with anxiety or depression. Conclusion: The COVID-19 pandemic has resulted in an increased demand for mental health services from family physicians. Increases in anxiety and depression were especially pronounced among younger female patients and increased throughout the pandemic. Our findings highlight the need for continued efforts to support and addresses mental health concerns in primary care.


Asunto(s)
COVID-19 , Antidepresivos , Ansiedad/epidemiología , Ansiedad/etiología , Ansiedad/psicología , COVID-19/epidemiología , Estudios de Cohortes , Depresión/epidemiología , Depresión/psicología , Femenino , Humanos , Ontario/epidemiología , Pandemias , Atención Primaria de Salud , Estudios Retrospectivos , SARS-CoV-2
8.
BMJ Open ; 12(5): e059130, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-35534063

RESUMEN

INTRODUCTION: Through the INTernational ConsoRtium of Primary Care BIg Data Researchers (INTRePID), we compared the pandemic impact on the volume of primary care visits and uptake of virtual care in Australia, Canada, China, Norway, Singapore, South Korea, Sweden, the UK and the USA. METHODS: Visit definitions were agreed on centrally, implemented locally across the various settings in INTRePID countries, and weekly visit counts were shared centrally for analysis. We evaluated the weekly rate of primary care physician visits during 2019 and 2020. Rate ratios (RRs) of total weekly visit volume and the proportion of weekly visits that were virtual in the pandemic period in 2020 compared with the same prepandemic period in 2019 were calculated. RESULTS: In 2019 and 2020, there were 80 889 386 primary care physician visits across INTRePID. During the pandemic, average weekly visit volume dropped in China, Singapore, South Korea, and the USA but was stable overall in Australia (RR 0.98 (95% CI 0.92 to 1.05, p=0.59)), Canada (RR 0.96 (95% CI 0.89 to 1.03, p=0.24)), Norway (RR 1.01 (95% CI 0.88 to 1.17, p=0.85)), Sweden (RR 0.91 (95% CI 0.79 to 1.06, p=0.22)) and the UK (RR 0.86 (95% CI 0.72 to 1.03, p=0.11)). In countries that had negligible virtual care prepandemic, the proportion of visits that were virtual were highest in Canada (77.0%) and Australia (41.8%). In Norway (RR 8.23 (95% CI 5.30 to 12.78, p<0.001), the UK (RR 2.36 (95% CI 2.24 to 2.50, p<0.001)) and Sweden (RR 1.33 (95% CI 1.17 to 1.50, p<0.001)) where virtual visits existed prepandemic, it increased significantly during the pandemic. CONCLUSIONS: The drop in primary care in-person visits during the pandemic was a global phenomenon across INTRePID countries. In several countries, primary care shifted to virtual visits mitigating the drop in in-person visits.


Asunto(s)
COVID-19 , Telemedicina , Macrodatos , COVID-19/epidemiología , Humanos , Pandemias , Atención Primaria de Salud , SARS-CoV-2
9.
IEEE J Biomed Health Inform ; 26(8): 4197-4206, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35588417

RESUMEN

As different scientific disciplines begin to converge on machine learning for causal inference, we demonstrate the application of machine learning algorithms in the context of longitudinal causal estimation using electronic health records. Our aim is to formulate a marginal structural model for estimating diabetes care provisions in which we envisioned hypothetical (i.e. counterfactual) dynamic treatment regimes using a combination of drug therapies to manage diabetes: metformin, sulfonylurea and SGLT-2i. The binary outcome of diabetes care provisions was defined using a composite measure of chronic disease prevention and screening elements [27] including (i) primary care visit, (ii) blood pressure, (iii) weight, (iv) hemoglobin A1c, (v) lipid, (vi) ACR, (vii) eGFR and (viii) statin medication. We used several statistical learning algorithms to describe causal relationships between the prescription of three common classes of diabetes medications and quality of diabetes care using the electronic health records contained in National Diabetes Repository. In particular, we generated an ensemble of statistical learning algorithms using the SuperLearner framework based on the following base learners: (i) least absolute shrinkage and selection operator, (ii) ridge regression, (iii) elastic net, (iv) random forest, (v) gradient boosting machines, and (vi) neural network. Each statistical learning algorithm was fitted using the pseudo-population generated from the marginalization of the time-dependent confounding process. Covariate balance was assessed using the longitudinal (i.e. cumulative-time product) stabilized weights with calibrated restrictions. Our results indicated that the treatment drop-in cohorts (with respect to metformin, sulfonylurea and SGLT-2i) may have improved diabetes care provisions in relation to treatment naïve (i.e. no treatment) cohort. As a clinical utility, we hope that this article will facilitate discussions around the prevention of adverse chronic outcomes associated with type II diabetes through the improvement of diabetes care provisions in primary care.


Asunto(s)
Diabetes Mellitus Tipo 2 , Metformina , Estudios de Cohortes , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Hemoglobina Glucada/análisis , Humanos , Metformina/uso terapéutico , Modelos Estructurales
10.
JMIR Ment Health ; 8(8): e27589, 2021 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-34383685

RESUMEN

BACKGROUND: Although effective mental health treatments exist, the ability to match individuals to optimal treatments is poor, and timely assessment of response is difficult. One reason for these challenges is the lack of objective measurement of psychiatric symptoms. Sensors and active tasks recorded by smartphones provide a low-burden, low-cost, and scalable way to capture real-world data from patients that could augment clinical decision-making and move the field of mental health closer to measurement-based care. OBJECTIVE: This study tests the feasibility of a fully remote study on individuals with self-reported depression using an Android-based smartphone app to collect subjective and objective measures associated with depression severity. The goals of this pilot study are to develop an engaging user interface for high task adherence through user-centered design; test the quality of collected data from passive sensors; start building clinically relevant behavioral measures (features) from passive sensors and active inputs; and preliminarily explore connections between these features and depression severity. METHODS: A total of 600 participants were asked to download the study app to join this fully remote, observational 12-week study. The app passively collected 20 sensor data streams (eg, ambient audio level, location, and inertial measurement units), and participants were asked to complete daily survey tasks, weekly voice diaries, and the clinically validated Patient Health Questionnaire (PHQ-9) self-survey. Pairwise correlations between derived behavioral features (eg, weekly minutes spent at home) and PHQ-9 were computed. Using these behavioral features, we also constructed an elastic net penalized multivariate logistic regression model predicting depressed versus nondepressed PHQ-9 scores (ie, dichotomized PHQ-9). RESULTS: A total of 415 individuals logged into the app. Over the course of the 12-week study, these participants completed 83.35% (4151/4980) of the PHQ-9s. Applying data sufficiency rules for minimally necessary daily and weekly data resulted in 3779 participant-weeks of data across 384 participants. Using a subset of 34 behavioral features, we found that 11 features showed a significant (P<.001 Benjamini-Hochberg adjusted) Spearman correlation with weekly PHQ-9, including voice diary-derived word sentiment and ambient audio levels. Restricting the data to those cases in which all 34 behavioral features were present, we had available 1013 participant-weeks from 186 participants. The logistic regression model predicting depression status resulted in a 10-fold cross-validated mean area under the curve of 0.656 (SD 0.079). CONCLUSIONS: This study finds a strong proof of concept for the use of a smartphone-based assessment of depression outcomes. Behavioral features derived from passive sensors and active tasks show promising correlations with a validated clinical measure of depression (PHQ-9). Future work is needed to increase scale that may permit the construction of more complex (eg, nonlinear) predictive models and better handle data missingness.

11.
PLoS One ; 16(8): e0254798, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34383766

RESUMEN

As society has moved past the initial phase of the COVID-19 crisis that relied on broad-spectrum shutdowns as a stopgap method, industries and institutions have faced the daunting question of how to return to a stabilized state of activities and more fully reopen the economy. A core problem is how to return people to their workplaces and educational institutions in a manner that is safe, ethical, grounded in science, and takes into account the unique factors and needs of each organization and community. In this paper, we introduce an epidemiological model (the "Community-Workplace" model) that accounts for SARS-CoV-2 transmission within the workplace, within the surrounding community, and between them. We use this multi-group deterministic compartmental model to consider various testing strategies that, together with symptom screening, exposure tracking, and nonpharmaceutical interventions (NPI) such as mask wearing and physical distancing, aim to reduce disease spread in the workplace. Our framework is designed to be adaptable to a variety of specific workplace environments to support planning efforts as reopenings continue. Using this model, we consider a number of case studies, including an office workplace, a factory floor, and a university campus. Analysis of these cases illustrates that continuous testing can help a workplace avoid an outbreak by reducing undetected infectiousness even in high-contact environments. We find that a university setting, where individuals spend more time on campus and have a higher contact load, requires more testing to remain safe, compared to a factory or office setting. Under the modeling assumptions, we find that maintaining a prevalence below 3% can be achieved in an office setting by testing its workforce every two weeks, whereas achieving this same goal for a university could require as much as fourfold more testing (i.e., testing the entire campus population twice a week). Our model also simulates the dynamics of reduced spread that result from the introduction of mitigation measures when test results reveal the early stages of a workplace outbreak. We use this to show that a vigilant university that has the ability to quickly react to outbreaks can be justified in implementing testing at the same rate as a lower-risk office workplace. Finally, we quantify the devastating impact that an outbreak in a small-town college could have on the surrounding community, which supports the notion that communities can be better protected by supporting their local places of business in preventing onsite spread of disease.


Asunto(s)
COVID-19/prevención & control , Trazado de Contacto/métodos , Brotes de Enfermedades/prevención & control , Distanciamiento Físico , Universidades , Lugar de Trabajo , Humanos
12.
PLoS One ; 16(8): e0255992, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34383844

RESUMEN

PURPOSE: We aimed to determine the degree to which reasons for primary care visits changed during the COVID-19 pandemic. METHODS: We used data from the University of Toronto Practice Based Research Network (UTOPIAN) to compare the most common reasons for primary care visits before and after the onset of the COVID-19 pandemic, focusing on the number of visits and the number of patients seen for each of the 25 most common diagnostic codes. The proportion of visits involving virtual care was assessed as a secondary outcome. RESULTS: UTOPIAN family physicians (N = 379) conducted 702,093 visits, involving 264,942 patients between March 14 and December 31, 2019 (pre-pandemic period), and 667,612 visits, involving 218,335 patients between March 14 and December 31, 2020 (pandemic period). Anxiety was the most common reason for visit, accounting for 9.2% of the total visit volume during the pandemic compared to 6.5% the year before. Diabetes and hypertension remained among the top 5 reasons for visit during the pandemic, but there were 23.7% and 26.2% fewer visits and 19.5% and 28.8% fewer individual patients accessing care for diabetes and hypertension, respectively. Preventive care visits were substantially reduced, with 89.0% fewer periodic health exams and 16.2% fewer well-baby visits. During the pandemic, virtual care became the dominant care format (77.5% virtual visits). Visits for anxiety and depression were the most common reasons for a virtual visit (90.6% virtual visits). CONCLUSION: The decrease in primary care visit volumes during the COVID-19 pandemic varied based on the reason for the visit, with increases in visits for anxiety and decreases for preventive care and visits for chronic diseases. Implications of increased demands for mental health services and gaps in preventive care and chronic disease management may require focused efforts in primary care.


Asunto(s)
COVID-19 , Visita a Consultorio Médico , Atención Primaria de Salud , Adulto , Anciano , Anciano de 80 o más Años , Canadá , Estudios Transversales , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pandemias
13.
CMAJ Open ; 9(2): E651-E658, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34131028

RESUMEN

BACKGROUND: It has been suggested that the COVID-19 pandemic has worsened socioeconomic disparities in access to primary care. Given these concerns, we investigated whether the pandemic affected visits to family physicians differently across sociodemographic groups. METHODS: We conducted a retrospective cohort study using electronic medical records from family physician practices within the University of Toronto Practice-Based Research Network. We evaluated primary care visits for a fixed cohort of patients who were active within the database as of Jan. 1, 2019, to estimate the number of patients who visited their family physician (visitor rate) and the number of distinct visits (visit volume) between Jan. 1, 2019, to June 30, 2020. We compared trends in visitor rate and visit volume during the pandemic (Mar. 14 to June 30, 2020) with the same period in the previous year (Mar. 14 to June 30, 2019) across sociodemographic factors, including age, sex, neighbourhood income, material deprivation and ethnic concentration. RESULTS: We included 365 family physicians and 372 272 patients. Compared with the previous year, visitor rates during the pandemic period dropped by 34.5%, from 357 visitors per 1000 people to 292 visitors per 1000 people. Declines in visit volume during the pandemic were less pronounced (21.8% fewer visits), as the mean number of visits per patient increased during the pandemic (from 1.64 to 1.96). The declines in visitor rate and visit volume varied based on patient age and sex, but not socioeconomic status. INTERPRETATION: Although the number of visits to family physicians dropped substantially during the first few weeks of the COVID-19 pandemic in Ontario, patients from communities with low socioeconomic status did not appear to be disproportionately affected. In this primary care setting, the pandemic appears not to have worsened socioeconomic disparities in access to care.


Asunto(s)
Citas y Horarios , Medicina Familiar y Comunitaria/tendencias , Disparidades en Atención de Salud/estadística & datos numéricos , Atención Primaria de Salud/tendencias , Adolescente , Adulto , Factores de Edad , Anciano , COVID-19 , Estudios de Cohortes , Femenino , Accesibilidad a los Servicios de Salud , Humanos , Masculino , Persona de Mediana Edad , Ontario , Estudios Retrospectivos , SARS-CoV-2 , Factores Sexuales , Clase Social , Adulto Joven
14.
Stat Med ; 39(22): 3024-3025, 2020 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-32914464
15.
Stat Med ; 39(3): 252-264, 2020 02 10.
Artículo en Inglés | MEDLINE | ID: mdl-31820458

RESUMEN

Meta-analysis allows for the aggregation of results from multiple studies to improve statistical inference for the parameter of interest. In recent years, random-effect meta-analysis has been employed to synthesize estimates of incidence rates of adverse events across heterogeneous clinical trials to evaluate treatment safety. However, the validity of existing approaches relies on asymptotic approximation as the number of studies becomes large. In practice, a limited number of trials are typically available for analysis. Moreover, adverse events are typically rare; thus, study-specific incidence rate estimates may be unstable or undefined. In this paper, we present a method for construction of an exact confidence interval for the location parameter of the beta-binomial model through inversion of exact tests. The coverage level of the proposed confidence interval is guaranteed to achieve at least the nominal level, regardless of the number of studies or the with-in study sample size, making it particularly applicable to the study of rare-event data.


Asunto(s)
Intervalos de Confianza , Metaanálisis como Asunto , Simulación por Computador , Humanos
16.
J Am Med Inform Assoc ; 26(11): 1255-1262, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31613361

RESUMEN

OBJECTIVE: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). MATERIALS AND METHODS: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. RESULTS: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. CONCLUSION: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Clasificación Internacional de Enfermedades , Procesamiento de Lenguaje Natural , Fenotipo , Polimorfismo de Nucleótido Simple , Área Bajo la Curva , Humanos , Unified Medical Language System
17.
Biometrics ; 75(1): 268-277, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30353541

RESUMEN

The use of Electronic Health Records (EHR) for translational research can be challenging due to difficulty in extracting accurate disease phenotype data. Historically, EHR algorithms for annotating phenotypes have been either rule-based or trained with billing codes and gold standard labels curated via labor intensive medical chart review. These simplistic algorithms tend to have unpredictable portability across institutions and low accuracy for many disease phenotypes due to imprecise billing codes. Recently, more sophisticated machine learning algorithms have been developed to improve the robustness and accuracy of EHR phenotyping algorithms. These algorithms are typically trained via supervised learning, relating gold standard labels to a wide range of candidate features including billing codes, procedure codes, medication prescriptions and relevant clinical concepts extracted from narrative notes via Natural Language Processing (NLP). However, due to the time intensiveness of gold standard labeling, the size of the training set is often insufficient to build a generalizable algorithm with the large number of candidate features extracted from EHR. To reduce the number of candidate predictors and in turn improve model performance, we present an automated feature selection method based entirely on unlabeled observations. The proposed method generates a comprehensive surrogate for the underlying phenotype with an unsupervised clustering of disease status based on several highly predictive features such as diagnosis codes and mentions of the disease in text fields available in the entire set of EHR data. A sparse regression model is then built with the estimated outcomes and remaining covariates to identify those features most informative of the phenotype of interest. Relying on the results of Li and Duan (1989), we demonstrate that variable selection for the underlying phenotype model can be achieved by fitting the surrogate-based model. We explore the performance of our methods in numerical simulations and present the results of a prediction model for Rheumatoid Arthritis (RA) built on a large EHR data mart from the Partners Health System consisting of billing codes and NLP terms. Empirical results suggest that our procedure reduces the number of gold-standard labels necessary for phenotyping thereby harnessing the automated power of EHR data and improving efficiency.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud/estadística & datos numéricos , Aprendizaje Automático , Artritis Reumatoide/diagnóstico , Análisis por Conglomerados , Humanos , Modelos Logísticos , Procesamiento de Lenguaje Natural , Fenotipo , Aprendizaje Automático Supervisado , Investigación Biomédica Traslacional
18.
AJR Am J Roentgenol ; 210(2): 431-437, 2018 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-29261347

RESUMEN

OBJECTIVE: The purpose of this study was to describe CT angiography (CTA) findings of lumen contrast heterogeneity and intraluminal thrombus volume and to evaluate their relationship with rapid aneurysm growth in patients with abdominal aortic aneurysms (AAA) between 3 and 5 cm. MATERIALS AND METHODS: This institutional review board-approved and HIPAA-compliant single-center retrospective study included CTA studies obtained between January 2004 and December 2014 in 140 patients with AAA (101 men, 39 women; mean age ± SD, 70 ± 9 years old; age range, 22-87 years old). Standardized measurements for aneurysm intraluminal thrombus volume and a relatively new metric termed "lumen contrast heterogeneity" were obtained from the CTA images. AAA growth rate data were acquired from all subsequent cross-sectional studies. The association between the imaging findings and rapid aneurysm growth (> 0.4 cm/y) was evaluated using multivariate logistic regression. Patient comorbidities and medications were added to the regression model to assess for further associations with rapid growth rate. RESULTS: Using a baseline logistic regression model, lumen contrast heterogeneity (odds ratio [OR], 1.16; 95% CI, 1.05-1.32), intraluminal thrombus volume (OR, 2.15; 95% CI, 1.26-3.86), and maximum AAA diameter (OR, 1.69; 95% CI, 1.03-2.84) were independently associated with increased likelihood of rapid aneurysm growth. None of the patient comorbidities or medications were significantly associated with the outcome when added to the baseline model. CONCLUSION: Both intraluminal thrombus and lumen contrast heterogeneity are seen on AAA CTA studies and can be quantified; both of these metrics are independently associated with rapid growth rate and should be recognized by radiologists evaluating patients with AAA during surveillance.


Asunto(s)
Aneurisma de la Aorta Abdominal/diagnóstico por imagen , Aneurisma de la Aorta Abdominal/patología , Angiografía por Tomografía Computarizada/métodos , Adulto , Anciano , Anciano de 80 o más Años , Comorbilidad , Estudios Transversales , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos
19.
J Am Med Inform Assoc ; 25(1): 54-60, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29126253

RESUMEN

Objective: Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. In this paper, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. Methods: The most predictive features, such as the number of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes or mentions of the target phenotype, are normalized to resemble a normal mixture distribution with high area under the receiver operating curve (AUC) for prediction. The transformed features are then denoised and combined into a score for accurate disease classification. Results: We validated the accuracy of PheNorm with 4 phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUCs of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the 4 phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference. Conclusion: The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data.


Asunto(s)
Algoritmos , Macrodatos , Registros Electrónicos de Salud , Fenotipo , Área Bajo la Curva , Conjuntos de Datos como Asunto , Humanos , Péptidos y Proteínas de Señalización Intercelular , Clasificación Internacional de Enfermedades , Péptidos , Medicina de Precisión
20.
J Pediatr ; 188: 224-231.e5, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28625502

RESUMEN

OBJECTIVES: To compare registry and electronic health record (EHR) data mining approaches for cohort ascertainment in patients with pediatric pulmonary hypertension (PH) in an effort to overcome some of the limitations of registry enrollment alone in identifying patients with particular disease phenotypes. STUDY DESIGN: This study was a single-center retrospective analysis of EHR and registry data at Boston Children's Hospital. The local Informatics for Integrating Biology and the Bedside (i2b2) data warehouse was queried for billing codes, prescriptions, and narrative data related to pediatric PH. Computable phenotype algorithms were developed by fitting penalized logistic regression models to a physician-annotated training set. Algorithms were applied to a candidate patient cohort, and performance was evaluated using a separate set of 136 records and 179 registry patients. We compared clinical and demographic characteristics of patients identified by computable phenotype and the registry. RESULTS: The computable phenotype had an area under the receiver operating characteristics curve of 90% (95% CI, 85%-95%), a positive predictive value of 85% (95% CI, 77%-93%), and identified 413 patients (an additional 231%) with pediatric PH who were not enrolled in the registry. Patients identified by the computable phenotype were clinically distinct from registry patients, with a greater prevalence of diagnoses related to perinatal distress and left heart disease. CONCLUSIONS: Mining of EHRs using computable phenotypes identified a large cohort of patients not recruited using a classic registry. Fusion of EHR and registry data can improve cohort ascertainment for the study of rare diseases. TRIAL REGISTRATION: ClinicalTrials.gov: NCT02249923.


Asunto(s)
Minería de Datos , Registros Electrónicos de Salud , Hipertensión Pulmonar/diagnóstico , Sistema de Registros , Algoritmos , Niño , Humanos , Hipertensión Pulmonar/epidemiología , Fenotipo , Valor Predictivo de las Pruebas , Estudios Retrospectivos , Sensibilidad y Especificidad , Estados Unidos/epidemiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...