Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
NPJ Parkinsons Dis ; 10(1): 58, 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38480700

RESUMEN

Characterization of Parkinson's disease (PD) progression using real-world evidence could guide clinical trial design and identify subpopulations. Efforts to curate research populations, the increasing availability of real-world data, and advances in natural language processing, particularly large language models, allow for a more granular comparison of populations than previously possible. This study includes two research populations and two real-world data-derived (RWD) populations. The research populations are the Harvard Biomarkers Study (HBS, N = 935), a longitudinal biomarkers cohort study with in-person structured study visits; and Fox Insights (N = 36,660), an online self-survey-based research study of the Michael J. Fox Foundation. Real-world cohorts are the Optum Integrated Claims-electronic health records (N = 157,475), representing wide-scale linked medical and claims data and de-identified data from Mass General Brigham (MGB, N = 22,949), an academic hospital system. Structured, de-identified electronic health records data at MGB are supplemented using a manually validated natural language processing with a large language model to extract measurements of PD progression. Motor and cognitive progression scores change more rapidly in MGB than HBS (median survival until H&Y 3: 5.6 years vs. >10, p < 0.001; mini-mental state exam median decline 0.28 vs. 0.11, p < 0.001; and clinically recognized cognitive decline, p = 0.001). In real-world populations, patients are diagnosed more than eleven years later (RWD mean of 72.2 vs. research mean of 60.4, p < 0.001). After diagnosis, in real-world cohorts, treatment with PD medications has initiated an average of 2.3 years later (95% CI: [2.1-2.4]; p < 0.001). This study provides a detailed characterization of Parkinson's progression in diverse populations. It delineates systemic divergences in the patient populations enrolled in research settings vs. patients in the real-world. These divergences are likely due to a combination of selection bias and real population differences, but exact attribution of the causes is challenging. This study emphasizes a need to utilize multiple data sources and to diligently consider potential biases when planning, choosing data sources, and performing downstream tasks and analyses.

2.
medRxiv ; 2024 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-38405736

RESUMEN

Characterization of Parkinson's disease (PD) progression using real-world evidence could guide clinical trial design and identify subpopulations. Efforts to curate research populations, the increasing availability of real-world data and recent advances in natural language processing, particularly large language models, allow for a more granular comparison of populations and the methods of data collection describing these populations than previously possible. This study includes two research populations and two real-world data derived (RWD) populations. The research populations are the Harvard Biomarkers Study (HBS, N = 935), a longitudinal biomarkers cohort study with in-person structured study visits; and Fox Insights (N = 36,660), an online self-survey-based research study of the Michael J. Fox Foundation. Real-world cohorts are the Optum Integrated Claims-electronic health records (N = 157,475), representing wide-scale linked medical and claims data and de-identified data from Mass General Brigham (MGB, N = 22,949), an academic hospital system. Structured, de-identified electronic health records data at MGB are supplemented using natural language processing with a large language model to extract measurements of PD progression. This extraction process is manually validated for accuracy. Motor and cognitive progression scores change more rapidly in MGB than HBS (median survival until H&Y 3: 5.6 years vs. >10, p<0.001; mini-mental state exam median decline 0.28 vs. 0.11, p<0.001; and clinically recognized cognitive decline, p=0.001). In the real-world populations, patients are diagnosed more than eleven years later (RWD mean of 72.2 vs. research mean of 60.4, p<0.001). After diagnosis, in real-world cohorts, treatment with PD medications is initiated 2.3 years later on average (95% CI: [2.1-2.4]; p<0.001). This study provides a detailed characterization of Parkinson's progression in diverse populations. It delineates systemic divergences in the patient populations enrolled in research settings vs. patients in the real world. These divergences are likely due to a combination of selection bias and real population differences, but exact attribution of the causes is challenging using existing data. This study emphasizes a need to utilize multiple data sources and to diligently consider potential biases when planning, choosing data sources, and performing downstream tasks and analyses.

3.
Lancet Digit Health ; 5(12): e882-e894, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-38000873

RESUMEN

BACKGROUND: The evaluation and management of first-time seizure-like events in children can be difficult because these episodes are not always directly observed and might be epileptic seizures or other conditions (seizure mimics). We aimed to evaluate whether machine learning models using real-world data could predict seizure recurrence after an initial seizure-like event. METHODS: This retrospective cohort study compared models trained and evaluated on two separate datasets between Jan 1, 2010, and Jan 1, 2020: electronic medical records (EMRs) at Boston Children's Hospital and de-identified, patient-level, administrative claims data from the IBM MarketScan research database. The study population comprised patients with an initial diagnosis of either epilepsy or convulsions before the age of 21 years, based on International Classification of Diseases, Clinical Modification (ICD-CM) codes. We compared machine learning-based predictive modelling using structured data (logistic regression and XGBoost) with emerging techniques in natural language processing by use of large language models. FINDINGS: The primary cohort comprised 14 021 patients at Boston Children's Hospital matching inclusion criteria with an initial seizure-like event and the comparison cohort comprised 15 062 patients within the IBM MarketScan research database. Seizure recurrence based on a composite expert-derived definition occurred in 57% of patients at Boston Children's Hospital and 63% of patients within IBM MarketScan. Large language models with additional domain-specific and location-specific pre-training on patients excluded from the study (F1-score 0·826 [95% CI 0·817-0·835], AUC 0·897 [95% CI 0·875-0·913]) performed best. All large language models, including the base model without additional pre-training (F1-score 0·739 [95% CI 0·738-0·741], AUROC 0·846 [95% CI 0·826-0·861]) outperformed models trained with structured data. With structured data only, XGBoost outperformed logistic regression and XGBoost models trained with the Boston Children's Hospital EMR (logistic regression: F1-score 0·650 [95% CI 0·643-0·657], AUC 0·694 [95% CI 0·685-0·705], XGBoost: F1-score 0·679 [0·676-0·683], AUC 0·725 [0·717-0·734]) performed similarly to models trained on the IBM MarketScan database (logistic regression: F1-score 0·596 [0·590-0·601], AUC 0·670 [0·664-0·675], XGBoost: F1-score 0·678 [0·668-0·687], AUC 0·710 [0·703-0·714]). INTERPRETATION: Physician's clinical notes about an initial seizure-like event include substantial signals for prediction of seizure recurrence, and additional domain-specific and location-specific pre-training can significantly improve the performance of clinical large language models, even for specialised cohorts. FUNDING: UCB, National Institute of Neurological Disorders and Stroke (US National Institutes of Health).


Asunto(s)
Epilepsia , Convulsiones , Niño , Humanos , Adulto Joven , Adulto , Estudios Retrospectivos , Convulsiones/diagnóstico , Aprendizaje Automático , Registros Electrónicos de Salud
4.
J Med Internet Res ; 25: e45662, 2023 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-37227772

RESUMEN

Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.


Asunto(s)
Neoplasias del Colon , Registros Electrónicos de Salud , Humanos , Algoritmos , Informática , Proyectos de Investigación
5.
Commun Med (Lond) ; 3(1): 24, 2023 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-36788316

RESUMEN

BACKGROUND: Aortic Stenosis and Mitral Regurgitation are common valvular conditions representing a hidden burden of disease within the population. The aim of this study was to develop and validate deep learning-based screening and diagnostic tools that can help guide clinical decision making. METHODS: In this multi-center retrospective cohort study, we acquired Transthoracic Echocardiogram reports from five Mount Sinai hospitals within New York City representing a demographically diverse cohort of patients. We developed a Natural Language Processing pipeline to extract ground-truth labels about valvular status and paired these to Electrocardiograms (ECGs). We developed and externally validated deep learning models capable of detecting valvular disease, in addition to considering scenarios of clinical deployment. RESULTS: We use 617,338 ECGs paired to transthoracic echocardiograms from 123,096 patients to develop a deep learning model for detection of Mitral Regurgitation. Area Under Receiver Operating Characteristic curve (AUROC) is 0.88 (95% CI:0.88-0.89) in internal testing, and 0.81 (95% CI:0.80-0.82) in external validation. To develop a model for detection of Aortic Stenosis, we use 617,338 Echo-ECG pairs for 128,628 patients. AUROC is 0.89 (95% CI: 0.88-0.89) in internal testing, going to 0.86 (95% CI: 0.85-0.87) in external validation. The model's performance increases leading up to the time of the diagnostic echo, and it performs well in validation against requirement of Transcatheter Aortic Valve Replacement procedures. CONCLUSIONS: Deep learning based tools can increase the amount of information extracted from ubiquitous investigations such as the ECG. Such tools are inexpensive, can help in earlier disease detection, and potentially improve prognosis.


The valves of the heart have flaps that open and close when the heart beats to maintain the flow of blood in the correct direction. Valvular disease, such as backflow or narrowing, puts additional strain upon heart muscles which can lead to heart failure. Usually, these conditions are diagnosed by doing an echocardiogram, an ultrasound scan of the heart and nearby blood vessels. The electrocardiogram (ECG) records the electrical signal generated by the heart and can be obtained more easily. We used deep learning neural networks, self-learning computer algorithms which excel at finding patterns within complex data. This enabled us to develop computer software able to diagnose valvular disease from ECGs. Earlier detection of such disease can help in improving overall outcome, while also reducing costs related to treatment.

6.
Neurol Clin Pract ; 12(4): e49-e57, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-36382117

RESUMEN

Background and Objectives: Epilepsy is an important comorbidity that affects outcomes for people with multiple sclerosis (MS). However, it is unclear whether seizure severity among individuals with coexistence of MS and epilepsy (MS + E) is higher than in those with other focal epilepsies. Our goal was to compare the overall severity of epilepsy in individuals with MS + E vs those with focal epilepsy without MS (E - MS), as defined by seizure-related health care utilization, frequency and duration of status epilepticus, and frequency of antiseizure medication (ASM) regimen changes. Methods: In this hypothesis-generating study, we analyzed a US commercial nationwide deidentified claims data set with >86 million individuals between January 1, 2008, and August 31, 2019. Using validated algorithms, we identified adults with E - MS and those with MS + E. We compared the number and length of seizure-related hospital admissions, the number of claims and unique days with claims for status epilepticus, and the rates of ASM regimen changes between the MS + E and E - MS groups. Results: During the study period, 66,708 individuals with E - MS and 537 with MS + E had ≥2 years of coverage after their initial diagnosis of epilepsy. There was no difference between the MS + E and E - MS groups in the percentage of individuals admitted for seizures and/or status epilepticus. However, MS + E with seizure-related admissions had more admissions and longer hospital stays than those with E - MS. MS + E who experienced status epilepticus had more unique days with status epilepticus claims compared with E - MS. MS + E were more likely to have ASM regimen changes in 2 years after the initial diagnosis of epilepsy and had more ASM changes during 2 years compared with E - MS. Among individuals with MS + E, there were no differences in our measures of seizure severity for those treated with sodium channel blockers/modulators vs other ASM classes. Discussion: This study supports the notion that individuals with MS + E can have more severe epilepsy than those with E - MS. Seizure severity among individuals with MS + E treated with sodium channel blockers/modulators vs other ASM classes shows no significant differences. Classification of Evidence: This study provides Class III evidence that individuals with MS + E can have more severe epilepsy than those with E - MS.

7.
NPJ Digit Med ; 5(1): 74, 2022 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-35697747

RESUMEN

Given the growing number of prediction algorithms developed to predict COVID-19 mortality, we evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems. We predicted COVID-19 mortality using baseline commonly measured laboratory values and standard demographic and clinical covariates across healthcare systems, countries, and continents. Specifically, we trained a Cox regression model with nine measured laboratory test values, standard demographics at admission, and comorbidity burden pre-admission. These models were compared at site, country, and continent level. Of the 39,969 hospitalized patients with COVID-19 (68.6% male), 5717 (14.3%) died. In the Cox model, age, albumin, AST, creatine, CRP, and white blood cell count are most predictive of mortality. The baseline covariates are more predictive of mortality during the early days of COVID-19 hospitalization. Models trained at healthcare systems with larger cohort size largely retain good transportability performance when porting to different sites. The combination of routine laboratory test values at admission along with basic demographic features can predict mortality in patients hospitalized with COVID-19. Importantly, this potentially deployable model differs from prior work by demonstrating not only consistent performance but also reliable transportability across healthcare systems in the US and Europe, highlighting the generalizability of this model and the overall approach.

8.
Sci Rep ; 11(1): 20238, 2021 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-34642371

RESUMEN

Neurological complications worsen outcomes in COVID-19. To define the prevalence of neurological conditions among hospitalized patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test in geographically diverse multinational populations during early pandemic, we used electronic health records (EHR) from 338 participating hospitals across 6 countries and 3 continents (January-September 2020) for a cross-sectional analysis. We assessed the frequency of International Classification of Disease code of neurological conditions by countries, healthcare systems, time before and after admission for COVID-19 and COVID-19 severity. Among 35,177 hospitalized patients with SARS-CoV-2 infection, there was an increase in the proportion with disorders of consciousness (5.8%, 95% confidence interval [CI] 3.7-7.8%, pFDR < 0.001) and unspecified disorders of the brain (8.1%, 5.7-10.5%, pFDR < 0.001) when compared to the pre-admission proportion. During hospitalization, the relative risk of disorders of consciousness (22%, 19-25%), cerebrovascular diseases (24%, 13-35%), nontraumatic intracranial hemorrhage (34%, 20-50%), encephalitis and/or myelitis (37%, 17-60%) and myopathy (72%, 67-77%) were higher for patients with severe COVID-19 when compared to those who never experienced severe COVID-19. Leveraging a multinational network to capture standardized EHR data, we highlighted the increased prevalence of central and peripheral neurological phenotypes in patients hospitalized with COVID-19, particularly among those with severe disease.


Asunto(s)
COVID-19 , Enfermedades del Sistema Nervioso , Pandemias , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/complicaciones , COVID-19/epidemiología , Niño , Preescolar , Estudios Transversales , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Enfermedades del Sistema Nervioso/epidemiología , Enfermedades del Sistema Nervioso/etiología , Prevalencia , Índice de Severidad de la Enfermedad , Adulto Joven
9.
JAMIA Open ; 4(2): ooab045, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34142018

RESUMEN

OBJECTIVE: Case-control study designs are commonly used in retrospective analyses of real-world evidence (RWE). Due to the increasingly wide availability of RWE, it can be difficult to determine whether findings are robust or the result of testing multiple hypotheses. MATERIALS AND METHODS: We investigate the potential effects of modifying cohort definitions in a case-control association study between depression and type 2 diabetes mellitus. We used a large (>75 million individuals) de-identified administrative claims database to observe the effects of minor changes to the requirements of glucose and hemoglobin A1c tests in the control group. RESULTS: We found that small permutations to the criteria used to define the control population result in significant shifts in both the demographic structure of the identified cohort as well as the odds ratio of association. These differences remain present when testing against age- and sex-matched controls. DISCUSSION: Analyses of RWE need to be carefully designed to avoid issues of multiple testing. Minor changes to control cohorts can lead to significantly different results and have the potential to alter even prospective studies through selection bias. CONCLUSION: We believe this work offers strong support for the need for robust guidelines, best practices, and regulations around the use of observational RWE for clinical or regulatory decision-making.

10.
medRxiv ; 2021 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-33655281

RESUMEN

OBJECTIVE: Neurological complications can worsen outcomes in COVID-19. We defined the prevalence of a wide range of neurological conditions among patients hospitalized with COVID-19 in geographically diverse multinational populations. METHODS: Using electronic health record (EHR) data from 348 participating hospitals across 6 countries and 3 continents between January and September 2020, we performed a cross-sectional study of hospitalized adult and pediatric patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test, both with and without severe COVID-19. We assessed the frequency of each disease category and 3-character International Classification of Disease (ICD) code of neurological diseases by countries, sites, time before and after admission for COVID-19, and COVID-19 severity. RESULTS: Among the 35,177 hospitalized patients with SARS-CoV-2 infection, there was increased prevalence of disorders of consciousness (5.8%, 95% confidence interval [CI]: 3.7%-7.8%, p FDR <.001) and unspecified disorders of the brain (8.1%, 95%CI: 5.7%-10.5%, p FDR <.001), compared to pre-admission prevalence. During hospitalization, patients who experienced severe COVID-19 status had 22% (95%CI: 19%-25%) increase in the relative risk (RR) of disorders of consciousness, 24% (95%CI: 13%-35%) increase in other cerebrovascular diseases, 34% (95%CI: 20%-50%) increase in nontraumatic intracranial hemorrhage, 37% (95%CI: 17%-60%) increase in encephalitis and/or myelitis, and 72% (95%CI: 67%-77%) increase in myopathy compared to those who never experienced severe disease. INTERPRETATION: Using an international network and common EHR data elements, we highlight an increase in the prevalence of central and peripheral neurological phenotypes in patients hospitalized with SARS-CoV-2 infection, particularly among those with severe disease.

11.
NPJ Digit Med ; 4(1): 62, 2021 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-33785839

RESUMEN

Machine learning can help clinicians to make individualized patient predictions only if researchers demonstrate models that contribute novel insights, rather than learning the most likely next step in a set of actions a clinician will take. We trained deep learning models using only clinician-initiated, administrative data for 42.9 million admissions using three subsets of data: demographic data only, demographic data and information available at admission, and the previous data plus charges recorded during the first day of admission. Models trained on charges during the first day of admission achieve performance close to published full EMR-based benchmarks for inpatient outcomes: inhospital mortality (0.89 AUC), prolonged length of stay (0.82 AUC), and 30-day readmission rate (0.71 AUC). Similar performance between models trained with only clinician-initiated data and those trained with full EMR data purporting to include information about patient state and physiology should raise concern in the deployment of these models. Furthermore, these models exhibited significant declines in performance when evaluated over only myocardial infarction (MI) patients relative to models trained over MI patients alone, highlighting the importance of physician diagnosis in the prognostic performance of these models. These results provide a benchmark for predictive accuracy trained only on prior clinical actions and indicate that models with similar performance may derive their signal by looking over clinician's shoulders-using clinical behavior as the expression of preexisting intuition and suspicion to generate a prediction. For models to guide clinicians in individual decisions, performance exceeding these benchmarks is necessary.

12.
J Am Med Inform Assoc ; 28(7): 1411-1420, 2021 07 14.
Artículo en Inglés | MEDLINE | ID: mdl-33566082

RESUMEN

OBJECTIVE: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity. MATERIALS AND METHODS: Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site. RESULTS: The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review. DISCUSSION: We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions. CONCLUSIONS: We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


Asunto(s)
COVID-19 , Registros Electrónicos de Salud , Índice de Severidad de la Enfermedad , COVID-19/clasificación , Hospitalización , Humanos , Aprendizaje Automático , Pronóstico , Curva ROC , Sensibilidad y Especificidad
13.
Nat Commun ; 12(1): 1107, 2021 02 17.
Artículo en Inglés | MEDLINE | ID: mdl-33597541

RESUMEN

One of the primary tools that researchers use to predict risk is the case-control study. We identify a flaw, temporal bias, that is specific to and uniquely associated with these studies that occurs when the study period is not representative of the data that clinicians have during the diagnostic process. Temporal bias acts to undermine the validity of predictions by over-emphasizing features close to the outcome of interest. We examine the impact of temporal bias across the medical literature, and highlight examples of exaggerated effect sizes, false-negative predictions, and replication failure. Given the ubiquity and practical advantages of case-control studies, we discuss strategies for estimating the influence of and preventing temporal bias where it exists.


Asunto(s)
Investigación Biomédica/normas , Ensayos Clínicos como Asunto/normas , Selección de Paciente , Proyectos de Investigación/normas , Sesgo , Investigación Biomédica/métodos , Investigación Biomédica/tendencias , Estudios de Casos y Controles , Ensayos Clínicos como Asunto/métodos , Predicción , Humanos , Reproducibilidad de los Resultados
14.
J Med Internet Res ; 23(3): e22219, 2021 03 02.
Artículo en Inglés | MEDLINE | ID: mdl-33600347

RESUMEN

Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.


Asunto(s)
COVID-19/epidemiología , Recolección de Datos/métodos , Registros Electrónicos de Salud , Recolección de Datos/normas , Humanos , Revisión de la Investigación por Pares/normas , Edición/normas , Reproducibilidad de los Resultados , SARS-CoV-2/aislamiento & purificación
15.
medRxiv ; 2021 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-33564777

RESUMEN

Objectives: To perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions. Design: Retrospective cohort study. Setting: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe. Participants: Patients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measures: Patients were categorized as "ever-severe" or "never-severe" using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction. Results: Of 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites. Conclusions: Laboratory test values at admission can be used to predict severity in patients with COVID-19. Prediction models show consistency across international sites highlighting the potential generalizability of these models.

17.
NPJ Digit Med ; 3: 109, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32864472

RESUMEN

We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.

18.
Clin Pharmacol Ther ; 107(4): 843-852, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31562770

RESUMEN

The 21st Century Cures Act passed by the United States Congress mandates the US Food and Drug Administration to develop guidance to evaluate the use of real-world evidence (RWE) to support the regulatory process. RWE has generated important medical discoveries, especially in areas where traditional clinical trials would be unethical or infeasible. However, RWE suffers from several issues that hinder its ability to provide proof of treatment efficacy at a level comparable to randomized controlled trials. In this review article, we summarized the advantages and limitations of RWE, identified the key opportunities for RWE, and pointed the way forward to maximize the potential of RWE for regulatory purposes.


Asunto(s)
Ensayos Clínicos como Asunto/legislación & jurisprudencia , Medicina Basada en la Evidencia/legislación & jurisprudencia , United States Food and Drug Administration/legislación & jurisprudencia , Ensayos Clínicos como Asunto/métodos , Ensayos Clínicos como Asunto/estadística & datos numéricos , Toma de Decisiones , Medicina Basada en la Evidencia/métodos , Medicina Basada en la Evidencia/estadística & datos numéricos , Humanos , Estados Unidos
19.
Circ Cardiovasc Qual Outcomes ; 12(7): e005122, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31284738

RESUMEN

BACKGROUND: Data sharing accelerates scientific progress but sharing individual-level data while preserving patient privacy presents a barrier. METHODS AND RESULTS: Using pairs of deep neural networks, we generated simulated, synthetic participants that closely resemble participants of the SPRINT trial (Systolic Blood Pressure Trial). We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants' data could identify a real a participant in the trial. Machine learning predictors built on the synthetic population generalize to the original data set. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data. CONCLUSIONS: Deep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical data sets by enhancing data sharing while preserving participant privacy.


Asunto(s)
Seguridad Computacional , Confidencialidad , Aprendizaje Profundo , Difusión de la Información/métodos , Antihipertensivos/uso terapéutico , Presión Sanguínea/efectos de los fármacos , Simulación por Computador , Recolección de Datos , Humanos , Hipertensión/diagnóstico , Hipertensión/tratamiento farmacológico , Hipertensión/fisiopatología , Ensayos Clínicos Controlados Aleatorios como Asunto , Resultado del Tratamiento
20.
Pac Symp Biocomput ; 24: 8-17, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30864306

RESUMEN

Biomedical association studies are increasingly done using clinical concepts, and in particular diagnostic codes from clinical data repositories as phenotypes. Clinical concepts can be represented in a meaningful, vector space using word embedding models. These embeddings allow for comparison between clinical concepts or for straightforward input to machine learning models. Using traditional approaches, good representations require high dimensionality, making downstream tasks such as visualization more difficult. We applied Poincaré embeddings in a 2-dimensional hyperbolic space to a large-scale administrative claims database and show performance comparable to 100-dimensional embeddings in a euclidean space. We then examine disease relationships under different disease contexts to better understand potential phenotypes.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Bases de Datos Factuales , Humanos , Clasificación Internacional de Enfermedades , Aprendizaje Automático , Informática Médica , Procesamiento de Lenguaje Natural , Fenotipo , Semántica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...