RESUMO
Machine Learning models trained from real-world data have demonstrated promise in predicting suicide attempts in adolescents. However, their transportability, namely the performance of a model trained on one dataset and applied to different data, is largely unknown, hindering the clinical adoption of these models. Here we developed different machine learning-based suicide prediction models based on real-world data collected in different contexts (inpatient, outpatient, and all encounters) with varying purposes (administrative claims and electronic health records), and compared their cross-data performance. The three datasets used were the All-Payer Claims Database in Connecticut, the Hospital Inpatient Discharge Database in Connecticut, and the Electronic Health Records data provided by the Kansas Health Information Network. We included 285,320 patients among whom we identified 3389 (1.2%) suicide attempters and 66% of the suicide attempters were female. Different machine learning models were evaluated on source datasets where models were trained and then applied to target datasets. More complex models, particularly deep long short-term memory neural network models, did not outperform simpler regularized logistic regression models in terms of both local and transported performance. Transported models exhibited varying performance, showing drops or even improvements compared to their source performance. While they can achieve satisfactory transported performance, they are usually upper-bounded by the best performance of locally developed models, and they can identify additional new cases in target data. Our study uncovers complex transportability patterns and could facilitate the development of suicide prediction models with better performance and generalizability.
Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Tentativa de Suicídio , Humanos , Feminino , Masculino , Adolescente , Tentativa de Suicídio/psicologia , Tentativa de Suicídio/estatística & dados numéricos , Connecticut , Estudos Longitudinais , Bases de Dados Factuais , Suicídio/psicologiaRESUMO
BACKGROUND: SARS-CoV-2-infected patients may develop new conditions in the period after the acute infection. These conditions, the post-acute sequelae of SARS-CoV-2 infection (PASC, or Long COVID), involve a diverse set of organ systems. Limited studies have investigated the predictability of Long COVID development and its associated risk factors. METHODS: In this retrospective cohort study, we used electronic healthcare records from two large-scale PCORnet clinical research networks, INSIGHT (~1.4 million patients from New York) and OneFlorida+ (~0.7 million patients from Florida), to identify factors associated with having Long COVID, and to develop machine learning-based models for predicting Long COVID development. Both SARS-CoV-2-infected and non-infected adults were analysed during the period of March 2020 to November 2021. Factors associated with Long COVID risk were identified by removing background associations and correcting for multiple tests. RESULTS: We observed complex association patterns between baseline factors and a variety of Long COVID conditions, and we highlight that severe acute SARS-CoV-2 infection, being underweight, and having baseline comorbidities (e.g., cancer and cirrhosis) are likely associated with increased risk of developing Long COVID. Several Long COVID conditions, e.g., dementia, malnutrition, chronic obstructive pulmonary disease, heart failure, PASC diagnosis U099, and acute kidney failure are well predicted (C-index > 0.8). Moderately predictable conditions include atelectasis, pulmonary embolism, diabetes, pulmonary fibrosis, and thromboembolic disease (C-index 0.7-0.8). Less predictable conditions include fatigue, anxiety, sleep disorders, and depression (C-index around 0.6). CONCLUSIONS: This observational study suggests that association patterns between investigated factors and Long COVID are complex, and the predictability of different Long COVID conditions varies. However, machine learning-based predictive models can help in identifying patients who are at risk of developing a variety of Long COVID conditions.
Most people who develop COVID-19 make a full recovery, but some go on to develop post-acute sequelae of SARS-CoV-2 infection, commonly known as Long COVID. Up to now, we did not know why some people are affected by Long COVID whilst others are not. We conducted a study to identify risk factors for Long COVID and developed a mathematical modeling approach to predict those at risk. We find that Long COVID is associated with some factors such as experiencing severe acute COVID-19, being underweight, and having conditions including cancer or cirrhosis. Due to the wide variety of symptoms defined as Long COVID, it may be challenging to come up with a set of risk factors that can predict the whole spectrum of Long COVID. However, our approach could be used to predict a variety of Long COVID conditions.
RESUMO
Paxlovid has been approved for use in patients who are at high risk for severe acute COVID-19 illness. Evidence regarding whether Paxlovid protects against Post-Acute Sequelae of SARS-CoV-2 infection (PASC), or Long COVID, is mixed in high-risk patients and lacking in low-risk patients. With a target trial emulation framework, we evaluated the association of Paxlovid treatment within 5 days of SARS-CoV-2 infection with incident Long COVID and hospitalization or death from any cause in the post-acute period (30-180 days after infection) using electronic health records from the Patient-Centered Clinical Research Networks (PCORnet) RECOVER repository. The study population included 497,499 SARS-CoV-2 positive patients between March 1, 2022, to February 1, 2023, and among which 165,256 were treated with Paxlovid within 5 days since infection and 307,922 were not treated with Paxlovid or other COVID-19 treatments. Compared with the non-treated group, Paxlovid treatment was associated with reduced risk of Long COVID with a Hazard Ratio (HR) of 0.88 (95% CI, 0.87 to 0.89) and absolute risk reduction of 2.99 events per 100 persons (95% CI, 2.65 to 3.32). Paxlovid treatment was associated with reduced risk of all-cause death (HR, 0.53, 95% CI 0.46 to 0.60; risk reduction 0.23 events per 100 persons, 95% CI 0.19 to 0.28) and hospitalization (HR, 0.70, 95% CI 0.68 to 0.73; risk reduction 2.37 events per 100 persons, 95% CI 2.19 to 2.56) in the post-acute phase. For those without documented risk factors, the associations (HR, 1.03, 95% CI 0.95 to 1.11; risk increase 0.80 events per 100 persons, 95% CI -0.84 to 2.45) were inconclusive. Overall, high-risk, nonhospitalized adult patients with COVID-19 who were treated with Paxlovid within 5 days of SARS-CoV-2 infection had a lower risk of Long COVID and all-cause hospitalization or death in the post-acute period. However, Long COVID risk reduction with Paxlovid was not observed in low-risk patients.
RESUMO
Background: Little is known about post-acute sequelae of SARS-CoV-2 infection (PASC) after acquiring SARS-CoV-2 infection during pregnancy. We aimed to evaluate the association between acquiring SARS-CoV-2 during pregnancy compared with acquiring SARS-CoV-2 outside of pregnancy and the development of PASC. Methods: This retrospective cohort study from the Researching COVID to Enhance Recovery (RECOVER) Initiative Patient-Centred Clinical Research Network (PCORnet) used electronic health record (EHR) data from 19 U.S. health systems. Females aged 18-49 years with lab-confirmed SARS-CoV-2 infection from March 2020 through June 2022 were included. Validated algorithms were used to identify pregnancies with a delivery at >20 weeks' gestation. The primary outcome was PASC, as previously defined by computable phenotype in the adult non-pregnant PCORnet EHR dataset, identified 30-180 days post-SARS-CoV-2 infection. Secondary outcomes were the 24 component diagnoses contributing to the PASC phenotype definition. Univariable comparisons were made for baseline characteristics between individuals with SARS-CoV-2 infection acquired during pregnancy compared with outside of pregnancy. Using inverse probability of treatment weighting to adjust for baseline differences, the association between SARS-CoV-2 infection acquired during pregnancy and the selected outcomes was modelled. The incident risk is reported as the adjusted hazard ratio (aHR) with 95% confidence intervals. Findings: In total, 83,915 females with SARS-CoV-2 infection acquired outside of pregnancy and 5397 females with SARS-CoV-2 infection acquired during pregnancy were included in analysis. Non-pregnant females with SARS-CoV-2 infection were more likely to be older and have comorbid health conditions. SARS-CoV-2 infection acquired in pregnancy as compared with acquired outside of pregnancy was associated with a lower incidence of PASC (25.5% vs 33.9%; aHR 0.85, 95% CI 0.80-0.91). SARS-CoV-2 infection acquired in pregnant females was associated with increased risk for some PASC component diagnoses including abnormal heartbeat (aHR 1.67, 95% CI 1.43-1.94), abdominal pain (aHR 1.34, 95% CI 1.16-1.55), and thromboembolism (aHR 1.88, 95% CI 1.17-3.04), but decreased risk for other diagnoses including malaise (aHR 0.35, 95% CI 0.27-0.47), pharyngitis (aHR 0.36, 95% CI 0.26-0.48) and cognitive problems (aHR 0.39, 95% CI 0.27-0.56). Interpretation: SARS-CoV-2 infection acquired during pregnancy was associated with lower risk of development of PASC at 30-180 days after incident SARS-CoV-2 infection in this nationally representative sample. These findings may be used to counsel pregnant and pregnant capable individuals, and direct future prospective study. Funding: National Institutes of Health (NIH) Other Transaction Agreement (OTA) OT2HL16184.
RESUMO
IMPORTANCE: The frequency and characteristics of post-acute sequelae of SARS-CoV-2 infection (PASC) may vary by SARS-CoV-2 variant. OBJECTIVE: To characterize PASC-related conditions among individuals likely infected by the ancestral strain in 2020 and individuals likely infected by the Delta variant in 2021. DESIGN: Retrospective cohort study of electronic medical record data for approximately 27 million patients from March 1, 2020-November 30, 2021. SETTING: Healthcare facilities in New York and Florida. PARTICIPANTS: Patients who were at least 20 years old and had diagnosis codes that included at least one SARS-CoV-2 viral test during the study period. EXPOSURE: Laboratory-confirmed COVID-19 infection, classified by the most common variant prevalent in those regions at the time. MAIN OUTCOME(S) AND MEASURE(S): Relative risk (estimated by adjusted hazard ratio [aHR]) and absolute risk difference (estimated by adjusted excess burden) of new conditions, defined as new documentation of symptoms or diagnoses, in persons between 31-180 days after a positive COVID-19 test compared to persons without a COVID-19 test or diagnosis during the 31-180 days after the last negative test. RESULTS: We analyzed data from 560,752 patients. The median age was 57 years; 60.3% were female, 20.0% non-Hispanic Black, and 19.6% Hispanic. During the study period, 57,616 patients had a positive SARS-CoV-2 test; 503,136 did not. For infections during the ancestral strain period, pulmonary fibrosis, edema (excess fluid), and inflammation had the largest aHR, comparing those with a positive test to those without a COVID-19 test or diagnosis (aHR 2.32 [95% CI 2.09 2.57]), and dyspnea (shortness of breath) carried the largest excess burden (47.6 more cases per 1,000 persons). For infections during the Delta period, pulmonary embolism had the largest aHR comparing those with a positive test to a negative test (aHR 2.18 [95% CI 1.57, 3.01]), and abdominal pain carried the largest excess burden (85.3 more cases per 1,000 persons). CONCLUSIONS AND RELEVANCE: We documented a substantial relative risk of pulmonary embolism and a large absolute risk difference of abdomen-related symptoms after SARS-CoV-2 infection during the Delta variant period. As new SARS-CoV-2 variants emerge, researchers and clinicians should monitor patients for changing symptoms and conditions that develop after infection.
Assuntos
COVID-19 , Registros Eletrônicos de Saúde , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , COVID-19/diagnóstico , Feminino , Masculino , Pessoa de Meia-Idade , SARS-CoV-2/isolamento & purificação , Estudos Retrospectivos , Adulto , Idoso , Estados Unidos/epidemiologia , Síndrome de COVID-19 Pós-Aguda , Florida/epidemiologia , Estudos de CoortesRESUMO
A drug molecule is a substance that changes an organism's mental or physical state. Every approved drug has an indication, which refers to the therapeutic use of that drug for treating a particular medical condition. While the Large Language Model (LLM), a generative Artificial Intelligence (AI) technique, has recently demonstrated effectiveness in translating between molecules and their textual descriptions, there remains a gap in research regarding their application in facilitating the translation between drug molecules and indications (which describes the disease, condition or symptoms for which the drug is used), or vice versa. Addressing this challenge could greatly benefit the drug discovery process. The capability of generating a drug from a given indication would allow for the discovery of drugs targeting specific diseases or targets and ultimately provide patients with better treatments. In this paper, we first propose a new task, the translation between drug molecules and corresponding indications, and then test existing LLMs on this new task. Specifically, we consider nine variations of the T5 LLM and evaluate them on two public datasets obtained from ChEMBL and DrugBank. Our experiments show the early results of using LLMs for this task and provide a perspective on the state-of-the-art. We also emphasize the current limitations and discuss future work that has the potential to improve the performance on this task. The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases and significantly reduce the cost of drug discovery, with the potential to revolutionize the field of drug discovery in the era of generative AI.
Assuntos
Inteligência Artificial , Descoberta de Drogas , Humanos , Descoberta de Drogas/métodos , Preparações Farmacêuticas/químicaRESUMO
Estimates of post-acute sequelae of SARS-CoV-2 infection (PASC) incidence, also known as Long COVID, have varied across studies and changed over time. We estimated PASC incidence among adult and pediatric populations in three nationwide research networks of electronic health records (EHR) participating in the RECOVER Initiative using different classification algorithms (computable phenotypes). Overall, 7% of children and 8.5%-26.4% of adults developed PASC, depending on computable phenotype used. Excess incidence among SARS-CoV-2 patients was 4% in children and ranged from 4-7% among adults, representing a lower-bound incidence estimation based on two control groups - contemporary COVID-19 negative and historical patients (2019). Temporal patterns were consistent across networks, with peaks associated with introduction of new viral variants. Our findings indicate that preventing and mitigating Long COVID remains a public health priority. Examining temporal patterns and risk factors of PASC incidence informs our understanding of etiology and can improve prevention and management.
RESUMO
Corticosteroids decrease the duration of organ dysfunction in a range of infectious critical illnesses, but their risk and benefit are not fully defined using this construct. This retrospective multicenter study aimed to evaluate the association between usage of corticosteroids and mortality of patients with infectious critical illness by emulating a target trial framework. The study employed a novel stratification method with predictive machine learning (ML) subphenotyping based on organ dysfunction trajectory. Our analysis revealed that corticosteroids' effectiveness varied depending on the stratification method. The ML-based approach identified four distinct subphenotypes, two of which had a large enough sample size in our patient cohorts for further evaluation: "Rapidly Improving" (RI) and "Rapidly Worsening," (RW) which showed divergent responses to corticosteroid treatment. Specifically, the RW group either benefited or were not harmed from corticosteroids, whereas the RI group appeared to derive harm. In the development cohort, which comprised of a combination of patients from the eICU and MIMIC-IV datasets, hazard ratio estimates for the primary outcome, 28-day mortality, in the RW group was 1.05 (95% CI: 0.96 - 1.04) whereas for the RW group, it was 1.40 (95% CI: 1.28 - 1.54). For the validation cohort, which comprised of patients from the Critical carE Database for Advanced Research, estimates for 28-day mortality for the RW and RI groups were 1.24 (95% CI: 1.05 - 1.46) and 1.34 (95% CI: 1.14 - 1.59), respectively. For secondary outcomes, the RW group had a shorter time to ICU discharge and time to cessation of mechanical ventilation with corticosteroid treatment, where the RI group again demonstrated harm. The findings support matching treatment strategies to empirically observed pathobiology and offer a more nuanced understanding of corticosteroid utility. Our results have implications for the design and interpretation of both observational studies and randomized controlled trials (RCTs), suggesting the need for stratification methods that account for the differential response to standard of care.
RESUMO
Target trial emulation is the process of mimicking target randomized trials using real-world data, where effective confounding control for unbiased treatment effect estimation remains a main challenge. Although various approaches have been proposed for this challenge, a systematic evaluation is still lacking. Here we emulated trials for thousands of medications from two large-scale real-world data warehouses, covering over 10 years of clinical records for over 170 million patients, aiming to identify new indications of approved drugs for Alzheimer's disease. We assessed different propensity score models under the inverse probability of treatment weighting framework and suggested a model selection strategy for improved baseline covariate balancing. We also found that the deep learning-based propensity score model did not necessarily outperform logistic regression-based methods in covariate balancing. Finally, we highlighted five top-ranked drugs (pantoprazole, gabapentin, atorvastatin, fluticasone, and omeprazole) originally intended for other indications with potential benefits for Alzheimer's patients.
Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/tratamento farmacológico , Reposicionamento de Medicamentos , Pontuação de Propensão , Atorvastatina/uso terapêuticoRESUMO
The objective of this study was to investigate the potential association between the use of four frequently prescribed drug classes, namely antihypertensive drugs, statins, selective serotonin reuptake inhibitors, and proton-pump inhibitors, and the likelihood of disease progression from mild cognitive impairment (MCI) to dementia using electronic health records (EHRs). We conducted a retrospective cohort study using observational EHRs from a cohort of approximately 2 million patients seen at a large, multi-specialty urban academic medical center in New York City, USA between 2008 and 2020 to automatically emulate the randomized controlled trials. For each drug class, two exposure groups were identified based on the prescription orders documented in the EHRs following their MCI diagnosis. During follow-up, we measured drug efficacy based on the incidence of dementia and estimated the average treatment effect (ATE) of various drugs. To ensure the robustness of our findings, we confirmed the ATE estimates via bootstrapping and presented associated 95% confidence intervals (CIs). Our analysis identified 14,269 MCI patients, among whom 2501 (17.5%) progressed to dementia. Using average treatment estimation and bootstrapping confirmation, we observed that drugs including rosuvastatin (ATE = - 0.0140 [- 0.0191, - 0.0088], p value < 0.001), citalopram (ATE = - 0.1128 [- 0.125, - 0.1005], p value < 0.001), escitalopram (ATE = - 0.0560 [- 0.0615, - 0.0506], p value < 0.001), and omeprazole (ATE = - 0.0201 [- 0.0299, - 0.0103], p value < 0.001) have a statistically significant association in slowing the progression from MCI to dementia. The findings from this study support the commonly prescribed drugs in altering the progression from MCI to dementia and warrant further investigation.
Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico , Estudos Retrospectivos , Registros Eletrônicos de Saúde , Progressão da Doença , Disfunção Cognitiva/tratamento farmacológico , Disfunção Cognitiva/epidemiologia , Disfunção Cognitiva/diagnóstico , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
Recent studies have investigated post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) using real-world patient data such as electronic health records (EHR). Prior studies have typically been conducted on patient cohorts with specific patient populations which makes their generalizability unclear. This study aims to characterize PASC using the EHR data warehouses from two large Patient-Centered Clinical Research Networks (PCORnet), INSIGHT and OneFlorida+, which include 11 million patients in New York City (NYC) area and 16.8 million patients in Florida respectively. With a high-throughput screening pipeline based on propensity score and inverse probability of treatment weighting, we identified a broad list of diagnoses and medications which exhibited significantly higher incidence risk for patients 30-180 days after the laboratory-confirmed SARS-CoV-2 infection compared to non-infected patients. We identified more PASC diagnoses in NYC than in Florida regarding our screening criteria, and conditions including dementia, hair loss, pressure ulcers, pulmonary fibrosis, dyspnea, pulmonary embolism, chest pain, abnormal heartbeat, malaise, and fatigue, were replicated across both cohorts. Our analyses highlight potentially heterogeneous risks of PASC in different populations.
Assuntos
COVID-19 , Síndrome de COVID-19 Pós-Aguda , Humanos , COVID-19/epidemiologia , Registros Eletrônicos de Saúde , SARS-CoV-2 , Pontuação de PropensãoRESUMO
Background: Patients who were SARS-CoV-2 infected could suffer from newly incidental conditions in their post-acute infection period. These conditions, denoted as the post-acute sequelae of SARS-CoV-2 infection (PASC), are highly heterogeneous and involve a diverse set of organ systems. Limited studies have investigated the predictability of these conditions and their associated risk factors. Method: In this retrospective cohort study, we investigated two large-scale PCORnet clinical research networks, INSIGHT and OneFlorida+, including 11 million patients in the New York City area and 16.8 million patients from Florida, to develop machine learning prediction models for those who are at risk for newly incident PASC and to identify factors associated with newly incident PASC conditions. Adult patients aged 20 with SARS-CoV-2 infection and without recorded infection between March 1st, 2020, and November 30th, 2021, were used for identifying associated factors with incident PASC after removing background associations. The predictive models were developed on infected adults. Results: We find several incident PASC, e.g., malnutrition, COPD, dementia, and acute kidney failure, were associated with severe acute SARS-CoV-2 infection, defined by hospitalization and ICU stay. Older age and extremes of weight were also associated with these incident conditions. These conditions were better predicted (C-index >0.8). Moderately predictable conditions included diabetes and thromboembolic disease (C-index 0.7-0.8). These were associated with a wider variety of baseline conditions. Less predictable conditions included fatigue, anxiety, sleep disorders, and depression (C-index around 0.6). Conclusions: This observational study suggests that a set of likely risk factors for different PASC conditions were identifiable from EHRs, predictability of different PASC conditions was heterogeneous, and using machine learning-based predictive models might help in identifying patients who were at risk of developing incident PASC.
RESUMO
Importance: The frequency and characteristics of post-acute sequelae of SARS-CoV-2 infection (PASC) may vary by SARS-CoV-2 variant. Objective: To characterize PASC-related conditions among individuals likely infected by the ancestral strain in 2020 and individuals likely infected by the Delta variant in 2021. Design: Retrospective cohort study of electronic medical record data for approximately 27 million patients from March 1, 2020-November 30, 2021. Setting: Healthcare facilities in New York and Florida. Participants: Patients who were at least 20 years old and had diagnosis codes that included at least one SARS-CoV-2 viral test during the study period. Exposure: Laboratory-confirmed COVID-19 infection, classified by the most common variant prevalent in those regions at the time. Main Outcomes and Measures: Relative risk (estimated by adjusted hazard ratio [aHR]) and absolute risk difference (estimated by adjusted excess burden) of new conditions, defined as new documentation of symptoms or diagnoses, in persons between 31-180 days after a positive COVID-19 test compared to persons with only negative tests during the 31-180 days after the last negative test. Results: We analyzed data from 560,752 patients. The median age was 57 years; 60.3% were female, 20.0% non-Hispanic Black, and 19.6% Hispanic. During the study period, 57,616 patients had a positive SARS-CoV-2 test; 503,136 did not. For infections during the ancestral strain period, pulmonary fibrosis, edema (excess fluid), and inflammation had the largest aHR, comparing those with a positive test to those with a negative test, (aHR 2.32 [95% CI 2.09 2.57]), and dyspnea (shortness of breath) carried the largest excess burden (47.6 more cases per 1,000 persons). For infections during the Delta period, pulmonary embolism had the largest aHR comparing those with a positive test to a negative test (aHR 2.18 [95% CI 1.57, 3.01]), and abdominal pain carried the largest excess burden (85.3 more cases per 1,000 persons). Conclusions and Relevance: We documented a substantial relative risk of pulmonary embolism and large absolute risk difference of abdomen-related symptoms after SARS-CoV-2 infection during the Delta variant period. As new SARS-CoV-2 variants emerge, researchers and clinicians should monitor patients for changing symptoms and conditions that develop after infection.
RESUMO
BACKGROUND: Compared to white individuals, Black and Hispanic individuals have higher rates of COVID-19 hospitalization and death. Less is known about racial/ethnic differences in post-acute sequelae of SARS-CoV-2 infection (PASC). OBJECTIVE: Examine racial/ethnic differences in potential PASC symptoms and conditions among hospitalized and non-hospitalized COVID-19 patients. DESIGN: Retrospective cohort study using data from electronic health records. PARTICIPANTS: 62,339 patients with COVID-19 and 247,881 patients without COVID-19 in New York City between March 2020 and October 2021. MAIN MEASURES: New symptoms and conditions 31-180 days after COVID-19 diagnosis. KEY RESULTS: The final study population included 29,331 white patients (47.1%), 12,638 Black patients (20.3%), and 20,370 Hispanic patients (32.7%) diagnosed with COVID-19. After adjusting for confounders, significant racial/ethnic differences in incident symptoms and conditions existed among both hospitalized and non-hospitalized patients. For example, 31-180 days after a positive SARS-CoV-2 test, hospitalized Black patients had higher odds of being diagnosed with diabetes (adjusted odds ratio [OR]: 1.96, 95% confidence interval [CI]: 1.50-2.56, q<0.001) and headaches (OR: 1.52, 95% CI: 1.11-2.08, q=0.02), compared to hospitalized white patients. Hospitalized Hispanic patients had higher odds of headaches (OR: 1.62, 95% CI: 1.21-2.17, q=0.003) and dyspnea (OR: 1.22, 95% CI: 1.05-1.42, q=0.02), compared to hospitalized white patients. Among non-hospitalized patients, Black patients had higher odds of being diagnosed with pulmonary embolism (OR: 1.68, 95% CI: 1.20-2.36, q=0.009) and diabetes (OR: 2.13, 95% CI: 1.75-2.58, q<0.001), but lower odds of encephalopathy (OR: 0.58, 95% CI: 0.45-0.75, q<0.001), compared to white patients. Hispanic patients had higher odds of being diagnosed with headaches (OR: 1.41, 95% CI: 1.24-1.60, q<0.001) and chest pain (OR: 1.50, 95% CI: 1.35-1.67, q < 0.001), but lower odds of encephalopathy (OR: 0.64, 95% CI: 0.51-0.80, q<0.001). CONCLUSIONS: Compared to white patients, patients from racial/ethnic minority groups had significantly different odds of developing potential PASC symptoms and conditions. Future research should examine the reasons for these differences.
Assuntos
Encefalopatias , COVID-19 , Humanos , COVID-19/complicações , Etnicidade , Estudos de Coortes , Síndrome de COVID-19 Pós-Aguda , SARS-CoV-2 , Estudos Retrospectivos , Teste para COVID-19 , Grupos Minoritários , Cidade de Nova Iorque/epidemiologia , Cefaleia/diagnóstico , Cefaleia/epidemiologiaRESUMO
Post-acute sequelae of SARS-CoV-2 infection (PASC) affects a wide range of organ systems among a large proportion of patients with SARS-CoV-2 infection. Although studies have identified a broad set of patient-level risk factors for PASC, little is known about the association between "exposome"-the totality of environmental exposures and the risk of PASC. Using electronic health data of patients with COVID-19 from two large clinical research networks in New York City and Florida, we identified environmental risk factors for 23 PASC symptoms and conditions from nearly 200 exposome factors. The three domains of exposome include natural environment, built environment, and social environment. We conducted a two-phase environment-wide association study. In Phase 1, we ran a mixed effects logistic regression with 5-digit ZIP Code tabulation area (ZCTA5) random intercepts for each PASC outcome and each exposome factor, adjusting for a comprehensive set of patient-level confounders. In Phase 2, we ran a mixed effects logistic regression for each PASC outcome including all significant (false positive discovery adjusted p-value < 0.05) exposome characteristics identified from Phase I and adjusting for confounders. We identified air toxicants (e.g., methyl methacrylate), particulate matter (PM2.5) compositions (e.g., ammonium), neighborhood deprivation, and built environment (e.g., food access) that were associated with increased risk of PASC conditions related to nervous, blood, circulatory, endocrine, and other organ systems. Specific environmental risk factors for each PASC condition and symptom were different across the New York City area and Florida. Future research is warranted to extend the analyses to other regions and examine more granular exposome characteristics to inform public health efforts to help patients recover from SARS-CoV-2 infection.
RESUMO
The post-acute sequelae of SARS-CoV-2 infection (PASC) refers to a broad spectrum of symptoms and signs that are persistent, exacerbated or newly incident in the period after acute SARS-CoV-2 infection. Most studies have examined these conditions individually without providing evidence on co-occurring conditions. In this study, we leveraged the electronic health record data of two large cohorts, INSIGHT and OneFlorida+, from the national Patient-Centered Clinical Research Network. We created a development cohort from INSIGHT and a validation cohort from OneFlorida+ including 20,881 and 13,724 patients, respectively, who were SARS-CoV-2 infected, and we investigated their newly incident diagnoses 30-180 days after a documented SARS-CoV-2 infection. Through machine learning analysis of over 137 symptoms and conditions, we identified four reproducible PASC subphenotypes, dominated by cardiac and renal (including 33.75% and 25.43% of the patients in the development and validation cohorts); respiratory, sleep and anxiety (32.75% and 38.48%); musculoskeletal and nervous system (23.37% and 23.35%); and digestive and respiratory system (10.14% and 12.74%) sequelae. These subphenotypes were associated with distinct patient demographics, underlying conditions before SARS-CoV-2 infection and acute infection phase severity. Our study provides insights into the heterogeneity of PASC and may inform stratified decision-making in the management of PASC conditions.
Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Síndrome de COVID-19 Pós-Aguda , Ansiedade , Transtornos de Ansiedade , Progressão da DoençaRESUMO
CONTEXT.: Machine learning (ML) allows for the analysis of massive quantities of high-dimensional clinical laboratory data, thereby revealing complex patterns and trends. Thus, ML can potentially improve the efficiency of clinical data interpretation and the practice of laboratory medicine. However, the risks of generating biased or unrepresentative models, which can lead to misleading clinical conclusions or overestimation of the model performance, should be recognized. OBJECTIVES.: To discuss the major components for creating ML models, including data collection, data preprocessing, model development, and model evaluation. We also highlight many of the challenges and pitfalls in developing ML models, which could result in misleading clinical impressions or inaccurate model performance, and provide suggestions and guidance on how to circumvent these challenges. DATA SOURCES.: The references for this review were identified through searches of the PubMed database, US Food and Drug Administration white papers and guidelines, conference abstracts, and online preprints. CONCLUSIONS.: With the growing interest in developing and implementing ML models in clinical practice, laboratorians and clinicians need to be educated in order to collect sufficiently large and high-quality data, properly report the data set characteristics, and combine data from multiple institutions with proper normalization. They will also need to assess the reasons for missing values, determine the inclusion or exclusion of outliers, and evaluate the completeness of a data set. In addition, they require the necessary knowledge to select a suitable ML model for a specific clinical question and accurately evaluate the performance of the ML model, based on objective criteria. Domain-specific knowledge is critical in the entire workflow of developing ML models.
Assuntos
Simulação por Computador , Aprendizado de Máquina , HumanosRESUMO
Post-acute sequelae of SARS-CoV-2 infection (PASC) affects a wide range of organ systems among a large proportion of patients with SARS-CoV-2 infection. Although studies have identified a broad set of patient-level risk factors for PASC, little is known about the contextual and spatial risk factors for PASC. Using electronic health data of patients with COVID-19 from two large clinical research networks in New York City and Florida, we identified contextual and spatial risk factors from nearly 200 environmental characteristics for 23 PASC symptoms and conditions of eight organ systems. We conducted a two-phase environment-wide association study. In Phase 1, we ran a mixed effects logistic regression with 5-digit ZIP Code tabulation area (ZCTA5) random intercepts for each PASC outcome and each contextual and spatial factor, adjusting for a comprehensive set of patient-level confounders. In Phase 2, we ran a mixed effects logistic regression for each PASC outcome including all significant (false positive discovery adjusted p-value < 0.05) contextual and spatial characteristics identified from Phase I and adjusting for confounders. We identified air toxicants (e.g., methyl methacrylate), criteria air pollutants (e.g., sulfur dioxide), particulate matter (PM 2.5 ) compositions (e.g., ammonium), neighborhood deprivation, and built environment (e.g., food access) that were associated with increased risk of PASC conditions related to nervous, respiratory, blood, circulatory, endocrine, and other organ systems. Specific contextual and spatial risk factors for each PASC condition and symptom were different across New York City area and Florida. Future research is warranted to extend the analyses to other regions and examine more granular contextual and spatial characteristics to inform public health efforts to help patients recover from SARS-CoV-2 infection.
RESUMO
Borderline personality disorder (BoPD or BPD) is highly prevalent and characterized by reactive moods, impulsivity, behavioral dysregulation, and distorted self-image. Yet the BoPD diagnosis is underutilized and patients with BoPD are frequently misdiagnosed resulting in lost opportunities for appropriate treatment. Automated screening of electronic health records (EHRs) is one potential strategy to help identify possible BoPD patients who are otherwise undiagnosed. We present the development and analytical validation of a BoPD screening algorithm based on routinely collected and structured EHRs. This algorithm integrates rule-based selection and machine learning (ML) in a two-step framework by first selecting potential patients based on the presence of comorbidities and characteristics commonly associated with BoPD, and then predicting whether the patients most likely have BoPD. Leveraging a large-scale US-based de-identified EHR database and our clinical expert's rating of two random samples of patient EHRs, results show that our screening algorithm has a high consistency with our clinical expert's ratings, with area under the receiver operating characteristic (AUROC) 0.837 [95% confidence interval (CI) 0.778-0.892], positive predictive value 0.717 (95% CI 0.583-0.836), accuracy 0.820 (95% CI 0.768-0.873), sensitivity 0.541 (95% CI 0.417-0.667) and specificity 0.922 (95% CI 0.880-0.960). Our aim is, to provide an additional resource to facilitate clinical decision making and promote the development of digital medicine.
Assuntos
Transtorno da Personalidade Borderline , Registros Eletrônicos de Saúde , Algoritmos , Transtorno da Personalidade Borderline/diagnóstico , Transtorno da Personalidade Borderline/epidemiologia , Bases de Dados Factuais , Humanos , Aprendizado de MáquinaRESUMO
The post-acute sequelae of SARS-CoV-2 infection (PASC) refers to a broad spectrum of symptoms and signs that are persistent, exacerbated, or newly incident in the post-acute SARS-CoV-2 infection period of COVID-19 patients. Most studies have examined these conditions individually without providing concluding evidence on co-occurring conditions. To answer this question, this study leveraged electronic health records (EHRs) from two large clinical research networks from the national Patient-Centered Clinical Research Network (PCORnet) and investigated patients' newly incident diagnoses that appeared within 30 to 180 days after a documented SARS-CoV-2 infection. Through machine learning, we identified four reproducible subphenotypes of PASC dominated by blood and circulatory system, respiratory, musculoskeletal and nervous system, and digestive system problems, respectively. We also demonstrated that these subphenotypes were associated with distinct patterns of patient demographics, underlying conditions present prior to SARS-CoV-2 infection, acute infection phase severity, and use of new medications in the post-acute period. Our study provides novel insights into the heterogeneity of PASC and can inform stratified decision-making in the treatment of COVID-19 patients with PASC conditions.