RESUMEN
Despite clinical observations of cardiotoxicity among cancer patients treated with tyrosine kinase inhibitors (TKIs), the molecular mechanisms by which these drugs affect the heart remain largely unknown. Mechanistic understanding of TKI-induced cardiotoxicity has been limited in part due to the complexity of tyrosine kinase signaling pathways and the multi-targeted nature of many of these drugs. TKI treatment has been associated with reactive oxygen species generation, mitochondrial dysfunction, and apoptosis in cardiomyocytes. To gain insight into the mechanisms mediating TKI-induced cardiotoxicity, this study constructs and validates a computational model of cardiomyocyte apoptosis, integrating intrinsic apoptotic and tyrosine kinase signaling pathways. The model predicts high levels of apoptosis in response to sorafenib, sunitinib, ponatinib, trastuzumab, and gefitinib, and lower levels of apoptosis in response to nilotinib and erlotinib, with the highest level of apoptosis induced by sorafenib. Knockdown simulations identified AP1, ASK1, JNK, MEK47, p53, and ROS as positive functional regulators of sorafenib-induced apoptosis of cardiomyocytes. Overexpression simulations identified Akt, IGF1, PDK1, and PI3K among the negative functional regulators of sorafenib-induced cardiomyocyte apoptosis. A combinatorial screen of the positive and negative regulators of sorafenib-induced apoptosis revealed ROS knockdown coupled with overexpression of FLT3, FGFR, PDGFR, VEGFR, or KIT as a particularly potent combination in reducing sorafenib-induced apoptosis. Network simulations of combinatorial treatment with sorafenib and the antioxidant N-acetyl cysteine (NAC) suggest that NAC may protect cardiomyocytes from sorafenib-induced apoptosis.
Asunto(s)
Antineoplásicos/efectos adversos , Apoptosis/efectos de los fármacos , Cardiotoxicidad/etiología , Cardiotoxicidad/metabolismo , Modelos Biológicos , Miocitos Cardíacos/efectos de los fármacos , Miocitos Cardíacos/metabolismo , Inhibidores de Proteínas Quinasas/efectos adversos , Antineoplásicos/farmacología , Biomarcadores , Biología Computacional/métodos , Susceptibilidad a Enfermedades , Redes Reguladoras de Genes , Humanos , Inhibidores de Proteínas Quinasas/farmacología , Reproducibilidad de los Resultados , Transducción de SeñalRESUMEN
OBJECTIVE: Identifying symptoms and characteristics highly specific to coronavirus disease 2019 (COVID-19) would improve the clinical and public health response to this pandemic challenge. Here, we describe a high-throughput approach - Concept-Wide Association Study (ConceptWAS) - that systematically scans a disease's clinical manifestations from clinical notes. We used this method to identify symptoms specific to COVID-19 early in the course of the pandemic. METHODS: We created a natural language processing pipeline to extract concepts from clinical notes in a local ER corresponding to the PCR testing date for patients who had a COVID-19 test and evaluated these concepts as predictors for developing COVID-19. We identified predictors from Firth's logistic regression adjusted by age, gender, and race. We also performed ConceptWAS using cumulative data every two weeks to identify the timeline for recognition of early COVID-19-specific symptoms. RESULTS: We processed 87,753 notes from 19,692 patients subjected to COVID-19 PCR testing between March 8, 2020, and May 27, 2020 (1,483 COVID-19-positive). We found 68 concepts significantly associated with a positive COVID-19 test. We identified symptoms associated with increasing risk of COVID-19, including "anosmia" (odds ratio [OR] = 4.97, 95% confidence interval [CI] = 3.21-7.50), "fever" (OR = 1.43, 95% CI = 1.28-1.59), "cough with fever" (OR = 2.29, 95% CI = 1.75-2.96), and "ageusia" (OR = 5.18, 95% CI = 3.02-8.58). Using ConceptWAS, we were able to detect loss of smell and loss of taste three weeks prior to their inclusion as symptoms of the disease by the Centers for Disease Control and Prevention (CDC). CONCLUSION: ConceptWAS, a high-throughput approach for exploring specific symptoms and characteristics of a disease like COVID-19, offers a promise for enabling EHR-powered early disease manifestations identification.
Asunto(s)
COVID-19/diagnóstico , Procesamiento de Lenguaje Natural , Evaluación de Síntomas/métodos , Adulto , Ageusia , Prueba de Ácido Nucleico para COVID-19 , Tos , Femenino , Fiebre , Humanos , Masculino , Persona de Mediana Edad , Pandemias , Estados UnidosRESUMEN
OBJECTIVE: Decompressive craniectomy (DC) improves functional outcomes in selected patients with malignant hemispheric infarction (MHI), but variability in the surgical technique and occasional complications may be limiting the effectiveness of this procedure. Our aim was to evaluate predefined perioperative CT measurements for association with post-DC midline brain shift in patients with MHI. METHODS: At two medical centers we identified 87 consecutive patients with MHI and DC between January 2007 and December 2019. We used our previously tested methods to measure the craniectomy surface area, extent of transcalvarial brain herniation, thickness of tissues overlying the craniectomy, diameter of the cerebral ventricle atrium contralateral to the stroke, extension of infarction beyond the craniectomy edges, and the pre and post-DC midline brain shifts. To avoid potential confounding from medical treatments and additional surgical procedures, we excluded patients with the first CT delayed >30 hours post-DC, resection of infarcted brain, or insertion of an external ventricular drain during DC. The primary outcome in multiple linear regression analysis was the postoperative midline brain shift. RESULTS: We analyzed 72 qualified patients. The average midline brain shift decreased from 8.7 mm pre-DC to 5.4 post-DC. The only factors significantly associated with post-DC midline brain shift at the p<0.01 level were preoperative midline shift (coefficient 0.32, standard error 0.10, p=0.002) and extent of transcalvarial brain herniation (coefficient -0.20, standard error 0.05, p <0.001). CONCLUSIONS: In patients with MHI and DC, smaller post-DC midline shift is associated with smaller pre-DC midline brain shift and greater transcalvarial brain herniation. This knowledge may prove helpful in assessing DC candidacy and surgical success. Additional studies to enhance the surgical success of DC are warranted.
Asunto(s)
Edema Encefálico/cirugía , Infarto Cerebral/cirugía , Craniectomía Descompresiva , Hernia/prevención & control , Adulto , Edema Encefálico/diagnóstico por imagen , Edema Encefálico/fisiopatología , Infarto Cerebral/diagnóstico por imagen , Infarto Cerebral/fisiopatología , Toma de Decisiones Clínicas , Craniectomía Descompresiva/efectos adversos , Femenino , Georgia , Hernia/diagnóstico por imagen , Hernia/etiología , Humanos , Masculino , Persona de Mediana Edad , Recuperación de la Función , Sistema de Registros , Estudios Retrospectivos , Medición de Riesgo , Factores de Riesgo , Tomografía Computarizada por Rayos X , Resultado del Tratamiento , VirginiaRESUMEN
Background: Statins reduce low-density lipoprotein cholesterol (LDL-C) and are efficacious in the prevention of atherosclerotic cardiovascular disease (ASCVD). Dose-response to statins varies among patients and can be modeled using three distinct pharmacological properties: (1) E0 (baseline LDL-C), (2) ED50 (potency: median dose achieving 50% reduction in LDL-C); and (3) Emax (efficacy: maximum LDL-C reduction). However, individualized dose-response and its association with ASCVD events remains unknown. Objective: We analyze the relationship between ED50 and Emax with real-world cardiovascular disease outcomes. Method: We leveraged de-identified electronic health record data to identify individuals exposed to multiple doses of the three most commonly prescribed statins (atorvastatin, simvastatin, or rosuvastatin) within the context of their longitudinal healthcare. We derived ED50 and Emax to quantify the relationship with a composite outcome of ASCVD events and all-cause mortality. Results: We estimated ED50 and Emax for 3,033 unique individuals (atorvastatin: 1,632, simvastatin: 1,089, and rosuvastatin: 312) using a nonlinear, mixed effects dose-response model. Time-to-event analyses revealed that ED50 and Emax are independently associated with the primary endpoint. Hazard ratios were 0.85 (p < 0.01), 0.83 (p < 0.01), and 0.87 (p = 0.10) for ED50 and 1.13 (p < 0.001), 1.06 (p < 0.001), and 1.15 (p = 0.009) for Emax in the atorvastatin, simvastatin, and rosuvastatin cohorts, respectively. Conclusion: The class-wide association of ED50 and Emax with clinical outcomes indicates that these measures influence the risk for ASCVD events in patients on statins.
RESUMEN
Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. Though significant efforts have been devoted to investigating the application of machine learning algorithms in the EHR, efficient and powerful representation of patients is needed to unleash the potential of discovering new medical patterns underlying the EHR. Here, we present an unsupervised method for embedding high-dimensional EHR data at the patient level, aimed at characterizing patient heterogeneity in complex diseases and identifying new disease patterns associated with clinical outcome disparities. Inspired by the architecture of modern language models-specifically transformers with attention mechanisms, we use patient diagnosis and procedure codes as vocabularies and treat each patient as a sentence to perform the patient embedding. We applied this approach to 34,851 unique medical codes across 1,046,649 longitudinal patient events, including 102,739 patients from the electronic Medical Records and GEnomics (eMERGE) Network. The resulting patient vectors demonstrated excellent performance in predicting future disease events (median AUROC = 0.87 within one year) and bulk phenotyping (median AUROC = 0.84). We then illustrated the utility of these patient vectors in revealing heterogeneous comorbidity patterns, exemplified by disease subtypes in colorectal cancer and systemic lupus erythematosus, and capturing distinct longitudinal disease trajectories. External validation using EHR data from the University of Washington confirmed robust model performance, with median AUROCs of 0.83 and 0.84 for bulk phenotyping tasks and disease onset prediction, respectively. Importantly, the model reproduced the clustering results of disease subtypes identified in the eMERGE cohort and uncovered variations in overall mortality among these subtypes. Together, these results underscore the potential of representation learning in EHRs to enhance patient characterization and associated clinical outcomes, thereby advancing disease forecasting and facilitating personalized medicine.
RESUMEN
OBJECTIVE: Pediatric patients have different diseases and outcomes than adults; however, existing phecodes do not capture the distinctive pediatric spectrum of disease. We aim to develop specialized pediatric phecodes (Peds-Phecodes) to enable efficient, large-scale phenotypic analyses of pediatric patients. MATERIALS AND METHODS: We adopted a hybrid data- and knowledge-driven approach leveraging electronic health records (EHRs) and genetic data from Vanderbilt University Medical Center to modify the most recent version of phecodes to better capture pediatric phenotypes. First, we compared the prevalence of patient diagnoses in pediatric and adult populations to identify disease phenotypes differentially affecting children and adults. We then used clinical domain knowledge to remove phecodes representing phenotypes unlikely to affect pediatric patients and create new phecodes for phenotypes relevant to the pediatric population. We further compared phenome-wide association study (PheWAS) outcomes replicating known pediatric genotype-phenotype associations between Peds-Phecodes and phecodes. RESULTS: The Peds-Phecodes aggregate 15 533 ICD-9-CM codes and 82 949 ICD-10-CM codes into 2051 distinct phecodes. Peds-Phecodes replicated more known pediatric genotype-phenotype associations than phecodes (248 vs 192 out of 687 SNPs, P < .001). DISCUSSION: We introduce Peds-Phecodes, a high-throughput EHR phenotyping tool tailored for use in pediatric populations. We successfully validated the Peds-Phecodes using genetic replication studies. Our findings also reveal the potential use of Peds-Phecodes in detecting novel genotype-phenotype associations for pediatric conditions. We expect that Peds-Phecodes will facilitate large-scale phenomic and genomic analyses in pediatric populations. CONCLUSION: Peds-Phecodes capture higher-quality pediatric phenotypes and deliver superior PheWAS outcomes compared to phecodes.
Asunto(s)
Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo , Niño , Humanos , Estudios de Asociación Genética , Genómica , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Drug repurposing represents an attractive alternative to the costly and time-consuming process of new drug development, particularly for serious, widespread conditions with limited effective treatments, such as Alzheimer's disease (AD). Emerging generative artificial intelligence (GAI) technologies like ChatGPT offer the promise of expediting the review and summary of scientific knowledge. To examine the feasibility of using GAI for identifying drug repurposing candidates, we iteratively tasked ChatGPT with proposing the twenty most promising drugs for repurposing in AD, and tested the top ten for risk of incident AD in exposed and unexposed individuals over age 65 in two large clinical datasets: (1) Vanderbilt University Medical Center and (2) the All of Us Research Program. Among the candidates suggested by ChatGPT, metformin, simvastatin, and losartan were associated with lower AD risk in meta-analysis. These findings suggest GAI technologies can assimilate scientific insights from an extensive Internet-based search space, helping to prioritize drug repurposing candidates and facilitate the treatment of diseases.
RESUMEN
OBJECTIVES: Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. MATERIALS AND METHODS: We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. RESULTS: GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). CONCLUSION: GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Fenotipo , Humanos , Diabetes Mellitus Tipo 2 , Demencia , Hipotiroidismo , Procesamiento de Lenguaje NaturalRESUMEN
Objectives: Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts. Materials and Methods: We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (i.e., type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network. Results: GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values). Conclusion: GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
RESUMEN
Background: Alzheimer's disease (AD) is a debilitating neurodegenerative condition with few treatment options available. Drug repurposing studies have sought to identify existing drugs that could be repositioned to treat AD; however, the effectiveness of drug repurposing for AD remains unclear. This review systematically analyzes the progress made in drug repurposing for AD throughout the last decade, summarizing the suggested drug candidates and analyzing changes in the repurposing strategies used over time. We also examine the different types of data that have been leveraged to validate suggested drug repurposing candidates for AD, which to our knowledge has not been previous investigated, although this information may be especially useful in appraising the potential of suggested drug repurposing candidates. We ultimately hope to gain insight into the suggested drugs representing the most promising repurposing candidates for AD. Methods: We queried the PubMed database for AD drug repurposing studies published between 2012 and 2022. 124 articles were reviewed. We used RxNorm to standardize drug names across the reviewed studies, map drugs to their constituent ingredients, and identify prescribable drugs. We used the Anatomical Therapeutic Chemical (ATC) Classification System to group drugs. Results: 573 unique drugs were proposed for repurposing in AD over the last 10 years. These suggested repurposing candidates included drugs acting on the nervous system (17%), antineoplastic and immunomodulating agents (16%), and drugs acting on the cardiovascular system (12%). Clozapine, a second-generation antipsychotic medication, was the most frequently suggested repurposing candidate (N = 6). 61% (76/124) of the reviewed studies performed a validation, yet only 4% (5/124) used real-world data for validation. Conclusion: A large number of potential drug repurposing candidates for AD has accumulated over the last decade. However, among these drugs, no single drug has emerged as the top candidate, making it difficult to establish research priorities. Validation of drug repurposing hypotheses is inconsistently performed, and real-world data has been critically underutilized for validation. Given the urgent need for new AD therapies, the utility of real-world data in accelerating identification of high-priority candidates for AD repurposing warrants further investigation.
RESUMEN
Drug repurposing represents an attractive alternative to the costly and time-consuming process of new drug development, particularly for serious, widespread conditions with limited effective treatments, such as Alzheimer's disease (AD). Emerging generative artificial intelligence (GAI) technologies like ChatGPT offer the promise of expediting the review and summary of scientific knowledge. To examine the feasibility of using GAI for identifying drug repurposing candidates, we iteratively tasked ChatGPT with proposing the twenty most promising drugs for repurposing in AD, and tested the top ten for risk of incident AD in exposed and unexposed individuals over age 65 in two large clinical datasets: 1) Vanderbilt University Medical Center and 2) the All of Us Research Program. Among the candidates suggested by ChatGPT, metformin, simvastatin, and losartan were associated with lower AD risk in meta-analysis. These findings suggest GAI technologies can assimilate scientific insights from an extensive Internet-based search space, helping to prioritize drug repurposing candidates and facilitate the treatment of diseases.
RESUMEN
Objective: Pediatric patients have different diseases and outcomes than adults; however, existing phecodes do not capture the distinctive pediatric spectrum of disease. We aim to develop specialized pediatric phecodes (Peds-Phecodes) to enable efficient, large-scale phenotypic analyses of pediatric patients. Materials and Methods: We adopted a hybrid data- and knowledge-driven approach leveraging electronic health records (EHRs) and genetic data from Vanderbilt University Medical Center to modify the most recent version of phecodes to better capture pediatric phenotypes. First, we compared the prevalence of patient diagnoses in pediatric and adult populations to identify disease phenotypes differentially affecting children and adults. We then used clinical domain knowledge to remove phecodes representing phenotypes unlikely to affect pediatric patients and create new phecodes for phenotypes relevant to the pediatric population. We further compared phenome-wide association study (PheWAS) outcomes replicating known pediatric genotype-phenotype associations between Peds-Phecodes and phecodes. Results: The Peds-Phecodes aggregate 15,533 ICD-9-CM codes and 82,949 ICD-10-CM codes into 2,051 distinct phecodes. Peds-Phecodes replicated more known pediatric genotype-phenotype associations than phecodes (248 versus 192 out of 687 SNPs, p<0.001). Discussion: We introduce Peds-Phecodes, a high-throughput EHR phenotyping tool tailored for use in pediatric populations. We successfully validated the Peds-Phecodes using genetic replication studies. Our findings also reveal the potential use of Peds-Phecodes in detecting novel genotype-phenotype associations for pediatric conditions. We expect that Peds-Phecodes will facilitate large-scale phenomic and genomic analyses in pediatric populations. Conclusion: Peds-Phecodes capture higher-quality pediatric phenotypes and deliver superior PheWAS outcomes compared to phecodes.
RESUMEN
OBJECTIVE: Decompressive craniectomy (DC) is an established optional treatment for malignant hemispheric infarction (MHI). We analyzed relevant clinical factors and computed tomography (CT) measurements in patients with DC for MHI to identify predictors of functional outcome 3-6 months after stroke. METHODS: This study was performed at 2 comprehensive stroke centers. The inclusion criteria required DC for MHI, no additional intraoperative procedures (strokectomy or cerebral ventricular drain placement), and documented functional status 3-6 months after the stroke. We classified functional outcome as acceptable if the modified Rankin Scale score was <5, or as unacceptable if it was 5 or 6 (bedbound and totally dependent on others or death). Multiple logistic regression analyzed relevant clinical factors and multiple perioperative CT measurements to identify predictors of acceptable functional outcome. RESULTS: Of 87 identified consecutive patients, 66 met the inclusion criteria. Acceptable functional outcome occurred in 35 of 66 (53%) patients. Likelihood of acceptable functional outcome decreased significantly with increasing age (OR 0.92, 95% CI 0.82-0.97, P = 0.004) and with increasing post-DC midline brain shift (OR 0.78, 95% CI 0.64-0.96, P = 0.016), and decreased non-significantly with left-sided stroke (OR 0.30, 95% CI 0.08-1.10, P = 0.069) and with increasing craniectomy barrier thickness (OR 0.92, 95% CI 0.85-1.01, P = 0.076). CONCLUSIONS: Patient age and the post-DC midline shift may be useful in prognosticating functional outcome after DC for MHI. Stroke side and craniectomy barrier thickness merit further ideally prospective outcome prediction testing.
Asunto(s)
Craniectomía Descompresiva , Accidente Cerebrovascular , Infarto Cerebral/diagnóstico por imagen , Infarto Cerebral/cirugía , Craniectomía Descompresiva/métodos , Humanos , Estudios Prospectivos , Accidente Cerebrovascular/cirugía , Tomografía Computarizada por Rayos X , Resultado del TratamientoRESUMEN
OBJECTIVES: To test the reliability of three simplified measurements made after decompressive hemicraniectomy (DHC) for malignant hemispheric infarction on computed tomography (CT) scan. PATIENTS AND METHODS: We defined new simple methods to measure the thickness of the soft tissues overlying the craniectomy defect and the extent of infarction beyond the anterior and posterior craniectomy edges on post-DHC CT. Multiple raters independently made the three new CT measurements in 49 patients from two institutions. The Intraclass Correlation Coefficient (ICC) compared the raters for interrater agreements (reliability). RESULTS: Between two raters at Augusta University Medical Center, each measuring 21 CT scans, the ICC coefficient point estimates were good to excellent (0.83 - 0.92). Among four raters at University of Virginia Medical Center, with three raters measuring each of 28 CT scans, the ICC coefficient point estimates were good to excellent (0.87 - 0.95). CONCLUSIONS: The proposed simple methods to obtain three additional CT measurements after DHC in malignant hemispheric infarction have good to excellent reliability in two independent patient samples. The clinical usefulness of these measurements should be investigated.
Asunto(s)
Infarto Encefálico/diagnóstico por imagen , Infarto Encefálico/cirugía , Craniectomía Descompresiva/métodos , Tomografía Computarizada por Rayos X , Adolescente , Adulto , Anciano , Niño , Femenino , Humanos , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados , Adulto JovenRESUMEN
Objective: Identifying symptoms highly specific to COVID-19 would improve the clinical and public health response to infectious outbreaks. Here, we describe a high-throughput approach - Concept-Wide Association Study (ConceptWAS) that systematically scans a disease's clinical manifestations from clinical notes. We used this method to identify symptoms specific to COVID-19 early in the course of the pandemic. Methods: Using the Vanderbilt University Medical Center (VUMC) EHR, we parsed clinical notes through a natural language processing pipeline to extract clinical concepts. We examined the difference in concepts derived from the notes of COVID-19-positive and COVID-19-negative patients on the PCR testing date. We performed ConceptWAS using the cumulative data every two weeks for early identifying specific COVID-19 symptoms. Results: We processed 87,753 notes 19,692 patients (1,483 COVID-19-positive) subjected to COVID-19 PCR testing between March 8, 2020, and May 27, 2020. We found 68 clinical concepts significantly associated with COVID-19. We identified symptoms associated with increasing risk of COVID-19, including "absent sense of smell" (odds ratio [OR] = 4.97, 95% confidence interval [CI] = 3.21-7.50), "fever" (OR = 1.43, 95% CI = 1.28-1.59), "with cough fever" (OR = 2.29, 95% CI = 1.75-2.96), and "ageusia" (OR = 5.18, 95% CI = 3.02-8.58). Using ConceptWAS, we were able to detect loss sense of smell or taste three weeks prior to their inclusion as symptoms of the disease by the Centers for Disease Control and Prevention (CDC). Conclusion: ConceptWAS is a high-throughput approach for exploring specific symptoms of a disease like COVID-19, with a promise for enabling EHR-powered early disease manifestations identification.