RESUMO
One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at-scale to identify causal disease factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third (n = 2069) were identified in participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Veteranos , Humanos , Masculino , Variação Genética , Estudos Longitudinais , Polimorfismo de Nucleotídeo Único , Estados Unidos , United States Department of Veterans Affairs , FemininoRESUMO
BACKGROUND: Post-COVID-19 condition (colloquially known as "long COVID-19") characterized as postacute sequelae of SARS-CoV-2 has no universal clinical case definition. Recent efforts have focused on understanding long COVID-19 symptoms, and electronic health record (EHR) data provide a unique resource for understanding this condition. The introduction of the International Classification of Diseases, Tenth Revision (ICD-10) code U09.9 for "Post COVID-19 condition, unspecified" to identify patients with long COVID-19 has provided a method of evaluating this condition in EHRs; however, the accuracy of this code is unclear. OBJECTIVE: This study aimed to characterize the utility and accuracy of the U09.9 code across 3 health care systems-the Veterans Health Administration, the Beth Israel Deaconess Medical Center, and the University of Pittsburgh Medical Center-against patients identified with long COVID-19 via a chart review by operationalizing the World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC) definitions. METHODS: Patients who were COVID-19 positive with either a U07.1 ICD-10 code or positive polymerase chain reaction test within these health care systems were identified for chart review. Among this cohort, we sampled patients based on two approaches: (1) with a U09.9 code and (2) without a U09.9 code but with a new onset long COVID-19-related ICD-10 code, which allows us to assess the sensitivity of the U09.9 code. To operationalize the long COVID-19 definition based on health agency guidelines, symptoms were grouped into a "core" cluster of 11 commonly reported symptoms among patients with long COVID-19 and an extended cluster that captured all other symptoms by disease domain. Patients having ≥2 symptoms persisting for ≥60 days that were new onset after their COVID-19 infection, with ≥1 symptom in the core cluster, were labeled as having long COVID-19 per chart review. The code's performance was compared across 3 health care systems and across different time periods of the pandemic. RESULTS: Overall, 900 patient charts were reviewed across 3 health care systems. The prevalence of long COVID-19 among the cohort with the U09.9 ICD-10 code based on the operationalized WHO definition was between 23.2% and 62.4% across these health care systems. We also evaluated a less stringent version of the WHO definition and the CDC definition and observed an increase in the prevalence of long COVID-19 at all 3 health care systems. CONCLUSIONS: This is one of the first studies to evaluate the U09.9 code against a clinical case definition for long COVID-19, as well as the first to apply this definition to EHR data using a chart review approach on a nationwide cohort across multiple health care systems. This chart review approach can be implemented at other EHR systems to further evaluate the utility and performance of the U09.9 code.
RESUMO
OBJECTIVE: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research. MATERIALS AND METHODS: The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site. RESULTS: The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development. DISCUSSION: The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping. CONCLUSION: CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.
Assuntos
Registros Eletrônicos de Saúde , Fenômica , Fenótipo , Bases de Conhecimento , AlgoritmosRESUMO
To address gaps in understanding the pathophysiology of Gulf War Illness (GWI), the VA Million Veteran Program (MVP) developed and implemented a survey to MVP enrollees who served in the U.S. military during the 1990-1991 Persian Gulf War (GW). Eligible Veterans were invited via mail to complete a survey assessing health conditions as well as GW-specific deployment characteristics and exposures. We evaluated the representativeness of this GW-era cohort relative to the broader population by comparing demographic, military, and health characteristics between respondents and non-respondents, as well as with all GW-era Veterans who have used Veterans Health Administration (VHA) services and the full population of U.S. GW-deployed Veterans. A total of 109,976 MVP GW-era Veterans were invited to participate and 45,270 (41%) returned a completed survey. Respondents were 84% male, 72% White, 8% Hispanic, with a mean age of 61.6 years (SD = 8.5). Respondents were more likely to be older, White, married, better educated, slightly healthier, and have higher socioeconomic status than non-respondents, but reported similar medical conditions and comparable health status. Although generally similar to all GW-era Veterans using VHA services and the full population of U.S. GW Veterans, respondents included higher proportions of women and military officers, and were slightly older. In conclusion, sample characteristics of the MVP GW-era cohort can be considered generally representative of the broader GW-era Veteran population. The sample represents the largest research cohort of GW-era Veterans established to date and provides a uniquely valuable resource for conducting in-depth studies to evaluate health conditions affecting 1990-1991 GW-era Veterans.
Assuntos
Militares , Veteranos , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Guerra do Golfo , Nível de Saúde , Inquéritos EpidemiológicosRESUMO
Importance: A significant proportion of SARS-CoV-2 infected individuals experience post-COVID-19 condition months after initial infection. Objective: To determine the rates, clinical setting, risk factors, and symptoms associated with the documentation of International Statistical Classification of Diseases Tenth Revision (ICD-10), code U09.9 for post-COVID-19 condition after acute infection. Design, Setting, and Participants: This retrospective cohort study was performed within the US Department of Veterans Affairs (VA) health care system. Veterans with a positive SARS-CoV-2 test result between October 1, 2021, the date ICD-10 code U09.9 was introduced, and January 31, 2023 (n = 388â¯980), and a randomly selected subsample of patients with the U09.9 code (n = 350) whose symptom prevalence was assessed by systematic medical record review, were included in the analysis. Exposure: Positive SARS-CoV-2 test result. Main Outcomes and Measures: Rates, clinical setting, risk factors, and symptoms associated with ICD-10 code U09.9 in the medical record. Results: Among the 388â¯980 persons with a positive SARS-CoV-2 test, the mean (SD) age was 61.4 (16.1) years; 87.3% were men. In terms of race and ethnicity, 0.8% were American Indian or Alaska Native, 1.4% were Asian, 20.7% were Black, 9.3% were Hispanic or Latino, 1.0% were Native Hawaiian or Other Pacific Islander; and 67.8% were White. Cumulative incidence of U09.9 documentation was 4.79% (95% CI, 4.73%-4.87%) at 6 months and 5.28% (95% CI, 5.21%-5.36%) at 12 months after infection. Factors independently associated with U09.9 documentation included older age, female sex, Hispanic or Latino ethnicity, comorbidity burden, and severe acute infection manifesting by symptoms, hospitalization, or ventilation. Primary vaccination (adjusted hazard ratio [AHR], 0.80 [95% CI, 0.78-0.83]) and booster vaccination (AHR, 0.66 [95% CI, 0.64-0.69]) were associated with a lower likelihood of U09.9 documentation. Marked differences by geographic region and facility in U09.9 code documentation may reflect local screening and care practices. Among the 350 patients undergoing systematic medical record review, the most common symptoms documented in the medical records among patients with the U09.9 code were shortness of breath (130 [37.1%]), fatigue or exhaustion (78 [22.3%]), cough (63 [18.0%]), reduced cognitive function or brain fog (22 [6.3%]), and change in smell and/or taste (20 [5.7%]). Conclusions and Relevance: In this cohort study of 388â¯980 veterans, documentation of ICD-10 code U09.9 had marked regional and facility-level variability. Strong risk factors for U09.9 documentation were identified, while vaccination appeared to be protective. Accurate and consistent documentation of U09.9 is needed to maximize its utility in tracking patients for clinical care and research. Future studies should examine the long-term trajectory of individuals with U09.9 documentation.
Assuntos
COVID-19 , SARS-CoV-2 , Masculino , Humanos , Feminino , Pessoa de Meia-Idade , COVID-19/epidemiologia , Estudos de Coortes , Estudos Retrospectivos , Classificação Internacional de Doenças , Síndrome de COVID-19 Pós-Aguda , Doença CrônicaRESUMO
Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P<4.6×10-11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations.
RESUMO
The development of phenotypes using electronic health records is a resource-intensive process. Therefore, the cataloging of phenotype algorithm metadata for reuse is critical to accelerate clinical research. The Department of Veterans Affairs (VA) has developed a standard for phenotype metadata collection which is currently used in the VA phenomics knowledgebase library, CIPHER (Centralized Interactive Phenomics Resource), to capture over 5000 phenotypes. The CIPHER standard improves upon existing phenotype library metadata collection by capturing the context of algorithm development, phenotyping method used, and approach to validation. While the standard was iteratively developed with VA phenomics experts, it is applicable to the capture of phenotypes across healthcare systems. We describe the framework of the CIPHER standard for phenotype metadata collection, the rationale for its development, and its current application to the largest healthcare system in the United States.
Assuntos
Registros Eletrônicos de Saúde , Fenômica , Estados Unidos , Fenótipo , Algoritmos , MetadadosRESUMO
OBJECTIVE: Accurately assigning phenotype information to individual patients via computational phenotyping using Electronic Health Records (EHRs) has been seen as the first step towards enabling EHRs for precision medicine research. Chart review labels annotated by clinical experts, also known as "gold standard" labels, are essential for the development and validation of computational phenotyping algorithms. However, given the complexity of EHR systems, the process of chart review is both labor intensive and time consuming. We propose a fully automated algorithm, referred to as pGUESS, to rank EHR notes according to their relevance to a given phenotype. By identifying the most relevant notes, pGUESS can greatly improve the efficiency and accuracy of chart reviews. METHOD: pGUESS uses prior guided semantic similarity to measure the informativeness of a clinical note to a given phenotype. We first select candidate clinical concepts from a pool of comprehensive medical concepts using public knowledge sources and then derive the semantic embedding vector (SEV) for a reference article (SEVref) and each note (SEVnote). The algorithm scores the relevance of a note as the cosine similarity between SEVnote and SEVref. RESULTS: The algorithm was validated against four sets of 200 notes that were manually annotated by clinical experts to assess their informativeness to one of three disease phenotypes. pGUESS algorithm substantially outperforms existing unsupervised approaches for classifying the relevance status with respect to both accuracy and scalability across phenotypes. Averaging over the three phenotypes, the rank correlation between the algorithm ranking and gold standard label was 0.64 for pGUESS, but only 0.47 and 0.35 for the next two best performing algorithms. pGUESS is also much more computationally scalable compared to existing algorithms. CONCLUSION: pGUESS algorithm can substantially reduce the burden of chart review and holds potential in improving the efficiency and accuracy of human annotation.
Assuntos
Algoritmos , Semântica , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Fenótipo , Medicina de PrecisãoRESUMO
Risk prediction models for cardiovascular disease (CVD) death developed from patients without vascular disease may not be suitable for myocardial infarction (MI) survivors. Prediction of mortality risk after MI may help to guide secondary prevention. Using national electronic record data from the Veterans Health Administration 2002 to 2012, we developed risk prediction models for CVD death and all-cause death based on 5-year follow-up data of 100,601 survivors of MI using Cox proportional hazards models. Model performance was evaluated using a cross-validation approach. During follow-up, there were 31,622 deaths and 12,901 CVD deaths. In men, older age, current smoking, atrial fibrillation, heart failure, peripheral artery disease, and lower body mass index were associated with greater risk of death from CVD or all-causes, and statin treatment, hypertension medication, estimated glomerular filtration rate level, and high body mass index were significantly associated with reduced risk of fatal outcomes. Similar associations and slightly different predictors were observed in women. The estimated Harrell's C-statistics of the final model versus the cross-validation estimates were 0.77 versus 0.77 in men and 0.81 versus 0.77 in women for CVD death. Similarly, the C-statistics were 0.75 versus 0.75 in men, 0.78 versus 0.75 in women for all-cause mortality. The predicted risk of death was well calibrated compared with the observed risk. In conclusion, we developed and internally validated risk prediction models of 5-year risk for CVD and all-cause death for outpatient survivors of MI. Traditional risk factors, co-morbidities, and lack of blood pressure or lipid treatment were all associated with greater risk of CVD and all-cause mortality.
Assuntos
Doenças Cardiovasculares , Infarto do Miocárdio , Veteranos , Pressão Sanguínea , Causas de Morte , Feminino , Taxa de Filtração Glomerular , Humanos , Masculino , Infarto do Miocárdio/etiologia , Fatores de RiscoRESUMO
BACKGROUND: Estimated 10-year atherosclerotic cardiovascular disease (ASCVD) risk in diabetes mellitus patients is used to guide primary prevention, but the performance of risk estimators (2013 Pooled Cohort Equations [PCE] and Risk Equations for Complications of Diabetes [RECODe]) varies across populations. Data from electronic health records could be used to improve risk estimation for a health system's patients. We aimed to evaluate risk equations for initial ASCVD events in US veterans with diabetes mellitus and improve model performance in this population. METHODS AND RESULTS: We studied 183 096 adults with diabetes mellitus and without prior ASCVD who received care in the Veterans Affairs Healthcare System (VA) from 2002 to 2016 with mean follow-up of 4.6 years. We evaluated model discrimination, using Harrell's C statistic, and calibration, using the reclassification χ2 test, of the PCE and RECODe equations to predict fatal or nonfatal myocardial infarction or stroke and cardiovascular mortality. We then tested whether model performance was affected by deriving VA-specific ß-coefficients. Discrimination of ASCVD events by the PCE was improved by deriving VA-specific ß-coefficients (C statistic increased from 0.560 to 0.597) and improved further by including measures of glycemia, renal function, and diabetes mellitus treatment (C statistic, 0.632). Discrimination by the RECODe equations was improved by substituting VA-specific coefficients (C statistic increased from 0.604 to 0.621). Absolute risk estimation by PCE and RECODe equations also improved with VA-specific coefficients; the calibration P increased from <0.001 to 0.08 for PCE and from <0.001 to 0.005 for RECODe, where higher P indicates better calibration. Approximately two-thirds of veterans would meet a guideline indication for high-intensity statin therapy based on the PCE versus only 10% to 15% using VA-fitted models. CONCLUSIONS: Existing ASCVD risk equations overestimate risk in veterans with diabetes mellitus, potentially impacting guideline-indicated statin therapy. Prediction model performance can be improved for a health system's patients using readily available electronic health record data.
RESUMO
Importance: Data are limited regarding statin therapy for primary prevention of atherosclerotic cardiovascular disease (ASCVD) in adults 75 years and older. Objective: To evaluate the role of statin use for mortality and primary prevention of ASCVD in veterans 75 years and older. Design, Setting, and Participants: Retrospective cohort study that used Veterans Health Administration (VHA) data on adults 75 years and older, free of ASCVD, and with a clinical visit in 2002-2012. Follow-up continued through December 31, 2016. All data were linked to Medicare and Medicaid claims and pharmaceutical data. A new-user design was used, excluding those with any prior statin use. Cox proportional hazards models were fit to evaluate the association of statin use with outcomes. Analyses were conducted using propensity score overlap weighting to balance baseline characteristics. Exposures: Any new statin prescription. Main Outcomes and Measures: The primary outcomes were all-cause and cardiovascular mortality. Secondary outcomes included a composite of ASCVD events (myocardial infarction, ischemic stroke, and revascularization with coronary artery bypass graft surgery or percutaneous coronary intervention). Results: Of 326â¯981 eligible veterans (mean [SD] age, 81.1 [4.1] years; 97% men; 91% white), 57â¯178 (17.5%) newly initiated statins during the study period. During a mean follow-up of 6.8 (SD, 3.9) years, a total 206â¯902 deaths occurred including 53â¯296 cardiovascular deaths, with 78.7 and 98.2 total deaths/1000 person-years among statin users and nonusers, respectively (weighted incidence rate difference [IRD]/1000 person-years, -19.5 [95% CI, -20.4 to -18.5]). There were 22.6 and 25.7 cardiovascular deaths per 1000 person-years among statin users and nonusers, respectively (weighted IRD/1000 person-years, -3.1 [95 CI, -3.6 to -2.6]). For the composite ASCVD outcome there were 123â¯379 events, with 66.3 and 70.4 events/1000 person-years among statin users and nonusers, respectively (weighted IRD/1000 person-years, -4.1 [95% CI, -5.1 to -3.0]). After propensity score overlap weighting was applied, the hazard ratio was 0.75 (95% CI, 0.74-0.76) for all-cause mortality, 0.80 (95% CI, 0.78-0.81) for cardiovascular mortality, and 0.92 (95% CI, 0.91-0.94) for a composite of ASCVD events when comparing statin users with nonusers. Conclusions and Relevance: Among US veterans 75 years and older and free of ASCVD at baseline, new statin use was significantly associated with a lower risk of all-cause and cardiovascular mortality. Further research, including from randomized clinical trials, is needed to more definitively determine the role of statin therapy in older adults for primary prevention of ASCVD.
Assuntos
Aterosclerose/prevenção & controle , Doenças Cardiovasculares/mortalidade , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Veteranos , Idoso , Idoso de 80 Anos ou mais , Doenças Cardiovasculares/prevenção & controle , Causas de Morte , Fatores de Confusão Epidemiológicos , Feminino , Humanos , Masculino , Mortalidade , Pontuação de Propensão , Estudos Retrospectivos , Estados Unidos/epidemiologia , Serviços de Saúde para Veteranos MilitaresRESUMO
Importance: Current guidelines recommend statin therapy for millions of US residents for the primary prevention of atherosclerotic cardiovascular disease (ASCVD). It is unclear whether traditional prediction models that do not account for current widespread statin use are sufficient for risk assessment. Objectives: To examine the performance of the Pooled Cohort Equations (PCE) for 5-year ASCVD risk estimation in a contemporary cohort and to test the hypothesis that inclusion of statin therapy improves model performance. Design, Setting, and Participants: This cohort study included adult patients in the Veterans Affairs health care system without baseline ASCVD. Using national electronic health record data, 3 Cox proportional hazards models were developed to estimate 5-year ASCVD risk, as follows: the variables and published ß coefficients from the PCE (model 1), the PCE variables with cohort-derived ß coefficients (model 2), and model 2 plus baseline statin use (model 3). Data were collected from January 2002 to December 2012 and analyzed from June 2016 to March 2020. Exposures: Traditional ASCVD risk factors from the PCE plus baseline statin use. Main Outcomes and Measures: Incident ASCVD and ASCVD mortality. Results: Of 1â¯672â¯336 patients in the cohort (mean [SD] baseline age 58.0 [13.8] years, 1â¯575â¯163 [94.2%] men, 1â¯383â¯993 [82.8%] white), 312â¯155 (18.7%) were receiving statin therapy at baseline. During 5 years of follow-up, 66â¯605 (4.0%) experienced an ASCVD event, and 31â¯878 (1.9%) experienced ASCVD death. Compared with the original PCE, the cohort-derived model did not improve model discrimination in any of the 4 age-sex strata but did improve model calibration. The PCE overestimated ASCVD risk compared with the cohort-derived model; 211â¯237 of 1â¯136â¯161 white men (18.6%), 29â¯634 of 218â¯463 black men (13.6%), 1741 of 44â¯399 white women (3.9%), and 836 of 16â¯034 black women (5.2%) would be potentially eligible for statin therapy under the PCE but not the cohort-derived model. When added to the cohort-derived model, baseline statin therapy was associated with a 7% (95% CI, 5%-9%) lower relative risk of ASCVD and a 25% (95% CI, 23%-28%) lower relative risk for ASCVD death. Conclusions and Relevance: In this study, lower than expected rates of incident ASCVD events in a contemporary national cohort were observed. The PCE overestimated ASCVD risk, and more than 15% of patients would be potentially eligible for statin therapy based on the PCE but not on a cohort-derived model. In the statin era, health care professionals and systems should base ASCVD risk assessment on models calibrated to their patient populations.
Assuntos
Doença da Artéria Coronariana , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Saúde dos Veteranos/estatística & dados numéricos , Estudos de Coortes , Doença da Artéria Coronariana/epidemiologia , Doença da Artéria Coronariana/terapia , Feminino , Fatores de Risco de Doenças Cardíacas , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Medição de Risco/métodos , Fatores de Risco , Estados Unidos/epidemiologia , United States Department of Veterans AffairsRESUMO
Electronic health records (EHRs) provide a wealth of data for phenotype development in population health studies, and researchers invest considerable time to curate data elements and validate disease definitions. The ability to reproduce well-defined phenotypes increases data quality, comparability of results and expedites research. In this paper, we present a standardized approach to organize and capture phenotype definitions, resulting in the creation of an open, online repository of phenotypes. This resource captures phenotype development, provenance and process from the Million Veteran Program, a national mega-biobank embedded in the Veterans Health Administration (VHA). To ensure that the repository is searchable, extendable, and sustainable, it is necessary to develop both a proper digital catalog architecture and underlying metadata infrastructure to enable effective management of the data fields required to define each phenotype. Our methods provide a resource for VHA investigators and a roadmap for researchers interested in standardizing their phenotype definitions to increase portability.
RESUMO
The Department of Veteran's Affairs (VA) archives one of the largest corpora of clinical notes in their corporate data warehouse as unstructured text data. Unstructured text easily supports keyword searches and regular expressions. Often these simple searches do not adequately support the complex searches that need to be performed on notes. For example, a researcher may want all notes with a Duke Treadmill Score of less than five or people that smoke more than one pack per day. Range queries like this and more can be supported by modelling text as semi-structured documents. In this paper, we implement a scalable machine learning pipeline that models plain medical text as useful semi-structured documents. We improve on existing models and achieve an F1-score of 0.912 and scale our methods to the entire VA corpus.
RESUMO
INTRODUCTION: Previous studies of the relationship between fried food consumption and coronary artery disease (CAD) have yielded conflicting results. We tested the hypothesis that frequent fried food consumption is associated with a higher risk of incident CAD events in Million Veteran Program (MVP) participants. METHODS: Veterans Health Administration electronic health record data were linked to questionnaires completed at MVP enrollment. Self-reported fried food consumption at baseline was categorized: (<1, 1-3, 4-6 times per week or daily). The outcome of interest was non-fatal myocardial infarction or CAD events. We fitted a Cox regression model adjusting for age, sex, race, education, exercise, smoking and alcohol consumption. RESULTS: Of 154,663 MVP enrollees with survey data, mean age was 64 years and 90% were men. During a mean follow-up of approximately 3 years, there were 6,725 CAD events. There was a positive linear relationship between frequency of fried food consumption and risk of CAD (p for trend 0.0015). Multivariable adjusted hazard ratios (95% CI) were 1.0 (ref), 1.07 (1.01-1.13), 1.08 (1.01-1.16), and 1.14 (1.03-1.27) across consecutive increasing categories of fried food intake. CONCLUSIONS: In a large national cohort of U.S. Veterans, fried food consumption has a positive, dose-dependent association with CAD.
Assuntos
Culinária/métodos , Doença da Artéria Coronariana/epidemiologia , Gorduras Insaturadas na Dieta/administração & dosagem , Veteranos/estatística & dados numéricos , Estudos de Coortes , Feminino , Seguimentos , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Medição de Risco , Autorrelato , Inquéritos e Questionários , Estados Unidos/epidemiologiaRESUMO
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Assuntos
Análise de Dados , Registros Eletrônicos de Saúde/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/métodos , Algoritmos , Interpretação Estatística de Dados , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , FenótipoRESUMO
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.
Assuntos
Etnicidade/genética , Estudo de Associação Genômica Ampla , Grupos Raciais/genética , Algoritmos , Humanos , Aprendizado de Máquina , Máquina de Vetores de SuporteRESUMO
BACKGROUND: Habitual alcohol use can be an indicator of alcohol dependence, which is associated with a wide range of serious health problems. METHODS: We completed a genome-wide association study in 126,936 European American and 17,029 African American subjects in the Veterans Affairs Million Veteran Program for a quantitative phenotype based on maximum habitual alcohol consumption. RESULTS: ADH1B, on chromosome 4, was the lead locus for both populations: for the European American sample, rs1229984 (p = 4.9 × 10-47); for African American, rs2066702 (p = 2.3 × 10-12). In the European American sample, we identified three additional genome-wide-significant maximum habitual alcohol consumption loci: on chromosome 17, rs77804065 (p = 1.5 × 10-12), at CRHR1 (corticotropin-releasing hormone receptor 1); the protein product of this gene is involved in stress and immune responses; and on chromosomes 8 and 10. European American and African American samples were then meta-analyzed; the associated region at CRHR1 increased in significance to 1.02 × 10-13, and we identified two additional genome-wide significant loci, FGF14 (p = 9.86 × 10-9) (chromosome 13) and a locus on chromosome 11. Besides ADH1B, none of the five loci have prior genome-wide significant support. Post-genome-wide association study analysis identified genetic correlation to other alcohol-related traits, smoking-related traits, and many others. Replications were observed in UK Biobank data. Genetic correlation between maximum habitual alcohol consumption and alcohol dependence was 0.87 (p = 4.78 × 10-9). Enrichment for cell types included dopaminergic and gamma-aminobutyric acidergic neurons in midbrain, and pancreatic delta cells. CONCLUSIONS: The present study supports five novel alcohol-use risk loci, with particularly strong statistical support for CRHR1. Additionally, we provide novel insight regarding the biology of harmful alcohol use.
Assuntos
Consumo de Bebidas Alcoólicas/genética , Negro ou Afro-Americano/estatística & dados numéricos , Receptores de Hormônio Liberador da Corticotropina/genética , População Branca/estatística & dados numéricos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Consumo de Bebidas Alcoólicas/etnologia , Alcoolismo/etnologia , Alcoolismo/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Estados Unidos , Veteranos , Adulto JovemRESUMO
We developed an algorithm for identifying U.S. veterans with a history of posttraumatic stress disorder (PTSD), using the Department of Veterans Affairs (VA) electronic medical record (EMR) system. This work was motivated by the need to create a valid EMR-based phenotype to identify thousands of cases and controls for a genome-wide association study of PTSD in veterans. We used manual chart review (n = 500) as the gold standard. For both the algorithm and chart review, three classifications were possible: likely PTSD, possible PTSD, and likely not PTSD. We used Lasso regression with cross-validation to select statistically significant predictors of PTSD from the EMR and then generate a predicted probability score of being a PTSD case for every participant in the study population (range: 0-1.00). Comparing the performance of our probabilistic approach (Lasso algorithm) to a rule-based approach (International Classification of Diseases [ICD] algorithm), the Lasso algorithm showed modestly higher overall percent agreement with chart review than the ICD algorithm (80% vs. 75%), higher sensitivity (0.95 vs. 0.84), and higher accuracy (AUC = 0.95 vs. 0.90). We applied a 0.7 probability cut-point to the Lasso results to determine final PTSD case-control status for the VA population. The final algorithm had a 0.99 sensitivity, 0.99 specificity, 0.95 positive predictive value, and 1.00 negative predictive value for PTSD classification (grouping possible PTSD and likely not PTSD) as determined by chart review. This algorithm may be useful for other research and quality improvement endeavors within the VA.
Spanish Abstracts by Asociación Chilena de Estrés Traumático (ACET) Validación de un algoritmo basado en registros médicos electrónicos para identificar el trastorno por estrés postraumático en veteranos de los EE. UU. VALIDACIÓN DE ALGORITOMO DE TEPT Desarrollamos un algoritmo para identificar a los veteranos de EE. UU. con historial de trastorno de estrés postraumático (TEPT), utilizando el sistema de registro médico electrónico (RME) del Departamento de Asuntos de Veteranos (AS). Este trabajo fue motivado por la necesidad de crear un fenotipo válido, basado en RME para identificar miles de casos y controles para un estudio de asociación del genoma del TEPT en los veteranos. Utilizamos la revisión manual de tablas (n = 500) como gold estándar. Tanto para el algoritmo como para la revisión de la tabla, fueron posibles tres clasificaciones: PTSD probable, PTSD posible y probablemente no PTSD. Usamos la regresión Lasso con validación cruzada para seleccionar los factores de pronóstico estadísticamente significativos del TEPT a partir de la RME y luego generar una puntuación de probabilidad pronosticada de ser un caso de TEPT para cada participante en la población del estudio (rango: 0-1.00). Comparando el rendimiento de nuestro enfoque probabilístico (algoritmo Lasso) con un enfoque basado en reglas (algoritmo de Clasificación Internacional de Enfermedades [CIE]), el algoritmo Lasso mostró un porcentaje de acuerdo global modestamente más alto con la revisión de tablas que el algoritmo CIE (80% vs. 75). %), mayor sensibilidad (0.95 frente a 0.84) y mayor precisión (AUC = 0.95 frente a 0.90). Aplicamos un punto de corte de probabilidad de 0.7 a los resultados de Lasso para determinar el estado final de control de caso de TEPT para la población de AV. El algoritmo final tuvo una sensibilidad de 0.99, una especificidad de 0.99, un valor predictivo positivo de 0.95 y un valor predictivo negativo de 1.00 para la clasificación de TEPT (agrupación de TEPT posible y probablemente no TEPT) según lo determinado por la revisión de la tabla. Este algoritmo puede ser útil para otros esfuerzos de investigación y mejora de la calidad dentro del AV.