RESUMO
BACKGROUND: Sequencing Mendelian arrhythmia genes in individuals without an indication for arrhythmia genetic testing can identify carriers of pathogenic or likely pathogenic (P/LP) variants. However, the extent to which these variants are associated with clinically meaningful phenotypes before or after return of variant results is unclear. In addition, the majority of discovered variants are currently classified as variants of uncertain significance, limiting clinical actionability. METHODS: The eMERGE-III study (Electronic Medical Records and Genomics Phase III) is a multicenter prospective cohort that included 21 846 participants without previous indication for cardiac genetic testing. Participants were sequenced for 109 Mendelian disease genes, including 10 linked to arrhythmia syndromes. Variant carriers were assessed with electronic health record-derived phenotypes and follow-up clinical examination. Selected variants of uncertain significance (n=50) were characterized in vitro with automated electrophysiology experiments in HEK293 cells. RESULTS: As previously reported, 3.0% of participants had P/LP variants in the 109 genes. Herein, we report 120 participants (0.6%) with P/LP arrhythmia variants. Compared with noncarriers, arrhythmia P/LP carriers had a significantly higher burden of arrhythmia phenotypes in their electronic health records. Fifty-four participants had variant results returned. Nineteen of these 54 participants had inherited arrhythmia syndrome diagnoses (primarily long-QT syndrome), and 12 of these 19 diagnoses were made only after variant results were returned (0.05%). After in vitro functional evaluation of 50 variants of uncertain significance, we reclassified 11 variants: 3 to likely benign and 8 to P/LP. CONCLUSIONS: Genome sequencing in a large population without indication for arrhythmia genetic testing identified phenotype-positive carriers of variants in congenital arrhythmia syndrome disease genes. As the genomes of large numbers of people are sequenced, the disease risk from rare variants in arrhythmia genes can be assessed by integrating genomic screening, electronic health record phenotypes, and in vitro functional studies. REGISTRATION: URL: https://www. CLINICALTRIALS: gov; Unique identifier; NCT03394859.
Assuntos
Arritmias Cardíacas , Testes Genéticos , Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/genética , Predisposição Genética para Doença , Testes Genéticos/métodos , Genômica , Células HEK293 , Humanos , Fenótipo , Estudos ProspectivosRESUMO
We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.
Assuntos
Anafilaxia , Processamento de Linguagem Natural , Humanos , Anafilaxia/diagnóstico , Anafilaxia/epidemiologia , Aprendizado de Máquina , Algoritmos , Serviço Hospitalar de Emergência , Registros Eletrônicos de SaúdeRESUMO
BACKGROUND: Acute pancreatitis is a serious gastrointestinal disease that is an important target for drug safety surveillance. Little is known about the accuracy of ICD-10 codes for acute pancreatitis in the United States, or their performance in specific clinical settings. We conducted a validation study to assess the accuracy of acute pancreatitis ICD-10 diagnosis codes in inpatient, emergency department (ED), and outpatient settings. METHODS: We reviewed electronic medical records for encounters with acute pancreatitis diagnosis codes in an integrated healthcare system from October 2015 to December 2019. Trained abstractors and physician adjudicators determined whether events met criteria for acute pancreatitis. RESULTS: Out of 1,844 eligible events, we randomly sampled 300 for review. Across all clinical settings, 182 events met validation criteria for an overall positive predictive value (PPV) of 61% (95% confidence intervals [CI] = 55, 66). The PPV was 87% (95% CI = 79, 92%) for inpatient codes, but only 45% for ED (95% CI = 35, 54%) and outpatient (95% CI = 34, 55%) codes. ED and outpatient encounters accounted for 43% of validated events. Acute pancreatitis codes from any encounter type with lipase >3 times the upper limit of normal had a PPV of 92% (95% CI = 86, 95%) and identified 85% of validated events (95% CI = 79, 89%), while codes with lipase <3 times the upper limit of normal had a PPV of only 22% (95% CI = 16, 30%). CONCLUSIONS: These results suggest that ICD-10 codes accurately identified acute pancreatitis in the inpatient setting, but not in the ED and outpatient settings. Laboratory data substantially improved algorithm performance.
Assuntos
Prestação Integrada de Cuidados de Saúde , Pancreatite , Adulto , Humanos , Estados Unidos/epidemiologia , Doença Aguda , Pancreatite/diagnóstico , Pancreatite/epidemiologia , Classificação Internacional de Doenças , Valor Preditivo dos Testes , LipaseRESUMO
Carotid artery atherosclerotic disease (CAAD) is a risk factor for stroke. We used a genome-wide association (GWAS) approach to discover genetic variants associated with CAAD in participants in the electronic Medical Records and Genomics (eMERGE) Network. We identified adult CAAD cases with unilateral or bilateral carotid artery stenosis and controls without evidence of stenosis from electronic health records at eight eMERGE sites. We performed GWAS with a model adjusting for age, sex, study site, and genetic principal components of ancestry. In eMERGE we found 1793 CAAD cases and 17,958 controls. Two loci reached genome-wide significance, on chr6 in LPA (rs10455872, odds ratio [OR] (95% confidence interval [CI]) = 1.50 (1.30-1.73), p = 2.1 × 10-8 ) and on chr7, an intergenic single nucleotide variant (SNV; rs6952610, OR (95% CI) = 1.25 (1.16-1.36), p = 4.3 × 10-8 ). The chr7 association remained significant in the presence of the LPA SNV as a covariate. The LPA SNV was also associated with coronary heart disease (CHD; 4199 cases and 11,679 controls) in this study (OR (95% CI) = 1.27 (1.13-1.43), p = 5 × 10-5 ) but the chr7 SNV was not (OR (95% CI) = 1.03 (0.97-1.09), p = .37). Both variants replicated in UK Biobank. Elevated lipoprotein(a) concentrations ([Lp(a)]) and LPA variants associated with elevated [Lp(a)] have previously been associated with CAAD and CHD, including rs10455872. With electronic health record phenotypes in eMERGE and UKB, we replicated a previously known association and identified a novel locus associated with CAAD.
Assuntos
Estenose das Carótidas , Estudo de Associação Genômica Ampla , Registros Eletrônicos de Saúde , Predisposição Genética para Doença , Genômica , Humanos , Lipoproteína(a)/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
As clinical testing for Mendelian causes of colorectal cancer (CRC) is largely driven by recognition of family history and early age of onset, the rates of such findings among individuals with prevalent CRC not recognized to have these features is largely unknown. We evaluated actionable genomic findings in community-based participants ascertained by three phenotypes: (1) CRC, (2) one or more adenomatous colon polyps, and (3) control participants over age 59 years without CRC or colon polyps. These participants underwent sequencing for a panel of genes that included colorectal cancer/polyp (CRC/P)-associated and actionable incidental findings genes. Those with CRC had a 3.8% rate of positive results (pathogenic or likely pathogenic) for a CRC-associated gene variant, despite generally being older at CRC onset (mean 72 years). Those ascertained for polyps had a 0.8% positive rate and those with no CRC/P had a positive rate of 0.2%. Though incidental finding rates unrelated to colon cancer were similar for all groups, our positive rate for cardiovascular findings exceeds disease prevalence, suggesting that variant interpretation challenges or low penetrance in these genes. The rate of HFE c.845G>A (p.Cys282Tyr) homozygotes in the CRC group reinforces a previously reported, but relatively unexplored, association between hemochromatosis and CRC. These results in a general clinical population suggest that current testing strategies could be improved in order to better detect Mendelian CRC-associated conditions. These data also underscore the need for additional functional and familial evidence to clarify the pathogenicity and penetrance of variants deemed pathogenic or likely pathogenic, particularly among the actionable genes associated with cardiovascular disease.
Assuntos
Pólipos do Colo/genética , Neoplasias Colorretais/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-IdadeRESUMO
BACKGROUND: Currently available medications for chronic osteoarthritis pain are only moderately effective, and their use is limited in many patients because of serious adverse effects and contraindications. The primary surgical option for osteoarthritis is total joint replacement (TJR). The objectives of this study were to describe the treatment history of patients with osteoarthritis receiving prescription pain medications and/or intra-articular corticosteroid injections, and to estimate the incidence of TJR in these patients. METHODS: This retrospective, multicenter, cohort study utilized health plan administrative claims data (January 1, 2013, through December 31, 2019) of adult patients with osteoarthritis in the Innovation in Medical Evidence Development and Surveillance Distributed Database, a subset of the US FDA Sentinel Distributed Database. Patients were analyzed in two cohorts: those with prevalent use of "any pain medication" (prescription non-steroidal anti-inflammatory drugs [NSAIDs], opioids, and/or intra-articular corticosteroid injections) using only the first qualifying dispensing (index date); and those with prevalent use of "each specific pain medication class" with all qualifying treatment episodes identified. RESULTS: Among 1 992 670 prevalent users of "any pain medication", pain medications prescribed on the index date were NSAIDs (596 624 [29.9%] patients), opioids (1 161 806 [58.3%]), and intra-articular corticosteroids (323 459 [16.2%]). Further, 92 026 patients received multiple pain medications on the index date, including 71 632 (3.6%) receiving both NSAIDs and opioids. Altogether, 20.6% of patients used an NSAID at any time following an opioid index dispensing and 17.2% used an opioid following an NSAID index dispensing. The TJR incidence rates per 100 person-years (95% confidence interval [CI]) were 3.21 (95% CI: 3.20-3.23) in the "any pain medication" user cohort, and among those receiving "each specific pain medication class" were NSAIDs, 4.63 (95% CI: 4.58-4.67); opioids, 7.45 (95% CI: 7.40-7.49); and intra-articular corticosteroids, 8.05 (95% CI: 7.97-8.13). CONCLUSIONS: In patients treated with prescription medications for osteoarthritis pain, opioids were more commonly prescribed at index than NSAIDs and intra-articular corticosteroid injections. Of the pain medication classes examined, the incidence of TJR was highest in patients receiving intra-articular corticosteroids and lowest in patients receiving NSAIDs.
Assuntos
Artroplastia de Substituição , Dor Crônica , Osteoartrite , Corticosteroides/efeitos adversos , Adulto , Analgésicos Opioides/uso terapêutico , Anti-Inflamatórios não Esteroides , Artroplastia de Substituição/efeitos adversos , Dor Crônica/tratamento farmacológico , Dor Crônica/epidemiologia , Estudos de Coortes , Humanos , Incidência , Osteoartrite/tratamento farmacológico , Osteoartrite/epidemiologia , Osteoartrite/cirurgia , Estudos RetrospectivosRESUMO
BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer's Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information. METHODS: There were two populations of patients: 1794 participants in the Adult Changes in Thought (ACT) study and 2391 patients in the general population of Kaiser Permanente Washington. All individuals had standardized cognitive assessment scores. We excluded patients with a diagnosis of Alzheimer's Disease, Dementia or use of donepezil. We manually annotated 10,391 clinic notes to train the NLP model. Standard Python code was used to extract phrases from notes and map each phrase to a cognitive functioning concept. Concepts derived from the NLP system were used to predict future MCI. The prediction model was trained on the ACT cohort and 60% of the general population cohort with 40% withheld for validation. We used a least absolute shrinkage and selection operator logistic regression approach (LASSO) to fit a prediction model with MCI as the prediction target. Using the predicted case status from the LASSO model and known MCI from standardized scores, we constructed receiver operating curves to measure model performance. RESULTS: Chart abstraction identified 42 MCI concepts. Prediction model performance in the validation data set was modest with an area under the curve of 0.67. Setting the cutoff for correct classification at 0.60, the classifier yielded sensitivity of 1.7%, specificity of 99.7%, PPV of 70% and NPV of 70.5% in the validation cohort. DISCUSSION AND CONCLUSION: Although the sensitivity of the machine learning model was poor, negative predictive value was high, an important characteristic of models used for population-based screening. While an AUC of 0.67 is generally considered moderate performance, it is also comparable to several tests that are widely used in clinical practice.
Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Humanos , Aprendizado de Máquina , Programas de Rastreamento , Processamento de Linguagem NaturalRESUMO
INTRODUCTION: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness. METHODS: This study examines incidents of under-specification that occurred during the implementation of 34 narrative phenotyping algorithms in the electronic Medical Record and Genomics (eMERGE) network. We reviewed the online communication history between algorithm developers and implementers within the Phenotype Knowledge Base (PheKB) platform, where questions could be raised and answered regarding the intended implementation of a phenotype algorithm. RESULTS: We developed a taxonomy of under-specification categories via an iterative review process between two groups of annotators. Under-specifications that lead to ambiguity and vagueness were consistently found across narrative phenotype algorithms developed by all involved eMERGE sites. DISCUSSION AND CONCLUSION: Our findings highlight that under-specification is an impediment to the accuracy and efficiency of the implementation of current narrative phenotyping algorithms, and we propose approaches for mitigating these issues and improved methods for disseminating EHR phenotyping algorithms.
Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Genômica , Humanos , Bases de Conhecimento , FenótipoRESUMO
Background: Most states have legalized medical cannabis, yet little is known about how medical cannabis use is documented in patients' electronic health records (EHRs). We used natural language processing (NLP) to calculate the prevalence of clinician-documented medical cannabis use among adults in an integrated health system in Washington State where medical and recreational use are legal. Methods: We analyzed EHRs of patients ≥18 years old screened for past-year cannabis use (November 1, 2017-October 31, 2018), to identify clinician-documented medical cannabis use. We defined medical use as any documentation of cannabis that was recommended by a clinician or described by the clinician or patient as intended to manage health conditions or symptoms. We developed and applied an NLP system that included NLP-assisted manual review to identify such documentation in encounter notes. Results: Medical cannabis use was documented for 16,684 (5.6%) of 299,597 outpatient encounters with routine screening for cannabis use among 203,489 patients seeing 1,274 clinicians. The validated NLP system identified 54% of documentation and NLP-assisted manual review the remainder. Language documenting reasons for cannabis use included 125 terms indicating medical use, 28 terms indicating non-medical use and 41 ambiguous terms. Implicit documentation of medical use (e.g., "edible THC nightly for lumbar pain") was more common than explicit (e.g., "continues medical cannabis use"). Conclusions: Clinicians use diverse and often ambiguous language to document patients' reasons for cannabis use. Automating extraction of documentation about patients' cannabis use could facilitate clinical decision support and epidemiological investigation but will require large amounts of gold standard training data.
Assuntos
Maconha Medicinal , Processamento de Linguagem Natural , Adolescente , Adulto , Documentação , Humanos , Maconha Medicinal/uso terapêutico , Medidas de Resultados Relatados pelo Paciente , Atenção Primária à SaúdeRESUMO
BACKGROUND: Anaphylaxis is a life-threatening allergic reaction that is difficult to identify accurately with administrative data. We conducted a population-based validation study to assess the accuracy of ICD-10 diagnosis codes for anaphylaxis in outpatient, emergency department, and inpatient settings. METHODS: In an integrated healthcare system in Washington State, we obtained medical records from healthcare encounters with anaphylaxis diagnosis codes (potential events) from October 2015 to December 2018. To capture events missed by anaphylaxis diagnosis codes, we also obtained records on a sample of serious allergic and drug reactions. Two physicians determined whether potential events met established clinical criteria for anaphylaxis (validated events). RESULTS: Out of 239 potential events with anaphylaxis diagnosis codes, the overall positive predictive value (PPV) for validated events was 64% (95% CI = 58 to 70). The PPV decreased with increasing age. Common precipitants for anaphylaxis were food (39%), medications (35%), and insect bite or sting (12%). The sensitivity of emergency department and inpatient anaphylaxis diagnosis codes for all validated events was 58% (95% CI = 51 to 65), but sensitivity increased to 95% (95% CI = 74 to 99) when outpatient diagnosis codes were included. Using information from all validated events and sampling weights, the incidence rate for anaphylaxis was 3.6 events per 10,000 person-years (95% CI = 3.1 to 4.0). CONCLUSIONS: In this population-based setting, ICD-10 diagnosis codes for anaphylaxis from emergency department and inpatient settings had moderate PPV and sensitivity for validated events. These findings have implications for epidemiologic studies that seek to estimate risks of anaphylaxis using electronic health data.
Assuntos
Anafilaxia , Anafilaxia/diagnóstico , Anafilaxia/epidemiologia , Registros Eletrônicos de Saúde , Humanos , Classificação Internacional de Doenças , Valor Preditivo dos Testes , Washington/epidemiologiaRESUMO
Individuals participating in biobanks and other large research projects are increasingly asked to provide broad consent for open-ended research use and widespread sharing of their biosamples and data. We assessed willingness to participate in a biobank using different consent and data sharing models, hypothesizing that willingness would be higher under more restrictive scenarios. Perceived benefits, concerns, and information needs were also assessed. In this experimental survey, individuals from 11 US healthcare systems in the Electronic Medical Records and Genomics (eMERGE) Network were randomly allocated to one of three hypothetical scenarios: tiered consent and controlled data sharing; broad consent and controlled data sharing; or broad consent and open data sharing. Of 82,328 eligible individuals, exactly 13,000 (15.8%) completed the survey. Overall, 66% (95% CI: 63%-69%) of population-weighted respondents stated they would be willing to participate in a biobank; willingness and attitudes did not differ between respondents in the three scenarios. Willingness to participate was associated with self-identified white race, higher educational attainment, lower religiosity, perceiving more research benefits, fewer concerns, and fewer information needs. Most (86%, CI: 84%-87%) participants would want to know what would happen if a researcher misused their health information; fewer (51%, CI: 47%-55%) would worry about their privacy. The concern that the use of broad consent and open data sharing could adversely affect participant recruitment is not supported by these findings. Addressing potential participants' concerns and information needs and building trust and relationships with communities may increase acceptance of broad consent and wide data sharing in biobank research.
Assuntos
Bancos de Espécimes Biológicos/ética , Disseminação de Informação/ética , Consentimento Livre e Esclarecido/ética , Opinião Pública , Adolescente , Adulto , Idoso , Pesquisa Biomédica/ética , Registros Eletrônicos de Saúde/ética , Feminino , Genoma Humano , Genômica , Humanos , Masculino , Pessoa de Meia-Idade , Privacidade , Fatores Socioeconômicos , Estados Unidos , Adulto JovemRESUMO
BACKGROUND: Primary care providers prescribe most long-term opioid therapy and are increasingly asked to taper the opioid doses of these patients to safer levels. A recent systematic review suggests that multiple interventions may facilitate opioid taper, but many of these are not feasible within the usual primary care practice. OBJECTIVE: To determine if opioid taper plans documented by primary care providers in the electronic health record are associated with significant and sustained opioid dose reductions among patients on long-term opioid therapy. DESIGN: A nested case-control design was used to compare cases (patients with a sustained opioid taper defined as average daily opioid dose of ≤ 30 mg morphine equivalent (MME) or a 50% reduction in MME) to controls (patients matched to cases on year and quarter of cohort entry, sex, and age group, who had not achieved a sustained taper). Each case was matched with four controls. PARTICIPANTS: Two thousand four hundred nine patients receiving a ≥ 60-day supply of opioids with an average daily dose of ≥ 50 MME during 2011-2015. MAIN MEASURES: Opioid taper plans documented in prescription instructions or clinical notes within the electronic health record identified through natural language processing; opioid dosing, patient characteristics, and taper plan components also abstracted from the electronic health record. KEY RESULTS: Primary care taper plans were associated with an increased likelihood of sustained opioid taper after adjusting for all patient covariates and near peak dose (OR = 3.63 [95% CI 2.96-4.46], p < 0.0001). Both taper plans in prescription instructions (OR = 4.03 [95% CI 3.19-5.09], p < 0.0001) and in clinical notes (OR = 2.82 [95% CI 2.00-3.99], p < 0.0001) were associated with sustained taper. CONCLUSIONS: These results suggest that planning for opioid taper during primary care visits may facilitate significant and sustained opioid dose reduction.
Assuntos
Analgésicos Opioides , Redução da Medicação , Registros Eletrônicos de Saúde , Analgésicos Opioides/efeitos adversos , Estudos de Casos e Controles , Humanos , Atenção Primária à SaúdeRESUMO
Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudinal data and attempts to classify individuals into groups based on a linear model of repeated measurements. We applied LCMM to repeated WBC count measures derived from electronic medical records of participants of the National Human Genetics Research Institute (NHRGI) electronic MEdical Record and GEnomics (eMERGE) network study, revealing two WBC count trajectory phenotypes. Advancing these phenotypes to GWAS, we found genetic associations between trajectory class membership and regions on chromosome 1p34.3 and chromosome 11q13.4. The chromosome 1 region contains CSF3R, which encodes the granulocyte colony-stimulating factor receptor. This protein is a major factor in neutrophil stimulation and proliferation. The association on chromosome 11 contain genes RNF169 and XRRA1; both involved in the regulation of double-strand break DNA repair.
Assuntos
Contagem de Leucócitos/métodos , Leucócitos/classificação , Adulto , Idoso , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Feminino , Estudo de Associação Genômica Ampla , Humanos , Análise de Classes Latentes , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Receptores de Fator Estimulador de Colônias/genética , Ubiquitina-Proteína Ligases/genéticaRESUMO
BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.
Assuntos
Biomarcadores/sangue , Doenças das Artérias Carótidas/diagnóstico , Estudo de Associação Genômica Ampla , Proteoma/análise , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças das Artérias Carótidas/genética , Feminino , Genótipo , Humanos , Lectinas Tipo C/análise , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteômica , Receptor beta de Fator de Crescimento Derivado de Plaquetas/sangueRESUMO
BACKGROUND & AIMS: There is significant variation among endoscopists in their adenoma detection rates (ADRs). We explored associations between ADR and characteristics of endoscopists, including personality traits and financial incentives. METHODS: We collected electronic health record data from October 2013 through September 2015 and calculated ADRs for physicians from 4 health systems. ADRs were risk-adjusted for differences in patient populations. Physicians were surveyed to assess financial motivations, knowledge and perceptions about colonoscopy quality, and personality traits. Of 140 physicians sent the survey, 117 responded. RESULTS: The median risk-adjusted ADR for all surveyed physicians was 29.3% (interquartile range, 24.1%-35.5%). We found no significant association between ADR and financial incentives, malpractice concerns, or physicians' perceptions of ADR as a quality metric. ADR was associated with the degree of self-reported compulsiveness relative to peers: among endoscopists who described themselves as much more compulsive, the ADR was 33.1%; among those who described themselves as somewhat more compulsive, the ADR was 32.9%; among those who described themselves as about the same as others, the ADR was 26.4%; and among those who described themselves as somewhat less compulsive, the ADR was 27.3%) (P = .0019). ADR was also associated with perceived thoroughness (much more thorough than peers, ADR = 31.5%; somewhat more, 31.9%; same/somewhat less, 27.1%; P = .0173). Physicians who reported feeling rushed, having difficulty pacing themselves, or having difficulty in accomplishing goals had higher ADRs. A secondary analysis found the same associations between personality and adenomas per colonoscopy. CONCLUSIONS: We found no significant association between ADR and financial incentives, malpractice concerns, or perceptions of ADR as a quality metric. However, ADRs were higher among physicians who described themselves as more compulsive or thorough, and among those who reported feeling rushed or having difficulty accomplishing goals.
Assuntos
Adenoma/diagnóstico , Neoplasias do Colo/diagnóstico , Colonoscopia/métodos , Detecção Precoce de Câncer/métodos , Personalidade , Médicos/psicologia , Indicadores de Qualidade em Assistência à Saúde , Adenoma/epidemiologia , Neoplasias do Colo/epidemiologia , Feminino , Seguimentos , Humanos , Incidência , Masculino , Estudos Retrospectivos , Estados Unidos/epidemiologiaRESUMO
BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition. METHODS: First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI). RESULTS: Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10- 20). This effect was consistent in both pediatric (p = 9.92 × 10- 6) and adult (p = 9.73 × 10- 15) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10- 8, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10- 4). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10- 8), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10- 11). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses. CONCLUSIONS: In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.
Assuntos
Hepatopatia Gordurosa não Alcoólica/genética , Adulto , Idoso , Índice de Massa Corporal , Estudos de Casos e Controles , Redes Comunitárias/organização & administração , Redes Comunitárias/estatística & dados numéricos , Progressão da Doença , Registros Eletrônicos de Saúde/organização & administração , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genômica/organização & administração , Genômica/estatística & dados numéricos , Humanos , Lipase/genética , Masculino , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Morbidade , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Fenótipo , Polimorfismo de Nucleotídeo Único , Transdução de Sinais/genéticaRESUMO
PURPOSE: To provide a validated method to confidently identify exon-containing copy-number variants (CNVs), with a low false discovery rate (FDR), in targeted sequencing data from a clinical laboratory with particular focus on single-exon CNVs. METHODS: DNA sequence coverage data are normalized within each sample and subsequently exonic CNVs are identified in a batch of samples, when the target log2 ratio of the sample to the batch median exceeds defined thresholds. The quality of exonic CNV calls is assessed by C-scores (Z-like scores) using thresholds derived from gold standard samples and simulation studies. We integrate an ExonQC threshold to lower FDR and compare performance with alternate software (VisCap). RESULTS: Thirteen CNVs were used as a truth set to validate Atlas-CNV and compared with VisCap. We demonstrated FDR reduction in validation, simulation, and 10,926 eMERGESeq samples without sensitivity loss. Sixty-four multiexon and 29 single-exon CNVs with high C-scores were assessed by Multiplex Ligation-dependent Probe Amplification (MLPA). CONCLUSION: Atlas-CNV is validated as a method to identify exonic CNVs in targeted sequencing data generated in the clinical laboratory. The ExonQC and C-score assignment can reduce FDR (identification of targets with high variance) and improve calling accuracy of single-exon CNVs respectively. We propose guidelines and criteria to identify high confidence single-exon CNVs.
Assuntos
Variações do Número de Cópias de DNA/genética , Éxons/genética , Genoma Humano/genética , Software , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNARESUMO
RATIONALE: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. OBJECTIVE: To identify additional AAA risk loci using data from all available genome-wide association studies. METHODS AND RESULTS: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. CONCLUSIONS: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.
Assuntos
Aneurisma da Aorta Abdominal/diagnóstico , Aneurisma da Aorta Abdominal/genética , Loci Gênicos/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Aneurisma da Aorta Abdominal/epidemiologia , Predisposição Genética para Doença/epidemiologia , Variação Genética/genética , Estudo de Associação Genômica Ampla/tendências , HumanosRESUMO
BACKGROUND: Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS: We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS: A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37⯱â¯1.38. Specifically, the average knowledge (K) score is 0.64⯱â¯0.66, interpretation (I) score is 0.33⯱â¯0.55, and programming (P) score is 0.40⯱â¯0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION: This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.