Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
PLoS One ; 18(5): e0283553, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37196047

RESUMO

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.


Assuntos
Doenças Diverticulares , Diverticulite , Divertículo , Humanos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Processamento de Linguagem Natural , Fenótipo , Algoritmos , Polimorfismo de Nucleotídeo Único
2.
BMC Med Inform Decis Mak ; 22(1): 129, 2022 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-35549702

RESUMO

BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer's Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information. METHODS: There were two populations of patients: 1794 participants in the Adult Changes in Thought (ACT) study and 2391 patients in the general population of Kaiser Permanente Washington. All individuals had standardized cognitive assessment scores. We excluded patients with a diagnosis of Alzheimer's Disease, Dementia or use of donepezil. We manually annotated 10,391 clinic notes to train the NLP model. Standard Python code was used to extract phrases from notes and map each phrase to a cognitive functioning concept. Concepts derived from the NLP system were used to predict future MCI. The prediction model was trained on the ACT cohort and 60% of the general population cohort with 40% withheld for validation. We used a least absolute shrinkage and selection operator logistic regression approach (LASSO) to fit a prediction model with MCI as the prediction target. Using the predicted case status from the LASSO model and known MCI from standardized scores, we constructed receiver operating curves to measure model performance. RESULTS: Chart abstraction identified 42 MCI concepts. Prediction model performance in the validation data set was modest with an area under the curve of 0.67. Setting the cutoff for correct classification at 0.60, the classifier yielded sensitivity of 1.7%, specificity of 99.7%, PPV of 70% and NPV of 70.5% in the validation cohort. DISCUSSION AND CONCLUSION: Although the sensitivity of the machine learning model was poor, negative predictive value was high, an important characteristic of models used for population-based screening. While an AUC of 0.67 is generally considered moderate performance, it is also comparable to several tests that are widely used in clinical practice.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Humanos , Aprendizado de Máquina , Programas de Rastreamento , Processamento de Linguagem Natural
3.
Circulation ; 142(17): 1633-1646, 2020 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-32981348

RESUMO

BACKGROUND: Abdominal aortic aneurysm (AAA) is an important cause of cardiovascular mortality; however, its genetic determinants remain incompletely defined. In total, 10 previously identified risk loci explain a small fraction of AAA heritability. METHODS: We performed a genome-wide association study in the Million Veteran Program testing ≈18 million DNA sequence variants with AAA (7642 cases and 172 172 controls) in veterans of European ancestry with independent replication in up to 4972 cases and 99 858 controls. We then used mendelian randomization to examine the causal effects of blood pressure on AAA. We examined the association of AAA risk variants with aneurysms in the lower extremity, cerebral, and iliac arterial beds, and derived a genome-wide polygenic risk score (PRS) to identify a subset of the population at greater risk for disease. RESULTS: Through a genome-wide association study, we identified 14 novel loci, bringing the total number of known significant AAA loci to 24. In our mendelian randomization analysis, we demonstrate that a genetic increase of 10 mm Hg in diastolic blood pressure (odds ratio, 1.43 [95% CI, 1.24-1.66]; P=1.6×10-6), as opposed to systolic blood pressure (odds ratio, 1.06 [95% CI, 0.97-1.15]; P=0.2), likely has a causal relationship with AAA development. We observed that 19 of 24 AAA risk variants associate with aneurysms in at least 1 other vascular territory. A 29-variant PRS was strongly associated with AAA (odds ratioPRS, 1.26 [95% CI, 1.18-1.36]; PPRS=2.7×10-11 per SD increase in PRS), independent of family history and smoking risk factors (odds ratioPRS+family history+smoking, 1.24 [95% CI, 1.14-1.35]; PPRS=1.27×10-6). Using this PRS, we identified a subset of the population with AAA prevalence greater than that observed in screening trials informing current guidelines. CONCLUSIONS: We identify novel AAA genetic associations with therapeutic implications and identify a subset of the population at significantly increased genetic risk of AAA independent of family history. Our data suggest that extending current screening guidelines to include testing to identify those with high polygenic AAA risk, once the cost of genotyping becomes comparable with that of screening ultrasound, would significantly increase the yield of current screening at reasonable cost.


Assuntos
Aneurisma da Aorta Abdominal/genética , Humanos , Veteranos
4.
World J Surg ; 44(1): 84-94, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31605180

RESUMO

BACKGROUND: The extent to which obesity and genetics determine postoperative complications is incompletely understood. METHODS: We performed a retrospective study using two population cohorts with electronic health record (EHR) data. The first included 736,726 adults with body mass index (BMI) recorded between 1990 and 2017 at Vanderbilt University Medical Center. The second cohort consisted of 65,174 individuals from 12 institutions contributing EHR and genome-wide genotyping data to the Electronic Medical Records and Genomics (eMERGE) Network. Pairwise logistic regression analyses were used to measure the association of BMI categories with postoperative complications derived from International Classification of Disease-9 codes, including postoperative infection, incisional hernia, and intestinal obstruction. A genetic risk score was constructed from 97 obesity-risk single-nucleotide polymorphisms for a Mendelian randomization study to determine the association of genetic risk of obesity on postoperative complications. Logistic regression analyses were adjusted for sex, age, site, and race/principal components. RESULTS: Individuals with overweight or obese BMI (≥25 kg/m2) had increased risk of incisional hernia (odds ratio [OR] 1.7-5.5, p < 3.1 × 10-20), and people with obesity (BMI ≥ 30 kg/m2) had increased risk of postoperative infection (OR 1.2-2.3, p < 2.5 × 10-5). In the eMERGE cohort, genetically predicted BMI was associated with incisional hernia (OR 2.1 [95% CI 1.8-2.5], p = 1.4 × 10-6) and postoperative infection (OR 1.6 [95% CI 1.4-1.9], p = 3.1 × 10-6). Association findings were similar after limitation of the cohorts to those who underwent abdominal procedures. CONCLUSIONS: Clinical and Mendelian randomization studies suggest that obesity, as measured by BMI, is associated with the development of postoperative incisional hernia and infection.


Assuntos
Análise da Randomização Mendeliana/métodos , Obesidade/complicações , Complicações Pós-Operatórias/genética , Adulto , Índice de Massa Corporal , Feminino , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Complicações Pós-Operatórias/etiologia , Estudos Retrospectivos , Fatores de Risco
5.
Am J Hum Genet ; 105(3): 526-533, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31422818

RESUMO

As clinical testing for Mendelian causes of colorectal cancer (CRC) is largely driven by recognition of family history and early age of onset, the rates of such findings among individuals with prevalent CRC not recognized to have these features is largely unknown. We evaluated actionable genomic findings in community-based participants ascertained by three phenotypes: (1) CRC, (2) one or more adenomatous colon polyps, and (3) control participants over age 59 years without CRC or colon polyps. These participants underwent sequencing for a panel of genes that included colorectal cancer/polyp (CRC/P)-associated and actionable incidental findings genes. Those with CRC had a 3.8% rate of positive results (pathogenic or likely pathogenic) for a CRC-associated gene variant, despite generally being older at CRC onset (mean 72 years). Those ascertained for polyps had a 0.8% positive rate and those with no CRC/P had a positive rate of 0.2%. Though incidental finding rates unrelated to colon cancer were similar for all groups, our positive rate for cardiovascular findings exceeds disease prevalence, suggesting that variant interpretation challenges or low penetrance in these genes. The rate of HFE c.845G>A (p.Cys282Tyr) homozygotes in the CRC group reinforces a previously reported, but relatively unexplored, association between hemochromatosis and CRC. These results in a general clinical population suggest that current testing strategies could be improved in order to better detect Mendelian CRC-associated conditions. These data also underscore the need for additional functional and familial evidence to clarify the pathogenicity and penetrance of variants deemed pathogenic or likely pathogenic, particularly among the actionable genes associated with cardiovascular disease.


Assuntos
Pólipos do Colo/genética , Neoplasias Colorretais/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
6.
Genes Immun ; 20(7): 555-565, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30459343

RESUMO

Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudinal data and attempts to classify individuals into groups based on a linear model of repeated measurements. We applied LCMM to repeated WBC count measures derived from electronic medical records of participants of the National Human Genetics Research Institute (NHRGI) electronic MEdical Record and GEnomics (eMERGE) network study, revealing two WBC count trajectory phenotypes. Advancing these phenotypes to GWAS, we found genetic associations between trajectory class membership and regions on chromosome 1p34.3 and chromosome 11q13.4. The chromosome 1 region contains CSF3R, which encodes the granulocyte colony-stimulating factor receptor. This protein is a major factor in neutrophil stimulation and proliferation. The association on chromosome 11 contain genes RNF169 and XRRA1; both involved in the regulation of double-strand break DNA repair.


Assuntos
Contagem de Leucócitos/métodos , Leucócitos/classificação , Adulto , Idoso , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Feminino , Estudo de Associação Genômica Ampla , Humanos , Análise de Classes Latentes , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Receptores de Fator Estimulador de Colônias/genética , Ubiquitina-Proteína Ligases/genética
7.
Clin Gastroenterol Hepatol ; 17(8): 1571-1579.e7, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30326300

RESUMO

BACKGROUND & AIMS: There is significant variation among endoscopists in their adenoma detection rates (ADRs). We explored associations between ADR and characteristics of endoscopists, including personality traits and financial incentives. METHODS: We collected electronic health record data from October 2013 through September 2015 and calculated ADRs for physicians from 4 health systems. ADRs were risk-adjusted for differences in patient populations. Physicians were surveyed to assess financial motivations, knowledge and perceptions about colonoscopy quality, and personality traits. Of 140 physicians sent the survey, 117 responded. RESULTS: The median risk-adjusted ADR for all surveyed physicians was 29.3% (interquartile range, 24.1%-35.5%). We found no significant association between ADR and financial incentives, malpractice concerns, or physicians' perceptions of ADR as a quality metric. ADR was associated with the degree of self-reported compulsiveness relative to peers: among endoscopists who described themselves as much more compulsive, the ADR was 33.1%; among those who described themselves as somewhat more compulsive, the ADR was 32.9%; among those who described themselves as about the same as others, the ADR was 26.4%; and among those who described themselves as somewhat less compulsive, the ADR was 27.3%) (P = .0019). ADR was also associated with perceived thoroughness (much more thorough than peers, ADR = 31.5%; somewhat more, 31.9%; same/somewhat less, 27.1%; P = .0173). Physicians who reported feeling rushed, having difficulty pacing themselves, or having difficulty in accomplishing goals had higher ADRs. A secondary analysis found the same associations between personality and adenomas per colonoscopy. CONCLUSIONS: We found no significant association between ADR and financial incentives, malpractice concerns, or perceptions of ADR as a quality metric. However, ADRs were higher among physicians who described themselves as more compulsive or thorough, and among those who reported feeling rushed or having difficulty accomplishing goals.


Assuntos
Adenoma/diagnóstico , Neoplasias do Colo/diagnóstico , Colonoscopia/métodos , Detecção Precoce de Câncer/métodos , Personalidade , Médicos/psicologia , Indicadores de Qualidade em Assistência à Saúde , Adenoma/epidemiologia , Neoplasias do Colo/epidemiologia , Feminino , Seguimentos , Humanos , Incidência , Masculino , Estudos Retrospectivos , Estados Unidos/epidemiologia
8.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30571344

RESUMO

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Assuntos
Biomarcadores/sangue , Doenças das Artérias Carótidas/diagnóstico , Estudo de Associação Genômica Ampla , Proteoma/análise , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças das Artérias Carótidas/genética , Feminino , Genótipo , Humanos , Lectinas Tipo C/análise , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteômica , Receptor beta de Fator de Crescimento Derivado de Plaquetas/sangue
9.
Endoscopy ; 50(10): 984-992, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29689571

RESUMO

BACKGROUND: Serrated polyps are important colorectal cancer precursors that are variably detected during colonoscopy. We measured serrated polyp detection rate (SPDR) in a large, multicenter, cross-sectional study of colonoscopy quality to identify drivers of SPDR variation. METHODS: Colonoscopy and pathology reports were collected for a 2-year period (10/2013-9/2015) from four sites across the United States. Data from reports, including size, location, and histology of polyps, were abstracted using a validated natural language processing algorithm. SPDR was defined as the proportion of colonoscopies with ≥ 1 serrated polyp (not including hyperplastic polyps). Multivariable logistic regression was performed to determine endoscopist characteristics associated with serrated polyp detection. RESULTS: A total of 104 618 colonoscopies were performed by 201 endoscopists who varied with respect to specialty (86 % were gastroenterologists), sex (18 % female), years in practice (range 1 - 51), and number of colonoscopies performed during the study period (range 30 - 2654). The overall mean SPDR was 5.1 % (SD 3.8 %, range 0 - 18.8 %). In multivariable analysis, gastroenterology specialty training (odds ratio [OR] 1.89, 95 % confidence interval [CI] 1.33 - 2.70), fewer years in practice (≤ 9 years vs. ≥ 27 years: OR 1.52, 95 %CI 1.14 - 2.04)], and higher procedure volumes (highest vs. lowest quartile: OR 1.77, 95 %CI 1.27 - 2.46)] were independently associated with serrated polyp detection. CONCLUSIONS: Gastroenterology specialization, more recent completion of training, and greater procedure volume are associated with serrated polyp detection. These findings imply that both repetition and training are likely to be important contributors to adequate detection of these important cancer precursors. Additional efforts to improve SPDR are needed.


Assuntos
Pólipos do Colo/diagnóstico por imagem , Colonoscopia/estatística & dados numéricos , Gastroenterologia/estatística & dados numéricos , Especialização/estatística & dados numéricos , Competência Clínica , Colonoscopia/educação , Colonoscopia/normas , Cirurgia Colorretal/estatística & dados numéricos , Estudos Transversais , Medicina de Família e Comunidade/estatística & dados numéricos , Feminino , Gastroenterologia/educação , Cirurgia Geral/estatística & dados numéricos , Humanos , Masculino , Cirurgia Torácica/estatística & dados numéricos
10.
Am J Gastroenterol ; 113(3): 431-439, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29380819

RESUMO

OBJECTIVES: Endoscopist quality measures such as adenoma detection rate (ADR) and serrated polyp detection rates (SPDRs) depend on pathologist classification of histology. Although variation in pathologic interpretation is recognized, we add to the literature by quantifying the impact of pathologic variability on endoscopist performance. METHODS: We used natural language processing to abstract relevant data from colonoscopy and related pathology reports performed over 2 years at four clinical sites. We quantified each pathologist's likelihood of classifying polyp specimens as adenomas or serrated polyps. We estimated the impact on endoscopists' ADR and SPDR of sending their specimens to pathologists with higher or lower classification rates. RESULTS: We observed 85,526 colonoscopies performed by 119 endoscopists; 50,453 had a polyp specimen, which were analyzed by 48 pathologists. There was greater variation across pathologists in classification of serrated polyps than in classification of adenomas. We estimate the endoscopist's average SPDR would be 0.5% if all their specimens were analyzed by the pathologist in our sample with the lowest classification rate and 12.0% if all their specimens were analyzed by the pathologist with the highest classification rate. In contrast, the endoscopist's average ADR would be 28.5% and 42.4% if their specimens were analyzed by the pathologist with lowest and highest classification rate, respectively. CONCLUSIONS: There is significant variation in pathologic interpretation, which more substantially affects endoscopist SPDR than ADR.

11.
Gastrointest Endosc ; 87(3): 778-786.e5, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28866456

RESUMO

BACKGROUND AND AIMS: Patients who receive a colonoscopy from a physician with a low adenoma detection rate (ADR) are at higher risk of subsequent colorectal cancer. It is unclear what drives the variation across physicians in ADR. We describe physician characteristics associated with higher ADR. METHODS: In this retrospective cohort study a natural language processing system was used to analyze all outpatient colonoscopy examinations and their associated pathology reports from October 2013 to September 2015 for adults age 40 years and older across physicians from 4 diverse health systems. Physician performance on ADR was risk adjusted for differences in patient population and procedure indication. Our sample included 201 physicians performing at least 30 colonoscopy examinations during the study period, totaling 104,618 colonoscopy examinations. RESULTS: The mean ADR was 33.2% (range, 6.3%-58.7%). Higher ADR was seen among female physicians (4.2 percentage points higher than men, P = .020), gastroenterologists (9.4 percentage points higher than nongastroenterologists, P < .001), and physicians with ≤9 years since their residency completion (6.0 percentage points higher than physicians who have had 27-51 years of practice, P = .004). CONCLUSIONS: Gastroenterologists, female physicians, and more recently trained physicians had higher performance in adenoma detection.


Assuntos
Adenoma/diagnóstico , Competência Clínica/estatística & dados numéricos , Colonoscopia/estatística & dados numéricos , Neoplasias Colorretais/diagnóstico , Médicos/estatística & dados numéricos , Adenoma/patologia , Adulto , Idoso , Estudos de Coortes , Neoplasias Colorretais/patologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Sistema de Registros , Estudos Retrospectivos
12.
J Am Med Inform Assoc ; 24(5): 986-991, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-28419261

RESUMO

OBJECTIVE: Widespread application of clinical natural language processing (NLP) systems requires taking existing NLP systems and adapting them to diverse and heterogeneous settings. We describe the challenges faced and lessons learned in adapting an existing NLP system for measuring colonoscopy quality. MATERIALS AND METHODS: Colonoscopy and pathology reports from 4 settings during 2013-2015, varying by geographic location, practice type, compensation structure, and electronic health record. RESULTS: Though successful, adaptation required considerably more time and effort than anticipated. Typical NLP challenges in assembling corpora, diverse report structures, and idiosyncratic linguistic content were greatly magnified. DISCUSSION: Strategies for addressing adaptation challenges include assessing site-specific diversity, setting realistic timelines, leveraging local electronic health record expertise, and undertaking extensive iterative development. More research is needed on how to make it easier to adapt NLP systems to new clinical settings. CONCLUSIONS: A key challenge in widespread application of NLP is adapting existing systems to new clinical settings.


Assuntos
Colonoscopia , Detecção Precoce de Câncer , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Coleta de Dados , Humanos , Disseminação de Informação , Sistemas Computadorizados de Registros Médicos , Patologia Clínica
13.
EGEMS (Wash DC) ; 4(1): 1254, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27668266

RESUMO

INTRODUCTION: The incidence of incidentally detected lung nodules is rapidly rising, but little is known about their management or associated patient outcomes. One barrier to studying lung nodule care is the inability to efficiently and reliably identify the cohort of interest (i.e. cases). Investigators at Kaiser Permanente Southern California (KPSC) recently developed an automated method to identify individuals with an incidentally discovered lung nodule, but the feasibility of implementing this method across other health systems is unknown. METHODS: A random sample of Group Health (GH) members who had a computed tomography in 2012 underwent chart review to determine if a lung nodule was documented in the radiology report. A previously developed natural language processing (NLP) algorithm was implemented at our site using only knowledge of the key words, qualifiers, excluding terms, and the logic linking these parameters. RESULTS: Among 499 subjects, 156 (31%, 95% confidence interval [CI] 27-36%) had an incidentally detected lung nodule. NLP identified 189 (38%, 95% CI 33-42%) individuals with a nodule. The accuracy of NLP at GH was similar to its accuracy at KPSC: sensitivity 90% (95% CI 85-95%) and specificity 86% (95% CI 82-89%) versus sensitivity 96% (95% CI 88-100%) and specificity 86% (95% CI 75-94%). CONCLUSION: Automated methods designed to identify individuals with an incidentally detected lung nodule can feasibly and independently be implemented across health systems. Use of these methods will likely facilitate the efficient conduct of multi-site studies evaluating practice patterns and associated outcomes.

14.
Science ; 351(6274): 737-41, 2016 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-26912863

RESUMO

Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neandertals, yet the influence of this admixture on human traits is largely unknown. We analyzed the contribution of common Neandertal variants to over 1000 electronic health record (EHR)-derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neandertal alleles with neurological, psychiatric, immunological, and dermatological phenotypes. Neandertal alleles together explained a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis), and individual Neandertal alleles were significantly associated with specific human phenotypes, including hypercoagulation and tobacco use. Our results establish that archaic admixture influences disease risk in modern humans, provide hypotheses about the effects of hundreds of Neandertal haplotypes, and demonstrate the utility of EHR data in evolutionary analyses.


Assuntos
Doença/genética , Homem de Neandertal/genética , Alelos , Animais , Depressão/genética , Evolução Molecular , Variação Genética , Genoma Humano , Haplótipos , Humanos , Ceratose Actínica/genética , Fenótipo , População Branca/genética
15.
JAMA ; 315(1): 47-57, 2016 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-26746457

RESUMO

IMPORTANCE: Large-scale DNA sequencing identifies incidental rare variants in established Mendelian disease genes, but the frequency of related clinical phenotypes in unselected patient populations is not well established. Phenotype data from electronic medical records (EMRs) may provide a resource to assess the clinical relevance of rare variants. OBJECTIVE: To determine the clinical phenotypes from EMRs for individuals with variants designated as pathogenic by expert review in arrhythmia susceptibility genes. DESIGN, SETTING, AND PARTICIPANTS: This prospective cohort study included 2022 individuals recruited for nonantiarrhythmic drug exposure phenotypes from October 5, 2012, to September 30, 2013, for the Electronic Medical Records and Genomics Network Pharmacogenomics project from 7 US academic medical centers. Variants in SCN5A and KCNH2, disease genes for long QT and Brugada syndromes, were assessed for potential pathogenicity by 3 laboratories with ion channel expertise and by comparison with the ClinVar database. Relevant phenotypes were determined from EMRs, with data available from 2002 (or earlier for some sites) through September 10, 2014. EXPOSURES: One or more variants designated as pathogenic in SCN5A or KCNH2. MAIN OUTCOMES AND MEASURES: Arrhythmia or electrocardiographic (ECG) phenotypes defined by International Classification of Diseases, Ninth Revision (ICD-9) codes, ECG data, and manual EMR review. RESULTS: Among 2022 study participants (median age, 61 years [interquartile range, 56-65 years]; 1118 [55%] female; 1491 [74%] white), a total of 122 rare (minor allele frequency <0.5%) nonsynonymous and splice-site variants in 2 arrhythmia susceptibility genes were identified in 223 individuals (11% of the study cohort). Forty-two variants in 63 participants were designated potentially pathogenic by at least 1 laboratory or ClinVar, with low concordance across laboratories (Cohen κ = 0.26). An ICD-9 code for arrhythmia was found in 11 of 63 (17%) variant carriers vs 264 of 1959 (13%) of those without variants (difference, +4%; 95% CI, -5% to +13%; P = .35). In the 1270 (63%) with ECGs, corrected QT intervals were not different in variant carriers vs those without (median, 429 vs 439 milliseconds; difference, -10 milliseconds; 95% CI, -16 to +3 milliseconds; P = .17). After manual review, 22 of 63 participants (35%) with designated variants had any ECG or arrhythmia phenotype, and only 2 had corrected QT interval longer than 500 milliseconds. CONCLUSIONS AND RELEVANCE: Among laboratories experienced in genetic testing for cardiac arrhythmia disorders, there was low concordance in designating SCN5A and KCNH2 variants as pathogenic. In an unselected population, the putatively pathogenic genetic variants were not associated with an abnormal phenotype. These findings raise questions about the implications of notifying patients of incidental genetic findings.


Assuntos
Arritmias Cardíacas/genética , Registros Eletrônicos de Saúde , Canais de Potássio Éter-A-Go-Go/genética , Variação Genética , Laboratórios/normas , Canal de Sódio Disparado por Voltagem NAV1.5/genética , Fenótipo , Idoso , Idoso de 80 Anos ou mais , Alelos , Arritmias Cardíacas/etnologia , Arritmias Cardíacas/fisiopatologia , Síndrome de Brugada/genética , Canal de Potássio ERG1 , Feminino , Predisposição Genética para Doença , Testes Genéticos/normas , Genômica , Heterozigoto , Humanos , Achados Incidentais , Masculino , Pessoa de Meia-Idade , Mutação de Sentido Incorreto , Estudos Prospectivos , Distribuição Aleatória , Estatísticas não Paramétricas , Adulto Jovem
16.
Am J Hum Genet ; 97(4): 512-20, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26365338

RESUMO

Hereditary hemochromatosis (HH) is a common autosomal-recessive disorder associated with pathogenic HFE variants, most commonly those resulting in p.Cys282Tyr and p.His63Asp. Recommendations on returning incidental findings of HFE variants in individuals undergoing genome-scale sequencing should be informed by penetrance estimates of HH in unselected samples. We used the eMERGE Network, a multicenter cohort with genotype data linked to electronic medical records, to estimate the diagnostic rate and clinical penetrance of HH in 98 individuals homozygous for the variant coding for HFE p.Cys282Tyr and 397 compound heterozygotes with variants resulting in p.[His63Asp];[Cys282Tyr]. The diagnostic rate of HH in males was 24.4% for p.Cys282Tyr homozygotes and 3.5% for compound heterozygotes (p < 0.001); in females, it was 14.0% for p.Cys282Tyr homozygotes and 2.3% for compound heterozygotes (p < 0.001). Only males showed differences across genotypes in transferrin saturation levels (100% of homozygotes versus 37.5% of compound heterozygotes with transferrin saturation > 50%; p = 0.003), serum ferritin levels (77.8% versus 33.3% with serum ferritin > 300 ng/ml; p = 0.006), and diabetes (44.7% versus 28.0%; p = 0.03). No differences were found in the prevalence of heart disease, arthritis, or liver disease, except for the rate of liver biopsy (10.9% versus 1.8% [p = 0.013] in males; 9.1% versus 2% [p = 0.035] in females). Given the higher rate of HH diagnosis than in prior studies, the high penetrance of iron overload, and the frequency of at-risk genotypes, in addition to other suggested actionable adult-onset genetic conditions, opportunistic screening should be considered for p.[Cys282Tyr];[Cys282Tyr] individuals with existing genomic data.


Assuntos
Variação Genética/genética , Hemocromatose/epidemiologia , Hemocromatose/genética , Antígenos de Histocompatibilidade Classe I/genética , Proteínas de Membrana/genética , Adulto , Idoso , Substituição de Aminoácidos , Criança , Estudos de Coortes , Feminino , Seguimentos , Genótipo , Hemocromatose/diagnóstico , Proteína da Hemocromatose , Heterozigoto , Homozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Penetrância , Prognóstico , Estados Unidos/epidemiologia
17.
Gastrointest Endosc ; 82(4): 676-82, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26385276

RESUMO

BACKGROUND: Colonoscopy is the predominant method for colorectal cancer screening in the United States. Previous studies have documented variation across physicians in colonoscopy quality as measured by the adenoma detection rate (ADR). ADR is the primary quality measure of colonoscopy examinations and an indicator of the likelihood of subsequent colorectal cancer. There is interest in mechanisms to improve the ADR. In Central Illinois, a local employer and a quality improvement organization partnered to publically report physician colonoscopy quality. OBJECTIVE: We assessed whether this initiative was associated with an improvement in the ADR. DESIGN: We compared ADRs before and after public reporting at a private practice endoscopy center with 11 gastroenterologists in Peoria, Illinois, who participated in the initiative. To generate the ADR, colonoscopy and pathology reports from examinations performed over 4 years at the endoscopy center were analyzed by using previously validated natural language processing software. SETTING: A central Illinois endoscopy center. RESULTS: The ADR in the pre-public reporting period was 34.3% and 39.2% in the post-public reporting period (an increase of 4.9%, P < .001). The increase in the right-sided ADR was 5.1% (P < .01), whereas the increase in the left-sided ADR was 2.1% (P < .05). The increase in the ADR was 7.8% for screening colonoscopies (P < 0.05) and 3.5% for nonscreening colonoscopies (P < .05). All but 1 physician's ADR increased (range -2.7% to 10.5%). There was no statistically significant change in the advanced ADR (increase of 0.8%, P = .22). LIMITATIONS: There was no concurrent control group to assess whether the increased ADR was due to a secular trend. CONCLUSION: A public reporting initiative on colonoscopy quality was associated with an increase in ADR.


Assuntos
Adenoma/diagnóstico , Colonoscopia/normas , Neoplasias Colorretais/diagnóstico , Detecção Precoce de Câncer/normas , Disseminação de Informação , Melhoria de Qualidade/estatística & dados numéricos , Indicadores de Qualidade em Assistência à Saúde/estatística & dados numéricos , Acesso à Informação , Adulto , Idoso , Idoso de 80 Anos ou mais , Colonoscopia/estatística & dados numéricos , Detecção Precoce de Câncer/estatística & dados numéricos , Feminino , Humanos , Illinois , Masculino , Pessoa de Meia-Idade , Avaliação de Programas e Projetos de Saúde , Estudos Retrospectivos
18.
J Pathol Inform ; 6: 38, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26167382

RESUMO

BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard. RESULTS: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. CONCLUSIONS: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance.

19.
Genome Med ; 7(1): 67, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26221186

RESUMO

BACKGROUND: In an effort to return actionable results from variant data to electronic health records (EHRs), participants in the Electronic Medical Records and Genomics (eMERGE) Network are being sequenced with the targeted Pharmacogenomics Research Network sequence platform (PGRNseq). This cost-effective, highly-scalable, and highly-accurate platform was created to explore rare variation in 84 key pharmacogenetic genes with strong drug phenotype associations. METHODS: To return Clinical Laboratory Improvement Amendments (CLIA) results to our participants at the Group Health Cooperative, we sequenced the DNA of 900 participants (61 % female) with non-CLIA biobanked samples. We then selected 450 of those to be re-consented, to redraw blood, and ultimately to validate CLIA variants in anticipation of returning the results to the participant and EHR. These 450 were selected using an algorithm we designed to harness data from self-reported race, diagnosis and procedure codes, medical notes, laboratory results, and variant-level bioinformatics to ensure selection of an informative sample. We annotated the multi-sample variant call format by a combination of SeattleSeq and SnpEff tools, with additional custom variables including evidence from ClinVar, OMIM, HGMD, and prior clinical associations. RESULTS: We focused our analyses on 27 actionable genes, largely driven by the Clinical Pharmacogenetics Implementation Consortium. We derived a ranking system based on the total number of coding variants per participant (75.2±14.7), and the number of coding variants with high or moderate impact (11.5±3.9). Notably, we identified 11 stop-gained (1 %) and 519 missense (20 %) variants out of a total of 1785 in these 27 genes. Finally, we prioritized variants to be returned to the EHR with prior clinical evidence of pathogenicity or annotated as stop-gain for the following genes: CACNA1S and RYR1 (malignant hyperthermia); SCN5A, KCNH2, and RYR2 (arrhythmia); and LDLR (high cholesterol). CONCLUSIONS: The incorporation of genetics into the EHR for clinical decision support is a complex undertaking for many reasons including lack of prior consent for return of results, lack of biospecimens collected in a CLIA environment, and EHR integration. Our study design accounts for these hurdles and is an example of a pilot system that can be utilized before expanding to an entire health system.

20.
J Biomed Inform ; 54: 77-84, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25661260

RESUMO

OBJECTIVE: Structured data on mammographic findings are difficult to obtain without manual review. We developed and evaluated a rule-based natural language processing (NLP) system to extract mammographic findings from free-text mammography reports. MATERIALS AND METHODS: The NLP system extracted four mammographic findings: mass, calcification, asymmetry, and architectural distortion, using a dictionary look-up method on 93,705 mammography reports from Group Health. Status annotations and anatomical location annotation were associated to each NLP detected finding through association rules. After excluding negated, uncertain, and historical findings, affirmative mentions of detected findings were summarized. Confidence flags were developed to denote reports with highly confident NLP results and reports with possible NLP errors. A random sample of 100 reports was manually abstracted to evaluate the accuracy of the system. RESULTS: The NLP system correctly coded 96-99 out of our sample of 100 reports depending on findings. Measures of sensitivity, specificity and negative predictive values exceeded 0.92 for all findings. Positive predictive values were relatively low for some findings due to their low prevalence. DISCUSSION: Our NLP system was implemented entirely in SAS Base, which makes it portable and easy to implement. It performed reasonably well with multiple applications, such as using confidence flags as a filter to improve the efficiency of manual review. Refinements of library and association rules, and testing on more diverse samples may further improve its performance. CONCLUSION: Our NLP system successfully extracts clinically useful information from mammography reports. Moreover, SAS is a feasible platform for implementing NLP algorithms.


Assuntos
Mamografia/métodos , Processamento de Linguagem Natural , Algoritmos , Feminino , Humanos , Sensibilidade e Especificidade , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA