Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
NPJ Precis Oncol ; 8(1): 120, 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38796637

RESUMO

A small number of cancer patients respond exceptionally well to therapies and survive significantly longer than patients with similar diagnoses. Profiling the germline genetic backgrounds of exceptional responder (ER) patients, with extreme survival times, can yield insights into the germline polymorphisms that influence response to therapy. As ERs showed a high incidence in autoimmune diseases, we hypothesized the differences in autoimmune disease risk could reflect the immune background of ERs and contribute to better cancer treatment responses. We analyzed the germline variants of 51 ERs using polygenic risk score (PRS) analysis. Compared to typical cancer patients, the ERs had significantly elevated PRSs for several autoimmune-related diseases: type 1 diabetes, hypothyroidism, and psoriasis. This indicates that an increased genetic predisposition towards these autoimmune diseases is more prevalent among the ERs. In contrast, ERs had significantly lower PRSs for developing inflammatory bowel disease. The left-skew of type 1 diabetes score was significant for exceptional responders. Variants on genes involved in the T1D PRS model associated with cancer drug response are more likely to co-occur with other variants among ERs. In conclusion, ERs exhibited different risks for autoimmune diseases compared to typical cancer patients, which suggests that changes in a patient's immune set point or immune surveillance specificity could be a potential mechanistic link to their exceptional response. These findings expand upon previous research on immune checkpoint inhibitor-treated patients to include those who received chemotherapy or radiotherapy.

3.
Psychiatry Res ; 323: 115175, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37003169

RESUMO

Growing evidence has shown that applying machine learning models to large clinical data sources may exceed clinician performance in suicide risk stratification. However, many existing prediction models either suffer from "temporal bias" (a bias that stems from using case-control sampling) or require training on all available patient visit data. Here, we adopt a "landmark model" framework that aligns with clinical practice for prediction of suicide-related behaviors (SRBs) using a large electronic health record database. Using the landmark approach, we developed models for SRB prediction (regularized Cox regression and random survival forest) that establish a time-point (e.g., clinical visit) from which predictions are made over user-specified prediction windows using historical information up to that point. We applied this approach to cohorts from three clinical settings: general outpatient, psychiatric emergency department, and psychiatric inpatients, for varying prediction windows and lengths of historical data. Models achieved high discriminative performance (area under the Receiver Operating Characteristic curve 0.74-0.93 for the Cox model) across different prediction windows and settings, even with relatively short periods of historical data. In short, we developed accurate, dynamic SRB risk prediction models with the landmark approach that reduce bias and enhance the reliability and portability of suicide risk prediction models.


Assuntos
Serviço Hospitalar de Emergência , Tentativa de Suicídio , Humanos , Tentativa de Suicídio/psicologia , Reprodutibilidade dos Testes , Curva ROC
4.
J Am Med Inform Assoc ; 28(12): 2582-2592, 2021 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-34608931

RESUMO

OBJECTIVE: Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data. MATERIALS AND METHODS: Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling. RESULTS: In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers. DISCUSSION: Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power. CONCLUSION: ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.


Assuntos
Algoritmos , Registro Médico Coordenado , Viés , Bases de Dados Factuais , Testes Diagnósticos de Rotina , Humanos
5.
Clin Gastroenterol Hepatol ; 18(8): 1890-1892, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31404664

RESUMO

Crohn's disease (CD) and ulcerative colitis (UC) are heterogeneous. With availability of therapeutic classes with distinct immunologic mechanisms of action, it has become imperative to identify markers that predict likelihood of response to each drug class. However, robust development of such tools has been challenging because of need for large prospective cohorts with systematic and careful assessment of treatment response using validated indices. Most hospitals in the United States use electronic health records (EHRs) that warehouse a large amount of narrative (free-text) and codified (administrative) data generated during routine clinical care. These data have been used to construct virtual disease cohorts for epidemiologic research as well as for defining genetic basis of disease states or discrete laboratory values.1-3 Whether EHR-based data can be used to validate genetic associations for more nuanced outcomes such as treatment response has not been examined previously.


Assuntos
Colite Ulcerativa , Doença de Crohn , Doenças Inflamatórias Intestinais , Registros Eletrônicos de Saúde , Humanos , Doenças Inflamatórias Intestinais/tratamento farmacológico , Estudos Prospectivos , Estados Unidos
6.
Nat Protoc ; 14(12): 3426-3444, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31748751

RESUMO

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).


Assuntos
Análise de Dados , Registros Eletrônicos de Saúde/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/métodos , Algoritmos , Interpretação Estatística de Dados , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Fenótipo
7.
Sci Data ; 6: 180298, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30620344

RESUMO

We develop an algorithm for probabilistic linkage of de-identified research datasets at the patient level, when only diagnosis codes with discrepancies and no personal health identifiers such as name or date of birth are available. It relies on Bayesian modelling of binarized diagnosis codes, and provides a posterior probability of matching for each patient pair, while considering all the data at once. Both in our simulation study (using an administrative claims dataset for data generation) and in two real use-cases linking patient electronic health records from a large tertiary care network, our method exhibits good performance and compares favourably to the standard baseline Fellegi-Sunter algorithm. We propose a scalable, fast and efficient open-source implementation in the ludic R package available on CRAN, which also includes the anonymized diagnosis code data from our real use-case. This work suggests it is possible to link de-identified research databases stripped of any personal health identifiers using only diagnosis codes, provided sufficient information is shared between the data sources.


Assuntos
Algoritmos , Conjuntos de Dados como Assunto , Armazenamento e Recuperação da Informação/métodos , Teorema de Bayes , Confidencialidade , Registros Eletrônicos de Saúde , Humanos
8.
J Am Med Inform Assoc ; 25(1): 54-60, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29126253

RESUMO

Objective: Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. In this paper, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. Methods: The most predictive features, such as the number of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes or mentions of the target phenotype, are normalized to resemble a normal mixture distribution with high area under the receiver operating curve (AUC) for prediction. The transformed features are then denoised and combined into a score for accurate disease classification. Results: We validated the accuracy of PheNorm with 4 phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUCs of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the 4 phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference. Conclusion: The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data.


Assuntos
Algoritmos , Big Data , Registros Eletrônicos de Saúde , Fenótipo , Área Sob a Curva , Conjuntos de Dados como Assunto , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Classificação Internacional de Doenças , Peptídeos , Medicina de Precisão
9.
J Am Med Inform Assoc ; 24(e1): e143-e149, 2017 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-27632993

RESUMO

OBJECTIVE: Phenotyping algorithms are capable of accurately identifying patients with specific phenotypes from within electronic medical records systems. However, developing phenotyping algorithms in a scalable way remains a challenge due to the extensive human resources required. This paper introduces a high-throughput unsupervised feature selection method, which improves the robustness and scalability of electronic medical record phenotyping without compromising its accuracy. METHODS: The proposed Surrogate-Assisted Feature Extraction (SAFE) method selects candidate features from a pool of comprehensive medical concepts found in publicly available knowledge sources. The target phenotype's International Classification of Diseases, Ninth Revision and natural language processing counts, acting as noisy surrogates to the gold-standard labels, are used to create silver-standard labels. Candidate features highly predictive of the silver-standard labels are selected as the final features. RESULTS: Algorithms were trained to identify patients with coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis using various numbers of labels to compare the performance of features selected by SAFE, a previously published automated feature extraction for phenotyping procedure, and domain experts. The out-of-sample area under the receiver operating characteristic curve and F -score from SAFE algorithms were remarkably higher than those from the other two, especially at small label sizes. CONCLUSION: SAFE advances high-throughput phenotyping methods by automatically selecting a succinct set of informative features for algorithm training, which in turn reduces overfitting and the needed number of gold-standard labels. SAFE also potentially identifies important features missed by automated feature extraction for phenotyping or experts.


Assuntos
Algoritmos , Mineração de Dados , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Fenótipo
10.
Inflamm Bowel Dis ; 22(4): 880-5, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26933751

RESUMO

BACKGROUND: The availability of monoclonal antibodies to tumor necrosis factor α has revolutionized management of Crohn's disease (CD) and ulcerative colitis. However, limited data exist regarding comparative effectiveness of these agents to inform clinical practice. METHODS: This study consisted of patients with CD or ulcerative colitis initiation either infliximab (IFX) or adalimumab (ADA) between 1998 and 2010. A validated likelihood of nonresponse classification score using frequency of narrative mentions of relevant symptoms in the electronic health record was applied to assess comparative effectiveness at 1 year. Inflammatory bowel disease-related surgery, hospitalization, and use of steroids were determined during this period. RESULTS: Our final cohort included 1060 new initiations of IFX (68% for CD) and 391 of ADA (79% for CD). In CD, the likelihood of nonresponse was higher in ADA than IFX (odds ratio, 1.62 and 95% CI, 1.21-2.17). Similar differences favoring efficacy of IFX were observed for the individual symptoms of diarrhea, pain, bleeding, and fatigue. However, there was no difference in inflammatory bowel disease-related surgery, hospitalizations, or prednisone use within 1 year after initiation of IFX or ADA in CD. There was no difference in narrative or codified outcomes between the 2 agents in ulcerative colitis. CONCLUSIONS: We identified a modestly higher likelihood of symptomatic nonresponse at 1 year for ADA compared with IFX in patients with CD. However, there were no differences in inflammatory bowel disease-related surgery or hospitalizations, suggesting these treatments are broadly comparable in effectiveness in routine clinical practice.


Assuntos
Adalimumab/uso terapêutico , Colite Ulcerativa/tratamento farmacológico , Doença de Crohn/tratamento farmacológico , Infliximab/uso terapêutico , Fator de Necrose Tumoral alfa/antagonistas & inibidores , Adulto , Anti-Inflamatórios/uso terapêutico , Quimioterapia Combinada , Feminino , Seguimentos , Fármacos Gastrointestinais/uso terapêutico , Humanos , Masculino , Prognóstico
11.
Clin Gastroenterol Hepatol ; 14(7): 973-9, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-26905907

RESUMO

BACKGROUND & AIMS: Inflammatory bowel diseases (IBDs) such as Crohn's disease and ulcerative colitis are associated with an increased risk of colorectal cancer (CRC). Chemopreventive strategies have produced weak or inconsistent results. Statins have been associated inversely with sporadic CRC. We examined their role as chemopreventive agents in patients with IBD. METHODS: We collected data from 11,001 patients with IBD receiving care at hospitals in the Greater Boston metropolitan area from 1998 through 2010. Diagnoses of CRC were determined using validated International Classification of Diseases, 9th revision, Clinical Modification codes. Statin use before diagnosis was assessed through analysis of electronic prescriptions. We performed multivariate logistic regression analyses, adjusting for potential confounders including primary sclerosing cholangitis, smoking, increased levels of inflammation markers, and CRC screening practices to identify an independent association between statin use and CRC. We performed sensitivity analyses using propensity score adjustment and variation in the definition of statin use. RESULTS: In our cohort, 1376 of the patients (12.5%) received 1 or more prescriptions for a statin. Patients using statins were more likely to be older, male, white, smokers, and have greater comorbidity than nonusers. Over a follow-up period of 9 years, 2% of statin users developed CRC compared with 3% of nonusers (age-adjusted odds ratio, 0.35; 95% confidence interval, 0.24-0.53). On multivariate analysis, statin use remained independently and inversely associated with CRC (odds ratio, 0.42; 95% confidence interval, 0.28-0.62). Our findings were robust on a variety of sensitivity and subgroup analyses. CONCLUSIONS: Statin use was associated inversely with the risk of CRC in a large IBD cohort. Prospective studies on the role of statins as chemopreventive agents are warranted.


Assuntos
Neoplasias Colorretais/epidemiologia , Neoplasias Colorretais/prevenção & controle , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Doenças Inflamatórias Intestinais/complicações , Adulto , Idoso , Boston/epidemiologia , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Medição de Risco
12.
Inflamm Bowel Dis ; 22(1): 151-8, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26332313

RESUMO

BACKGROUND: Electronic health records, increasingly a part of healthcare, provide a wealth of untapped narrative free text data that have the potential to accurately inform clinical outcomes. METHODS: From a validated cohort of patients with Crohn's disease or ulcerative colitis, we identified patients with ≥1 coded or narrative mention of monoclonal antibodies to tumor necrosis factor α. Chart review by ascertained true use of therapy, time of initiation, and cessation of treatment, and also clinical response stratified as nonresponse, partial, or complete response at 1 year. Internal consistency was assessed in an independent validation cohort. RESULTS: A total of 3087 patients had a mention of an antibodies to tumor necrosis factor α. Actual therapy initiation was within 60 days of the first coded mention in 74% of patients. In the derivation cohort, 18% of antibodies to tumor necrosis factor α starts were classified as nonresponse at 1 year, 21% as partial, and 56% as complete response. On multivariate analysis, the number of narrative mentions of diarrhea (odds ratio 1.08; 95% confidence interval, 1.02-1.14) and fatigue (odds ratio 1.16; 95% confidence interval, 1.02-1.32) was independently associated with nonresponse at 1 year (area under the curve 0.82). A likelihood of nonresponse score comprising a weighted sum of both demonstrated a good dose-response relationship across nonresponders (2.18), partial (1.20), and complete (0.50) responders (P < 0.0001) and correlated well with need for surgery or hospitalizations. CONCLUSIONS: Narrative data in an electronic health record offer considerable potential to define temporally evolving disease outcomes such as nonresponse to treatment.


Assuntos
Algoritmos , Colite Ulcerativa/tratamento farmacológico , Doença de Crohn/tratamento farmacológico , Interpretação Estatística de Dados , Registros Eletrônicos de Saúde , Fármacos Gastrointestinais/uso terapêutico , Fator de Necrose Tumoral alfa/antagonistas & inibidores , Adulto , Estudos de Coortes , Feminino , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Adulto Jovem
13.
Inflamm Bowel Dis ; 21(11): 2507-14, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26241000

RESUMO

BACKGROUND: The accuracy and utility of electronic health record (EHR)-derived phenotypes in replicating genotype-phenotype relationships have been infrequently examined. Low circulating vitamin D levels are associated with severe outcomes in inflammatory bowel disease (IBD); however, the genetic basis for vitamin D insufficiency in this population has not been examined previously. METHODS: We compared the accuracy of physician-assigned phenotypes in a large prospective IBD registry to that identified by an EHR algorithm incorporating codified and structured data. Genotyping for IBD risk alleles was performed on the Immunochip and a genetic risk score calculated and compared between EHR-defined patients and those in the registry. Additionally, 4 vitamin D risk alleles were genotyped and serum 25-hydroxy vitamin D [25(OH)D] levels compared across genotypes. RESULTS: A total of 1131 patients captured by our EHR algorithm were also included in our prospective registry (656 Crohn's disease, 475 ulcerative colitis). The overall genetic risk score for Crohn's disease (P = 0.13) and ulcerative colitis (P = 0.32) was similar between EHR-defined patients and a prospective registry. Three of the 4 vitamin D risk alleles were associated with low vitamin D levels in patients with IBD and contributed an additional 3% of the variance explained. Vitamin D genetic risk score did not predict normalization of vitamin D levels. CONCLUSIONS: EHR cohorts form valuable data sources for examining genotype-phenotype relationships. Vitamin D risk alleles explain 3% of the variance in vitamin D levels in patients with IBD.


Assuntos
Doenças Inflamatórias Intestinais/genética , Vitamina D/análogos & derivados , Adolescente , Adulto , Alelos , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Variação Genética , Genótipo , Humanos , Modelos Logísticos , Masculino , Análise Multivariada , Fenótipo , Fatores de Risco , Estados Unidos , Vitamina D/sangue , Adulto Jovem
14.
PLoS One ; 10(8): e0136651, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26301417

RESUMO

BACKGROUND: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. METHODS AND RESULTS: We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. CONCLUSIONS: We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Diabetes Mellitus/epidemiologia , Registros Eletrônicos de Saúde , Adulto , Idoso , Algoritmos , Artrite Reumatoide/complicações , Artrite Reumatoide/epidemiologia , Artrite Reumatoide/fisiopatologia , Doença da Artéria Coronariana/complicações , Doença da Artéria Coronariana/fisiopatologia , Diabetes Mellitus/fisiopatologia , Feminino , Humanos , Hiperlipidemias/complicações , Hiperlipidemias/epidemiologia , Hiperlipidemias/fisiopatologia , Doenças Inflamatórias Intestinais/complicações , Doenças Inflamatórias Intestinais/epidemiologia , Doenças Inflamatórias Intestinais/fisiopatologia , Masculino , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Fenótipo , Fatores de Risco
15.
J Am Med Inform Assoc ; 22(5): 993-1000, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25929596

RESUMO

OBJECTIVE: Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. MATERIALS AND METHODS: Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. RESULTS: The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. DISCUSSION: Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. CONCLUSION: The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Artrite Reumatoide/diagnóstico , Humanos , Unified Medical Language System
16.
Am J Psychiatry ; 172(4): 363-72, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25827034

RESUMO

OBJECTIVE: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. METHOD: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. RESULTS: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. CONCLUSIONS: Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.


Assuntos
Transtorno Bipolar/diagnóstico , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Adulto , Idoso , Algoritmos , Transtorno Bipolar/classificação , Transtorno Bipolar/psicologia , Estudos de Casos e Controles , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
Clin Gastroenterol Hepatol ; 13(2): 322-329.e1, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25041865

RESUMO

BACKGROUND & AIMS: Crohn's disease and ulcerative colitis are associated with an increased risk of colorectal cancer (CRC). Surveillance colonoscopy is recommended at 2- to 3-year intervals beginning 8 years after diagnosis of inflammatory bowel disease (IBD). However, there have been no reports of whether colonoscopy examination reduces the risk for CRC in patients with IBD. METHODS: In a retrospective study, we analyzed data from 6823 patients with IBD (2764 with a recent colonoscopy, 4059 without a recent colonoscopy) seen and followed up for at least 3 years at 2 tertiary referral hospitals in Boston, Massachusetts. The primary outcome was diagnosis of CRC. We examined the proportion of patients undergoing a colonoscopy within 36 months before a diagnosis of CRC or at the end of the follow-up period, excluding colonoscopies performed within 6 months before a diagnosis of CRC, to avoid inclusion of prevalent cancers. Multivariate logistic regression was performed, adjusting for plausible confounders. RESULTS: A total of 154 patients developed CRC. The incidence of CRC among patients without a recent colonoscopy (2.7%) was significantly higher than among patients with a recent colonoscopy (1.6%) (odds ratio [OR], 0.56; 95% confidence interval [CI], 0.39-0.80). This difference persisted in multivariate analysis (OR, 0.65; 95% CI, 0.45-0.93) and was robust when adjusted for a range of assumptions in sensitivity analyses. Among patients with CRC, a colonoscopy within 6 to 36 months before diagnosis was associated with a reduced mortality rate (OR, 0.34; 95% CI, 0.12-0.95). CONCLUSIONS: Recent colonoscopy (within 36 months) is associated with a reduced incidence of CRC in patients with IBD, and lower mortality rates in those diagnosed with CRC.


Assuntos
Colonoscopia , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/prevenção & controle , Doenças Inflamatórias Intestinais/complicações , Adulto , Boston , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Medição de Risco
19.
Dig Dis Sci ; 60(2): 471-7, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25213079

RESUMO

INTRODUCTION: Inflammatory bowel diseases [IBD; Crohn's disease (CD), ulcerative colitis] often affect women in their reproductive years. Few studies have analyzed the impact of mode of childbirth on long-term IBD outcomes. METHODS: We used a multi-institutional IBD cohort to identify all women in the reproductive age-group with a diagnosis of IBD prior to pregnancy. We identified the occurrence of a new diagnosis code for perianal complications, IBD-related hospitalization and surgery, and initiation of medical therapy after either a vaginal delivery or caesarean section (CS). Cox proportional hazards models adjusting for potential confounders were used to estimate independent effect of mode of childbirth on IBD outcomes. RESULTS: Our cohort included 360 women with IBD (161 CS). Women in the CS group were likely to be older and more likely to have complicated disease behavior prior to pregnancy. During follow-up, there was no difference in the likelihood of IBD-related surgery (multivariate hazard ratio 1.75, 95 % confidence interval (CI) 0.40-7.75), IBD-related hospitalization (HR 1.39), initiation of immunomodulator therapy (HR 1.45), or anti-TNF therapy (HR 1.11). Among the 133 CD pregnancies with no prior perianal disease, we found no excess risk of subsequent new diagnosis perianal fistulae with vaginal delivery compared to CS (HR 0.19, 95 % CI 0.04-1.05). CONCLUSIONS: Mode of delivery did not influence natural history of IBD. In our cohort, vaginal delivery was not associated with increased risk of subsequent perianal disease in women with CD.


Assuntos
Cesárea/efeitos adversos , Colite Ulcerativa/complicações , Doença de Crohn/complicações , Parto Obstétrico/efeitos adversos , Parto , Adolescente , Adulto , Boston , Distribuição de Qui-Quadrado , Colite Ulcerativa/diagnóstico , Colite Ulcerativa/terapia , Doença de Crohn/diagnóstico , Doença de Crohn/terapia , Parto Obstétrico/métodos , Procedimentos Cirúrgicos do Sistema Digestório , Progressão da Doença , Feminino , Humanos , Fatores Imunológicos/uso terapêutico , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Análise Multivariada , Readmissão do Paciente , Gravidez , Prognóstico , Modelos de Riscos Proporcionais , Fístula Retal/etiologia , Fatores de Risco , Fatores de Tempo , Adulto Jovem
20.
Hum Genet ; 133(11): 1369-82, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25062868

RESUMO

To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case-control status, and this estimated case-control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies.


Assuntos
Artrite Reumatoide/genética , Estudos de Associação Genética/métodos , Modelos Genéticos , Algoritmos , Artrite Reumatoide/classificação , Artrite Reumatoide/epidemiologia , Estudos de Casos e Controles , Estudos de Coortes , Simulação por Computador , Registros Eletrônicos de Saúde , Pesquisa em Genética , Genótipo , Humanos , Auditoria Médica , Fenótipo , Prevalência , Tamanho da Amostra , Software , Estados Unidos/epidemiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...