Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Stud Health Technol Inform ; 310: 1086-1090, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269982

RESUMEN

Clinical trial enrollment is impeded by the significant time burden placed on research coordinators screening eligible patients. With 50,000 new cancer cases every year, the Veterans Health Administration (VHA) has made increased access for Veterans to high-quality clinical trials a priority. To aid in this effort, we worked with research coordinators to build the MPACT (Matching Patients to Accelerate Clinical Trials) platform with a goal of improving efficiency in the screening process. MPACT supports both a trial prescreening workflow and a screening workflow, employing Natural Language Processing and Data Science methods to produce reliable phenotypes of trial eligibility criteria. MPACT also has a functionality to track a patient's eligibility status over time. Qualitative feedback has been promising with users reporting a reduction in time spent on identifying eligible patients.


Asunto(s)
Neoplasias , Tecnología , Humanos , Flujo de Trabajo , Ciencia de los Datos , Determinación de la Elegibilidad , Neoplasias/diagnóstico , Neoplasias/terapia
2.
Health Informatics J ; 29(3): 14604582231198021, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37635280

RESUMEN

Introduction: PD-L1 expression is used to determine oncology patients' response to and eligibility for immunologic treatments; however, PD-L1 expression status often only exists in unstructured clinical notes, limiting ability to use it in population-level studies. Methods: We developed and evaluated a machine learning based natural language processing (NLP) tool to extract PD-L1 expression values from the nationwide Veterans Affairs electronic health record system. Results: The model demonstrated strong evaluation performance across multiple levels of label granularity. Mean precision of the overall PD-L1 positive label was 0.859 (sd, 0.039), recall 0.994 (sd, 0.013), and F1 0.921 (0.024). When a numeric PD-L1 value was identified, the mean absolute error of the value was 0.537 on a scale of 0 to 100. Conclusion: We presented an accurate NLP method for deriving PD-L1 status from clinical notes. By reducing the time and manual effort needed to review medical records, our work will enable future population-level studies in cancer immunotherapy.


Asunto(s)
Antígeno B7-H1 , Procesamiento de Lenguaje Natural , Humanos , Registros Médicos , Programas Informáticos , Aprendizaje Automático , Registros Electrónicos de Salud
3.
Biochemistry (Mosc) ; 87(10): 1138-1148, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36273882

RESUMEN

Effect of dipyridamole (DIP) at concentrations up to 1 mM on fluorescent characteristics of light-harvesting complexes LH2 and LH1, as well as on conditions of photosynthetic electron transport chain in the bacterial chromatophores of Rba. sphaeroides was investigated. DIP was found to affect efficiency of energy transfer from the light-harvesting complex LH2 to the LH1-reaction center core complex and to produce the long-wavelength ("red") shift of the absorption band of light-harvesting bacteriochlorophyll molecules in the IR spectral region at 840-900 nm. This shift is associated with the membrane transition to the energized state. It was shown that DIP is able to reduce the photooxidized bacteriochlorophyll of the reaction center, which accelerated electron flow along the electron transport chain, thereby stimulating generation of the transmembrane potential on the chromatophore membrane. The results are important for clarifying possible mechanisms of DIP influence on the activity of membrane-bound functional proteins. In particular, they might be significant for interpreting numerous therapeutic effects of DIP.


Asunto(s)
Cromatóforos , Rhodobacter sphaeroides , Rhodobacter sphaeroides/metabolismo , Complejos de Proteína Captadores de Luz/metabolismo , Bacterioclorofilas/metabolismo , Dipiridamol/farmacología , Dipiridamol/metabolismo , Transferencia de Energía , Proteínas de la Membrana/metabolismo , Cromatóforos/metabolismo , Proteínas Bacterianas/metabolismo
4.
Methods Inf Med ; 61(5-06): 167-173, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36070785

RESUMEN

OBJECTIVE: To provide high-quality data for coronavirus disease 2019 (COVID-19) research, we validated derived COVID-19 clinical indicators and 22 associated machine learning phenotypes, in the Mass General Brigham (MGB) COVID-19 Data Mart. METHODS: Fifteen reviewers performed a retrospective manual chart review for 150 COVID-19-positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered a natural language processing (NLP)-based chart review tool, the Digital Analytic Patient Reviewer (DAPR). For this work, we designed a dedicated patient summary view and developed new 127 NLP logics to extract COVID-19 relevant medical concepts and target phenotypes. Moreover, we transformed DAPR for research purposes so that patient information is used for an approved research purpose only and enabled fast access to the integrated patient information. Lastly, we performed a survey to evaluate the validation difficulty and usefulness of the DAPR. RESULTS: The concepts for COVID-19-positive cohort, COVID-19 index date, COVID-19-related admission, and the admission date were shown to have high values in all evaluation metrics. However, three phenotypes showed notable performance degradation than the positive predictive value in the prepandemic population. Based on these results, we removed the three phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes toward using DAPR for chart review. They assessed that the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed. CONCLUSION: Use of NLP technology in the chart review helped to cope with the challenges of the COVID-19 data validation task and accelerated the process. As a result, we could provide more reliable research data promptly and respond to the COVID-19 crisis. DAPR's benefit can be expanded to other domains. We plan to operationalize it for wider research groups.


Asunto(s)
COVID-19 , Humanos , Estudios Retrospectivos , Data Warehousing , Procesamiento de Lenguaje Natural , Exactitud de los Datos
5.
Dig Dis Sci ; 67(2): 473-480, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-33590405

RESUMEN

BACKGROUND AND AIMS: Conventional adenomas (CAs) and serrated polyps (SPs) are precursors to colorectal cancer (CRC). Understanding metachronous cancer risk is poor due to lack of accurate large-volume datasets. We outline the use of natural language processing (NLP) in forming the Partners Colonoscopy Cohort, an integrated longitudinal cohort of patients undergoing colonoscopies. METHODS: We identified endoscopy quality data from endoscopy reports for colonoscopies performed from 2007 to 2018 in a large integrated healthcare system, Mass General Brigham). Through modification of an established NLP pipeline, we extracted histopathological data (polyp location, histology and dysplasia) from corresponding pathology reports. Pathology and endoscopy data were merged by polyp location using a four-stage algorithm. NLP and merging procedures were validated by manual review of 500 pathology reports. RESULTS: 305,656 colonoscopies in 213,924 patients were identified. After merging, 76,137 patients had matched polyp data for 334,750 polyps. CAs and SPs were present in 86,707 (28.5%) and 55,373 (18.2%) colonoscopies. Among patients with polyps at index screening colonoscopy, 14,931 (33.4%) had follow-up colonoscopy (median 46.4, interquartile range 33.8-62.4 months); 91 (0.2%) and 1127 (2.5%) patients developed metachronous CRC and high-risk polyps (polyps ≥ 10 mm or CAs having high-grade dysplasia/villous/tublovillous histology or SPs with dysplasia). Genetic data were available for 23,787 (31.7%) patients with polyps from the Partners Biobank. The validation study showed a positive predictive value of 100% for polyp histology and locations. CONCLUSION: We created the Partners Colonoscopy Cohort providing essential infrastructure for future studies to better understand the natural history of CRC and improve screening and post-polypectomy strategies.


Asunto(s)
Adenoma , Pólipos del Colon , Colonoscopía , Neoplasias Colorrectales , Conjuntos de Datos como Asunto , Pólipos Adenomatosos , Adulto , Anciano , Estudios de Cohortes , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Procesamiento de Lenguaje Natural
6.
J Am Med Inform Assoc ; 29(4): 643-651, 2022 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-34849976

RESUMEN

OBJECTIVE: Integrating and harmonizing disparate patient data sources into one consolidated data portal enables researchers to conduct analysis efficiently and effectively. MATERIALS AND METHODS: We describe an implementation of Informatics for Integrating Biology and the Bedside (i2b2) to create the Mass General Brigham (MGB) Biobank Portal data repository. The repository integrates data from primary and curated data sources and is updated weekly. The data are made readily available to investigators in a data portal where they can easily construct and export customized datasets for analysis. RESULTS: As of July 2021, there are 125 645 consented patients enrolled in the MGB Biobank. 88 527 (70.5%) have a biospecimen, 55 121 (43.9%) have completed the health information survey, 43 552 (34.7%) have genomic data and 124 760 (99.3%) have EHR data. Twenty machine learning computed phenotypes are calculated on a weekly basis. There are currently 1220 active investigators who have run 58 793 patient queries and exported 10 257 analysis files. DISCUSSION: The Biobank Portal allows noninformatics researchers to conduct study feasibility by querying across many data sources and then extract data that are most useful to them for clinical studies. While institutions require substantial informatics resources to establish and maintain integrated data repositories, they yield significant research value to a wide range of investigators. CONCLUSION: The Biobank Portal and other patient data portals that integrate complex and simple datasets enable diverse research use cases. i2b2 tools to implement these registries and make the data interoperable are open source and freely available.


Asunto(s)
Bancos de Muestras Biológicas , Almacenamiento y Recuperación de la Información , Recolección de Datos , Humanos , Informática
7.
Sci Rep ; 11(1): 19959, 2021 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-34620889

RESUMEN

Electronic health records (EHR) provide an unprecedented opportunity to conduct large, cost-efficient, population-based studies. However, the studies of heterogeneous diseases, such as chronic obstructive pulmonary disease (COPD), often require labor-intensive clinical review and testing, limiting widespread use of these important resources. To develop a generalizable and efficient method for accurate identification of large COPD cohorts in EHRs, a COPD datamart was developed from 3420 participants meeting inclusion criteria in the Mass General Brigham Biobank. Training and test sets were selected and labeled with gold-standard COPD classifications obtained from chart review by pulmonologists. Multiple classes of algorithms were built utilizing both structured (e.g. ICD codes) and unstructured (e.g. medical notes) data via elastic net regression. Models explicitly including and excluding spirometry features were compared. External validation of the final algorithm was conducted in an independent biobank with a different EHR system. The final COPD classification model demonstrated excellent positive predictive value (PPV; 91.7%), sensitivity (71.7%), and specificity (94.4%). This algorithm performed well not only within the MGBB, but also demonstrated similar or improved classification performance in an independent biobank (PPV 93.5%, sensitivity 61.4%, specificity 90%). Ancillary comparisons showed that the classification model built including a binary feature for FEV1/FVC produced substantially higher sensitivity than those excluding. This study fills a gap in COPD research involving population-based EHRs, providing an important resource for the rapid, automated classification of COPD cases that is both cost-efficient and requires minimal information from unstructured medical records.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Bases de Datos Factuales , Volumen Espiratorio Forzado , Humanos , Capacidad Vital
8.
JAMIA Open ; 4(3): ooab074, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34485848

RESUMEN

OBJECTIVE: To best meet our point-of-care research (POC-R) needs, we developed ProjectFlow, a configurable, clinical research workflow management application. In this article, we describe ProjectFlow and how it is used to manage study processes for the Diuretic Comparison Project (DCP) and the Research Precision Oncology Program (RePOP). MATERIALS AND METHODS: The Veterans Health Administration (VHA) is the largest integrated health care system in the United States. ProjectFlow is a flexible web-based workflow management tool specifically created to facilitate conduct of our clinical research initiatives within the VHA. The application was developed using the Grails web framework and allows researchers to create custom workflows using Business Process Model and Notation. RESULTS: As of January 2021, ProjectFlow has facilitated management of study recruitment, enrollment, randomization, and drug orders for over 10 000 patients for the DCP clinical trial. It has also helped us evaluate over 3800 patients for recruitment and enroll over 370 of them into RePOP for use in data sharing partnerships and predictive analytics aimed at optimizing cancer treatment in the VHA. DISCUSSION: The POC-R study design embeds research processes within day-to-day clinical care and leverages longitudinal electronic health record (EHR) data for study recruitment, monitoring, and outcome reporting. Software that allows flexibility in study workflow creation and integrates with enterprise EHR systems is critical to the success of POC-R. CONCLUSIONS: We developed a flexible web-based informatics solution called ProjectFlow that supports custom research workflow configuration and has ability to integrate data from existing VHA EHR systems.

10.
J Am Med Inform Assoc ; 27(11): 1716-1720, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-33067628

RESUMEN

OBJECTIVE: Reducing risk of coronavirus disease 2019 (COVID-19) infection among healthcare personnel requires a robust occupational health response involving multiple disciplines. We describe a flexible informatics solution to enable such coordination, and we make it available as open-source software. MATERIALS AND METHODS: We developed a stand-alone application that integrates data from several sources, including electronic health record data and data captured outside the electronic health record. RESULTS: The application facilitates workflows from different hospital departments, including Occupational Health and Infection Control, and has been used extensively. As of June 2020, 4629 employees and 7768 patients and have been added for tracking by the application, and the application has been accessed over 46 000 times. DISCUSSION: Data captured by the application provides both a historical and real-time view into the operational impact of COVID-19 within the hospital, enabling aggregate and patient-level reporting to support identification of new cases, contact tracing, outbreak investigations, and employee workforce management. CONCLUSIONS: We have developed an open-source application that facilitates communication and workflow across multiple disciplines to manage hospital employees impacted by the COVID-19 pandemic.


Asunto(s)
Infecciones por Coronavirus/transmisión , Manejo de Datos , Personal de Salud , Salud Laboral , Sistemas de Identificación de Pacientes/métodos , Neumonía Viral/transmisión , Programas Informáticos , Flujo de Trabajo , Boston , COVID-19 , Brotes de Enfermedades , Hospitales de Veteranos , Humanos , Transmisión de Enfermedad Infecciosa de Paciente a Profesional/prevención & control , Pandemias , Integración de Sistemas , Estados Unidos
11.
Physiol Plant ; 165(3): 476-486, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29345315

RESUMEN

The development of high-performance photobioreactors equipped with automatic systems for non-invasive real-time monitoring of cultivation conditions and photosynthetic parameters is a challenge in algae biotechnology. Therefore, we developed a chlorophyll (Chl) fluorescence measuring system for the online recording of the light-induced fluorescence rise and the dark relaxation of the flash-induced fluorescence yield (Qa- - re-oxidation kinetics) in photobioreactors. This system provides automatic measurements in a broad range of Chl concentrations at high frequency of gas-tight sampling, and advanced data analysis. The performance of this new technique was tested on the green microalgae Chlamydomonas reinhardtii subjected to a sulfur deficiency stress and to long-term dark anaerobic conditions. More than thousand fluorescence kinetic curves were recorded and analyzed during aerobic and anaerobic stages of incubation. Lifetime and amplitude values of kinetic components were determined, and their dynamics plotted on heatmaps. Out of these data, stress-sensitive kinetic parameters were specified. This implemented apparatus can therefore be useful for the continuous real-time monitoring of algal photosynthesis in photobioreactors.


Asunto(s)
Clorofila/metabolismo , Fotobiorreactores/microbiología , Fotosíntesis/fisiología , Chlamydomonas reinhardtii/metabolismo , Fluorescencia , Cinética
12.
Photosynth Res ; 139(1-3): 441-448, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30353420

RESUMEN

The dependence on temperature of tryptophan fluorescence lifetime in trimeric photosystem I (PSI) complexes from cyanobacteria Synechocystis sp. PCC 6803 during the heating of pre-frozen to - 180 °C in the dark or in the light-activated preparations has been studied. Fluorescence lifetime in samples frozen in the light was longer than in samples frozen in the dark. For samples in 65% glycerol at λreg = 335 nm and at 20 °C, the lifetime of components were as follows: τ1 ≈ 1.2 ns, τ2 ≈ 4.9 ns, and τ3 ≈ 20 ns. The contribution of the first component was negligible. To analyze the contribution of components 2 and 3 derived from frozen-thawed samples, two temperature ranges from - 180 to - 90 °C and above - 90 °C are considered. In doing so, the contributions of these components appear antiphase course to each other. The dependence on temperature of these contributions is explained by the influence of the microconformational protein dynamics on the tryptophan fluorescence lifetime. In the present work, a comparative analysis of temperature-dependent conformational dynamics and electron transfer in cyanobacterial PSI (Schlodder et al., in Biochemistry 37:9466-9476, 1998) and Rhodobacter sphaeroides reaction center complexes (Knox et al., in J Photochem Photobiol B 180:140-148, 2018) was also carried out.


Asunto(s)
Cianobacterias/metabolismo , Fluorescencia , Luz , Complejo de Proteína del Fotosistema I/metabolismo , Triptófano/química , Cianobacterias/efectos de la radiación , Temperatura
13.
AMIA Annu Symp Proc ; 2019: 408-417, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32308834

RESUMEN

We consider the task of producing a useful clustering of healthcare providers from their clinical action signature- their drug, procedure, and billing codes. Because high-dimensional sparse count vectors are challenging to cluster, we develop a novel autoencoder framework to address this task. Our solution creates a low-dimensional embedded representation of the high-dimensional space that preserves angular relationships and assigns examples to clusters while optimizing the quality of this clustering. Our method is able to find a better clustering than under a two-step alternative, e.g., projected K means/medoids, where a representation is learned and then clustering is applied to the representation. We demonstrate our method's characteristics through quantitative and qualitative analysis of real and simulated data, including in several real-world healthcare case studies. Finally, we develop a tool to enhance exploratory analysis of providers based on their clinical behaviors.


Asunto(s)
Análisis por Conglomerados , Simulación por Computador , Personal de Salud , Medicare , Anciano , Algoritmos , Humanos , Reconocimiento de Normas Patrones Automatizadas , Estados Unidos
14.
Dig Dis Sci ; 63(7): 1794-1800, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29696479

RESUMEN

BACKGROUND: ADR is a widely used colonoscopy quality indicator. Calculation of ADR is labor-intensive and cumbersome using current electronic medical databases. Natural language processing (NLP) is a method used to extract meaning from unstructured or free text data. AIMS: (1) To develop and validate an accurate automated process for calculation of adenoma detection rate (ADR) and serrated polyp detection rate (SDR) on data stored in widely used electronic health record systems, specifically Epic electronic health record system, Provation® endoscopy reporting system, and Sunquest PowerPath pathology reporting system. METHODS: Screening colonoscopies performed between June 2010 and August 2015 were identified using the Provation® reporting tool. An NLP pipeline was developed to identify adenomas and sessile serrated polyps (SSPs) on pathology reports corresponding to these colonoscopy reports. The pipeline was validated using a manual search. Precision, recall, and effectiveness of the natural language processing pipeline were calculated. ADR and SDR were then calculated. RESULTS: We identified 8032 screening colonoscopies that were linked to 3821 pathology reports (47.6%). The NLP pipeline had an accuracy of 100% for adenomas and 100% for SSPs. Mean total ADR was 29.3% (range 14.7-53.3%); mean male ADR was 35.7% (range 19.7-62.9%); and mean female ADR was 24.9% (range 9.1-51.0%). Mean total SDR was 4.0% (0-9.6%). CONCLUSIONS: We developed and validated an NLP pipeline that accurately and automatically calculates ADRs and SDRs using data stored in Epic, Provation® and Sunquest PowerPath. This NLP pipeline can be used to evaluate colonoscopy quality parameters at both individual and practice levels.


Asunto(s)
Adenocarcinoma/diagnóstico , Pólipos Adenomatosos/diagnóstico , Neoplasias del Colon/diagnóstico , Pólipos del Colon/diagnóstico , Colonoscopía , Detección Precoz del Cáncer/métodos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Adenocarcinoma/patología , Pólipos Adenomatosos/patología , Automatización , Neoplasias del Colon/patología , Pólipos del Colon/patología , Colonoscopía/normas , Detección Precoz del Cáncer/normas , Femenino , Humanos , Masculino , Valor Predictivo de las Pruebas , Indicadores de Calidad de la Atención de Salud , Reproducibilidad de los Resultados
15.
J Photochem Photobiol B ; 180: 140-148, 2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29413697

RESUMEN

The temperature dependencies of the rate of dark recombination of separated charges between the photoactive bacteriochlorophyll and the primary quinone acceptor (QA) in photosynthetic reaction centers (RCs) of the purple bacteria Rhodobacter sphaeroides (Rb. sphaeroides) were investigated. Measurements were performed in water-glycerol and trehalose environments after freezing to -180 °C in the dark and under actinic light with subsequent heating. Simultaneously, the RC tryptophanyl fluorescence lifetime in the spectral range between 323 and 348 nm was measured under these conditions. A correlation was found between the temperature dependencies of the functional and dynamic parameters of RCs in different solvent mixtures. For the first time, differences in the average fluorescence lifetime of tryptophanyl residues were measured between RCs frozen in the dark and in the actinic light. The obtained results can be explained by the RC transitions between different conformational states and the dynamic processes in the structure of the hydrogen bonds of RCs. We assumed that RCs exist in two main microconformations - "fast" and "slow", which are characterized by different rates of P+ and QA- recombination reactions. The "fast" conformation is induced in frozen RCs in the dark, while the "slow" conformation of RC occurs when the RC preparation is frozen under actinic light. An explanation of the temperature dependencies of tryptophan fluorescence lifetimes in RC proteins was made under the assumption that temperature changes affect mainly the electron transfer from the indole ring of the tryptophan molecule to the nearest amide or carboxyl groups.


Asunto(s)
Benzoquinonas/química , Proteínas del Complejo del Centro de Reacción Fotosintética/química , Rhodobacter sphaeroides/metabolismo , Triptófano/química , Transporte de Electrón , Enlace de Hidrógeno , Cinética , Luz , Proteínas del Complejo del Centro de Reacción Fotosintética/metabolismo , Teoría Cuántica , Espectrometría de Fluorescencia , Temperatura , Triptófano/metabolismo
16.
J Pers Med ; 6(1)2016 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-26927184

RESUMEN

We have designed a Biobank Portal that lets researchers request Biobank samples and genotypic data, query associated electronic health records, and design and download datasets containing de-identified attributes about consented Biobank subjects. This do-it-yourself functionality puts a wide variety and volume of data at the fingertips of investigators, allowing them to create custom datasets for their clinical and genomic research from complex phenotypic data and quickly obtain corresponding samples and genomic data. The Biobank Portal is built upon the i2b2 infrastructure [1] and uses an open-source web client that is available to faculty members and other investigators behind an institutional firewall. Built-in privacy measures [2] ensure that the data in the Portal are utilized only according to the processes to which the patients have given consent.

17.
PLoS One ; 10(8): e0136651, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26301417

RESUMEN

BACKGROUND: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. METHODS AND RESULTS: We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. CONCLUSIONS: We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.


Asunto(s)
Enfermedad de la Arteria Coronaria/epidemiología , Diabetes Mellitus/epidemiología , Registros Electrónicos de Salud , Adulto , Anciano , Algoritmos , Artritis Reumatoide/complicaciones , Artritis Reumatoide/epidemiología , Artritis Reumatoide/fisiopatología , Enfermedad de la Arteria Coronaria/complicaciones , Enfermedad de la Arteria Coronaria/fisiopatología , Diabetes Mellitus/fisiopatología , Femenino , Humanos , Hiperlipidemias/complicaciones , Hiperlipidemias/epidemiología , Hiperlipidemias/fisiopatología , Enfermedades Inflamatorias del Intestino/complicaciones , Enfermedades Inflamatorias del Intestino/epidemiología , Enfermedades Inflamatorias del Intestino/fisiopatología , Masculino , Persona de Mediana Edad , Procesamiento de Lenguaje Natural , Fenotipo , Factores de Riesgo
18.
Am J Psychiatry ; 172(4): 363-72, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25827034

RESUMEN

OBJECTIVE: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. METHOD: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. RESULTS: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. CONCLUSIONS: Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.


Asunto(s)
Trastorno Bipolar/diagnóstico , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Adulto , Anciano , Algoritmos , Trastorno Bipolar/clasificación , Trastorno Bipolar/psicología , Estudios de Casos y Controles , Estudios de Cohortes , Femenino , Humanos , Masculino , Persona de Mediana Edad , Fenotipo , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
19.
J Stroke Cerebrovasc Dis ; 23(8): 2031-2035, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25085345

RESUMEN

BACKGROUND: Spinal manipulation has been associated with cervical arterial dissection and stroke but a causal relationship has been questioned by population-based studies. Earlier studies identified cases using International Classification of Diseases Ninth Revision (ICD-9) codes specific to anatomic stroke location rather than stroke etiology. We hypothesize that case misclassification occurred in these previous studies and an underestimation of the strength of the association. We also predicted that case misclassification would differ by patient age. METHODS: We identified cases in the Veterans Health Administration database using the same strategy as the prior studies. The electronic medical record was then screened for the word "dissection." The presence of atraumatic dissection was determined by medical record review by a neurologist. RESULTS: Of 3690 patients found by ICD-9 codes over a 30-month period, 414 (11.2%) had confirmed cervical artery dissection with a positive predictive value of 10.5% (95% confidence interval [CI] 9.6%-11.5%). The positive predictive value was higher in patients less than 45 years of age vs 45 years of age or older (41% vs 9%, P < .001). We reanalyzed a previous study, which reported no association between spinal manipulation and cervical artery dissection (odds ratio [OR] = 1.12, 95% CI .77-1.63) and recalculated an odds ratio of 2.15 (95% CI .98-4.69). For patients less than 45 years of age, the OR was 6.91 (95% CI 2.59-13.74). CONCLUSIONS: Prior studies grossly misclassified cases of cervical dissection and mistakenly dismissed a causal association with manipulation. Our study indicates that the OR for spinal manipulation exposure in cervical artery dissection is higher than previously reported.


Asunto(s)
Envejecimiento/patología , Manipulación Espinal/clasificación , Manipulación Espinal/estadística & datos numéricos , Disección de la Arteria Vertebral/clasificación , Disección de la Arteria Vertebral/epidemiología , Adulto , Anciano , Registros Electrónicos de Salud , Femenino , Humanos , Clasificación Internacional de Enfermedades/normas , Masculino , Persona de Mediana Edad , Oportunidad Relativa , Factores de Riesgo
20.
J Am Med Inform Assoc ; 19(5): 809-16, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22707743

RESUMEN

OBJECTIVE: This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. DESIGN: Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. MEASUREMENTS: Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. RESULTS: The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. CONCLUSION: For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.


Asunto(s)
Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Algoritmos , Inteligencia Artificial , Humanos , Curva ROC
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...