Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Circ Cardiovasc Qual Outcomes ; 13(10): e006516, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33079591

RESUMEN

BACKGROUND: The electronic medical record contains a wealth of information buried in free text. We created a natural language processing algorithm to identify patients with atrial fibrillation (AF) using text alone. METHODS AND RESULTS: We created 3 data sets from patients with at least one AF billing code from 2010 to 2017: a training set (n=886), an internal validation set from site no. 1 (n=285), and an external validation set from site no. 2 (n=276). A team of clinicians reviewed and adjudicated patients as AF present or absent, which served as the reference standard. We trained 54 algorithms to classify each patient, varying the model, number of features, number of stop words, and the method used to create the feature set. The algorithm with the highest F-score (the harmonic mean of sensitivity and positive predictive value) in the training set was applied to the validation sets. F-scores and area under the receiver operating characteristic curves were compared between site no. 1 and site no. 2 using bootstrapping. Adjudicated AF prevalence was 75.1% at site no. 1 and 86.2% at site no. 2. Among 54 algorithms, the best performing model was logistic regression, using 1000 features, 100 stop words, and term frequency-inverse document frequency method to create the feature set, with sensitivity 92.8%, specificity 93.9%, and an area under the receiver operating characteristic curve of 0.93 in the training set. The performance at site no. 1 was sensitivity 92.5%, specificity 88.7%, with an area under the receiver operating characteristic curve of 0.91. The performance at site no. 2 was sensitivity 89.5%, specificity 71.1%, with an area under the receiver operating characteristic curve of 0.80. The F-score was lower at site no. 2 compared with site no. 1 (92.5% [SD, 1.1%] versus 94.2% [SD, 1.1%]; P<0.001). CONCLUSIONS: We developed a natural language processing algorithm to identify patients with AF using text alone, with >90% F-score at 2 separate sites. This approach allows better use of the clinical narrative and creates an opportunity for precise, high-throughput cohort identification.


Asunto(s)
Fibrilación Atrial/diagnóstico , Diagnóstico por Computador , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Anciano , Anciano de 80 o más Años , Fibrilación Atrial/clasificación , Fibrilación Atrial/epidemiología , Chicago/epidemiología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Prevalencia , Reproducibilidad de los Resultados , Utah/epidemiología
2.
J Am Heart Assoc ; 9(5): e014527, 2020 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-32098599

RESUMEN

Background Electronic medical records (EMRs) allow identification of disease-specific patient populations, but varying electronic cohort definitions could result in different populations. We compared the characteristics of an electronic medical record-derived atrial fibrillation (AF) patient population using 5 different electronic cohort definitions. Methods and Results Adult patients with at least 1 AF billing code from January 1, 2010, to December 31, 2017, were included. Based on different electronic cohort definitions, we trained 5 different logistic regression models using a labeled training data set (n=786). Each model yielded a predicted probability; patients were classified as having AF if the probability was higher than a specified cut point. Test characteristics were calculated for each model. These models were then applied to the full cohort and resulting characteristics were compared. In the training set, the comprehensive model (including demographics, billing codes, and natural language processing results) performed best, with an area under the curve of 0.89, sensitivity of 0.90, and specificity of 0.87. Among a candidate population (n=22 000), the proportion of patients identified as having AF varied from 61% in the model using diagnosis or procedure International Classification of Diseases (ICD) billing codes to 83% in the model using natural language processing of clinical notes. Among identified AF patients, the proportion of patients with a CHA2DS2-VASc score ≥2 varied from 69% to 85%; oral anticoagulant treatment rates varied from 50% to 66% depending on the model. Conclusions Different electronic cohort definitions result in substantially different AF study samples. This difference threatens the quality and reproducibility of electronic medical record-based research and quality initiatives.


Asunto(s)
Fibrilación Atrial/diagnóstico , Registros Electrónicos de Salud , Adulto , Anciano , Anticoagulantes/uso terapéutico , Fibrilación Atrial/terapia , Estudios de Cohortes , Current Procedural Terminology , Electrocardiografía , Femenino , Humanos , Clasificación Internacional de Enfermedades , Modelos Logísticos , Masculino , Persona de Mediana Edad , Procesamiento de Lenguaje Natural , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...