Translating and evaluating historic phenotyping algorithms using SNOMED CT.

Elkheder, Musaab; Gonzalez-Izquierdo, Arturo; Qummer Ul Arfeen, Muhammad; Kuan, Valerie; Lumbers, R Thomas; Denaxas, Spiros; Shah, Anoop D

Elkheder, Musaab; Gonzalez-Izquierdo, Arturo; Qummer Ul Arfeen, Muhammad; Kuan, Valerie; Lumbers, R Thomas; Denaxas, Spiros; Shah, Anoop D.

Afiliação

Elkheder M; Institute of Health Informatics, University College London, London, UK.
Gonzalez-Izquierdo A; Institute of Health Informatics, University College London, London, UK.
Qummer Ul Arfeen M; Health Data Research UK, London, UK.
Kuan V; Institute of Health Informatics, University College London, London, UK.
Lumbers RT; Institute of Health Informatics, University College London, London, UK.
Denaxas S; Institute of Health Informatics, University College London, London, UK.
Shah AD; Barts Health NHS Trust, London, UK.

J Am Med Inform Assoc ; 30(2): 222-232, 2023 01 18.

Article em En | MEDLINE | ID: mdl-36083213

ABSTRACT

ABSTRACT

OBJECTIVE:

Patient phenotype definitions based on terminologies are required for the computational use of electronic health records. Within UK primary care research databases, such definitions have typically been represented as flat lists of Read terms, but Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) (a widely employed international reference terminology) enables the use of relationships between concepts, which could facilitate the phenotyping process. We implemented SNOMED CT-based phenotyping approaches and investigated their performance in the CPRD Aurum primary care database. MATERIALS AND

METHODS:

We developed SNOMED CT phenotype definitions for 3 exemplar diseases diabetes mellitus, asthma, and heart failure, using 3

methods:

"primary" (primary concept and its descendants), "extended" (primary concept, descendants, and additional relations), and "value set" (based on text searches of term descriptions). We also derived SNOMED CT codelists in a semiautomated manner for 276 disease phenotypes used in a study of health across the lifecourse. Cohorts selected using each codelist were compared to "gold standard" manually curated Read codelists in a sample of 500 000 patients from CPRD Aurum.

RESULTS:

SNOMED CT codelists selected a similar set of patients to Read, with F1 scores exceeding 0.93, and age and sex distributions were similar. The "value set" and "extended" codelists had slightly greater recall but lower precision than "primary" codelists. We were able to represent 257 of the 276 phenotypes by a single concept hierarchy, and for 135 phenotypes, the F1 score was greater than 0.9.

CONCLUSIONS:

SNOMED CT provides an efficient way to define disease phenotypes, resulting in similar patient populations to manually curated codelists.

Assuntos

Asma; Systematized Nomenclature of Medicine; Humanos; Algoritmos; Registros Eletrônicos de Saúde; Bases de Dados Factuais

Palavras-chave

SNOMED CT; electronic health records; ontology; phenotype; terminology

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Asma / Systematized Nomenclature of Medicine Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google