Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit.

Kraljevic, Zeljko; Searle, Thomas; Shek, Anthony; Roguski, Lukasz; Noor, Kawsar; Bean, Daniel; Mascio, Aurelie; Zhu, Leilei; Folarin, Amos A; Roberts, Angus; Bendayan, Rebecca; Richardson, Mark P; Stewart, Robert; Shah, Anoop D; Wong, Wai Keong; Ibrahim, Zina; Teo, James T; Dobson, Richard J B

Kraljevic, Zeljko; Searle, Thomas; Shek, Anthony; Roguski, Lukasz; Noor, Kawsar; Bean, Daniel; Mascio, Aurelie; Zhu, Leilei; Folarin, Amos A; Roberts, Angus; Bendayan, Rebecca; Richardson, Mark P; Stewart, Robert; Shah, Anoop D; Wong, Wai Keong; Ibrahim, Zina; Teo, James T; Dobson, Richard J B.

Afiliación

Kraljevic Z; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
Searle T; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
Shek A; Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
Roguski L; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
Noor K; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
Bean D; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK.
Mascio A; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
Zhu L; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
Folarin AA; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and K
Roberts A; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and Ki
Bendayan R; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
Richardson MP; Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
Stewart R; Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
Shah AD; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
Wong WK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
Ibrahim Z; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
Teo JT; Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Department of Neurology, King's College Hospital NHS Foundation Trust, London, UK.
Dobson RJB; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical

Artif Intell Med ; 117: 102083, 2021 07.

Article en En | MEDLINE | ID: mdl-34127232

ABSTRACT

ABSTRACT

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F10.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over â¼8.8B words from â¼17M clinical records and further fine-tuning with â¼6K clinician annotated examples. We show strong transferability (F1â¯>â¯0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.

Asunto(s)

Procesamiento de Lenguaje Natural; Systematized Nomenclature of Medicine; Registros Electrónicos de Salud; Almacenamiento y Recuperación de la Información; Unified Medical Language System

Palabras clave

Clinical concept embeddings; Clinical natural language processing; Clinical ontology embeddings; Electronic health record information extraction

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Systematized Nomenclature of Medicine Tipo de estudio: Prognostic_studies Idioma: En Revista: Artif Intell Med Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google