Your browser doesn't support javascript.
loading
Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit.
Kraljevic, Zeljko; Searle, Thomas; Shek, Anthony; Roguski, Lukasz; Noor, Kawsar; Bean, Daniel; Mascio, Aurelie; Zhu, Leilei; Folarin, Amos A; Roberts, Angus; Bendayan, Rebecca; Richardson, Mark P; Stewart, Robert; Shah, Anoop D; Wong, Wai Keong; Ibrahim, Zina; Teo, James T; Dobson, Richard J B.
Afiliación
  • Kraljevic Z; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
  • Searle T; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
  • Shek A; Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
  • Roguski L; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
  • Noor K; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
  • Bean D; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK.
  • Mascio A; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
  • Zhu L; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
  • Folarin AA; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and K
  • Roberts A; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and Ki
  • Bendayan R; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
  • Richardson MP; Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
  • Stewart R; Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
  • Shah AD; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
  • Wong WK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK.
  • Ibrahim Z; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
  • Teo JT; Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Department of Neurology, King's College Hospital NHS Foundation Trust, London, UK.
  • Dobson RJB; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical
Artif Intell Med ; 117: 102083, 2021 07.
Article en En | MEDLINE | ID: mdl-34127232
ABSTRACT
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F10.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ∼8.8B words from ∼17M clinical records and further fine-tuning with ∼6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Systematized Nomenclature of Medicine Tipo de estudio: Prognostic_studies Idioma: En Revista: Artif Intell Med Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Reino Unido

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Systematized Nomenclature of Medicine Tipo de estudio: Prognostic_studies Idioma: En Revista: Artif Intell Med Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Reino Unido