Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions.

Petit-Jean, Thomas; Gérardin, Christel; Berthelot, Emmanuelle; Chatellier, Gilles; Frank, Marie; Tannier, Xavier; Kempf, Emmanuelle; Bey, Romain

Petit-Jean, Thomas; Gérardin, Christel; Berthelot, Emmanuelle; Chatellier, Gilles; Frank, Marie; Tannier, Xavier; Kempf, Emmanuelle; Bey, Romain.

Afiliação

Petit-Jean T; Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France.
Gérardin C; Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France.
Berthelot E; Institut Pierre-Louis d'Epidémiologie et de Santé Publique, INSERM, Sorbonne Université, Paris, 75012, France.
Chatellier G; Department of Cardiology, Hôpital Bicêtre, Assistance Publique-Hôpitaux de Paris, Le Kremlin Bicêtre, 94270, France.
Frank M; Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, 75012, France.
Tannier X; Department of Medical Informatics, Assistance Publique-Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, 75015, France.
Kempf E; Department of Medical Informatics, Hôpitaux Universitaires Paris-Saclay, Assistance Publique-Hôpitaux de Paris, Le Kremlin-Bicêtre, 94270, France.
Bey R; Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), INSERM, Université Sorbonne Paris Nord, Sorbonne Université, Paris, 75005, France.

J Am Med Inform Assoc ; 31(6): 1280-1290, 2024 May 20.

Article em En | MEDLINE | ID: mdl-38573195

ABSTRACT

ABSTRACT

OBJECTIVE:

To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow. MATERIALS AND

METHODS:

The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting.

RESULTS:

The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry.

CONCLUSIONS:

We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.

Assuntos

Registros Eletrônicos de Saúde; Aprendizado de Máquina; Processamento de Linguagem Natural; Fluxo de Trabalho; Humanos; Data Warehousing; Algoritmos; França; Confidencialidade

Palavras-chave

Charlson score index; comorbidities; domain adaptation; natural language processing; privacy

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Registros Eletrônicos de Saúde / Fluxo de Trabalho / Aprendizado de Máquina Limite: Humans País/Região como assunto: Europa Idioma: En Revista: J Am Med Inform Assoc Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: França

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google