Búsqueda | Portal de Búsqueda de la BVS España

Final Report on the German Clinical Reference Corpus 3000PA.

Hahn, Udo; Modersohn, Luise; Faller, Jakob; Lohr, Christina.

Stud Health Technol Inform ; 310: 599-603, 2024 Jan 25.

Artículo en Inglés | MEDLINE | ID: mdl-38269879

RESUMEN

We here report on one of the outcomes of a large-scale German research program, the Medical Informatics Initiative (MII), aiming at the development of a solid data and software infrastructure for German-language clinical natural language processing. Within this framework, we have developed 3000PA, a national clinical reference corpus composed of patient records from three clinical university sites and annotated with a multitude of semantic annotation layers (including medical named entities, semantic and temporal relations between entities, as well as certainty and negation information related to entities and relations). This non-sharable corpus has been complemented by three sharable ones (JSYNCC, GGPONC, and GRASCCO). Overall, 3000PA, JSYNCC and GRASCCO feature about 2.1 million metadata points.

Asunto(s)

Lenguaje , Informática Médica , Humanos , Semántica , Metadatos , Procesamiento de Lenguaje Natural

De-Identifying GRASCCO - A Pilot Study for the De-Identification of the German Medical Text Project (GeMTeX) Corpus.

Lohr, Christina; Matthies, Franz; Faller, Jakob; Modersohn, Luise; Riedel, Andrea; Hahn, Udo; Kiser, Rebekka; Boeker, Martin; Meineke, Frank.

Stud Health Technol Inform ; 317: 171-179, 2024 Aug 30.

Artículo en Inglés | MEDLINE | ID: mdl-39234720

RESUMEN

INTRODUCTION: The German Medical Text Project (GeMTeX) is one of the largest infrastructure efforts targeting German-language clinical documents. We here introduce the architecture of the de-identification pipeline of GeMTeX. METHODS: This pipeline comprises the export of raw clinical documents from the local hospital information system, the import into the annotation platform INCEpTION, fully automatic pre-tagging with protected health information (PHI) items by the Averbis Health Discovery pipeline, a manual curation step of these pre-annotated data, and, finally, the automatic replacement of PHI items with type-conformant substitutes. This design was implemented in a pilot study involving six annotators and two curators each at the Data Integration Centers of the University Hospitals Leipzig and Erlangen. RESULTS: As a proof of concept, the publicly available Graz Synthetic Text Clinical Corpus (GRASSCO) was enhanced with PHI annotations in an annotation campaign for which reasonable inter-annotator agreement values of Krippendorff's α ≈ 0.97 can be reported. CONCLUSION: These curated 1.4 K PHI annotations are released as open-source data constituting the first publicly available German clinical language text corpus with PHI metadata.

Asunto(s)

Registros Electrónicos de Salud , Proyectos Piloto , Alemania , Procesamiento de Lenguaje Natural , Confidencialidad , Humanos , Seguridad Computacional

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA