Pesquisa | Portal Regional da BVS

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Carrell, David S; Malin, Bradley A; Cronkite, David J; Aberdeen, John S; Clark, Cheryl; Li, Muqun Rachel; Bastakoty, Dikshya; Nyemba, Steve; Hirschman, Lynette.

J Am Med Inform Assoc ; 27(9): 1374-1382, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32930712

RESUMO

OBJECTIVE: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII. MATERIALS AND METHODS: Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers. RESULTS: Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers. DISCUSSION AND CONCLUSIONS: Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

Assuntos

Confidencialidade , Anonimização de Dados , Registros Eletrônicos de Saúde , Segurança Computacional , Humanos , Processamento de Linguagem Natural

The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Carrell, David S; Cronkite, David J; Li, Muqun Rachel; Nyemba, Steve; Malin, Bradley A; Aberdeen, John S; Hirschman, Lynette.

J Am Med Inform Assoc ; 26(12): 1536-1544, 2019 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-31390016

RESUMO

OBJECTIVE: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus. MATERIALS AND METHODS: We modeled a scenario where an institution (the defender) externally shared an 800-note corpus of actual outpatient clinical encounter notes from a large, integrated health care delivery system in Washington State. These notes were deidentified by a machine-learned PII tagger and HIPS resynthesis. A malicious attacker obtained and performed a parrot attack intending to expose leaked PII in this corpus. Specifically, the attacker mimicked the defender's process by manually annotating all PII-like content in half of the released corpus, training a PII tagger on these data, and using the trained model to tag the remaining encounter notes. The attacker hypothesized that untagged identifiers would be leaked PII, discoverable by manual review. We evaluated the attacker's success using measures of leak-detection rate and accuracy. RESULTS: The attacker correctly hypothesized that 211 (68%) of 310 actual PII leaks in the corpus were leaks, and wrongly hypothesized that 191 resynthesized PII instances were also leaks. One-third of actual leaks remained undetected. DISCUSSION AND CONCLUSION: A malicious parrot attack to reveal leaked PII in clinical text deidentified by machine-learned HIPS resynthesis can attenuate but not eliminate the protective effect of HIPS deidentification.

Assuntos

Segurança Computacional , Confidencialidade , Anonimização de Dados , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Informações Pessoalmente Identificáveis , Instituições de Assistência Ambulatorial , Atenção à Saúde , Humanos , Washington

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA