Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Biomed Inform ; 65: 76-96, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27832965

RESUMO

Publishing data about patients that contain both demographics and diagnosis codes is essential to perform large-scale, low-cost medical studies. However, preserving the privacy and utility of such data is challenging, because it requires: (i) guarding against identity disclosure (re-identification) attacks based on both demographics and diagnosis codes, (ii) ensuring that the anonymized data remain useful in intended analysis tasks, and (iii) minimizing the information loss, incurred by anonymization, to preserve the utility of general analysis tasks that are difficult to determine before data publishing. Existing anonymization approaches are not suitable for being used in this setting, because they cannot satisfy all three requirements. Therefore, in this work, we propose a new approach to deal with this problem. We enforce the requirement (i) by applying (k,km)-anonymity, a privacy principle that prevents re-identification from attackers who know the demographics of a patient and up to m of their diagnosis codes, where k and m are tunable parameters. To capture the requirement (ii), we propose the concept of utility constraint for both demographics and diagnosis codes. Utility constraints limit the amount of generalization and are specified by data owners (e.g., the healthcare institution that performs anonymization). We also capture requirement (iii), by employing well-established information loss measures for demographics and for diagnosis codes. To realize our approach, we develop an algorithm that enforces (k,km)-anonymity on a dataset containing both demographics and diagnosis codes, in a way that satisfies the specified utility constraints and with minimal information loss, according to the measures. Our experiments with a large dataset containing more than 200,000 electronic health records show the effectiveness and efficiency of our algorithm.


Assuntos
Algoritmos , Confidencialidade , Grupos Diagnósticos Relacionados , Registros Eletrônicos de Saúde , Demografia , Humanos
2.
J Biomed Inform ; 50: 4-19, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24936746

RESUMO

The dissemination of Electronic Health Records (EHRs) can be highly beneficial for a range of medical studies, spanning from clinical trials to epidemic control studies, but it must be performed in a way that preserves patients' privacy. This is not straightforward, because the disseminated data need to be protected against several privacy threats, while remaining useful for subsequent analysis tasks. In this work, we present a survey of algorithms that have been proposed for publishing structured patient data, in a privacy-preserving way. We review more than 45 algorithms, derive insights on their operation, and highlight their advantages and disadvantages. We also provide a discussion of some promising directions for future research in this area.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Privacidade , Revelação da Verdade
3.
J Biomed Inform ; 50: 46-61, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24879898

RESUMO

The dissemination of Electronic Health Record (EHR) data, beyond the originating healthcare institutions, can enable large-scale, low-cost medical studies that have the potential to improve public health. Thus, funding bodies, such as the National Institutes of Health (NIH) in the U.S., encourage or require the dissemination of EHR data, and a growing number of innovative medical investigations are being performed using such data. However, simply disseminating EHR data, after removing identifying information, may risk privacy, as patients can still be linked with their record, based on diagnosis codes. This paper proposes the first approach that prevents this type of data linkage using disassociation, an operation that transforms records by splitting them into carefully selected subsets. Our approach preserves privacy with significantly lower data utility loss than existing methods and does not require data owners to specify diagnosis codes that may lead to identity disclosure, as these methods do. Consequently, it can be employed when data need to be shared broadly and be used in studies, beyond the intended ones. Through extensive experiments using EHR data, we demonstrate that our method can construct data that are highly useful for supporting various types of clinical case count studies and general medical analysis tasks.


Assuntos
Registros Eletrônicos de Saúde , Privacidade , Humanos , Estados Unidos
4.
Proc Natl Acad Sci U S A ; 107(17): 7898-903, 2010 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-20385806

RESUMO

Genome-wide association studies (GWAS) facilitate the discovery of genotype-phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients' standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data "as is" may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients' identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks.


Assuntos
Algoritmos , Testes Anônimos/métodos , Coleta de Dados/métodos , Registros Eletrônicos de Saúde , Privacidade Genética , Estudo de Associação Genômica Ampla/métodos , Medicina de Precisão/métodos
5.
Expert Syst Appl ; 39(10): 9764-9777, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22563145

RESUMO

Transaction data record various information about individuals, including their purchases and diagnoses, and are increasingly published to support large-scale and low-cost studies in domains such as marketing and medicine. However, the dissemination of transaction data may lead to privacy breaches, as it allows an attacker to link an individual's record to their identity. Approaches that anonymize data by eliminating certain values in an individual's record or by replacing them with more general values have been proposed recently, but they often produce data of limited usefulness. This is because these approaches adopt value transformation strategies that do not guarantee data utility in intended applications and objective measures that may lead to excessive data distortion. In this paper, we propose a novel approach for anonymizing data in a way that satisfies data publishers' utility requirements and incurs low information loss. To achieve this, we introduce an accurate information loss measure and an effective anonymization algorithm that explores a large part of the problem space. An extensive experimental study, using click-stream and medical data, demonstrates that our approach permits many times more accurate query answering than the state-of-the-art methods, while it is comparable to them in terms of efficiency.

6.
AMIA Annu Symp Proc ; 2022: 279-288, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128430

RESUMO

Data access limitations have stifled COVID-19 disparity investigations in the United States. Though federal and state legislation permits publicly disseminating de-identified data, methods for de-identification, including a recently proposed dynamic policy approach to pandemic data sharing, remain unproved in their ability to support pandemic disparity studies. Thus, in this paper, we evaluate how such an approach enables timely, accurate, and fair disparity detection, with respect to potential adversaries with varying prior knowledge about the population. We show that, when considering reasonably enabled adversaries, dynamic policies support up to three times earlier disparity detection in partially synthetic data than data sharing policies derived from two current, public datasets. Using real-world COVID-19 data, we also show how granular date information, which dynamic policies were designed to share, improves disparity characterization. Our results highlight the potential of the dynamic policy approach to publish data that supports disparity investigations in current and future pandemics.


Assuntos
COVID-19 , Humanos , Estados Unidos , COVID-19/epidemiologia , Políticas , Disseminação de Informação , Pandemias , Vigilância em Saúde Pública/métodos
7.
J Am Med Inform Assoc ; 29(5): 853-863, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-35182149

RESUMO

OBJECTIVE: Supporting public health research and the public's situational awareness during a pandemic requires continuous dissemination of infectious disease surveillance data. Legislation, such as the Health Insurance Portability and Accountability Act of 1996 and recent state-level regulations, permits sharing deidentified person-level data; however, current deidentification approaches are limited. Namely, they are inefficient, relying on retrospective disclosure risk assessments, and do not flex with changes in infection rates or population demographics over time. In this paper, we introduce a framework to dynamically adapt deidentification for near-real time sharing of person-level surveillance data. MATERIALS AND METHODS: The framework leverages a simulation mechanism, capable of application at any geographic level, to forecast the reidentification risk of sharing the data under a wide range of generalization policies. The estimates inform weekly, prospective policy selection to maintain the proportion of records corresponding to a group size less than 11 (PK11) at or below 0.1. Fixing the policy at the start of each week facilitates timely dataset updates and supports sharing granular date information. We use August 2020 through October 2021 case data from Johns Hopkins University and the Centers for Disease Control and Prevention to demonstrate the framework's effectiveness in maintaining the PK11 threshold of 0.01. RESULTS: When sharing COVID-19 county-level case data across all US counties, the framework's approach meets the threshold for 96.2% of daily data releases, while a policy based on current deidentification techniques meets the threshold for 32.3%. CONCLUSION: Periodically adapting the data publication policies preserves privacy while enhancing public health utility through timely updates and sharing epidemiologically critical features.


Assuntos
COVID-19 , Privacidade , Humanos , Pandemias , Políticas , Estudos Prospectivos , Saúde Pública , Estudos Retrospectivos
8.
IEEE J Biomed Health Inform ; 24(11): 3182-3188, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32750932

RESUMO

Healthcare enterprises are starting to adopt cloud computing due to its numerous advantages over traditional infrastructures. This has become a necessity because of the increased volume, velocity and variety of healthcare data, and the need to facilitate data correlation and large-scale analysis. Cloud computing infrastructures have the power to offer continuous acquisition of data from multiple heterogeneous sources, efficient data integration, and big data analysis. At the same time, security, availability, and disaster recovery are critical factors aiding towards the adoption of cloud computing. However, the migration of healthcare workloads to cloud is not straightforward due to the vagueness in healthcare data standards, heterogeneity and sensitive nature of healthcare data, and many regulations that govern its usage. This paper highlights the need for providing healthcare data acquisition using cloud infrastructures and presents the challenges, requirements, use-cases, and best practices for building a state-of-the-art healthcare data ingestion service on cloud.


Assuntos
Computação em Nuvem , Segurança Computacional , Big Data , Atenção à Saúde , Ingestão de Alimentos , Humanos
10.
AMIA Annu Symp Proc ; 2019: 313-322, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308824

RESUMO

Using electronic health data to predict adverse drug reaction (ADR) incurs practical challenges, such as lack of adequate data from any single site for rare ADR detection, resource constraints on integrating data from multiple sources, and privacy concerns with creating a centralized database from person-specific, sensitive data. We introduce a federated learning framework that can learn a global ADR prediction model from distributed health data held locally at different sites. We propose two novel methods of local model aggregation to improve the predictive capability of the global model. Through comprehensive experimental evaluation using real-world health data from 1 million patients, we demonstrate the effectiveness of our proposed approach in achieving comparable performance to centralized learning and outperforming localized learning models for two types of ADRs. We also demonstrate that, for varying data distributions, our aggregation methods outperform state-of-the-art techniques, in terms of precision, recall, and accuracy.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Bases de Dados Factuais , Humanos , Modelos Logísticos , Máquina de Vetores de Suporte
11.
J Am Med Inform Assoc ; 21(2): 337-44, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24045907

RESUMO

OBJECTIVE: Common chronic diseases such as hypertension are costly and difficult to manage. Our ultimate goal is to use data from electronic health records to predict the risk and timing of deterioration in hypertension control. Towards this goal, this work predicts the transition points at which hypertension is brought into, as well as pushed out of, control. METHOD: In a cohort of 1294 patients with hypertension enrolled in a chronic disease management program at the Vanderbilt University Medical Center, patients are modeled as an array of features derived from the clinical domain over time, which are distilled into a core set using an information gain criteria regarding their predictive performance. A model for transition point prediction was then computed using a random forest classifier. RESULTS: The most predictive features for transitions in hypertension control status included hypertension assessment patterns, comorbid diagnoses, procedures and medication history. The final random forest model achieved a c-statistic of 0.836 (95% CI 0.830 to 0.842) and an accuracy of 0.773 (95% CI 0.766 to 0.780). CONCLUSIONS: This study achieved accurate prediction of transition points of hypertension control status, an important first step in the long-term goal of developing personalized hypertension management plans.


Assuntos
Gerenciamento Clínico , Registros Eletrônicos de Saúde , Hipertensão/terapia , Anti-Hipertensivos/uso terapêutico , Doença Crônica , Humanos , Modelos Teóricos , Prognóstico
12.
ITAB Corfu Greece (2010) ; 2010: 1-6, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25580471

RESUMO

Patient-specific records contained in Electronic Medical Record (EMR) systems are increasingly combined with genomic sequences and deposited into bio-repositories. This allows researchers to perform large-scale, low-cost biomedical studies, such as Genome-Wide Association Studies (GWAS) aimed at identifying associations between genetic factors and complex health-related phenomena, which are an integral facet of personalized medicine. Disseminating this data, however, raises serious privacy concerns because patients' genomic sequences can be linked to their identities through diagnosis codes. This work proposes an approach that guards against this type of data linkage by modifying diagnosis codes in a way that limits the probability of associating a patient's identity to their genomic sequence. Experiments using EMRs from the Vanderbilt University Medical Center verify that our approach generates data that can support up to 29.4% more GWAS than the best-so-far method, while permitting biomedical analysis tasks several orders of magnitude more accurately.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA