Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
IEEE Trans Knowl Data Eng ; 34(2): 996-1010, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36158636

RESUMO

The Cox proportional hazards model is a popular semi-parametric model for survival analysis. In this paper, we aim at developing a federated algorithm for the Cox proportional hazards model over vertically partitioned data (i.e., data from the same patient are stored at different institutions). We propose a novel algorithm, namely VERTICOX, to obtain the global model parameters in a distributed fashion based on the Alternating Direction Method of Multipliers (ADMM) framework. The proposed model computes intermediary statistics and exchanges them to calculate the global model without collecting individual patient-level data. We demonstrate that our algorithm achieves equivalent accuracy for the estimation of model parameters and statistics to that of its centralized realization. The proposed algorithm converges linearly under the ADMM framework. Its computational complexity and communication costs are polynomially and linearly associated with the number of subjects, respectively. Experimental results show that VERTICOX can achieve accurate model parameter estimation to support federated survival analysis over vertically distributed data by saving bandwidth and avoiding exchange of information about individual patients. The source code for VERTICOX is available at: https://github.com/daiwenrui/VERTICOX.

2.
J Biomed Inform ; 113: 103667, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33359112

RESUMO

Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.

3.
J Biomed Inform ; 78: 43-53, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29277597

RESUMO

Modern medical information systems enable the collection of massive temporal health data. Albeit these data have great potentials for advancing medical research, the data exploration and extraction of useful knowledge present significant challenges. In this work, we develop a new pattern matching technique which aims to facilitate the discovery of clinically useful knowledge from large temporal datasets. Our approach receives in input a set of temporal patterns modeling specific events of interest (e.g., doctor's knowledge, symptoms of diseases) and it returns data instances matching these patterns (e.g., patients exhibiting the specified symptoms). The resulting instances are ranked according to a significance score based on the p-value. Our experimental evaluations on a real-world dataset demonstrate the efficiency and effectiveness of our approach.


Assuntos
Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Pacientes/classificação , Reconhecimento Automatizado de Padrão/métodos , Curadoria de Dados , Bases de Dados Factuais , Atenção à Saúde , Humanos , Fatores de Tempo
4.
Artigo em Inglês | MEDLINE | ID: mdl-38152352

RESUMO

Measuring spatial accessibility to healthcare resources and facilities has long been an important problem in public health. For example, during disease outbreaks, sharing spatial accessibility data such as individual travel distances to health facilities is vital to policy making and designing effective interventions. However, sharing these data may raise privacy concerns, as information about individual data contributors (e.g., health status and residential address) may be disclosed. In this work, we investigate those unintended information leakage in spatial accessibility analysis. Specifically, we are interested in understanding whether sharing data for spatial accessibility computations may disclose individual participation (i.e., membership inference) and personal identifiable information (i.e., address inference). Furthermore, we propose two provably private algorithms that mitigate those privacy risks. The evaluation is conducted with real population and healthcare facilities data from Mecklenburg county, NC and Nashville, TN. Compared to state-of-the-art privacy practices, our methods effectively reduce the risks of membership and address disclosure, while providing useful data for spatial accessibility analysis.

5.
IEEE Int Conf Healthc Inform ; 2023: 81-90, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38152589

RESUMO

Deep neural networks have been increasingly integrated in healthcare applications to enable accurate predicative analyses. Sharing trained deep models not only facilitates knowledge integration in collaborative research efforts but also enables equitable access to computational intelligence. However, recent studies have shown that an adversary may leverage a shared model to learn the participation of a target individual in the training set. In this work, we investigate privacy-protecting model sharing for survival studies. Specifically, we pose three research questions. (1) Do deep survival models leak membership information? (2) How effective is differential privacy in defending against membership inference in deep survival analyses? (3) Are there other effects of differential privacy on deep survival analyses? Our study assesses the membership leakage in emerging deep survival models and develops differentially private training procedures to provide rigorous privacy protection. The experimental results show that deep survival models leak membership information and our approach effectively reduces membership inference risks. The results also show that differential privacy introduces a limited performance loss, and may improve the model robustness in the presence of noisy data, compared to non-private models.

6.
Proc ACM Int Conf Inf Knowl Manag ; 2023: 131-141, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37906633

RESUMO

Sharing health data is vital in advancing medical research and transforming knowledge into clinical practice. Meanwhile, protecting the privacy of data contributors is of paramount importance. To that end, several privacy approaches have been proposed to protect individual data contributors in data sharing, including data anonymization and data synthesis techniques. These approaches have shown promising results in providing privacy protection at the dataset level. In this work, we study the privacy challenges in enabling fine-grained privacy in health data sharing. Our work is motivated by recent research findings, in which patients and healthcare providers may have different privacy preferences and policies that need to be addressed. Specifically, we propose a novel and effective privacy solution that enables data curators (e.g., healthcare providers) to protect sensitive data elements while preserving data usefulness. Our solution builds on randomized techniques to provide rigorous privacy protection for sensitive elements and leverages graphical models to mitigate privacy leakage due to dependent elements. To enhance the usefulness of the shared data, our randomized mechanism incorporates domain knowledge to preserve semantic similarity and adopts a block-structured design to minimize utility loss. Evaluations with real-world health data demonstrate the effectiveness of our approach and the usefulness of the shared data for health applications.

7.
Proc IEEE Int Conf Big Data ; 2023: 5444-5453, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38585488

RESUMO

Effective disease surveillance systems require large-scale epidemiological data to improve health outcomes and quality of care for the general population. As data may be limited within a single site, multi-site data (e.g., from a number of local/regional health systems) need to be considered. Leveraging distributed data across multiple sites for epidemiological analysis poses significant challenges. Due to the sensitive nature of epidemiological data, it is imperative to design distributed solutions that provide strong privacy protections. Current privacy solutions often assume a central site, which is responsible for aggregating the distributed data and applying privacy protection before sharing the results (e.g., aggregation via secure primitives and differential privacy for sharing aggregate results). However, identifying such a central site may be difficult in practice and relying on a central site may introduce potential vulnerabilities (e.g., single point of failure). Furthermore, to support clinical interventions and inform policy decisions in a timely manner, epidemiological analysis need to reflect dynamic changes in the data. Yet, existing distributed privacy-protecting approaches were largely designed for static data (e.g., one-time data sharing) and cannot fulfill dynamic data requirements. In this work, we propose a privacy-protecting approach that supports the sharing of dynamic epidemiological analysis and provides strong privacy protection in a decentralized manner. We apply our solution in continuous survival analysis using the Kaplan-Meier estimation model while providing differential privacy protection. Our evaluations on a real dataset containing COVID-19 cases show that our method provides highly usable results.

8.
Artigo em Inglês | MEDLINE | ID: mdl-36120416

RESUMO

The use of deep learning techniques in medical applications holds great promises for advancing health care. However, there are growing privacy concerns regarding what information about individual data contributors (i.e., patients in the training set) these deep models may reveal when shared with external users. In this work, we first investigate the membership privacy risks in sharing deep learning models for cancer genomics tasks, and then study the applicability of privacy-protecting strategies for mitigating these privacy risks.

9.
Artigo em Inglês | MEDLINE | ID: mdl-36120417

RESUMO

Sharing time-to-event data is beneficial for enabling collaborative research efforts (e.g., survival studies), facilitating the design of effective interventions, and advancing patient care (e.g., early diagnosis). Despite numerous privacy solutions for sharing time-to-event data, recent research studies have shown that external information may become available (e.g., self-disclosure of study participation on social media) to an adversary, posing new privacy concerns. In this work, we formulate a cohort inference attack for time-to-event data sharing, in which an informed adversary aims at inferring the membership of a target individual in a specific cohort. Our study investigates the privacy risks associated with time-to-event data and evaluates the empirical privacy protection offered by popular privacy-protecting solutions (e.g., binning, differential privacy). Furthermore, we propose a novel approach to privately release individual level time-to-event data with high utility, while providing indistinguishability guarantees for the input value. Our method TE-Sanitizer is shown to provide effective mitigation against the inference attacks and high usefulness in survival analysis. The results and discussion provide domain experts with insights on the privacy and the usefulness of the studied methods.

10.
J Am Med Inform Assoc ; 29(7): 1152-1160, 2022 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-35380666

RESUMO

OBJECTIVE: Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics. METHODS AND MATERIALS: We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis. RESULTS: We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering). DISCUSSION: Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors. CONCLUSION: The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data.


Assuntos
Disseminação de Informação , Privacidade , Eletrocardiografia , Genômica , Humanos , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação
11.
Am J Ophthalmol ; 227: 74-86, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33497675

RESUMO

PURPOSE: To (1) use All of Us (AoU) data to validate a previously published single-center model predicting the need for surgery among individuals with glaucoma, (2) train new models using AoU data, and (3) share insights regarding this novel data source for ophthalmic research. DESIGN: Development and evaluation of machine learning models. METHODS: Electronic health record data were extracted from AoU for 1,231 adults diagnosed with primary open-angle glaucoma. The single-center model was applied to AoU data for external validation. AoU data were then used to train new models for predicting the need for glaucoma surgery using multivariable logistic regression, artificial neural networks, and random forests. Five-fold cross-validation was performed. Model performance was evaluated based on area under the receiver operating characteristic curve (AUC), accuracy, precision, and recall. RESULTS: The mean (standard deviation) age of the AoU cohort was 69.1 (10.5) years, with 57.3% women and 33.5% black, significantly exceeding representation in the single-center cohort (P = .04 and P < .001, respectively). Of 1,231 participants, 286 (23.2%) needed glaucoma surgery. When applying the single-center model to AoU data, accuracy was 0.69 and AUC was only 0.49. Using AoU data to train new models resulted in superior performance: AUCs ranged from 0.80 (logistic regression) to 0.99 (random forests). CONCLUSIONS: Models trained with national AoU data achieved superior performance compared with using single-center data. Although AoU does not currently include ophthalmic imaging, it offers several strengths over similar big-data sources such as claims data. AoU is a promising new data source for ophthalmic research.


Assuntos
Bases de Dados Factuais/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Cirurgia Filtrante/métodos , Glaucoma de Ângulo Aberto/diagnóstico , Glaucoma de Ângulo Aberto/cirurgia , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Armazenamento e Recuperação da Informação/métodos , Modelos Logísticos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Redes Neurais de Computação , Curva ROC
12.
Nat Genet ; 52(7): 646-654, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32601475

RESUMO

The sharing of genomic data holds great promise in advancing precision medicine and providing personalized treatments and other types of interventions. However, these opportunities come with privacy concerns, and data misuse could potentially lead to privacy infringement for individuals and their blood relatives. With the rapid growth and increased availability of genomic datasets, understanding the current genome privacy landscape and identifying the challenges in developing effective privacy-protecting solutions are imperative. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer genetic-testing applications. We additionally present general privacy-protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. Finally, we discuss limitations in current privacy-protection methods, highlight possible mitigation strategies and suggest future research opportunities for advancing genomic data sharing.


Assuntos
Privacidade Genética , Testes Genéticos , Genômica , Disseminação de Informação , Conjuntos de Dados como Assunto , Genética Forense , Pesquisa em Genética , Genômica/tendências , Humanos , Linhagem , Medição de Risco
13.
J Am Med Inform Assoc ; 27(3): 366-375, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31750926

RESUMO

OBJECTIVE: Survival analysis is the cornerstone of many healthcare applications in which the "survival" probability (eg, time free from a certain disease, time to death) of a group of patients is computed to guide clinical decisions. It is widely used in biomedical research and healthcare applications. However, frequent sharing of exact survival curves may reveal information about the individual patients, as an adversary may infer the presence of a person of interest as a participant of a study or of a particular group. Therefore, it is imperative to develop methods to protect patient privacy in survival analysis. MATERIALS AND METHODS: We develop a framework based on the formal model of differential privacy, which provides provable privacy protection against a knowledgeable adversary. We show the performance of privacy-protecting solutions for the widely used Kaplan-Meier nonparametric survival model. RESULTS: We empirically evaluated the usefulness of our privacy-protecting framework and the reduced privacy risk for a popular epidemiology dataset and a synthetic dataset. Results show that our methods significantly reduce the privacy risk when compared with their nonprivate counterparts, while retaining the utility of the survival curves. DISCUSSION: The proposed framework demonstrates the feasibility of conducting privacy-protecting survival analyses. We discuss future research directions to further enhance the usefulness of our proposed solutions in biomedical research applications. CONCLUSION: The results suggest that our proposed privacy-protection methods provide strong privacy protections while preserving the usefulness of survival analyses.


Assuntos
Confidencialidade , Estimativa de Kaplan-Meier , Análise de Sobrevida , Humanos , Privacidade
14.
IEEE Trans Big Data ; 6(2): 296-308, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32478127

RESUMO

Biomedical research often involves studying patient data that contain personal information. Inappropriate use of these data might lead to leakage of sensitive information, which can put patient privacy at risk. The problem of preserving patient privacy has received increasing attentions in the era of big data. Many privacy methods have been developed to protect against various attack models. This paper reviews relevant topics in the context of biomedical research. We discuss privacy preserving technologies related to (1) record linkage, (2) synthetic data generation, and (3) genomic data privacy. We also discuss the ethical implications of big data privacy in biomedicine and present challenges in future research directions for improving data privacy in biomedical research.

15.
Stat Methods Med Res ; 27(11): 3304-3324, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-29298592

RESUMO

Modern medical research relies on multi-institutional collaborations which enhance the knowledge discovery and data reuse. While these collaborations allow researchers to perform analytics otherwise impossible on individual datasets, they often pose significant challenges in the data integration process. Due to the lack of a unique identifier, data integration solutions often have to rely on patient's protected health information (PHI). In many situations, such information cannot leave the institutions or must be strictly protected. Furthermore, the presence of noisy values for these attributes may result in poor overall utility. While much research has been done to address these challenges, most of the current solutions are designed for a static setting without considering the temporal information of the data (e.g. EHR). In this work, we propose a novel approach that uses non-PHI for linking patient longitudinal data. Specifically, our technique captures the diagnosis dependencies using patterns which are shown to provide important indications for linking patient records. Our solution can be used as a standalone technique to perform temporal record linkage using non-protected health information data or it can be combined with Privacy Preserving Record Linkage solutions (PPRL) when protected health information is available. In this case, our approach can solve ambiguities in results. Experimental evaluations on real datasets demonstrate the effectiveness of our technique.


Assuntos
Segurança Computacional , Registro Médico Coordenado/métodos , Pesquisa Biomédica , Registros Eletrônicos de Saúde
16.
Proc Int Conf Data Eng ; 2017: 1533-1540, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28757793

RESUMO

The study of patients in Intensive Care Units (ICUs) is a crucial task in critical care research which has significant implications both in identifying clinical risk factors and defining institutional guidances. The mortality study of ICU patients is of particular interest because it provides useful indications to healthcare institutions for improving patients experience, internal policies, and procedures (e.g. allocation of resources). To this end, many research works have been focused on the length of stay (LOS) for ICU patients as a feature for studying the mortality. In this work, we propose a novel mortality study based on the notion of burstiness, where the temporal information of patients longitudinal data is taken into consideration. The burstiness of temporal data is a popular measure in network analysis and time-series anomaly detection, where high values of burstiness indicate presence of rapidly occurring events in short time periods (i.e. burst). Our intuition is that these bursts may relate to possible complications in the patient's medical condition and hence provide indications on the mortality. Compared to the LOS, the burstiness parameter captures the temporality of the medical events providing information about the overall dynamic of the patients condition. To the best of our knowledge, we are the first to apply the burstiness measure in the clinical research domain. Our preliminary results on a real dataset show that patients with high values of burstiness tend to have higher mortality rate compared to patients with more regular medical events. Overall, our study shows promising results and provides useful insights for developing predictive models on temporal data and advancing modern critical care medicine.

17.
Ann N Y Acad Sci ; 1387(1): 73-83, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27681358

RESUMO

Accessing and integrating human genomic data with phenotypes are important for biomedical research. Making genomic data accessible for research purposes, however, must be handled carefully to avoid leakage of sensitive individual information to unauthorized parties and improper use of data. In this article, we focus on data sharing within the scope of data accessibility for research. Current common practices to gain biomedical data access are strictly rule based, without a clear and quantitative measurement of the risk of privacy breaches. In addition, several types of studies require privacy-preserving linkage of genotype and phenotype information across different locations (e.g., genotypes stored in a sequencing facility and phenotypes stored in an electronic health record) to accelerate discoveries. The computer science community has developed a spectrum of techniques for data privacy and confidentiality protection, many of which have yet to be tested on real-world problems. In this article, we discuss clinical, technical, and ethical aspects of genome data privacy and confidentiality in the United States, as well as potential solutions for privacy-preserving genotype-phenotype linkage in biomedical research.


Assuntos
Privacidade Genética , Genômica/métodos , Biologia Computacional/ética , Biologia Computacional/normas , Biologia Computacional/tendências , Segurança Computacional , Mineração de Dados/ética , Mineração de Dados/normas , Mineração de Dados/tendências , Privacidade Genética/ética , Privacidade Genética/legislação & jurisprudência , Privacidade Genética/normas , Privacidade Genética/tendências , Genômica/ética , Genômica/normas , Genômica/tendências , Humanos , Consentimento Livre e Esclarecido/legislação & jurisprudência , Consentimento Livre e Esclarecido/normas , Registro Médico Coordenado/normas , Gestão de Riscos , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa