Pesquisa | BVS IEC

1.

Sociotechnical safeguards for genomic data privacy.

Wan, Zhiyu; Hazel, James W; Clayton, Ellen Wright; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley A.

Nat Rev Genet ; 23(7): 429-445, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-35246669

RESUMO

Recent developments in a variety of sectors, including health care, research and the direct-to-consumer industry, have led to a dramatic increase in the amount of genomic data that are collected, used and shared. This state of affairs raises new and challenging concerns for personal privacy, both legally and technically. This Review appraises existing and emerging threats to genomic data privacy and discusses how well current legal frameworks and technical safeguards mitigate these concerns. It concludes with a discussion of remaining and emerging challenges and illustrates possible solutions that can balance protecting privacy and realizing the benefits that result from the sharing of genetic information.

Assuntos

Genômica , Privacidade , Genoma

2.

Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics.

Venkatesaramani, Rajagopal; Wan, Zhiyu; Malin, Bradley A; Vorobeychik, Yevgeniy.

Genome Res ; 33(7): 1113-1123, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37217251

RESUMO

The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio-based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy-utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets.

Assuntos

Disseminação de Informação , Privacidade , Humanos , Disseminação de Informação/métodos , Genômica , Frequência do Gene , Alelos

3.

Publisher Correction: Sociotechnical safeguards for genomic data privacy.

Wan, Zhiyu; Hazel, James W; Clayton, Ellen Wright; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley A.

Nat Rev Genet ; 23(7): 453, 2022 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-35332247

4.

Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach.

Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi; Clayton, Ellen Wright; Kantarcioglu, Murat; Malin, Bradley.

Am J Hum Genet ; 100(2): 316-322, 2017 02 02.

Artigo em Inglês | MEDLINE | ID: mdl-28065469

RESUMO

Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.

Assuntos

Bases de Dados Genéticas , Privacidade Genética/legislação & jurisprudência , Genômica , Disseminação de Informação , Modelos Teóricos , Registros Eletrônicos de Saúde , Humanos , Polimorfismo de Nucleotídeo Único

5.

Anatomical Context Protects Deep Learning from Adversarial Perturbations in Medical Imaging.

Li, Yi; Zhang, Huahong; Bermudez, Camilo; Chen, Yifan; Landman, Bennett A; Vorobeychik, Yevgeniy.

Neurocomputing (Amst) ; 379: 370-378, 2020 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-32863583

RESUMO

Deep learning has achieved impressive performance across a variety of tasks, including medical image processing. However, recent research has shown that deep neural networks are susceptible to small adversarial perturbations in the image. We study the impact of such adversarial perturbations in medical image processing where the goal is to predict an individual's age based on a 3D MRI brain image. We consider two models: a conventional deep neural network, and a hybrid deep learning model which additionally uses features informed by anatomical context. We find that we can introduce significant errors in predicted age by adding imperceptible noise to an image, can accomplish this even for large batches of images using a single perturbation, and that the hybrid model is much more robust to adversarial perturbations than the conventional deep neural network. Our work highlights limitations of current deep learning techniques in clinical applications, and suggests a path forward.

6.

Integrating linear optimization with structural modeling to increase HIV neutralization breadth.

Sevy, Alexander M; Panda, Swetasudha; Crowe, James E; Meiler, Jens; Vorobeychik, Yevgeniy.

PLoS Comput Biol ; 14(2): e1005999, 2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29451898

RESUMO

Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.

Assuntos

Anticorpos Neutralizantes/imunologia , Biologia Computacional , Epitopos/imunologia , Anticorpos Anti-HIV/imunologia , Infecções por HIV/imunologia , Algoritmos , Motivos de Aminoácidos , HIV-1 , Humanos , Modelos Lineares , Aprendizado de Máquina , Análise de Regressão , Software , Máquina de Vetores de Suporte

7.

A method for analyzing inpatient care variability through physicians' orders.

Lenert, Matthew C; Miller, Randolph A; Vorobeychik, Yevgeniy; Walsh, Colin G.

J Biomed Inform ; 91: 103111, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30710635

RESUMO

OBJECTIVE: Administrators assess care variability through chart review or cost variability to inform care standardization efforts. Chart review is costly and cost variability is imprecise. This study explores the potential of physician orders as an alternative measure of care variability. MATERIALS & METHODS: The authors constructed an order variability metric from adult Vanderbilt University Hospital patients treated between 2013 and 2016. The study compared how well a cost variability model predicts variability in the length of stay compared to an order variability model. Both models adjusted for covariates such as severity of illness, comorbidities, and hospital transfers. RESULTS: The order variability model significantly minimized the Akaike information criterion (superior outcome) compared to the cost variability model. This result also held when excluding patients who received intensive care. CONCLUSION: Order variability can potentially typify care variability better than cost variability. Order variability is a scalable metric, calculable during the course of care.

Assuntos

Hospitalização , Pacientes Internados , Médicos , Padrões de Prática Médica , Adulto , Feminino , Custos de Cuidados de Saúde , Humanos , Tempo de Internação , Masculino , Corpo Clínico Hospitalar , Pessoa de Meia-Idade , Qualidade da Assistência à Saúde , Estudos Retrospectivos

8.

Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Li, Bo; Vorobeychik, Yevgeniy; Li, Muqun; Malin, Bradley.

IEEE Trans Knowl Data Eng ; 29(3): 698-711, 2017 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-28943741

RESUMO

Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains. Many organizations aim to share such data while obscuring features that could disclose personally identifiable information. Much of this data exhibits weak structure (e.g., text), such that machine learning approaches have been developed to detect and remove identifiers from it. While learning is never perfect, and relying on such approaches to sanitize data can leak sensitive information, a small risk is often acceptable. Our goal is to balance the value of published data and the risk of an adversary discovering leaked identifiers. We model data sanitization as a game between 1) a publisher who chooses a set of classifiers to apply to data and publishes only instances predicted as non-sensitive and 2) an attacker who combines machine learning and manual inspection to uncover leaked identifying information. We introduce a fast iterative greedy algorithm for the publisher that ensures a low utility for a resource-limited adversary. Moreover, using five text data sets we illustrate that our algorithm leaves virtually no automatically identifiable sensitive instances for a state-of-the-art learning algorithm, while sharing over 93% of the original data, and completes after at most 5 iterations.

9.

Optimizing annotation resources for natural language de-identification via a game theoretic framework.

Li, Muqun; Carrell, David; Aberdeen, John; Hirschman, Lynette; Kirby, Jacqueline; Li, Bo; Vorobeychik, Yevgeniy; Malin, Bradley A.

J Biomed Inform ; 61: 97-109, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27020263

RESUMO

OBJECTIVE: Electronic medical records (EMRs) are increasingly repurposed for activities beyond clinical care, such as to support translational research and public policy analysis. To mitigate privacy risks, healthcare organizations (HCOs) aim to remove potentially identifying patient information. A substantial quantity of EMR data is in natural language form and there are concerns that automated tools for detecting identifiers are imperfect and leak information that can be exploited by ill-intentioned data recipients. Thus, HCOs have been encouraged to invest as much effort as possible to find and detect potential identifiers, but such a strategy assumes the recipients are sufficiently incentivized and capable of exploiting leaked identifiers. In practice, such an assumption may not hold true and HCOs may overinvest in de-identification technology. The goal of this study is to design a natural language de-identification framework, rooted in game theory, which enables an HCO to optimize their investments given the expected capabilities of an adversarial recipient. METHODS: We introduce a Stackelberg game to balance risk and utility in natural language de-identification. This game represents a cost-benefit model that enables an HCO with a fixed budget to minimize their investment in the de-identification process. We evaluate this model by assessing the overall payoff to the HCO and the adversary using 2100 clinical notes from Vanderbilt University Medical Center. We simulate several policy alternatives using a range of parameters, including the cost of training a de-identification model and the loss in data utility due to the removal of terms that are not identifiers. In addition, we compare policy options where, when an attacker is fined for misuse, a monetary penalty is paid to the publishing HCO as opposed to a third party (e.g., a federal regulator). RESULTS: Our results show that when an HCO is forced to exhaust a limited budget (set to $2000 in the study), the precision and recall of the de-identification of the HCO are 0.86 and 0.8, respectively. A game-based approach enables a more refined cost-benefit tradeoff, improving both privacy and utility for the HCO. For example, our investigation shows that it is possible for an HCO to release the data without spending all their budget on de-identification and still deter the attacker, with a precision of 0.77 and a recall of 0.61 for the de-identification. There also exist scenarios in which the model indicates an HCO should not release any data because the risk is too great. In addition, we find that the practice of paying fines back to a HCO (an artifact of suing for breach of contract), as opposed to a third party such as a federal regulator, can induce an elevated level of data sharing risk, where the HCO is incentivized to bait the attacker to elicit compensation. CONCLUSIONS: A game theoretic framework can be applied in leading HCO's to optimized decision making in natural language de-identification investments before sharing EMR data.

Assuntos

Confidencialidade , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Idioma , Risco

10.

Behavioral dynamics and influence in networked coloring and consensus.

Judd, Stephen; Kearns, Michael; Vorobeychik, Yevgeniy.

Proc Natl Acad Sci U S A ; 107(34): 14978-82, 2010 Aug 24.

Artigo em Inglês | MEDLINE | ID: mdl-20696936

RESUMO

We report on human-subject experiments on the problems of coloring (a social differentiation task) and consensus (a social agreement task) in a networked setting. Both tasks can be viewed as coordination games, and despite their cognitive similarity, we find that within a parameterized family of social networks, network structure elicits opposing behavioral effects in the two problems, with increased long-distance connectivity making consensus easier for subjects and coloring harder. We investigate the influence that subjects have on their network neighbors and the collective outcome, and find that it varies considerably, beyond what can be explained by network position alone. We also find strong correlations between influence and other features of individual subject behavior. In contrast to much of the recent research in network science, which often emphasizes network topology out of the context of any specific problem and places primacy on network position, our findings highlight the potential importance of the details of tasks and individuals in social networks.

Assuntos

Teoria dos Jogos , Comportamento Social , Apoio Social , Consenso , Humanos , Modelos Psicológicos , Interface Usuário-Computador

11.

A game theoretic approach to balance privacy risks and familial benefits.

Guo, Jia; Clayton, Ellen Wright; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Wooders, Myrna; Wan, Zhiyu; Yin, Zhijun; Malin, Bradley A.

Sci Rep ; 13(1): 6932, 2023 04 28.

Artigo em Inglês | MEDLINE | ID: mdl-37117219

RESUMO

As recreational genomics continues to grow in its popularity, many people are afforded the opportunity to share their genomes in exchange for various services, including third-party interpretation (TPI) tools, to understand their predisposition to health problems and, based on genome similarity, to find extended family members. At the same time, these services have increasingly been reused by law enforcement to track down potential criminals through family members who disclose their genomic information. While it has been observed that many potential users shy away from such data sharing when they learn that their privacy cannot be assured, it remains unclear how potential users' valuations of the service will affect a population's behavior. In this paper, we present a game theoretic framework to model interdependent privacy challenges in genomic data sharing online. Through simulations, we find that in addition to the boundary cases when (1) no player and (2) every player joins, there exist pure-strategy Nash equilibria when a relatively small portion of players choose to join the genomic database. The result is consistent under different parametric settings. We further examine the stability of Nash equilibria and illustrate that the only equilibrium that is resistant to a random dropping of players is when all players join the genomic database. Finally, we show that when players consider the impact that their data sharing may have on their relatives, the only pure strategy Nash equilibria are when either no player or every player shares their genomic data.

Assuntos

Hepatopatia Gordurosa não Alcoólica , Privacidade , Humanos , Disseminação de Informação , Família , Genômica

12.

Managing re-identification risks while providing access to the All of Us research program.

Xia, Weiyi; Basford, Melissa; Carroll, Robert; Clayton, Ellen Wright; Harris, Paul; Kantacioglu, Murat; Liu, Yongtai; Nyemba, Steve; Vorobeychik, Yevgeniy; Wan, Zhiyu; Malin, Bradley A.

J Am Med Inform Assoc ; 30(5): 907-914, 2023 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-36809550

RESUMO

OBJECTIVE: The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels. METHODS: At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate re-identification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the re-identification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics. RESULTS: The results indicated that 95th percentile of the re-identification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders. CONCLUSIONS: While the re-identification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.

Assuntos

Saúde da População , Humanos , Masculino , Feminino , Privacidade , Gestão de Riscos , Segurança Computacional , Pesquisadores

13.

A Representativeness-informed Model for Research Record Selection from Electronic Medical Record Systems.

Borza, Victor A; Clayton, Ellen Wright; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Malin, Bradley A.

AMIA Annu Symp Proc ; 2022: 259-268, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37128377

RESUMO

Scientific and clinical studies have a long history of bias in recruitment of underprivileged and minority populations. This underrepresentation leads to inaccurate, inapplicable, and non-generalizable results. Electronic medical record (EMR) systems, which now drive much research, often poorly represent these groups. We introduce a method for quantifying representativeness using information theoretic measures and an algorithmic approach to select a more representative record cohort than random selection when resource limitations preclude researchers from reviewing every record in the database. We apply this method to select cohorts of 2,000-20,000 records from a large (2M+ records) EMR database at the Vanderbilt University Medical Center and assess representativeness based on age, ethnicity, race, and gender. Compared to random selection - which will on average mirror the EMR database demographics - we find that a representativeness-informed approach can compose a cohort of records that is approximately 5.8 times more representative.

Assuntos

Gerenciamento de Dados , Registros Eletrônicos de Saúde , Humanos , Software , Bases de Dados Factuais

14.

Implicit Incentives Among Reddit Users to Prioritize Attention Over Privacy and Reveal Their Faces When Discussing Direct-to-Consumer Genetic Test Results: Topic and Attention Analysis.

Liu, Yongtai; Yin, Zhijun; Wan, Zhiyu; Yan, Chao; Xia, Weiyi; Ni, Congning; Clayton, Ellen Wright; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley A.

JMIR Infodemiology ; 2(2): e35702, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37113452

RESUMO

Background: As direct-to-consumer genetic testing services have grown in popularity, the public has increasingly relied upon online forums to discuss and share their test results. Initially, users did so anonymously, but more recently, they have included face images when discussing their results. Various studies have shown that sharing images on social media tends to elicit more replies. However, users who do this forgo their privacy. When these images truthfully represent a user, they have the potential to disclose that user's identity. Objective: This study investigates the face image sharing behavior of direct-to-consumer genetic testing users in an online environment to determine if there exists an association between face image sharing and the attention received from other users. Methods: This study focused on r/23andme, a subreddit dedicated to discussing direct-to-consumer genetic testing results and their implications. We applied natural language processing to infer the themes associated with posts that included a face image. We applied a regression analysis to characterize the association between the attention that a post received, in terms of the number of comments, the karma score (defined as the number of upvotes minus the number of downvotes), and whether the post contained a face image. Results: We collected over 15,000 posts from the r/23andme subreddit, published between 2012 and 2020. Face image posting began in late 2019 and grew rapidly, with over 800 individuals revealing their faces by early 2020. The topics in posts including a face were primarily about sharing, discussing ancestry composition, or sharing family reunion photos with relatives discovered via direct-to-consumer genetic testing. On average, posts including a face image received 60% (5/8) more comments and had karma scores 2.4 times higher than other posts. Conclusions: Direct-to-consumer genetic testing consumers in the r/23andme subreddit are increasingly posting face images and testing reports on social platforms. The association between face image posting and a greater level of attention suggests that people are forgoing their privacy in exchange for attention from others. To mitigate this risk, platform organizers and moderators could inform users about the risk of posting face images in a direct, explicit manner to make it clear that their privacy may be compromised if personal images are shared.

15.

A Review of Incident Prediction, Resource Allocation, and Dispatch Models for Emergency Management.

Mukhopadhyay, Ayan; Pettet, Geoffrey; Vazirizade, Sayyed Mohsen; Lu, Di; Jaimes, Alejandro; Said, Said El; Baroud, Hiba; Vorobeychik, Yevgeniy; Kochenderfer, Mykel; Dubey, Abhishek.

Accid Anal Prev ; 165: 106501, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-34929574

RESUMO

In the last fifty years, researchers have developed statistical, data-driven, analytical, and algorithmic approaches for designing and improving emergency response management (ERM) systems. The problem has been noted as inherently difficult and constitutes spatio-temporal decision making under uncertainty, which has been addressed in the literature with varying assumptions and approaches. This survey provides a detailed review of these approaches, focusing on the key challenges and issues regarding four sub-processes: (a) incident prediction, (b) incident detection, (c) resource allocation, and (c) computer-aided dispatch for emergency response. We highlight the strengths and weaknesses of prior work in this domain and explore the similarities and differences between different modeling paradigms. We conclude by illustrating open challenges and opportunities for future research in this complex domain.

Assuntos

Acidentes de Trânsito , Alocação de Recursos , Humanos , Incerteza

16.

Noncooperatively optimized tolerance: decentralized strategic optimization in complex systems.

Vorobeychik, Yevgeniy; Mayo, Jackson R; Armstrong, Robert C; Ruthruff, Joseph R.

Phys Rev Lett ; 107(10): 108702, 2011 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-21981540

RESUMO

We introduce noncooperatively optimized tolerance (NOT), a game theoretic generalization of highly optimized tolerance (HOT), which we illustrate in the forest fire framework. As the number of players increases, NOT retains features of HOT, such as robustness and self-dissimilar landscapes, but also develops features of self-organized criticality. The system retains considerable robustness even as it becomes fractured, due in part to emergent cooperation between players, and at the same time exhibits increasing resilience against changes in the environment, giving rise to intermediate regimes where the system is robust to a particular distribution of adverse events, yet not very fragile to changes.

17.

Influence and dynamic behavior in random boolean networks.

Seshadhri, C; Vorobeychik, Yevgeniy; Mayo, Jackson R; Armstrong, Robert C; Ruthruff, Joseph R.

Phys Rev Lett ; 107(10): 108701, 2011 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-21981539

RESUMO

We present a rigorous mathematical framework for analyzing dynamics of a broad class of boolean network models. We use this framework to provide the first formal proof of many of the standard critical transition results in boolean network analysis, and offer analogous characterizations for novel classes of random boolean networks. We show that some of the assumptions traditionally made in the more common mean-field analysis of boolean networks do not hold in general. For example, we offer evidence that imbalance (internal inhomogeneity) of transfer functions is a crucial feature that tends to drive quiescent behavior far more strongly than previously observed.

18.

Re-identification of individuals in genomic datasets using public face images.

Venkatesaramani, Rajagopal; Malin, Bradley A; Vorobeychik, Yevgeniy.

Sci Adv ; 7(47): eabg3296, 2021 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-34788101

RESUMO

Recent studies suggest that genomic data can be matched to images of human faces, raising the concern that genomic data can be re-identified with relative ease. However, such investigations assume access to well-curated images, which are rarely available in practice and challenging to derive from photos not generated in a controlled laboratory setting. In this study, we reconsider re-identification risk and find that, for most individuals, the actual risk posed by linkage attacks to typical face images is substantially smaller than claimed in prior investigations. Moreover, we show that only a small amount of well-calibrated noise, imperceptible to humans, can be added to images to markedly reduce such risk. The results of this investigation create an opportunity to create image filters that enable individuals to have better control over re-identification risk based on linkage.

19.

Matching Soulmates.

Leo, Greg; Lou, Jian; Van der Linden, Martin; Vorobeychik, Yevgeniy; Wooders, Myrna.

J Public Econ Theory ; 23(5): 822-857, 2021 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-34924745

RESUMO

We study iterated matching of soulmates [IMS], a recursive process of forming coalitions that are mutually preferred by members to any other coalition containing individuals as yet unmatched by this process. If all players can be matched this way, preferences are IMS-complete. A mechanism is a soulmate mechanism if it allows the formation of all soulmate coalitions. Our model follows Banerjee, Konishi and Sönmez (2001), except reported preferences are strategic variables. We investigate the incentive and stability properties of soulmate mechanisms. In contrast to prior literature, we do not impose conditions that ensure IMS-completeness. A fundamental result is that, (1) any group of players who could change their reported preferences and mutually benefit does not contain any players who were matched as soulmates and reported their preferences truthfully. As corollaries, (2) for any IMS-complete profile, soulmate mechanisms have a truthful strong Nash equilibrium, and (3) as long as all players matched as soulmates report their preferences truthfully, there is no incentive for any to deviate. Moreover, (4) soulmate coalitions are invariant core coalitions - that is, any soulmate coalition will be a coalition in every outcome in the core. To accompany our theoretical results, we present real-world data analysis and simulations that highlight the prevalence of situations in which many, but not all, players can be matched as soulmates. In an Appendix we relate IMS to other well-known coalition formation processes.

20.

De-identifying Socioeconomic Data at the Census Tract Level for Medical Research Through Constraint-based Clustering.

Liu, Yongtai; Conway, Douglas; Wan, Zhiyu; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Malin, Bradley A.

AMIA Annu Symp Proc ; 2021: 793-802, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35309009

RESUMO

Numerous studies have shown that a person's health status is closely related to their socioeconomic status. It is evident that incorporating socioeconomic data associated with a patient's geographic area of residence into clinical datasets will promote medical research. However, most socioeconomic variables are unique in combination and are affiliated with small geographical regions (e.g., census tracts) that are often associated with less than 20,000 people. Thus, sharing such tract-level data can violate the Safe Harbor implementation of de-identification under the Health Insurance Portability and Accountability Act of 1996 (HIPAA). In this paper, we introduce a constraint-based k-means clustering approach to generate census tract-level socioeconomic data that is de-identification compliant. Our experimental analysis with data from the American Community Survey illustrates that the approach generates a protected dataset with high similarity to the unaltered values, and achieves a substantially better data utility than the HIPAA Safe Harbor recommendation of 3-digit ZIP code.

Assuntos

Pesquisa Biomédica , Setor Censitário , Análise por Conglomerados , Health Insurance Portability and Accountability Act , Humanos , Classe Social , Estados Unidos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA