Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics.

Venkatesaramani, Rajagopal; Wan, Zhiyu; Malin, Bradley A; Vorobeychik, Yevgeniy.

Genome Res ; 33(7): 1113-1123, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37217251

RESUMO

The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio-based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy-utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets.

Assuntos

Disseminação de Informação , Privacidade , Humanos , Disseminação de Informação/métodos , Genômica , Frequência do Gene , Alelos

2.

A game theoretic approach to balance privacy risks and familial benefits.

Guo, Jia; Clayton, Ellen Wright; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Wooders, Myrna; Wan, Zhiyu; Yin, Zhijun; Malin, Bradley A.

Sci Rep ; 13(1): 6932, 2023 04 28.

Artigo em Inglês | MEDLINE | ID: mdl-37117219

RESUMO

As recreational genomics continues to grow in its popularity, many people are afforded the opportunity to share their genomes in exchange for various services, including third-party interpretation (TPI) tools, to understand their predisposition to health problems and, based on genome similarity, to find extended family members. At the same time, these services have increasingly been reused by law enforcement to track down potential criminals through family members who disclose their genomic information. While it has been observed that many potential users shy away from such data sharing when they learn that their privacy cannot be assured, it remains unclear how potential users' valuations of the service will affect a population's behavior. In this paper, we present a game theoretic framework to model interdependent privacy challenges in genomic data sharing online. Through simulations, we find that in addition to the boundary cases when (1) no player and (2) every player joins, there exist pure-strategy Nash equilibria when a relatively small portion of players choose to join the genomic database. The result is consistent under different parametric settings. We further examine the stability of Nash equilibria and illustrate that the only equilibrium that is resistant to a random dropping of players is when all players join the genomic database. Finally, we show that when players consider the impact that their data sharing may have on their relatives, the only pure strategy Nash equilibria are when either no player or every player shares their genomic data.

Assuntos

Hepatopatia Gordurosa não Alcoólica , Privacidade , Humanos , Disseminação de Informação , Família , Genômica

3.

Managing re-identification risks while providing access to the All of Us research program.

Xia, Weiyi; Basford, Melissa; Carroll, Robert; Clayton, Ellen Wright; Harris, Paul; Kantacioglu, Murat; Liu, Yongtai; Nyemba, Steve; Vorobeychik, Yevgeniy; Wan, Zhiyu; Malin, Bradley A.

J Am Med Inform Assoc ; 30(5): 907-914, 2023 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-36809550

RESUMO

OBJECTIVE: The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels. METHODS: At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate re-identification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the re-identification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics. RESULTS: The results indicated that 95th percentile of the re-identification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders. CONCLUSIONS: While the re-identification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.

Assuntos

Saúde da População , Humanos , Masculino , Feminino , Privacidade , Gestão de Riscos , Segurança Computacional , Pesquisadores

4.

Publisher Correction: Sociotechnical safeguards for genomic data privacy.

Wan, Zhiyu; Hazel, James W; Clayton, Ellen Wright; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley A.

Nat Rev Genet ; 23(7): 453, 2022 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-35332247

5.

Sociotechnical safeguards for genomic data privacy.

Wan, Zhiyu; Hazel, James W; Clayton, Ellen Wright; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley A.

Nat Rev Genet ; 23(7): 429-445, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-35246669

RESUMO

Recent developments in a variety of sectors, including health care, research and the direct-to-consumer industry, have led to a dramatic increase in the amount of genomic data that are collected, used and shared. This state of affairs raises new and challenging concerns for personal privacy, both legally and technically. This Review appraises existing and emerging threats to genomic data privacy and discusses how well current legal frameworks and technical safeguards mitigate these concerns. It concludes with a discussion of remaining and emerging challenges and illustrates possible solutions that can balance protecting privacy and realizing the benefits that result from the sharing of genetic information.

Assuntos

Genômica , Privacidade , Genoma

6.

A Review of Incident Prediction, Resource Allocation, and Dispatch Models for Emergency Management.

Mukhopadhyay, Ayan; Pettet, Geoffrey; Vazirizade, Sayyed Mohsen; Lu, Di; Jaimes, Alejandro; Said, Said El; Baroud, Hiba; Vorobeychik, Yevgeniy; Kochenderfer, Mykel; Dubey, Abhishek.

Accid Anal Prev ; 165: 106501, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-34929574

RESUMO

In the last fifty years, researchers have developed statistical, data-driven, analytical, and algorithmic approaches for designing and improving emergency response management (ERM) systems. The problem has been noted as inherently difficult and constitutes spatio-temporal decision making under uncertainty, which has been addressed in the literature with varying assumptions and approaches. This survey provides a detailed review of these approaches, focusing on the key challenges and issues regarding four sub-processes: (a) incident prediction, (b) incident detection, (c) resource allocation, and (c) computer-aided dispatch for emergency response. We highlight the strengths and weaknesses of prior work in this domain and explore the similarities and differences between different modeling paradigms. We conclude by illustrating open challenges and opportunities for future research in this complex domain.

Assuntos

Acidentes de Trânsito , Alocação de Recursos , Humanos , Incerteza

7.

Implicit Incentives Among Reddit Users to Prioritize Attention Over Privacy and Reveal Their Faces When Discussing Direct-to-Consumer Genetic Test Results: Topic and Attention Analysis.

Liu, Yongtai; Yin, Zhijun; Wan, Zhiyu; Yan, Chao; Xia, Weiyi; Ni, Congning; Clayton, Ellen Wright; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley A.

JMIR Infodemiology ; 2(2): e35702, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37113452

RESUMO

Background: As direct-to-consumer genetic testing services have grown in popularity, the public has increasingly relied upon online forums to discuss and share their test results. Initially, users did so anonymously, but more recently, they have included face images when discussing their results. Various studies have shown that sharing images on social media tends to elicit more replies. However, users who do this forgo their privacy. When these images truthfully represent a user, they have the potential to disclose that user's identity. Objective: This study investigates the face image sharing behavior of direct-to-consumer genetic testing users in an online environment to determine if there exists an association between face image sharing and the attention received from other users. Methods: This study focused on r/23andme, a subreddit dedicated to discussing direct-to-consumer genetic testing results and their implications. We applied natural language processing to infer the themes associated with posts that included a face image. We applied a regression analysis to characterize the association between the attention that a post received, in terms of the number of comments, the karma score (defined as the number of upvotes minus the number of downvotes), and whether the post contained a face image. Results: We collected over 15,000 posts from the r/23andme subreddit, published between 2012 and 2020. Face image posting began in late 2019 and grew rapidly, with over 800 individuals revealing their faces by early 2020. The topics in posts including a face were primarily about sharing, discussing ancestry composition, or sharing family reunion photos with relatives discovered via direct-to-consumer genetic testing. On average, posts including a face image received 60% (5/8) more comments and had karma scores 2.4 times higher than other posts. Conclusions: Direct-to-consumer genetic testing consumers in the r/23andme subreddit are increasingly posting face images and testing reports on social platforms. The association between face image posting and a greater level of attention suggests that people are forgoing their privacy in exchange for attention from others. To mitigate this risk, platform organizers and moderators could inform users about the risk of posting face images in a direct, explicit manner to make it clear that their privacy may be compromised if personal images are shared.

8.

A Representativeness-informed Model for Research Record Selection from Electronic Medical Record Systems.

Borza, Victor A; Clayton, Ellen Wright; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Malin, Bradley A.

AMIA Annu Symp Proc ; 2022: 259-268, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37128377

RESUMO

Scientific and clinical studies have a long history of bias in recruitment of underprivileged and minority populations. This underrepresentation leads to inaccurate, inapplicable, and non-generalizable results. Electronic medical record (EMR) systems, which now drive much research, often poorly represent these groups. We introduce a method for quantifying representativeness using information theoretic measures and an algorithmic approach to select a more representative record cohort than random selection when resource limitations preclude researchers from reviewing every record in the database. We apply this method to select cohorts of 2,000-20,000 records from a large (2M+ records) EMR database at the Vanderbilt University Medical Center and assess representativeness based on age, ethnicity, race, and gender. Compared to random selection - which will on average mirror the EMR database demographics - we find that a representativeness-informed approach can compose a cohort of records that is approximately 5.8 times more representative.

Assuntos

Gerenciamento de Dados , Registros Eletrônicos de Saúde , Humanos , Software , Bases de Dados Factuais

9.

Matching Soulmates.

Leo, Greg; Lou, Jian; Van der Linden, Martin; Vorobeychik, Yevgeniy; Wooders, Myrna.

J Public Econ Theory ; 23(5): 822-857, 2021 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-34924745

RESUMO

We study iterated matching of soulmates [IMS], a recursive process of forming coalitions that are mutually preferred by members to any other coalition containing individuals as yet unmatched by this process. If all players can be matched this way, preferences are IMS-complete. A mechanism is a soulmate mechanism if it allows the formation of all soulmate coalitions. Our model follows Banerjee, Konishi and Sönmez (2001), except reported preferences are strategic variables. We investigate the incentive and stability properties of soulmate mechanisms. In contrast to prior literature, we do not impose conditions that ensure IMS-completeness. A fundamental result is that, (1) any group of players who could change their reported preferences and mutually benefit does not contain any players who were matched as soulmates and reported their preferences truthfully. As corollaries, (2) for any IMS-complete profile, soulmate mechanisms have a truthful strong Nash equilibrium, and (3) as long as all players matched as soulmates report their preferences truthfully, there is no incentive for any to deviate. Moreover, (4) soulmate coalitions are invariant core coalitions - that is, any soulmate coalition will be a coalition in every outcome in the core. To accompany our theoretical results, we present real-world data analysis and simulations that highlight the prevalence of situations in which many, but not all, players can be matched as soulmates. In an Appendix we relate IMS to other well-known coalition formation processes.

10.

Using game theory to thwart multistage privacy intrusions when sharing data.

Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi; Liu, Yongtai; Wooders, Myrna; Guo, Jia; Yin, Zhijun; Clayton, Ellen Wright; Kantarcioglu, Murat; Malin, Bradley A.

Sci Adv ; 7(50): eabe9986, 2021 Dec 10.

Artigo em Inglês | MEDLINE | ID: mdl-34890225

RESUMO

Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resources, combining them in a stage-wise manner, to enhance the chance of an attack's success. In this work, we represent a re-identification game using a two-player Stackelberg game of perfect information, which can be applied to assess risk, and suggest an optimal data sharing strategy based on a privacy-utility tradeoff. We report on experiments with large-scale genomic datasets to show that, using game theoretic models accounting for adversarial capabilities to launch multistage attacks, most data can be effectively shared with low re-identification risk.

11.

Re-identification of individuals in genomic datasets using public face images.

Venkatesaramani, Rajagopal; Malin, Bradley A; Vorobeychik, Yevgeniy.

Sci Adv ; 7(47): eabg3296, 2021 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-34788101

RESUMO

Recent studies suggest that genomic data can be matched to images of human faces, raising the concern that genomic data can be re-identified with relative ease. However, such investigations assume access to well-curated images, which are rarely available in practice and challenging to derive from photos not generated in a controlled laboratory setting. In this study, we reconsider re-identification risk and find that, for most individuals, the actual risk posed by linkage attacks to typical face images is substantially smaller than claimed in prior investigations. Moreover, we show that only a small amount of well-calibrated noise, imperceptible to humans, can be added to images to markedly reduce such risk. The results of this investigation create an opportunity to create image filters that enable individuals to have better control over re-identification risk based on linkage.

12.

Enabling realistic health data re-identification risk assessment through adversarial modeling.

Xia, Weiyi; Liu, Yongtai; Wan, Zhiyu; Vorobeychik, Yevgeniy; Kantacioglu, Murat; Nyemba, Steve; Clayton, Ellen Wright; Malin, Bradley A.

J Am Med Inform Assoc ; 28(4): 744-752, 2021 03 18.

Artigo em Inglês | MEDLINE | ID: mdl-33448306

RESUMO

OBJECTIVE: Re-identification risk methods for biomedical data often assume a worst case, in which attackers know all identifiable features (eg, age and race) about a subject. Yet, worst-case adversarial modeling can overestimate risk and induce heavy editing of shared data. The objective of this study is to introduce a framework for assessing the risk considering the attacker's resources and capabilities. MATERIALS AND METHODS: We integrate 3 established risk measures (ie, prosecutor, journalist, and marketer risks) and compute re-identification probabilities for data subjects. This probability is dependent on an attacker's capabilities (eg, ability to obtain external identified resources) and the subject's decision on whether to reveal their participation in a dataset. We illustrate the framework through case studies using data from over 1 000 000 patients from Vanderbilt University Medical Center and show how re-identification risk changes when attackers are pragmatic and use 2 known resources for attack: (1) voter registration lists and (2) social media posts. RESULTS: Our framework illustrates that the risk is substantially smaller in the pragmatic scenarios than in the worst case. Our experiments yield a median worst-case risk of 0.987 (where 0 is least risky and 1 is most risky); however, the median reduction in risk was 90.1% in the voter registration scenario and 100% in the social media posts scenario. Notably, these observations hold true for a wide range of adversarial capabilities. CONCLUSIONS: This research illustrates that re-identification risk is situationally dependent and that appropriate adversarial modeling may permit biomedical data sharing on a wider scale than is currently the case.

Assuntos

Segurança Computacional , Confidencialidade , Anonimização de Dados , Probabilidade , Humanos , Risco , Medição de Risco

13.

De-identifying Socioeconomic Data at the Census Tract Level for Medical Research Through Constraint-based Clustering.

Liu, Yongtai; Conway, Douglas; Wan, Zhiyu; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Malin, Bradley A.

AMIA Annu Symp Proc ; 2021: 793-802, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35309009

RESUMO

Numerous studies have shown that a person's health status is closely related to their socioeconomic status. It is evident that incorporating socioeconomic data associated with a patient's geographic area of residence into clinical datasets will promote medical research. However, most socioeconomic variables are unique in combination and are affiliated with small geographical regions (e.g., census tracts) that are often associated with less than 20,000 people. Thus, sharing such tract-level data can violate the Safe Harbor implementation of de-identification under the Health Insurance Portability and Accountability Act of 1996 (HIPAA). In this paper, we introduce a constraint-based k-means clustering approach to generate census tract-level socioeconomic data that is de-identification compliant. Our experimental analysis with data from the American Community Survey illustrates that the approach generates a protected dataset with high similarity to the unaltered values, and achieves a substantially better data utility than the HIPAA Safe Harbor recommendation of 3-digit ZIP code.

Assuntos

Pesquisa Biomédica , Setor Censitário , Análise por Conglomerados , Health Insurance Portability and Accountability Act , Humanos , Classe Social , Estados Unidos

14.

Anatomical Context Protects Deep Learning from Adversarial Perturbations in Medical Imaging.

Li, Yi; Zhang, Huahong; Bermudez, Camilo; Chen, Yifan; Landman, Bennett A; Vorobeychik, Yevgeniy.

Neurocomputing (Amst) ; 379: 370-378, 2020 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-32863583

RESUMO

Deep learning has achieved impressive performance across a variety of tasks, including medical image processing. However, recent research has shown that deep neural networks are susceptible to small adversarial perturbations in the image. We study the impact of such adversarial perturbations in medical image processing where the goal is to predict an individual's age based on a 3D MRI brain image. We consider two models: a conventional deep neural network, and a hybrid deep learning model which additionally uses features informed by anatomical context. We find that we can introduce significant errors in predicted age by adding imperceptible noise to an image, can accomplish this even for large batches of images using a single perturbation, and that the hybrid model is much more robust to adversarial perturbations than the conventional deep neural network. Our work highlights limitations of current deep learning techniques in clinical applications, and suggests a path forward.

15.

How to Hide One's Relationships from Link Prediction Algorithms.

Waniek, Marcin; Zhou, Kai; Vorobeychik, Yevgeniy; Moro, Esteban; Michalak, Tomasz P; Rahwan, Talal.

Sci Rep ; 9(1): 12208, 2019 08 21.

Artigo em Inglês | MEDLINE | ID: mdl-31434975

RESUMO

Our private connections can be exposed by link prediction algorithms. To date, this threat has only been addressed from the perspective of a central authority, completely neglecting the possibility that members of the social network can themselves mitigate such threats. We fill this gap by studying how an individual can rewire her own network neighborhood to hide her sensitive relationships. We prove that the optimization problem faced by such an individual is NP-complete, meaning that any attempt to identify an optimal way to hide one's relationships is futile. Based on this, we shift our attention towards developing effective, albeit not optimal, heuristics that are readily-applicable by users of existing social media platforms to conceal any connections they deem sensitive. Our empirical evaluation reveals that it is more beneficial to focus on "unfriending" carefully-chosen individuals rather than befriending new ones. In fact, by avoiding communication with just 5 individuals, it is possible for one to hide some of her relationships in a massive, real-life telecommunication network, consisting of 829,725 phone calls between 248,763 individuals. Our analysis also shows that link prediction algorithms are more susceptible to manipulation in smaller and denser networks. Evaluating the error vs. attack tolerance of link prediction algorithms reveals that rewiring connections randomly may end up exposing one's sensitive relationships, highlighting the importance of the strategic aspect. In an age where personal relationships continue to leave digital traces, our results empower the general public to proactively protect their private relationships.

Assuntos

Algoritmos , Relações Interpessoais , Modelos Teóricos , Mídias Sociais , Feminino , Humanos , Masculino

16.

A method for analyzing inpatient care variability through physicians' orders.

Lenert, Matthew C; Miller, Randolph A; Vorobeychik, Yevgeniy; Walsh, Colin G.

J Biomed Inform ; 91: 103111, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30710635

RESUMO

OBJECTIVE: Administrators assess care variability through chart review or cost variability to inform care standardization efforts. Chart review is costly and cost variability is imprecise. This study explores the potential of physician orders as an alternative measure of care variability. MATERIALS & METHODS: The authors constructed an order variability metric from adult Vanderbilt University Hospital patients treated between 2013 and 2016. The study compared how well a cost variability model predicts variability in the length of stay compared to an order variability model. Both models adjusted for covariates such as severity of illness, comorbidities, and hospital transfers. RESULTS: The order variability model significantly minimized the Akaike information criterion (superior outcome) compared to the cost variability model. This result also held when excluding patients who received intensive care. CONCLUSION: Order variability can potentially typify care variability better than cost variability. Order variability is a scalable metric, calculable during the course of care.

Assuntos

Hospitalização , Pacientes Internados , Médicos , Padrões de Prática Médica , Adulto , Feminino , Custos de Cuidados de Saúde , Humanos , Tempo de Internação , Masculino , Corpo Clínico Hospitalar , Pessoa de Meia-Idade , Qualidade da Assistência à Saúde , Estudos Retrospectivos

17.

Biomedical Research Cohort Membership Disclosure on Social Media.

Liu, Yongtai; Yan, Chao; Yin, Zhijun; Wan, Zhiyu; Xia, Weiyi; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Clayton, Ellen Wright; Malin, Bradley A.

AMIA Annu Symp Proc ; 2019: 607-616, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-32308855

RESUMO

To accelerate medical knowledge discovery, an increasing number of research programs are gathering and sharing data on a large number of participants. Due to the privacy concerns and legal restrictions on data sharing, these programs apply various strategies to mitigate privacy risk. However, the activities of participants and research program sponsors, particularly on social media, might reveal an individual's membership in a study, making it easier to recognize participants' records and uncover the information they have yet to disclose. This behavior can jeopardize the privacy of the participants themselves, the reputation of the projects, sponsors, and the research enterprise. To investigate the dangers of self-disclosure behavior, we gathered and analyzed 4,020 tweets, and uncovered over 100 tweets disclosing the individuals' memberships in over 15 programs. Our investigation showed that self-disclosure on social media can reveal participants' membership in research cohorts, and such activity might lead to the leakage of a person's identity, genomic, and other sensitive health information.

Assuntos

Pesquisa Biomédica , Revelação , Disseminação de Informação , Autorrevelação , Mídias Sociais , Ensaios Clínicos como Assunto , Feminino , Humanos , Masculino , Privacidade

18.

A Crowdsourcing Framework for Medical Data Sets.

Ye, Cheng; Coco, Joseph; Epishova, Anna; Hajaj, Chen; Bogardus, Henry; Novak, Laurie; Denny, Joshua; Vorobeychik, Yevgeniy; Lasko, Thomas; Malin, Bradley; Fabbri, Daniel.

AMIA Jt Summits Transl Sci Proc ; 2017: 273-280, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29888085

RESUMO

Crowdsourcing services like Amazon Mechanical Turk allow researchers to ask questions to crowds of workers and quickly receive high quality labeled responses. However, crowds drawn from the general public are not suitable for labeling sensitive and complex data sets, such as medical records, due to various concerns. Major challenges in building and deploying a crowdsourcing system for medical data include, but are not limited to: managing access rights to sensitive data and ensuring data privacy controls are enforced; identifying workers with the necessary expertise to analyze complex information; and efficiently retrieving relevant information in massive data sets. In this paper, we introduce a crowdsourcing framework to support the annotation of medical data sets. We further demonstrate a workflow for crowdsourcing clinical chart reviews including (1) the design and decomposition of research questions; (2) the architecture for storing and displaying sensitive data; and (3) the development of tools to support crowd workers in quickly analyzing information from complex data sets.

19.

Integrating linear optimization with structural modeling to increase HIV neutralization breadth.

Sevy, Alexander M; Panda, Swetasudha; Crowe, James E; Meiler, Jens; Vorobeychik, Yevgeniy.

PLoS Comput Biol ; 14(2): e1005999, 2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29451898

RESUMO

Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.

Assuntos

Anticorpos Neutralizantes/imunologia , Biologia Computacional , Epitopos/imunologia , Anticorpos Anti-HIV/imunologia , Infecções por HIV/imunologia , Algoritmos , Motivos de Aminoácidos , HIV-1 , Humanos , Modelos Lineares , Aprendizado de Máquina , Análise de Regressão , Software , Máquina de Vetores de Suporte

20.

Detecting the Presence of an Individual in Phenotypic Summary Data.

Liu, Yongtai; Wan, Zhiyu; Xia, Weiyi; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Clayton, Ellen Wright; Kho, Abel; Carrell, David; Malin, Bradley A.

AMIA Annu Symp Proc ; 2018: 760-769, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30815118

RESUMO

As the quantity and detail of association studies between clinical phenotypes and genotypes grows, there is a push to make summary statistics widely available. Genome wide summary statistics have been shown to be vulnerable to the inference of a targeted individual's presence. In this paper, we show that presence attacks are feasible with phenome wide summary statistics as well. We use data from three healthcare organizations and an online resource that publishes summary statistics. We introduce a novel attack that achieves over 80% recall and precision within a population of 16,346, where 8,173 individuals are targets. However, the feasibility of the attack is dependent on the attacker's knowledge about 1) the targeted individual and 2) the reference dataset. Within a population of over 2 million, where 8,173 individuals are targets, our attack achieves 31% recall and 17% precision. As a result, it is plausible that sharing of phenomic summary statistics may be accomplished with an acceptable level of privacy risk.

Assuntos

Segurança Computacional , Informações Pessoalmente Identificáveis , Fenótipo , Estudo de Associação Genômica Ampla , Genótipo , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA