Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 63
Filtrar
1.
Sci Rep ; 13(1): 6932, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37117219

RESUMO

As recreational genomics continues to grow in its popularity, many people are afforded the opportunity to share their genomes in exchange for various services, including third-party interpretation (TPI) tools, to understand their predisposition to health problems and, based on genome similarity, to find extended family members. At the same time, these services have increasingly been reused by law enforcement to track down potential criminals through family members who disclose their genomic information. While it has been observed that many potential users shy away from such data sharing when they learn that their privacy cannot be assured, it remains unclear how potential users' valuations of the service will affect a population's behavior. In this paper, we present a game theoretic framework to model interdependent privacy challenges in genomic data sharing online. Through simulations, we find that in addition to the boundary cases when (1) no player and (2) every player joins, there exist pure-strategy Nash equilibria when a relatively small portion of players choose to join the genomic database. The result is consistent under different parametric settings. We further examine the stability of Nash equilibria and illustrate that the only equilibrium that is resistant to a random dropping of players is when all players join the genomic database. Finally, we show that when players consider the impact that their data sharing may have on their relatives, the only pure strategy Nash equilibria are when either no player or every player shares their genomic data.


Assuntos
Hepatopatia Gordurosa não Alcoólica , Privacidade , Humanos , Disseminação de Informação , Família , Genômica
2.
Turk J Gastroenterol ; 34(2): 161-169, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36262101

RESUMO

BACKGROUND: Regular coffee consumption has beneficial and preventative effects on liver and chronic neurodegenerative diseases. However, the studies performed with the ingredients found in coffee beverages have not clarified the responsible mechanisms. Exosomes are small, membrane-coated cargo packages secreted by prokaryote and eukaryote cells. Exosomes regulate intercellular communication and affect cellular metabolic activities even among different species. In this study, we aimed to isolate and characterize the edible plant-derived exosome-like nanoparticles from roasted hot coffee beverages, hypothesizing that the edible plant-derived exosome-like nanoparticles were responsible for the beneficial effects of coffee. METHODS: Size exclusion chromatography and commercial kits were used for the isolation process. Efficient coffee edible plant-derived exosome-like nanoparticle fractions were determined by an ultraviolet-visible spectrophotometer. Harvested coffee edible plant-derived exosome-like nanoparticles were characterized by transmission electron microscopy. The quantification procedure was performed using a commercial kit. Coffee edible plant-derived exosome-like nanoparticles' proliferative effects on human hepatic stellate cells and human hepatocellular carcinoma cells were studied using an MTT (3-(4,5-Dimethylthiazol-2-yl)-2,5-Diphenyltetrazolium Bromide) assay. Whole-exosome RNA sequencing was performed. RESULTS: Transmission electron microscopy scanning analysis indicated round-shaped nanoparticles with sizes ranging from 40 to 100 nm. Both size exclusion chromatography and kit-isolated edible plant-derived exosome-like nanoparticle samples showed maximum absorbance at 227.5 nm in ultraviolet-visible spectrophotometer analysis. Regarding the quantitation results, kit isolation was more efficient than the size exclusion chromatography method when the harvested particle numbers were compared. An important MTT assay finding confirmed the observed beneficial effects of coffee beverages: coffee edible plant-derived exosome-like nanoparticles significantly suppressed hepatocellular carcinoma cell proliferation. As a result of sequencing, we identified 15 mature miRNAs. A MapReduce-based MicroRNA Target Prediction Method (The DIANA tools' MR-microT algorithm) highlighted 2 genes specifically associated with the miRNAs that we obtained: KMT2C and ZNF773. CONCLUSION: For the first time in the literature, coffee edible plant-derived exosome-like nanoparticles were identified. These nanoparticles may have therapeutic effects on chronic liver diseases. Experimental studies, therefore, should be performed on disease models to demonstrate their efficacy.


Assuntos
Carcinoma Hepatocelular , Exossomos , Neoplasias Hepáticas , MicroRNAs , Nanopartículas , Humanos , Café/metabolismo , Exossomos/química , Exossomos/genética , Exossomos/metabolismo , MicroRNAs/metabolismo
3.
Artigo em Inglês | MEDLINE | ID: mdl-35865106

RESUMO

Blockchain is an emerging technology that has enabled many applications, from cryptocurrencies to digital asset management and supply chains. Due to this surge of popularity, analyzing the data stored on blockchains poses a new critical challenge in data science. To assist data scientists in various analytic tasks for a blockchain, in this tutorial, we provide a systematic and comprehensive overview of the fundamental elements of blockchain network models. We discuss how we can abstract blockchain data as various types of networks and further use such associated network abstractions to reap important insights on blockchains' structure, organization, and functionality. This article is categorized under:Technologies > Data PreprocessingApplication Areas > Business and IndustryFundamental Concepts of Data and Knowledge > Data ConceptsFundamental Concepts of Data and Knowledge > Knowledge Representation.

5.
Nat Rev Genet ; 23(7): 429-445, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35246669

RESUMO

Recent developments in a variety of sectors, including health care, research and the direct-to-consumer industry, have led to a dramatic increase in the amount of genomic data that are collected, used and shared. This state of affairs raises new and challenging concerns for personal privacy, both legally and technically. This Review appraises existing and emerging threats to genomic data privacy and discusses how well current legal frameworks and technical safeguards mitigate these concerns. It concludes with a discussion of remaining and emerging challenges and illustrates possible solutions that can balance protecting privacy and realizing the benefits that result from the sharing of genetic information.


Assuntos
Genômica , Privacidade , Genoma
6.
J Am Med Inform Assoc ; 29(5): 853-863, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-35182149

RESUMO

OBJECTIVE: Supporting public health research and the public's situational awareness during a pandemic requires continuous dissemination of infectious disease surveillance data. Legislation, such as the Health Insurance Portability and Accountability Act of 1996 and recent state-level regulations, permits sharing deidentified person-level data; however, current deidentification approaches are limited. Namely, they are inefficient, relying on retrospective disclosure risk assessments, and do not flex with changes in infection rates or population demographics over time. In this paper, we introduce a framework to dynamically adapt deidentification for near-real time sharing of person-level surveillance data. MATERIALS AND METHODS: The framework leverages a simulation mechanism, capable of application at any geographic level, to forecast the reidentification risk of sharing the data under a wide range of generalization policies. The estimates inform weekly, prospective policy selection to maintain the proportion of records corresponding to a group size less than 11 (PK11) at or below 0.1. Fixing the policy at the start of each week facilitates timely dataset updates and supports sharing granular date information. We use August 2020 through October 2021 case data from Johns Hopkins University and the Centers for Disease Control and Prevention to demonstrate the framework's effectiveness in maintaining the PK11 threshold of 0.01. RESULTS: When sharing COVID-19 county-level case data across all US counties, the framework's approach meets the threshold for 96.2% of daily data releases, while a policy based on current deidentification techniques meets the threshold for 32.3%. CONCLUSION: Periodically adapting the data publication policies preserves privacy while enhancing public health utility through timely updates and sharing epidemiologically critical features.


Assuntos
COVID-19 , Privacidade , Humanos , Pandemias , Políticas , Estudos Prospectivos , Saúde Pública , Estudos Retrospectivos
7.
JMIR Infodemiology ; 2(2): e35702, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37113452

RESUMO

Background: As direct-to-consumer genetic testing services have grown in popularity, the public has increasingly relied upon online forums to discuss and share their test results. Initially, users did so anonymously, but more recently, they have included face images when discussing their results. Various studies have shown that sharing images on social media tends to elicit more replies. However, users who do this forgo their privacy. When these images truthfully represent a user, they have the potential to disclose that user's identity. Objective: This study investigates the face image sharing behavior of direct-to-consumer genetic testing users in an online environment to determine if there exists an association between face image sharing and the attention received from other users. Methods: This study focused on r/23andme, a subreddit dedicated to discussing direct-to-consumer genetic testing results and their implications. We applied natural language processing to infer the themes associated with posts that included a face image. We applied a regression analysis to characterize the association between the attention that a post received, in terms of the number of comments, the karma score (defined as the number of upvotes minus the number of downvotes), and whether the post contained a face image. Results: We collected over 15,000 posts from the r/23andme subreddit, published between 2012 and 2020. Face image posting began in late 2019 and grew rapidly, with over 800 individuals revealing their faces by early 2020. The topics in posts including a face were primarily about sharing, discussing ancestry composition, or sharing family reunion photos with relatives discovered via direct-to-consumer genetic testing. On average, posts including a face image received 60% (5/8) more comments and had karma scores 2.4 times higher than other posts. Conclusions: Direct-to-consumer genetic testing consumers in the r/23andme subreddit are increasingly posting face images and testing reports on social platforms. The association between face image posting and a greater level of attention suggests that people are forgoing their privacy in exchange for attention from others. To mitigate this risk, platform organizers and moderators could inform users about the risk of posting face images in a direct, explicit manner to make it clear that their privacy may be compromised if personal images are shared.

8.
AMIA Annu Symp Proc ; 2022: 259-268, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128377

RESUMO

Scientific and clinical studies have a long history of bias in recruitment of underprivileged and minority populations. This underrepresentation leads to inaccurate, inapplicable, and non-generalizable results. Electronic medical record (EMR) systems, which now drive much research, often poorly represent these groups. We introduce a method for quantifying representativeness using information theoretic measures and an algorithmic approach to select a more representative record cohort than random selection when resource limitations preclude researchers from reviewing every record in the database. We apply this method to select cohorts of 2,000-20,000 records from a large (2M+ records) EMR database at the Vanderbilt University Medical Center and assess representativeness based on age, ethnicity, race, and gender. Compared to random selection - which will on average mirror the EMR database demographics - we find that a representativeness-informed approach can compose a cohort of records that is approximately 5.8 times more representative.


Assuntos
Gerenciamento de Dados , Registros Eletrônicos de Saúde , Humanos , Software , Bases de Dados Factuais
9.
AMIA Annu Symp Proc ; 2022: 279-288, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128430

RESUMO

Data access limitations have stifled COVID-19 disparity investigations in the United States. Though federal and state legislation permits publicly disseminating de-identified data, methods for de-identification, including a recently proposed dynamic policy approach to pandemic data sharing, remain unproved in their ability to support pandemic disparity studies. Thus, in this paper, we evaluate how such an approach enables timely, accurate, and fair disparity detection, with respect to potential adversaries with varying prior knowledge about the population. We show that, when considering reasonably enabled adversaries, dynamic policies support up to three times earlier disparity detection in partially synthetic data than data sharing policies derived from two current, public datasets. Using real-world COVID-19 data, we also show how granular date information, which dynamic policies were designed to share, improves disparity characterization. Our results highlight the potential of the dynamic policy approach to publish data that supports disparity investigations in current and future pandemics.


Assuntos
COVID-19 , Humanos , Estados Unidos , COVID-19/epidemiologia , Políticas , Disseminação de Informação , Pandemias , Vigilância em Saúde Pública/métodos
10.
Sci Adv ; 7(50): eabe9986, 2021 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-34890225

RESUMO

Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resources, combining them in a stage-wise manner, to enhance the chance of an attack's success. In this work, we represent a re-identification game using a two-player Stackelberg game of perfect information, which can be applied to assess risk, and suggest an optimal data sharing strategy based on a privacy-utility tradeoff. We report on experiments with large-scale genomic datasets to show that, using game theoretic models accounting for adversarial capabilities to launch multistage attacks, most data can be effectively shared with low re-identification risk.

11.
AMIA Annu Symp Proc ; 2021: 793-802, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35309009

RESUMO

Numerous studies have shown that a person's health status is closely related to their socioeconomic status. It is evident that incorporating socioeconomic data associated with a patient's geographic area of residence into clinical datasets will promote medical research. However, most socioeconomic variables are unique in combination and are affiliated with small geographical regions (e.g., census tracts) that are often associated with less than 20,000 people. Thus, sharing such tract-level data can violate the Safe Harbor implementation of de-identification under the Health Insurance Portability and Accountability Act of 1996 (HIPAA). In this paper, we introduce a constraint-based k-means clustering approach to generate census tract-level socioeconomic data that is de-identification compliant. Our experimental analysis with data from the American Community Survey illustrates that the approach generates a protected dataset with high similarity to the unaltered values, and achieves a substantially better data utility than the HIPAA Safe Harbor recommendation of 3-digit ZIP code.


Assuntos
Pesquisa Biomédica , Setor Censitário , Análise por Conglomerados , Health Insurance Portability and Accountability Act , Humanos , Classe Social , Estados Unidos
12.
IEEE Trans Dependable Secure Comput ; 18(5): 2061-2073, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35342375

RESUMO

Transparency has become a critical need in machine learning (ML) applications. Designing transparent ML models helps increase trust, ensure accountability, and scrutinize fairness. Some organizations may opt-out of transparency to protect individuals' privacy. Therefore, there is a great demand for transparency models that consider both privacy and security risks. Such transparency models can motivate organizations to improve their credibility by making the ML-based decision-making process comprehensible to end-users. Differential privacy (DP) provides an important technique to disclose information while protecting individual privacy. However, it has been shown that DP alone cannot prevent certain types of privacy attacks against disclosed ML models. DP with low ϵ values can provide high privacy guarantees, but may result in significantly weaker ML models in terms of accuracy. On the other hand, setting ϵ value too high may lead to successful privacy attacks. This raises the question whether we can disclose accurate transparent ML models while preserving privacy. In this paper we introduce a novel technique that complements DP to ensure model transparency and accuracy while being robust against model inversion attacks. We show that combining the proposed technique with DP provide highly transparent and accurate ML models while preserving privacy against model inversion attacks.

13.
Hepatol Forum ; 2(3): 89-90, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35784905
14.
BMC Med Genomics ; 13(Suppl 7): 82, 2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32693807

RESUMO

BACKGROUND: Blockchain has emerged as a decentralized and distributed framework that enables tamper-resilience and, thus, practical immutability for stored data. This immutability property is important in scenarios where auditability is desired, such as in maintaining access logs for sensitive healthcare and biomedical data. However, the underlying data structure of blockchain, by default, does not provide capabilities to efficiently query the stored data. In this investigation, we show that it is possible to efficiently run complex audit queries over the access log data stored on blockchains by using additional key-value stores. This paper specifically reports on the approach we designed for the blockchain track of iDASH Privacy & Security Workshop 2018 competition. In this track, participants were asked to devise an efficient way to run conjunctive equality and range queries on a genomic dataset access log trail after storing it in a permissioned blockchain network consisting of 4 identical nodes, each representing a different site, created with the Multichain platform. METHODS: Multichain duplicates and indexes blockchain data locally at each node in a key-value store to support retrieval requests at a later point in time. To efficiently leverage the key-value storage mechanism, we applied various techniques and optimizations, such as bucketization, simple data duplication and batch loading by accounting for the required query types of the competition and the interface provided by Multichain. Particularly, we implemented our solution and compared its loading and query-response performance with SQLite, a commonly used relational database, using the data provided by the iDASH 2018 organizers. RESULTS: Depending on the query type and the data size, the run time difference between blockchain based query-response and SQLite based query-response ranged from 0.2 seconds to 6 seconds. A deeper inspection revealed that range queries were the bottleneck of our solution which, nevertheless, scales up linearly. CONCLUSIONS: This investigation demonstrates that blockchain-based systems can provide reasonable query-response times to complex queries even if they only use simple key-value stores to manage their data. Consequently, we show that blockchains may be useful for maintaining data with auditability and immutability requirements across multiple sites.


Assuntos
Blockchain , Armazenamento e Recuperação da Informação , Algoritmos , Humanos
15.
AMIA Annu Symp Proc ; 2019: 607-616, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308855

RESUMO

To accelerate medical knowledge discovery, an increasing number of research programs are gathering and sharing data on a large number of participants. Due to the privacy concerns and legal restrictions on data sharing, these programs apply various strategies to mitigate privacy risk. However, the activities of participants and research program sponsors, particularly on social media, might reveal an individual's membership in a study, making it easier to recognize participants' records and uncover the information they have yet to disclose. This behavior can jeopardize the privacy of the participants themselves, the reputation of the projects, sponsors, and the research enterprise. To investigate the dangers of self-disclosure behavior, we gathered and analyzed 4,020 tweets, and uncovered over 100 tweets disclosing the individuals' memberships in over 15 programs. Our investigation showed that self-disclosure on social media can reveal participants' membership in research cohorts, and such activity might lead to the leakage of a person's identity, genomic, and other sensitive health information.


Assuntos
Pesquisa Biomédica , Revelação , Disseminação de Informação , Autorrevelação , Mídias Sociais , Ensaios Clínicos como Assunto , Feminino , Humanos , Masculino , Privacidade
17.
AMIA Annu Symp Proc ; 2018: 760-769, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815118

RESUMO

As the quantity and detail of association studies between clinical phenotypes and genotypes grows, there is a push to make summary statistics widely available. Genome wide summary statistics have been shown to be vulnerable to the inference of a targeted individual's presence. In this paper, we show that presence attacks are feasible with phenome wide summary statistics as well. We use data from three healthcare organizations and an online resource that publishes summary statistics. We introduce a novel attack that achieves over 80% recall and precision within a population of 16,346, where 8,173 individuals are targets. However, the feasibility of the attack is dependent on the attacker's knowledge about 1) the targeted individual and 2) the reference dataset. Within a population of over 2 million, where 8,173 individuals are targets, our attack achieves 31% recall and 17% precision. As a result, it is plausible that sharing of phenomic summary statistics may be accomplished with an acceptable level of privacy risk.


Assuntos
Segurança Computacional , Informações Pessoalmente Identificáveis , Fenótipo , Estudo de Associação Genômica Ampla , Genótipo , Humanos
18.
J Am Med Inform Assoc ; 25(1): 25-31, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036325

RESUMO

Objective: Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods: This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results: The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion: The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.


Assuntos
Pesquisa Biomédica , Conjuntos de Dados como Assunto , Disseminação de Informação , Fator de Impacto de Revistas , Publicações/normas , Fatores de Tempo
19.
Mol Imaging Radionucl Ther ; 26(3): 83-92, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28976330

RESUMO

OBJECTIVE: Non-Hodgkin's lymphomas arising from tissues other than primary lymphatic sites are classified as primary extranodal lymphomas (PEL). PELs of the gastrointestinal system (PGISL) originate from the lymphatic tissues within the gastrointestinal tract. The prognostic value of 18F-FDG PET/CT in lymphomas is high in terms of both overall survival (OS) and disease-free survival (DFS). Our aim was to investigate the uptake patterns and properties of low-grade and high-grade PGISL on primary staging 18F-FDG PET/CT, as well as the prognostic significance of metabolic tumor parameters in high grade PGISL. METHODS: Thirty-nine patients with PGISL were enrolled in this retrospective cohort study between 2004-2015. Primary staging 18F-FDG PET/CT have been performed and quantitative parameters of SUVmax, SUVmean, metabolic tumor volume (MTV), total lesion glycolysis (TLG) have been calculated for all patients prior to treatment. Low-grade and high-grade PGISL were compared in terms of metabolic tumor parameters. Cox regression models were performed to determine factors that correlate with DFS in high-grade PGISL. RESULTS: There were statistically significant differences between high-grade and low-grade PGISL in terms of SUVmax, SUVmean, MTV, TLG, recurrence, mortality, DFS and OS. None of the potential risk factors (sex, age, site, SUVmax, SUVmean, MTV, TLG) for recurrence and metastasis in high grade PGISL was identified as a risk factor on univariate and multivariate Cox regression analysis. CONCLUSION: Metabolic tumor parameters are not predictive markers in high-grade PGISL, especially in diffuse large B cell variant and primary gastric lymphoma. The first implications suggest they will not play a role in patient management.

20.
BMC Med Genomics ; 10(Suppl 2): 39, 2017 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-28786360

RESUMO

BACKGROUND: Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual's DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track. METHODS: This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency. RESULTS: Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their responses accordingly. CONCLUSIONS: This research shows that it is possible to thwart the attack on Beacon services, without substantially altering the utility of the system, using computational methods. The method we initially developed is limited by the design of the scenario and evaluation protocol for the iDASH Challenge; however, it can be improved by allowing the data custodian to act in a staged manner.


Assuntos
Segurança Computacional , Genômica , Disseminação de Informação/métodos , Frequência do Gene , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...