Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.697
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 183(4): 905-917.e16, 2020 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-33186529

RESUMO

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.


Assuntos
Segurança Computacional , Genômica , Privacidade , Genoma Humano , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Fenótipo , Filogenia , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Análise de Célula Única
2.
Cell ; 175(3): 848-858.e6, 2018 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-30318150

RESUMO

In familial searching in forensic genetics, a query DNA profile is tested against a database to determine whether it represents a relative of a database entrant. We examine the potential for using linkage disequilibrium to identify pairs of profiles as belonging to relatives when the query and database rely on nonoverlapping genetic markers. Considering data on individuals genotyped with both microsatellites used in forensic applications and genome-wide SNPs, we find that ∼30%-32% of parent-offspring pairs and ∼35%-36% of sib pairs can be identified from the SNPs of one member of the pair and the microsatellites of the other. The method suggests the possibility of performing familial searches of microsatellite databases using query SNP profiles, or vice versa. It also reveals that privacy concerns arising from computations across multiple databases that share no genetic markers in common entail risks, not only for database entrants, but for their close relatives as well.


Assuntos
Família , Genética Forense/métodos , Genética Populacional/métodos , Técnicas de Genotipagem/métodos , Polimorfismo de Nucleotídeo Único , Feminino , Humanos , Desequilíbrio de Ligação , Masculino , Repetições de Microssatélites , Modelos Genéticos , Modelos Estatísticos , Linhagem
3.
Trends Genet ; 40(5): 383-386, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38637270

RESUMO

Artificial intelligence (AI) in omics analysis raises privacy threats to patients. Here, we briefly discuss risk factors to patient privacy in data sharing, model training, and release, as well as methods to safeguard and evaluate patient privacy in AI-driven omics methods.


Assuntos
Inteligência Artificial , Genômica , Humanos , Genômica/métodos , Privacidade , Disseminação de Informação
4.
Proc Natl Acad Sci U S A ; 121(21): e2400787121, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38758697

RESUMO

We show that adding noise before publishing data effectively screens [Formula: see text]-hacked findings: spurious explanations produced by fitting many statistical models (data mining). Noise creates "baits" that affect two types of researchers differently. Uninformed [Formula: see text]-hackers, who are fully ignorant of the true mechanism and engage in data mining, often fall for baits. Informed researchers, who start with an ex ante hypothesis, are minimally affected. We show that as the number of observations grows large, dissemination noise asymptotically achieves optimal screening. In a tractable special case where the informed researchers' theory can identify the true causal mechanism with very few data, we characterize the optimal level of dissemination noise and highlight the relevant trade-offs. Dissemination noise is a tool that statistical agencies currently use to protect privacy. We argue this existing practice can be repurposed to screen [Formula: see text]-hackers and thus improve research credibility.

5.
Trends Genet ; 39(5): 335-337, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36707316

RESUMO

Re-identification from data used in precision medicine research is presumed to create minimal risk but may disproportionately impact health disparity populations. We consider plausible privacy risks and the negative ramifications thereof for people with disabilities, the largest health disparity population in the USA, and suggest measures to address these concerns.


Assuntos
Pessoas com Deficiência , Medicina de Precisão , Humanos , Privacidade
6.
Annu Rev Genomics Hum Genet ; 24: 333-346, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-36630592

RESUMO

This article reviews evolving legal implications for clinicians and researchers as genomics is used more widely in both the clinic and in translational research, reflecting rapid changes in scientific knowledge as well as the surrounding cultural and political environment. Professionals will face new and changing duties to make or act upon a genetic diagnosis, address direct-to-consumer genetic testing in patient care, consider the health implications of results for patients' family members, and recontact patients when test results change over time. Professional duties in reproductive genetic testing will need to be recalibrated in response to disruptive changes to reproductive rights in the United States. We also review the debate over who controls the flow of genetic information and who is responsible for its protection, considering the globally influential European Union General Data Protection Regulation and the rapidly evolving data privacy law landscape of the United States.


Assuntos
Instituições de Assistência Ambulatorial , Triagem e Testes Direto ao Consumidor , Humanos , União Europeia , Família , Genômica
7.
Annu Rev Genomics Hum Genet ; 24: 347-368, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37253596

RESUMO

Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data-providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards' legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.


Assuntos
Fenbendazol , Privacidade , Humanos , Instalações de Saúde , Disseminação de Informação , Políticas
8.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39073827

RESUMO

Genome-wide association studies (GWAS) serve as a crucial tool for identifying genetic factors associated with specific traits. However, ethical constraints prevent the direct exchange of genetic information, prompting the need for privacy preservation solutions. To address these issues, earlier works are based on cryptographic mechanisms such as homomorphic encryption, secure multi-party computing, and differential privacy. Very recently, federated learning has emerged as a promising solution for enabling secure and collaborative GWAS computations. This work provides an extensive overview of existing methods for GWAS privacy preserving, with the main focus on collaborative and distributed approaches. This survey provides a comprehensive analysis of the challenges faced by existing methods, their limitations, and insights into designing efficient solutions.


Assuntos
Privacidade Genética , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Humanos , Genômica/métodos , Segurança Computacional
9.
Mol Cell Proteomics ; 23(3): 100731, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38331191

RESUMO

Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.


Assuntos
Privacidade , Proteômica , Humanos , Genômica , Metadados , Disseminação de Informação
10.
Proc Natl Acad Sci U S A ; 120(8): e2218605120, 2023 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-36800385

RESUMO

A reconstruction attack on a private dataset D takes as input some publicly accessible information about the dataset and produces a list of candidate elements of D. We introduce a class of data reconstruction attacks based on randomized methods for nonconvex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of D from aggregate query statistics Q(D)∈ℝm but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identity theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset D was sampled, demonstrating that they are exploiting information in the aggregate statistics Q(D) and not simply the overall structure of the distribution. In other words, the queries Q(D) are permitting reconstruction of elements of this dataset, not the distribution from which D was drawn. These findings are established both on 2010 US decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset and provide further motivation for the careful application of provably private techniques such as differential privacy.

11.
Proc Natl Acad Sci U S A ; 120(33): e2304415120, 2023 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-37549296

RESUMO

Real-world healthcare data sharing is instrumental in constructing broader-based and larger clinical datasets that may improve clinical decision-making research and outcomes. Stakeholders are frequently reluctant to share their data without guaranteed patient privacy, proper protection of their datasets, and control over the usage of their data. Fully homomorphic encryption (FHE) is a cryptographic capability that can address these issues by enabling computation on encrypted data without intermediate decryptions, so the analytics results are obtained without revealing the raw data. This work presents a toolset for collaborative privacy-preserving analysis of oncological data using multiparty FHE. Our toolset supports survival analysis, logistic regression training, and several common descriptive statistics. We demonstrate using oncological datasets that the toolset achieves high accuracy and practical performance, which scales well to larger datasets. As part of this work, we propose a cryptographic protocol for interactive bootstrapping in multiparty FHE, which is of independent interest. The toolset we develop is general-purpose and can be applied to other collaborative medical and healthcare application domains.


Assuntos
Segurança Computacional , Privacidade , Humanos , Modelos Logísticos , Tomada de Decisão Clínica
12.
Proc Natl Acad Sci U S A ; 119(40): e2121024119, 2022 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-36166477

RESUMO

A set of 20 short tandem repeats (STRs) is used by the US criminal justice system to identify suspects and to maintain a database of genetic profiles for individuals who have been previously convicted or arrested. Some of these STRs were identified in the 1990s, with a preference for markers in putative gene deserts to avoid forensic profiles revealing protected medical information. We revisit that assumption, investigating whether forensic genetic profiles reveal information about gene-expression variation or potential medical information. We find six significant correlations (false discovery rate = 0.23) between the forensic STRs and the expression levels of neighboring genes in lymphoblastoid cell lines. We explore possible mechanisms for these associations, showing evidence compatible with forensic STRs causing expression variation or being in linkage disequilibrium with a causal locus in three cases and weaker or potentially spurious associations in the other three cases. Together, these results suggest that forensic genetic loci may reveal expression levels and, perhaps, medical information.


Assuntos
Genética Forense , Loci Gênicos , Repetições de Microssatélites , Privacidade , Genética Forense/legislação & jurisprudência , Genética Forense/métodos , Frequência do Gene , Genética Populacional , Humanos , Desequilíbrio de Ligação
13.
Am J Epidemiol ; 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38973755

RESUMO

Epidemiologic studies frequently use risk ratios to quantify associations between exposures and binary outcomes. When the data are physically stored at multiple data partners, it can be challenging to perform individual-level analysis if data cannot be pooled centrally due to privacy constraints. Existing methods either require multiple file transfers between each data partner and an analysis center (e.g., distributed regression) or only provide approximate estimation of the risk ratio (e.g., meta-analysis). Here we develop a practical method that requires a single transfer of eight summary-level quantities from each data partner. Our approach leverages an existing risk-set method and software originally developed for Cox regression. Sharing only summary-level information, the proposed method provides risk ratio estimates and confidence intervals identical to those that would be provided - if individual-level data were pooled - by the modified Poisson regression. We justify the method theoretically, confirm its performance using simulated data, and implement it in a distributed analysis of COVID-19 data from the U.S. Food and Drug Administration's Sentinel System.

14.
Trends Genet ; 37(2): 106-108, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32943209

RESUMO

Along with the potential for breakthroughs in care and prevention, the search for genetic mechanisms underlying the spread and severity of coronavirus disease 2019 (COVID-19) introduces the risk of discrimination against those found to have markers for susceptibility. We propose new legal protections to mitigate gaps in protections under existing laws.


Assuntos
COVID-19/genética , Predisposição Genética para Doença/genética , Privacidade Genética/legislação & jurisprudência , SARS-CoV-2/fisiologia , COVID-19/prevenção & controle , COVID-19/virologia , Marcadores Genéticos/genética , Testes Genéticos/legislação & jurisprudência , Humanos
15.
Hum Brain Mapp ; 45(9): e26721, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38899549

RESUMO

With the rise of open data, identifiability of individuals based on 3D renderings obtained from routine structural magnetic resonance imaging (MRI) scans of the head has become a growing privacy concern. To protect subject privacy, several algorithms have been developed to de-identify imaging data using blurring, defacing or refacing. Completely removing facial structures provides the best re-identification protection but can significantly impact post-processing steps, like brain morphometry. As an alternative, refacing methods that replace individual facial structures with generic templates have a lower effect on the geometry and intensity distribution of original scans, and are able to provide more consistent post-processing results by the price of higher re-identification risk and computational complexity. In the current study, we propose a novel method for anonymized face generation for defaced 3D T1-weighted scans based on a 3D conditional generative adversarial network. To evaluate the performance of the proposed de-identification tool, a comparative study was conducted between several existing defacing and refacing tools, with two different segmentation algorithms (FAST and Morphobox). The aim was to evaluate (i) impact on brain morphometry reproducibility, (ii) re-identification risk, (iii) balance between (i) and (ii), and (iv) the processing time. The proposed method takes 9 s for face generation and is suitable for recovering consistent post-processing results after defacing.


Assuntos
Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Adulto , Encéfalo/diagnóstico por imagem , Encéfalo/anatomia & histologia , Masculino , Feminino , Redes Neurais de Computação , Imageamento Tridimensional/métodos , Neuroimagem/métodos , Neuroimagem/normas , Anonimização de Dados , Adulto Jovem , Processamento de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/normas , Algoritmos
16.
Breast Cancer Res Treat ; 203(3): 523-531, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37882921

RESUMO

PURPOSE: This observational study aims to assess the feasibility of calculating indicators developed by the European Commission Initiative on Breast Cancer (ECIBC) for the Dutch breast cancer population. METHODS: Patients diagnosed with invasive or in situ breast cancer between 2012 and 2018 were selected from the Netherlands Cancer Registry (NCR). Outcomes of the quality indicators (QI) were presented as mean scores and were compared to a stated norm. Variation between hospitals was assessed by standard deviations and funnel plots and trends over time were evaluated. The quality indicator calculator (QIC) was validated by comparing these outcomes with the outcomes of constructed algorithms in Stata. RESULTS: In total, 133,527 patients were included. Data for 24 out of 26 QIs were available in the NCR. For 67% and 67% of the QIs, a mean score above the norm and low or medium hospital variation was observed, respectively. The proportion of patients undergoing a breast reconstruction or neoadjuvant systemic therapy increased over time. The proportion treated within 4 weeks from diagnosis, having >10 lymph nodes removed or estrogen negative breast cancer who underwent adjuvant chemotherapy decreased. The outcomes of the constructed algorithms in this study and the QIC showed 100% similarity. CONCLUSION: Data from the NCR could be used for the calculation of more than 92% of the ECIBC indicators. The quality of breast cancer care in the Netherlands is high, as more than half of the QIs already score above the norm and medium hospital variation was observed. The QIC can be easy and reliably applied.


Assuntos
Carcinoma de Mama in situ , Neoplasias da Mama , Humanos , Feminino , Indicadores de Qualidade em Assistência à Saúde , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/terapia , Países Baixos/epidemiologia , Hospitais
17.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36215114

RESUMO

Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties.


Assuntos
Pesquisa Biomédica , Privacidade , Estudos Retrospectivos , Anonimização de Dados , Fenótipo
18.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36384083

RESUMO

BACKGROUND: Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization. RESULTS: Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352. CONCLUSIONS: Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations. SHORT ABSTRACT: Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.


Assuntos
Estudo de Associação Genômica Ampla , Privacidade , Humanos , Genótipo , Privacidade Genética , Genoma
19.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35388408

RESUMO

Reproducibility of results obtained using ribonucleic acid (RNA) data across labs remains a major hurdle in cancer research. Often, molecular predictors trained on one dataset cannot be applied to another due to differences in RNA library preparation and quantification, which inhibits the validation of predictors across labs. While current RNA correction algorithms reduce these differences, they require simultaneous access to patient-level data from all datasets, which necessitates the sharing of training data for predictors when sharing predictors. Here, we describe SpinAdapt, an unsupervised RNA correction algorithm that enables the transfer of molecular models without requiring access to patient-level data. It computes data corrections only via aggregate statistics of each dataset, thereby maintaining patient data privacy. Despite an inherent trade-off between privacy and performance, SpinAdapt outperforms current correction methods, like Seurat and ComBat, on publicly available cancer studies, including TCGA and ICGC. Furthermore, SpinAdapt can correct new samples, thereby enabling unbiased evaluation on validation cohorts. We expect this novel correction paradigm to enhance research reproducibility and to preserve patient privacy.


Assuntos
Confidencialidade , Privacidade , Algoritmos , Humanos , RNA , Reprodutibilidade dos Testes
20.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35106557

RESUMO

DNA sequencing technologies have advanced significantly in the last few years leading to advancements in biomedical research which has improved personalised medicine and the discovery of new treatments for diseases. Sequencing technology advancement has also reduced the cost of DNA sequencing, which has led to the rise of direct-to-consumer (DTC) sequencing, e.g. 23andme.com, ancestry.co.uk, etc. In the meantime, concerns have emerged over privacy and security in collecting, handling, analysing and sharing DNA and genomic data. DNA data are unique and can be used to identify individuals. Moreover, those data provide information on people's current disease status and disposition, e.g. mental health or susceptibility for developing cancer. DNA privacy violation does not only affect the owner but also affects their close consanguinity due to its hereditary nature. This article introduces and defines the term 'digital DNA life cycle' and presents an overview of privacy and security threats and their mitigation techniques for predigital DNA and throughout the digital DNA life cycle. It covers DNA sequencing hardware, software and DNA sequence pipeline in addition to common privacy attacks and their countermeasures when DNA digital data are stored, queried or shared. Likewise, the article examines DTC genomic sequencing privacy and security.


Assuntos
Genômica , Privacidade , Animais , DNA/genética , Genoma , Genômica/métodos , Humanos , Estágios do Ciclo de Vida
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA