Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Genome Biol ; 24(1): 271, 2023 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-38053191

RESUMO

BACKGROUND: Genotype imputation is an essential step in genetic studies to improve data quality and statistical power. Public imputation servers are widely used by researchers to impute their data using otherwise access-controlled reference panels of high-fidelity genomes held by these servers. RESULTS: We report evidence against the prevailing assumption that providing access to panels only indirectly via imputation servers poses a negligible privacy risk to individuals in the panels. To this end, we present algorithmic strategies for adaptively constructing artificial input samples and interpreting their imputation results that lead to the accurate reconstruction of reference panel haplotypes. We illustrate this possibility on three reference panels of real genomes for a range of imputation tools and output settings. Moreover, we demonstrate that reconstructed haplotypes from the same individual could be linked via their genetic relatives using our Bayesian linking algorithm, which allows a substantial portion of the individual's diploid genome to be reassembled. We also provide population genetic estimates of the proportion of a panel that could be linked when an adversary holds a varying number of genomes from the same population. CONCLUSIONS: Our results show that genomes in imputation server reference panels can be vulnerable to reconstruction, implying that additional safeguards may need to be considered. We suggest possible mitigation measures based on our findings. Our work illustrates the value of adversarial algorithms in uncovering new privacy risks to help inform the genomics community towards secure data sharing practices.


Assuntos
Estudo de Associação Genômica Ampla , Genoma , Humanos , Teorema de Bayes , Estudo de Associação Genômica Ampla/métodos , Genótipo , Haplótipos , Polimorfismo de Nucleotídeo Único
2.
Genome Res ; 33(7): 1101-1112, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37541758

RESUMO

Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another data set, known as a linking attack. Prior works show such a risk could analyze only a fraction of eQTLs that is independent owing to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce the discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all known eQTLs in a genomic region, DSM improves the power of linking attacks with necessary calibration for linkage disequilibrium and redundant predictive signals. We show greater linking accuracy of DSM compared with existing approaches across a range of attack scenarios and data sets including up to 22,288 individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics data sets beyond transcriptomics.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Humanos , Perfilação da Expressão Gênica , Genótipo , Locos de Características Quantitativas , Polimorfismo de Nucleotídeo Único
3.
Nucleic Acids Res ; 51(W1): W535-W541, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37246709

RESUMO

Advances in genomics are increasingly depending upon the ability to analyze large and diverse genomic data collections, which are often difficult to amass due to privacy concerns. Recent works have shown that it is possible to jointly analyze datasets held by multiple parties, while provably preserving the privacy of each party's dataset using cryptographic techniques. However, these tools have been challenging to use in practice due to the complexities of the required setup and coordination among the parties. We present sfkit, a secure and federated toolkit for collaborative genomic studies, to allow groups of collaborators to easily perform joint analyses of their datasets without compromising privacy. sfkit consists of a web server and a command-line interface, which together support a range of use cases including both auto-configured and user-supplied computational environments. sfkit provides collaborative workflows for the essential tasks of genome-wide association study (GWAS) and principal component analysis (PCA). We envision sfkit becoming a one-stop server for secure collaborative tools for a broad range of genomic analyses. sfkit is open-source and available at: https://sfkit.org.


Assuntos
Estudo de Associação Genômica Ampla , Genômica , Software , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Internet , Privacidade , Fluxo de Trabalho
4.
Genome Biol ; 24(1): 5, 2023 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-36631897

RESUMO

Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3-4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.


Assuntos
Biologia Computacional , Disseminação de Informação
5.
Proc IEEE Symp Secur Priv ; 2023: 1908-1925, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-38665901

RESUMO

Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.

7.
Comput Vis ECCV ; 13681: 661-678, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37525827

RESUMO

The application of modern machine learning to retinal image analyses offers valuable insights into a broad range of human health conditions beyond ophthalmic diseases. Additionally, data sharing is key to fully realizing the potential of machine learning models by providing a rich and diverse collection of training data. However, the personallyidentifying nature of retinal images, encompassing the unique vascular structure of each individual, often prevents this data from being shared openly. While prior works have explored image de-identification strategies based on synthetic averaging of images in other domains (e.g. facial images), existing techniques face difficulty in preserving both privacy and clinical utility in retinal images, as we demonstrate in our work. We therefore introduce k-SALSA, a generative adversarial network (GAN)-based framework for synthesizing retinal fundus images that summarize a given private dataset while satisfying the privacy notion of k-anonymity. k-SALSA brings together state-of-the-art techniques for training and inverting GANs to achieve practical performance on retinal images. Furthermore, k-SALSA leverages a new technique, called local style alignment, to generate a synthetic average that maximizes the retention of fine-grain visual patterns in the source images, thus improving the clinical utility of the generated images. On two benchmark datasets of diabetic retinopathy (EyePACS and APTOS), we demonstrate our improvement upon existing methods with respect to image fidelity, classification performance, and mitigation of membership inference attacks. Our work represents a step toward broader sharing of retinal images for scientific collaboration. Code is available at https://github.com/hcholab/k-salsa.

8.
IEEE Trans Inf Theory ; 68(6): 4090-4105, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37283781

RESUMO

Motivated by the growing availability of personal genomics services, we study an information-theoretic privacy problem that arises when sharing genomic data: a user wants to share his or her genome sequence while keeping the genotypes at certain positions hidden, which could otherwise reveal critical health-related information. A straightforward solution of erasing (masking) the chosen genotypes does not ensure privacy, because the correlation between nearby positions can leak the masked genotypes. We introduce an erasure-based privacy mechanism with perfect information-theoretic privacy, whereby the released sequence is statistically independent of the sensitive genotypes. Our mechanism can be interpreted as a locally-optimal greedy algorithm for a given processing order of sequence positions, where utility is measured by the number of positions released without erasure. We show that finding an optimal order is NP-hard in general and provide an upper bound on the optimal utility. For sequences from hidden Markov models, a standard modeling approach in genetics, we propose an efficient algorithmic implementation of our mechanism with complexity polynomial in sequence length. Moreover, we illustrate the robustness of the mechanism by bounding the privacy leakage from erroneous prior distributions. Our work is a step towards more rigorous control of privacy in genomic data sharing.

11.
Nat Commun ; 12(1): 5910, 2021 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-34635645

RESUMO

Using real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics. Using our system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone. Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations.


Assuntos
Medicina de Precisão , Privacidade , Algoritmos , Segurança Computacional , Atenção à Saúde , Estudo de Associação Genômica Ampla , Humanos , Estimativa de Kaplan-Meier , Análise de Sobrevida
12.
Cell Syst ; 12(10): 983-993.e7, 2021 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-34450045

RESUMO

Genotype imputation is an essential tool in genomics research, whereby missing genotypes are inferred using reference genomes to enhance downstream analyses. Recently, public imputation servers have allowed researchers to leverage large-scale genomic data resources for imputation. However, privacy concerns about uploading one's genetic data to a server limit the utility of these services. We introduce a secure hardware-based solution for privacy-preserving genotype imputation, which keeps the input genomes private by processing them within Intel SGX's trusted execution environment. Our solution features SMac, an efficient and secure imputation algorithm designed for Intel SGX, which employs a state-of-the-art imputation strategy also utilized by existing imputation servers. SMac achieves imputation accuracy equivalent to existing tools and provides protection against known side-channel attacks on SGX while maintaining scalability. We also show the necessity of our enhanced security by identifying vulnerabilities in existing imputation software. Our work represents a step toward privacy-preserving genomic analysis services.


Assuntos
Genômica , Privacidade , Algoritmos , Genótipo , Software
13.
Bioinformatics ; 37(Suppl_1): i349-i357, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252956

RESUMO

MOTIVATION: Recent advances in single-cell RNA-sequencing (scRNA-seq) technologies promise to enable the study of gene regulatory associations at unprecedented resolution in diverse cellular contexts. However, identifying unique regulatory associations observed only in specific cell types or conditions remains a key challenge; this is particularly so for rare transcriptional states whose sample sizes are too small for existing gene regulatory network inference methods to be effective. RESULTS: We present ShareNet, a Bayesian framework for boosting the accuracy of cell type-specific gene regulatory networks by propagating information across related cell types via an information sharing structure that is adaptively optimized for a given single-cell dataset. The techniques we introduce can be used with a range of general network inference algorithms to enhance the output for each cell type. We demonstrate the enhanced accuracy of our approach on three benchmark scRNA-seq datasets. We find that our inferred cell type-specific networks also uncover key changes in gene associations that underpin the complex rewiring of regulatory networks across cell types, tissues and dynamic biological processes. Our work presents a path toward extracting deeper insights about cell type-specific gene regulation in the rapidly growing compendium of scRNA-seq datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. AVAILABILITY AND IMPLEMENTATION: The code for ShareNet is available at http://sharenet.csail.mit.edu and https://github.com/alexw16/sharenet.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Teorema de Bayes , Disseminação de Informação , Análise de Sequência de RNA , Software
14.
Nat Biotechnol ; 39(6): 765-774, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33462509

RESUMO

Nonlinear data visualization methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), summarize the complex transcriptomic landscape of single cells in two dimensions or three dimensions, but they neglect the local density of data points in the original space, often resulting in misleading visualizations where densely populated subsets of cells are given more visual space than warranted by their transcriptional diversity in the dataset. Here we present den-SNE and densMAP, which are density-preserving visualization tools based on t-SNE and UMAP, respectively, and demonstrate their ability to accurately incorporate information about transcriptomic variability into the visual interpretation of single-cell RNA sequencing data. Applied to recently published datasets, our methods reveal significant changes in transcriptomic variability in a range of biological processes, including heterogeneity in transcriptomic variability of immune cells in blood and tumor, human immune cell specialization and the developmental trajectory of Caenorhabditis elegans. Our methods are readily applicable to visualizing high-dimensional data in other scientific domains.


Assuntos
Visualização de Dados , Análise de Célula Única , Transcriptoma , Algoritmos , Perfilação da Expressão Gênica/métodos , Humanos , Análise de Componente Principal
15.
Cell Syst ; 10(5): 408-416.e9, 2020 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32359425

RESUMO

Sharing data across research groups is an essential driver of biomedical research. While interactive query-answering systems for biomedical databases aim to facilitate the sharing of aggregate insights without divulging sensitive individual-level data, query answers can still leak private information about the individuals in the database. Here, we draw upon recent advances in differential privacy to introduce query-answering mechanisms that provably maximize the utility (e.g., accuracy) of the system while achieving formal privacy guarantees. We demonstrate our accuracy improvement over existing approaches for a range of use cases, including cohort discovery, variant lookup, and association testing. Our new theoretical results extend the proof of optimality of the underlying mechanism, previously known only for count queries with symmetric utility functions, to more general utility functions needed for key biomedical research workflows. Our work presents a path toward interactive biomedical databases that achieve the optimal privacy-utility trade-offs permitted by the theory of differential privacy.


Assuntos
Disseminação de Informação/ética , Disseminação de Informação/métodos , Bases de Dados Factuais , Genômica/métodos , Humanos , Prontuários Médicos , Privacidade
16.
Genome Biol ; 20(1): 128, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31262363

RESUMO

As the scale of genomic and health-related data explodes and our understanding of these data matures, the privacy of the individuals behind the data is increasingly at stake. Traditional approaches to protect privacy have fundamental limitations. Here we discuss emerging privacy-enhancing technologies that can enable broader data sharing and collaboration in genomics research.


Assuntos
Privacidade Genética , Genômica/ética , Disseminação de Informação/ética , Genoma Humano , Humanos
17.
Cell Syst ; 8(6): 483-493.e7, 2019 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-31176620

RESUMO

Large-scale single-cell RNA sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a dataset using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks, such as scRNA-seq data integration, while maintaining accuracy. We anticipate our algorithm will become an increasingly essential step when sharing and analyzing the rapidly growing volume of scRNA-seq data and help enable the democratization of single-cell omics.


Assuntos
Análise de Célula Única/métodos , Transcriptoma , Algoritmos , Animais , Análise de Dados , Conjuntos de Dados como Assunto , Heterogeneidade Genética , Humanos , Macrófagos , RNA-Seq
18.
Proc Mach Learn Res ; 89: 1832-1840, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32832915

RESUMO

Representing data in hyperbolic space can effectively capture latent hierarchical relationships. To enable accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and describe its theoretical connection to the Euclidean counterpart. We also generalize Euclidean kernel SVM to hyperbolic space, allowing nonlinear hyperbolic decision boundaries and providing a geometric interpretation for a certain class of indefinite kernels. Hyperbolic SVM improves classification accuracy in simulation and in real-world problems involving complex networks and word embeddings. Our work enables end-to-end analyses based on the inherent hyperbolic geometry of the data without resorting to ill-fitting tools developed for Euclidean space.

19.
Science ; 362(6412): 347-350, 2018 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-30337410

RESUMO

Although combining data from multiple entities could power life-saving breakthroughs, open sharing of pharmacological data is generally not viable because of data privacy and intellectual property concerns. To this end, we leverage modern cryptographic tools to introduce a computational protocol for securely training a predictive model of drug-target interactions (DTIs) on a pooled dataset that overcomes barriers to data sharing by provably ensuring the confidentiality of all underlying drugs, targets, and observed interactions. Our protocol runs within days on a real dataset of more than 1 million interactions and is more accurate than state-of-the-art DTI prediction methods. Using our protocol, we discover previously unidentified DTIs that we experimentally validated via targeted assays. Our work lays a foundation for more effective and cooperative biomedical research.


Assuntos
Confidencialidade , Bases de Dados de Produtos Farmacêuticos/legislação & jurisprudência , Sistemas de Liberação de Medicamentos , Disseminação de Informação/legislação & jurisprudência , Disseminação de Informação/métodos , Farmacologia/legislação & jurisprudência , Simulação por Computador , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA