Pesquisa | Biblioteca Virtual em Saúde

Finding Highly Similar Regions of Genomic Sequences Through Homomorphic Encryption.

Bataa, Magsarjav; Song, Siwoo; Park, Kunsoo; Kim, Miran; Cheon, Jung Hee; Kim, Sun.

J Comput Biol ; 31(3): 197-212, 2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38531050

RESUMO

Finding highly similar regions of genomic sequences is a basic computation of genomic analysis. Genomic analyses on a large amount of data are efficiently processed in cloud environments, but outsourcing them to a cloud raises concerns over the privacy and security issues. Homomorphic encryption (HE) is a powerful cryptographic primitive that preserves privacy of genomic data in various analyses processed in an untrusted cloud environment. We introduce an efficient algorithm for finding highly similar regions of two homomorphically encrypted sequences, and describe how to implement it using the bit-wise and word-wise HE schemes. In the experiment, our algorithm outperforms an existing algorithm by up to two orders of magnitude in terms of elapsed time. Overall, it finds highly similar regions of the sequences in real data sets in a feasible time.

Assuntos

Segurança Computacional , Genômica , Algoritmos

Secure tumor classification by shallow neural network using homomorphic encryption.

Hong, Seungwan; Park, Jai Hyun; Cho, Wonhee; Choe, Hyeongmin; Cheon, Jung Hee.

BMC Genomics ; 23(1): 284, 2022 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-35395714

RESUMO

BACKGROUND: Disclosure of patients' genetic information in the process of applying machine learning techniques for tumor classification hinders the privacy of personal information. Homomorphic Encryption (HE), which supports operations between encrypted data, can be used as one of the tools to perform such computation without information leakage, but it brings great challenges for directly applying general machine learning algorithms due to the limitations of operations supported by HE. In particular, non-polynomial activation functions, including softmax functions, are difficult to implement with HE and require a suitable approximation method to minimize the loss of accuracy. In the secure genome analysis competition called iDASH 2020, it is presented as a competition task that a multi-label tumor classification method that predicts the class of samples based on genetic information using HE. METHODS: We develop a secure multi-label tumor classification method using HE to ensure privacy during all the computations of the model inference process. Our solution is based on a 1-layer neural network with the softmax activation function model and uses the approximate HE scheme. We present an approximation method that enables softmax activation in the model using HE and a technique for efficiently encoding data to reduce computational costs. In addition, we propose a HE-friendly data filtering method to reduce the size of large-scale genetic data. RESULTS: We aim to analyze the dataset from The Cancer Genome Atlas (TCGA) dataset, which consists of 3,622 samples from 11 types of cancers, genetic features from 25,128 genes. Our preprocessing method reduces the number of genes to 4,096 or less and achieves a microAUC value of 0.9882 (85% accuracy) with a 1-layer shallow neural network. Using our model, we successfully compute the tumor classification inference steps on the encrypted test data in 3.75 minutes. As a result of exceptionally high microAUC values, our solution was awarded co-first place in iDASH 2020 Track 1: "Secure multi-label Tumor classification using Homomorphic Encryption". CONCLUSIONS: Our solution is the first result of implementing a neural network model with softmax activation using HE. Also, HE optimization methods presented in this work enable machine learning implementation using HE or other challenging HE applications.

Assuntos

Segurança Computacional , Privacidade , Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Redes Neurais de Computação

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation.

Kim, Miran; Harmanci, Arif Ozgun; Bossuat, Jean-Philippe; Carpov, Sergiu; Cheon, Jung Hee; Chillotti, Ilaria; Cho, Wonhee; Froelicher, David; Gama, Nicolas; Georgieva, Mariya; Hong, Seungwan; Hubaux, Jean-Pierre; Kim, Duhyeong; Lauter, Kristin; Ma, Yiping; Ohno-Machado, Lucila; Sofia, Heidi; Son, Yongha; Song, Yongsoo; Troncoso-Pastoriza, Juan; Jiang, Xiaoqian.

Cell Syst ; 12(11): 1108-1120.e4, 2021 11 17.

Artigo em Inglês | MEDLINE | ID: mdl-34464590

RESUMO

Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.

Assuntos

Serviços Terceirizados , Segurança Computacional , Estudo de Associação Genômica Ampla , Genótipo , Privacidade

Privacy-preserving approximate GWAS computation based on homomorphic encryption.

Kim, Duhyeong; Son, Yongha; Kim, Dongwoo; Kim, Andrey; Hong, Seungwan; Cheon, Jung Hee.

BMC Med Genomics ; 13(Suppl 7): 77, 2020 07 21.

Artigo em Inglês | MEDLINE | ID: mdl-32693801

RESUMO

BACKGROUND: One of three tasks in a secure genome analysis competition called iDASH 2018 was to develop a solution for privacy-preserving GWAS computation based on homomorphic encryption. The scenario is that a data holder encrypts a number of individual records, each of which consists of several phenotype and genotype data, and provide the encrypted data to an untrusted server. Then, the server performs a GWAS algorithm based on homomorphic encryption without the decryption key and outputs the result in encrypted state so that there is no information leakage on the sensitive data to the server. METHODS: We develop a privacy-preserving semi-parallel GWAS algorithm by applying an approximate homomorphic encryption scheme HEAAN. Fisher scoring and semi-parallel GWAS algorithms are modified to be efficiently computed over homomorphically encrypted data with several optimization methodologies; substitute matrix inversion by an adjoint matrix, avoid computing a superfluous matrix of super-large size, and transform the algorithm into an approximate version. RESULTS: Our modified semi-parallel GWAS algorithm based on homomorphic encryption which achieves 128-bit security takes 30-40 minutes for 245 samples containing 10,000-15,000 SNPs. Compared to the true p-value from the original semi-parallel GWAS algorithm, the F1 score of our p-value result is over 0.99. CONCLUSIONS: Privacy-preserving semi-parallel GWAS computation can be efficiently done based on homomorphic encryption with sufficiently high accuracy compared to the semi-parallel GWAS computation in unencrypted state.

Assuntos

Segurança Computacional , Estudo de Associação Genômica Ampla , Privacidade , Algoritmos , Genômica , Humanos , Polimorfismo de Nucleotídeo Único

A secure SNP panel scheme using homomorphically encrypted K-mers without SNP calling on the user side.

Park, Sungjoon; Kim, Minsu; Seo, Seokjun; Hong, Seungwan; Han, Kyoohyung; Lee, Keewoo; Cheon, Jung Hee; Kim, Sun.

BMC Genomics ; 20(Suppl 2): 188, 2019 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-30967116

RESUMO

BACKGROUND: Single Nucleotide Polymorphism (SNP) in the genome has become crucial information for clinical use. For example, the targeted cancer therapy is primarily based on the information which clinically important SNPs are detectable from the tumor. Many hospitals have developed their own panels that include clinically important SNPs. The genome information exchange between the patient and the hospital has become more popular. However, the genome sequence information is innate and irreversible and thus its leakage has serious consequences. Therefore, protecting one's genome information is critical. On the other side, hospitals may need to protect their own panels. There is no known secure SNP panel scheme to protect both. RESULTS: In this paper, we propose a secure SNP panel scheme using homomorphically encrypted K-mers without requiring SNP calling on the user side and without revealing the panel information to the user. Use of the powerful homomorphic encryption technique is desirable, but there is no known algorithm to efficiently align two homomorphically encrypted sequences. Thus, we designed and implemented a novel secure SNP panel scheme utilizing the computationally feasible equality test on two homomorphically encrypted K-mers. To make the scheme work correctly, in addition to SNPs in the panel, sequence variations at the population level should be addressed. We designed a concept of Point Deviation Tolerance (PDT) level to address the false positives and false negatives. Using the TCGA BRCA dataset, we demonstrated that our scheme works at the level of over a hundred thousand somatic mutations. In addition, we provide a computational guideline for the panel design, including the size of K-mer and the number of SNPs. CONCLUSIONS: The proposed method is the first of its kind to protect both the user's sequence and the hospital's panel information using the powerful homomorphic encryption scheme. We demonstrated that the scheme works with a simulated dataset and the TCGA BRCA dataset. In this study, we have shown only the feasibility of the proposed scheme and much more efforts should be done to make the scheme usable for clinical use.

Assuntos

Computação em Nuvem/normas , Segurança Computacional , Mineração de Dados/métodos , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Humanos

Logistic regression model training based on the approximate homomorphic encryption.

Kim, Andrey; Song, Yongsoo; Kim, Miran; Lee, Keewoo; Cheon, Jung Hee.

BMC Med Genomics ; 11(Suppl 4): 83, 2018 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-30309349

RESUMO

BACKGROUND: Security concerns have been raised since big data became a prominent tool in data analysis. For instance, many machine learning algorithms aim to generate prediction models using training data which contain sensitive information about individuals. Cryptography community is considering secure computation as a solution for privacy protection. In particular, practical requirements have triggered research on the efficiency of cryptographic primitives. METHODS: This paper presents a method to train a logistic regression model without information leakage. We apply the homomorphic encryption scheme of Cheon et al. (ASIACRYPT 2017) for an efficient arithmetic over real numbers, and devise a new encoding method to reduce storage of encrypted database. In addition, we adapt Nesterov's accelerated gradient method to reduce the number of iterations as well as the computational cost while maintaining the quality of an output classifier. RESULTS: Our method shows a state-of-the-art performance of homomorphic encryption system in a real-world application. The submission based on this work was selected as the best solution of Track 3 at iDASH privacy and security competition 2017. For example, it took about six minutes to obtain a logistic regression model given the dataset consisting of 1579 samples, each of which has 18 features with a binary outcome variable. CONCLUSIONS: We present a practical solution for outsourcing analysis tools such as logistic regression analysis while preserving the data confidentiality.

Assuntos

Segurança Computacional , Modelos Teóricos , Bases de Dados como Assunto , Modelos Logísticos

A Full RNS Variant of Approximate Homomorphic Encryption.

Cheon, Jung Hee; Han, Kyoohyung; Kim, Andrey; Kim, Miran; Song, Yongsoo.

Sel Areas Cryptogr ; 11349: 347-368, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-33870337

RESUMO

The technology of homomorphic encryption has improved rapidly in a few years. The cutting edge implementations are efficient enough to use in practical applications. Recently, Cheon et al. (ASI-ACRYPT'17) proposed a homomorphic encryption scheme which supports an arithmetic of approximate numbers over encryption. This scheme shows the current best performance in computation over the real numbers, but its implementation could not employ core optimization techniques based on the Residue Number System (RNS) decomposition and the Number Theoretic Transformation (NTT). In this paper, we present a variant of approximate homomorphic encryption which is optimal for implementation on standard computer system. We first introduce a new structure of ciphertext modulus which allows us to use both the RNS decomposition of cyclotomic polynomials and the NTT conversion on each of the RNS components. We also suggest new approximate modulus switching procedures without any RNS composition. Compared to previous exact algorithms requiring multi-precision arithmetic, our algorithms can be performed by using only word size (64-bit) operations. Our scheme achieves a significant performance gain from its full RNS implementation. For example, compared to the earlier implementation, our implementation showed speed-ups 17.3, 6.4, and 8.3 times for decryption, constant multiplication, and homomorphic multiplication, respectively, when the dimension of a cyclotomic ring is 32768. We also give experimental result for evaluations of some advanced circuits used in machine learning or statistical analysis. Finally, we demonstrate the practicability of our library by applying to machine learning algorithm. For example, our single core implementation takes 1.8 minutes to build a logistic regression model from encrypted data when the dataset consists of 575 samples, compared to the previous best result 3.5 minutes using four cores.

Secure searching of biomarkers through hybrid homomorphic encryption scheme.

Kim, Miran; Song, Yongsoo; Cheon, Jung Hee.

BMC Med Genomics ; 10(Suppl 2): 42, 2017 07 26.

Artigo em Inglês | MEDLINE | ID: mdl-28786366

RESUMO

BACKGROUND: As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. METHOD: We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. RESULT: Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. CONCLUSION: Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.

Assuntos

Segurança Computacional , Mineração de Dados/métodos , Genômica , Biomarcadores/metabolismo , Computação em Nuvem

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA