Búsqueda | Portal de Búsqueda de la BVS España

Sketching algorithms for genomic data analysis and querying in a secure enclave.

Kockan, Can; Zhu, Kaiyuan; Dokmai, Natnatee; Karpov, Nikolai; Kulekci, M Oguzhan; Woodruff, David P; Sahinalp, S Cenk.

Nat Methods ; 17(3): 295-301, 2020 03.

Artículo en Inglés | MEDLINE | ID: mdl-32132732

RESUMEN

Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware-software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors-in particular, Intel's SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel 'sketching' algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner.

Asunto(s)

Biología Computacional/métodos , Estudio de Asociación del Genoma Completo , Genómica/métodos , Algoritmos , Variación Genética , Genoma Humano , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos

Structural variation and fusion detection using targeted sequencing data from circulating cell free DNA.

Gawronski, Alexander R; Lin, Yen-Yi; McConeghy, Brian; LeBihan, Stephane; Asghari, Hossein; Koçkan, Can; Orabi, Baraa; Adra, Nabil; Pili, Roberto; Collins, Colin C; Sahinalp, S Cenk; Hach, Faraz.

Nucleic Acids Res ; 47(7): e38, 2019 04 23.

Artículo en Inglés | MEDLINE | ID: mdl-30759232

RESUMEN

MOTIVATION: Cancer is a complex disease that involves rapidly evolving cells, often forming multiple distinct clones. In order to effectively understand progression of a patient-specific tumor, one needs to comprehensively sample tumor DNA at multiple time points, ideally obtained through inexpensive and minimally invasive techniques. Current sequencing technologies make the 'liquid biopsy' possible, which involves sampling a patient's blood or urine and sequencing the circulating cell free DNA (cfDNA). A certain percentage of this DNA originates from the tumor, known as circulating tumor DNA (ctDNA). The ratio of ctDNA may be extremely low in the sample, and the ctDNA may originate from multiple tumors or clones. These factors present unique challenges for applying existing tools and workflows to the analysis of ctDNA, especially in the detection of structural variations which rely on sufficient read coverage to be detectable. RESULTS: Here we introduce SViCT , a structural variation (SV) detection tool designed to handle the challenges associated with cfDNA analysis. SViCT can detect breakpoints and sequences of various structural variations including deletions, insertions, inversions, duplications and translocations. SViCT extracts discordant read pairs, one-end anchors and soft-clipped/split reads, assembles them into contigs, and re-maps contig intervals to a reference genome using an efficient k-mer indexing approach. The intervals are then joined using a combination of graph and greedy algorithms to identify specific structural variant signatures. We assessed the performance of SViCT and compared it to state-of-the-art tools using simulated cfDNA datasets with properties matching those of real cfDNA samples. The positive predictive value and sensitivity of our tool was superior to all the tested tools and reasonable performance was maintained down to the lowest dilution of 0.01% tumor DNA in simulated datasets. Additionally, SViCT was able to detect all known SVs in two real cfDNA reference datasets (at 0.6-5% ctDNA) and predict a novel structural variant in a prostate cancer cohort. AVAILABILITY: SViCT is available at https://github.com/vpc-ccg/svict. Contact:faraz.hach@ubc.ca.

Asunto(s)

Algoritmos , Ácidos Nucleicos Libres de Células/sangre , Ácidos Nucleicos Libres de Células/genética , Análisis Mutacional de ADN/métodos , Mutación , ADN Tumoral Circulante/sangre , ADN Tumoral Circulante/genética , Conjuntos de Datos como Asunto , Humanos , Masculino , Neoplasias de la Próstata/genética , Sensibilidad y Especificidad

SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA.

Kockan, Can; Hach, Faraz; Sarrafi, Iman; Bell, Robert H; McConeghy, Brian; Beja, Kevin; Haegert, Anne; Wyatt, Alexander W; Volik, Stanislav V; Chi, Kim N; Collins, Colin C; Sahinalp, S Cenk.

Bioinformatics ; 33(1): 26-34, 2017 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-27531099

RESUMEN

MOTIVATION: Successful development and application of precision oncology approaches require robust elucidation of the genomic landscape of a patient's cancer and, ideally, the ability to monitor therapy-induced genomic changes in the tumour in an inexpensive and minimally invasive manner. Thanks to recent advances in sequencing technologies, 'liquid biopsy', the sampling of patient's bodily fluids such as blood and urine, is considered as one of the most promising approaches to achieve this goal. In many cancer patients, and especially those with advanced metastatic disease, deep sequencing of circulating cell free DNA (cfDNA) obtained from patient's blood yields a mixture of reads originating from the normal DNA and from multiple tumour subclones-called circulating tumour DNA or ctDNA. The ctDNA/cfDNA ratio as well as the proportion of ctDNA originating from specific tumour subclones depend on multiple factors, making comprehensive detection of mutations difficult, especially at early stages of cancer. Furthermore, sensitive and accurate detection of single nucleotide variants (SNVs) and indels from cfDNA is constrained by several factors such as the sequencing errors and PCR artifacts, and mapping errors related to repeat regions within the genome. In this article, we introduce SiNVICT, a computational method that increases the sensitivity and specificity of SNV and indel detection at very low variant allele frequencies. SiNVICT has the capability to handle multiple sequencing platforms with different error properties; it minimizes false positives resulting from mapping errors and other technology specific artifacts including strand bias and low base quality at read ends. SiNVICT also has the capability to perform time-series analysis, where samples from a patient sequenced at multiple time points are jointly examined to report locations of interest where there is a possibility that certain clones were wiped out by some treatment while some subclones gained selective advantage. RESULTS: We tested SiNVICT on simulated data as well as prostate cancer cell lines and cfDNA obtained from castration-resistant prostate cancer patients. On both simulated and biological data, SiNVICT was able to detect SNVs and indels with variant allele percentages as low as 0.5%. The lowest amounts of total DNA used for the biological data where SNVs and indels could be detected with very high sensitivity were 2.5 ng on the Ion Torrent platform and 10 ng on Illumina. With increased sequencing and mapping accuracy, SiNVICT might be utilized in clinical settings, making it possible to track the progress of point mutations and indels that are associated with resistance to cancer therapies and provide patients personalized treatment. We also compared SiNVICT with other popular SNV callers such as MuTect, VarScan2 and Freebayes. Our results show that SiNVICT performs better than these tools in most cases and allows further data exploration such as time-series analysis on cfDNA sequencing data. AVAILABILITY AND IMPLEMENTATION: SiNVICT is available at: https://sfu-compbio.github.io/sinvictSupplementary information: Supplementary data are available at Bioinformatics online. CONTACT: cenk@sfu.ca.

Asunto(s)

Análisis Mutacional de ADN/métodos , ADN de Neoplasias/sangre , Mutación INDEL , Neoplasias/genética , Mutación Puntual , Programas Informáticos , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Masculino , Neoplasias/sangre , Sensibilidad y Especificidad

Toolkit for automated and rapid discovery of structural variants.

Soylev, Arda; Kockan, Can; Hormozdiari, Fereydoun; Alkan, Can.

Methods ; 129: 3-7, 2017 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-28583483

RESUMEN

Structural variations (SV) are broadly defined as genomic alterations that affect >50bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, Tardis, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. Tardis also has a modular structure that makes it easy to extend for the discovery of additional forms of SV.

Asunto(s)

Variación Estructural del Genoma/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/tendencias , Humanos , Análisis de Secuencia de ADN , Secuenciación Completa del Genoma

Privacy-preserving genotype imputation in a trusted execution environment.

Dokmai, Natnatee; Kockan, Can; Zhu, Kaiyuan; Wang, XiaoFeng; Sahinalp, S Cenk; Cho, Hyunghoon.

Cell Syst ; 12(10): 983-993.e7, 2021 10 20.

Artículo en Inglés | MEDLINE | ID: mdl-34450045

RESUMEN

Genotype imputation is an essential tool in genomics research, whereby missing genotypes are inferred using reference genomes to enhance downstream analyses. Recently, public imputation servers have allowed researchers to leverage large-scale genomic data resources for imputation. However, privacy concerns about uploading one's genetic data to a server limit the utility of these services. We introduce a secure hardware-based solution for privacy-preserving genotype imputation, which keeps the input genomes private by processing them within Intel SGX's trusted execution environment. Our solution features SMac, an efficient and secure imputation algorithm designed for Intel SGX, which employs a state-of-the-art imputation strategy also utilized by existing imputation servers. SMac achieves imputation accuracy equivalent to existing tools and provides protection against known side-channel attacks on SGX while maintaining scalability. We also show the necessity of our enhanced security by identifying vulnerabilities in existing imputation software. Our work represents a step toward privacy-preserving genomic analysis services.

Asunto(s)

Genómica , Privacidad , Algoritmos , Genotipo , Programas Informáticos

Privacy-Preserving Genotype Imputation in a Trusted Execution Environment.

Dokmai, Natnatee; Kockan, Can; Zhu, Kaiyuan; Wang, XiaoFeng; Sahinalp, S Cenk; Cho, Hyunghoon.

Res Comput Mol Biol ; 12(10): 983-993.e7, 2021 Oct 20.

Artículo en Inglés | MEDLINE | ID: mdl-34859247

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA