Búsqueda | Portal Regional de la BVS

Recurrent repeat expansions in human cancer genomes.

Erwin, Graham S; Gürsoy, Gamze; Al-Abri, Rashid; Suriyaprakash, Ashwini; Dolzhenko, Egor; Zhu, Kevin; Hoerner, Christian R; White, Shannon M; Ramirez, Lucia; Vadlakonda, Ananya; Vadlakonda, Alekhya; von Kraut, Konor; Park, Julia; Brannon, Charlotte M; Sumano, Daniel A; Kirtikar, Raushun A; Erwin, Alicia A; Metzner, Thomas J; Yuen, Ryan K C; Fan, Alice C; Leppert, John T; Eberle, Michael A; Gerstein, Mark; Snyder, Michael P.

Nature ; 613(7942): 96-102, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36517591

RESUMEN

Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases1,2. However, repeat expansions are often not explored beyond neurological and neurodegenerative disorders. In some cancers, mutations accumulate in short tracts of TRs, a phenomenon termed microsatellite instability; however, larger repeat expansions have not been systematically analysed in cancer3-8. Here we identified TR expansions in 2,622 cancer genomes spanning 29 cancer types. In seven cancer types, we found 160 recurrent repeat expansions (rREs), most of which (155/160) were subtype specific. We found that rREs were non-uniformly distributed in the genome with enrichment near candidate cis-regulatory elements, suggesting a potential role in gene regulation. One rRE, a GAAA-repeat expansion, located near a regulatory element in the first intron of UGT2B7 was detected in 34% of renal cell carcinoma samples and was validated by long-read DNA sequencing. Moreover, in preliminary experiments, treating cells that harbour this rRE with a GAAA-targeting molecule led to a dose-dependent decrease in cell proliferation. Overall, our results suggest that rREs may be an important but unexplored source of genetic variation in human cancer, and we provide a comprehensive catalogue for further study.

Asunto(s)

Expansión de las Repeticiones de ADN , Genoma Humano , Neoplasias , Humanos , Secuencia de Bases , Expansión de las Repeticiones de ADN/genética , Genoma Humano/genética , Neoplasias/clasificación , Neoplasias/genética , Neoplasias/patología , Análisis de Secuencia de ADN , Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción/genética , Intrones/genética , Carcinoma de Células Renales/genética , Carcinoma de Células Renales/patología , Proliferación Celular/efectos de los fármacos , Reproducibilidad de los Resultados

Storing and analyzing a genome on a blockchain.

Gürsoy, Gamze; Brannon, Charlotte M; Ni, Eric; Wagner, Sarah; Khanna, Amol; Gerstein, Mark.

Genome Biol ; 23(1): 134, 2022 06 29.

Artículo en Inglés | MEDLINE | ID: mdl-35765079

RESUMEN

There are major efforts underway to make genome sequencing a routine part of clinical practice. A critical barrier to these is achieving practical solutions for data ownership and integrity. Blockchain provides solutions to these challenges in other realms, such as finance. However, its use in genomics is stymied due to the difficulty in storing large-scale data on-chain, slow transaction speeds, and limitations on querying. To overcome these roadblocks, we developed a private blockchain network to store genomic variants and reference-aligned reads on-chain. It uses nested database indexing with an accompanying tool suite to rapidly access and analyze the data.

Asunto(s)

Cadena de Bloques , Genoma , Genómica

Privacy-preserving genotype imputation with fully homomorphic encryption.

Gürsoy, Gamze; Chielle, Eduardo; Brannon, Charlotte M; Maniatakos, Michail; Gerstein, Mark.

Cell Syst ; 13(2): 173-182.e3, 2022 02 16.

Artículo en Inglés | MEDLINE | ID: mdl-34758288

RESUMEN

Genotype imputation is the inference of unknown genotypes using known population structure observed in large genomic datasets; it can further our understanding of phenotype-genotype relationships and is useful for QTL mapping and GWASs. However, the compute-intensive nature of genotype imputation can overwhelm local servers for computation and storage. Hence, many researchers are moving toward using cloud services, raising privacy concerns. We address these concerns by developing an efficient, privacy-preserving algorithm called p-Impute. Our method uses homomorphic encryption, allowing calculations on ciphertext, thereby avoiding the decryption of private genotypes in the cloud. It is similar to k-nearest neighbor approaches, inferring missing genotypes in a genomic block based on the SNP genotypes of genetically related individuals in the same block. Our results demonstrate accuracy in agreement with the state-of-the-art plaintext solutions. Moreover, p-Impute is scalable to real-world applications as its memory and time requirements increase linearly with the increasing number of samples. p-Impute is freely available for download here: https://doi.org/10.5281/zenodo.5542001.

Asunto(s)

Seguridad Computacional , Privacidad , Nube Computacional , Estudio de Asociación del Genoma Completo , Genotipo

Functional genomics data: privacy risk assessment and technological mitigation.

Gürsoy, Gamze; Li, Tianxiao; Liu, Susanna; Ni, Eric; Brannon, Charlotte M; Gerstein, Mark B.

Nat Rev Genet ; 23(4): 245-258, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-34759381

RESUMEN

The generation of functional genomics data by next-generation sequencing has increased greatly in the past decade. Broad sharing of these data is essential for research advancement but poses notable privacy challenges, some of which are analogous to those that occur when sharing genetic variant data. However, there are also unique privacy challenges that arise from cryptic information leakage during the processing and summarization of functional genomics data from raw reads to derived quantities, such as gene expression values. Here, we review these challenges and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.

Asunto(s)

Privacidad Genética , Privacidad , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Medición de Riesgo

Author Correction: Functional genomics data: privacy risk assessment and technological mitigation.

Gürsoy, Gamze; Li, Tianxiao; Liu, Susanna; Ni, Eric; Brannon, Charlotte M; Gerstein, Mark B.

Nat Rev Genet ; 23(4): 259, 2022 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-34811555

FANCY: fast estimation of privacy risk in functional genomics data.

Gürsoy, Gamze; Brannon, Charlotte M; Navarro, Fabio C P; Gerstein, Mark.

Bioinformatics ; 36(21): 5145-5150, 2021 01 29.

Artículo en Inglés | MEDLINE | ID: mdl-32726397

RESUMEN

MOTIVATION: Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. RESULTS: FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R2 for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low. AVAILABILITY AND IMPLEMENTATION: A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Privacidad , Programas Informáticos , Genómica , Humanos , RNA-Seq , Secuenciación del Exoma

Data Sanitization to Reduce Private Information Leakage from Functional Genomics.

Gürsoy, Gamze; Emani, Prashant; Brannon, Charlotte M; Jolanki, Otto A; Harmanci, Arif; Strattan, J Seth; Cherry, J Michael; Miranker, Andrew D; Gerstein, Mark.

Cell ; 183(4): 905-917.e16, 2020 11 12.

Artículo en Inglés | MEDLINE | ID: mdl-33186529

RESUMEN

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.

Asunto(s)

Seguridad Computacional , Genómica , Privacidad , Genoma Humano , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Fenotipo , Filogenia , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Análisis de la Célula Individual

Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts.

Gürsoy, Gamze; Brannon, Charlotte M; Gerstein, Mark.

BMC Med Genomics ; 13(1): 74, 2020 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-32487214

RESUMEN

BACKGROUND: As pharmacogenomics data becomes increasingly integral to clinical treatment decisions, appropriate data storage and sharing protocols need to be adopted. One promising option for secure, high-integrity storage and sharing is Ethereum smart contracts. Ethereum is a blockchain platform, and smart contracts are immutable pieces of code running on virtual machines in this platform that can be invoked by a user or another contract (in the blockchain network). The 2019 iDASH (Integrating Data for Analysis, Anonymization, and Sharing) competition for Secure Genome Analysis challenged participants to develop time- and space-efficient Ethereum smart contracts for gene-drug relationship data. METHODS: Here we design a specific smart contract to store and query gene-drug interactions in Ethereum using an index-based, multi-mapping approach. Our contract stores each pharmacogenomics observation, a gene-variant-drug triplet with outcome, in a mapping searchable by a unique identifier, allowing for time and space efficient storage and query. This solution ranked in the top three at the 2019 IDASH competition. We further improve our "challenge solution" and develop an alternate "fastQuery" smart contract, which combines together identical gene-variant-drug combinations into a single storage entry, leading to significantly better scalability and query efficiency. RESULTS: On a private, proof-of-authority network, both our challenge and fastQuery solutions exhibit approximately linear memory and time usage for inserting into and querying small databases (<1,000 entries). For larger databases (1000 to 10,000 entries), fastQuery maintains this scaling. Furthermore, both solutions can query by a single field ("0-AND") or a combination of fields ("1- or 2-AND"). Specifically, the challenge solution can complete a 2-AND query from a small database (100 entries) in 35ms using 0.1 MB of memory. For the same query, fastQuery has a 2-fold improvement in time and a 10-fold improvement in memory. CONCLUSION: We show that pharmacogenomics data can be stored and queried efficiently using Ethereum blockchain. Our solutions could potentially be used to store a range of clinical data and extended to other fields requiring high-integrity data storage and efficient access.

Asunto(s)

Algoritmos , Cadena de Bloques/normas , Toma de Decisiones , Genes , Almacenamiento y Recuperación de la Información/métodos , Preparaciones Farmacéuticas/análisis , Farmacogenética , Atención a la Salud , Humanos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA