Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 183(4): 905-917.e16, 2020 11 12.
Artículo en Inglés | MEDLINE | ID: mdl-33186529

RESUMEN

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.


Asunto(s)
Seguridad Computacional , Genómica , Privacidad , Genoma Humano , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Fenotipo , Filogenia , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Análisis de la Célula Individual
2.
Cell ; 180(5): 915-927.e16, 2020 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-32084333

RESUMEN

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.


Asunto(s)
Genoma Humano/genética , Genómica/métodos , Mutación/genética , Neoplasias/genética , Análisis Mutacional de ADN/métodos , Progresión de la Enfermedad , Humanos , Neoplasias/patología , Secuenciación Completa del Genoma
3.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36384083

RESUMEN

BACKGROUND: Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization. RESULTS: Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352. CONCLUSIONS: Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations. SHORT ABSTRACT: Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.


Asunto(s)
Estudio de Asociación del Genoma Completo , Privacidad , Humanos , Genotipo , Privacidad Genética , Genoma
4.
J Neurooncol ; 168(3): 515-524, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38811523

RESUMEN

PURPOSE: Accurate classification of cancer subgroups is essential for precision medicine, tailoring treatments to individual patients based on their cancer subtypes. In recent years, advances in high-throughput sequencing technologies have enabled the generation of large-scale transcriptomic data from cancer samples. These data have provided opportunities for developing computational methods that can improve cancer subtyping and enable better personalized treatment strategies. METHODS: Here in this study, we evaluated different feature selection schemes in the context of meningioma classification. To integrate interpretable features from the bulk (n = 77 samples) and single-cell profiling (∼ 10 K cells), we developed an algorithm named CLIPPR which combines the top-performing single-cell models, RNA-inferred copy number variation (CNV) signals, and the initial bulk model to create a meta-model. RESULTS: While the scheme relying solely on bulk transcriptomic data showed good classification accuracy, it exhibited confusion between malignant and benign molecular classes in approximately ∼ 8% of meningioma samples. In contrast, models trained on features learned from meningioma single-cell data accurately resolved the sub-groups confused by bulk-transcriptomic data but showed limited overall accuracy. CLIPPR showed superior overall accuracy and resolved benign-malignant confusion as validated on n = 789 bulk meningioma samples gathered from multiple institutions. Finally, we showed the generalizability of our algorithm using our in-house single-cell (∼ 200 K cells) and bulk TCGA glioma data (n = 711 samples). CONCLUSION: Overall, our algorithm CLIPPR synergizes the resolution of single-cell data with the depth of bulk sequencing and enables improved cancer sub-group diagnoses and insights into their biology.


Asunto(s)
Algoritmos , Neoplasias Meníngeas , Meningioma , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Neoplasias Meníngeas/genética , Neoplasias Meníngeas/patología , Neoplasias Meníngeas/clasificación , Meningioma/genética , Meningioma/patología , Meningioma/clasificación , Análisis de Secuencia de ARN/métodos , Variaciones en el Número de Copia de ADN , Biomarcadores de Tumor/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos
5.
J Neurooncol ; 163(2): 397-405, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37318677

RESUMEN

INTRODUCTION: Meningiomas are the most common primary intracranial tumor. Recently, various genetic classification systems for meningioma have been described. We sought to identify clinical drivers of different molecular changes in meningioma. As such, clinical and genomic consequences of smoking in patients with meningiomas remain unexplored. METHODS: 88 tumor samples were analyzed in this study. Whole exome sequencing (WES) was used to assess somatic mutation burden. RNA sequencing data was used to identify differentially expressed genes (DEG) and genes sets (GSEA). RESULTS: Fifty-seven patients had no history of smoking, twenty-two were past smokers, and nine were current smokers. The clinical data showed no major differences in natural history across smoking status. WES revealed absence of AKT1 mutation rate in current or past smokers compared to non-smokers (p = 0.046). Current smokers had increased mutation rate in NOTCH2 compared to past and never smokers (p < 0.05). Mutational signature from current and past smokers showed disrupted DNA mismatch repair (cosine-similarity = 0.759 and 0.783). DEG analysis revealed the xenobiotic metabolic genes UGT2A1 and UGT2A2 were both significantly downregulated in current smokers compared to past (Log2FC = - 3.97, padj = 0.0347 and Log2FC = - 4.18, padj = 0.0304) and never smokers (Log2FC = - 3.86, padj = 0.0235 and Log2FC = - 4.20, padj = 0.0149). GSEA analysis of current smokers showed downregulation of xenobiotic metabolism and enrichment for G2M checkpoint, E2F targets, and mitotic spindle compared to past and never smokers (FDR < 25% each). CONCLUSION: In this study, we conducted a comparative analysis of meningioma patients based on their smoking history, examining both their clinical trajectories and molecular changes. Meningiomas from current smokers were more likely to harbor NOTCH2 mutations, and AKT1 mutations were absent in current or past smokers. Moreover, both current and past smokers exhibited a mutational signature associated with DNA mismatch repair. Meningiomas from current smokers demonstrate downregulation of xenobiotic metabolic enzymes UGT2A1 and UGT2A2, which are downregulated in other smoking related cancers. Furthermore, current smokers exhibited downregulation xenobiotic metabolic gene sets, as well as enrichment in gene sets related to mitotic spindle, E2F targets, and G2M checkpoint, which are hallmark pathways involved in cell division and DNA replication control. In aggregate, our results demonstrate novel alterations in meningioma molecular biology in response to systemic carcinogens.


Asunto(s)
Neoplasias Meníngeas , Meningioma , Humanos , Meningioma/genética , Meningioma/patología , Xenobióticos , Fumar/efectos adversos , Fumar/genética , Mutación , Genómica , Neoplasias Meníngeas/patología , Glucuronosiltransferasa/genética
6.
BMC Bioinformatics ; 23(1): 409, 2022 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-36182914

RESUMEN

BACKGROUND: Sequencing of thousands of samples provides genetic variants with allele frequencies spanning a very large spectrum and gives invaluable insight into genetic determinants of diseases. Protecting the genetic privacy of participants is challenging as only a few rare variants can easily re-identify an individual among millions. In certain cases, there are policy barriers against sharing genetic data from indigenous populations and stigmatizing conditions. RESULTS: We present SVAT, a method for secure outsourcing of variant annotation and aggregation, which are two basic steps in variant interpretation and detection of causal variants. SVAT uses homomorphic encryption to encrypt the data at the client-side. The data always stays encrypted while it is stored, in-transit, and most importantly while it is analyzed. SVAT makes use of a vectorized data representation to convert annotation and aggregation into efficient vectorized operations in a single framework. Also, SVAT utilizes a secure re-encryption approach so that multiple disparate genotype datasets can be combined for federated aggregation and secure computation of allele frequencies on the aggregated dataset. CONCLUSIONS: Overall, SVAT provides a secure, flexible, and practical framework for privacy-aware outsourcing of annotation, filtering, and aggregation of genetic variants. SVAT is publicly available for download from https://github.com/harmancilab/SVAT .


Asunto(s)
Nube Computacional , Servicios Externos , Seguridad Computacional , Frecuencia de los Genes , Genotipo , Humanos
7.
BMC Bioinformatics ; 23(1): 356, 2022 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-36038834

RESUMEN

BACKGROUND: The decreasing cost of DNA sequencing has led to a great increase in our knowledge about genetic variation. While population-scale projects bring important insight into genotype-phenotype relationships, the cost of performing whole-genome sequencing on large samples is still prohibitive. In-silico genotype imputation coupled with genotyping-by-arrays is a cost-effective and accurate alternative for genotyping of common and uncommon variants. Imputation methods compare the genotypes of the typed variants with the large population-specific reference panels and estimate the genotypes of untyped variants by making use of the linkage disequilibrium patterns. Most accurate imputation methods are based on the Li-Stephens hidden Markov model, HMM, that treats the sequence of each chromosome as a mosaic of the haplotypes from the reference panel. RESULTS: Here we assess the accuracy of vicinity-based HMMs, where each untyped variant is imputed using the typed variants in a small window around itself (as small as 1 centimorgan). Locality-based imputation is used recently by machine learning-based genotype imputation approaches. We assess how the parameters of the vicinity-based HMMs impact the imputation accuracy in a comprehensive set of benchmarks and show that vicinity-based HMMs can accurately impute common and uncommon variants. CONCLUSIONS: Our results indicate that locality-based imputation models can be effectively used for genotype imputation. The parameter settings that we identified can be used in future methods and vicinity-based HMMs can be used for re-structuring and parallelizing new imputation methods. The source code for the vicinity-based HMM implementations is publicly available at https://github.com/harmancilab/LoHaMMer .


Asunto(s)
Polimorfismo de Nucleótido Simple , Programas Informáticos , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Haplotipos , Desequilibrio de Ligamiento , Análisis de Secuencia de ADN/métodos
8.
BMC Genomics ; 23(1): 841, 2022 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-36539717

RESUMEN

BACKGROUND: RNA-sequencing has become a standard tool for analyzing gene activity in bulk samples and at the single-cell level. By increasing sample sizes and cell counts, this technique can uncover substantial information about cellular transcriptional states. Beyond quantification of gene expression, RNA-seq can be used for detecting variants, including single nucleotide polymorphisms, small insertions/deletions, and larger variants, such as copy number variants. Notably, joint analysis of variants with cellular transcriptional states may provide insights into the impact of mutations, especially for complex and heterogeneous samples. However, this analysis is often challenging due to a prohibitively high number of variants and cells, which are difficult to summarize and visualize. Further, there is a dearth of methods that assess and summarize the association between detected variants and cellular transcriptional states. RESULTS: Here, we introduce XCVATR (eXpressed Clusters of Variant Alleles in Transcriptome pRofiles), a method that identifies variants and detects local enrichment of expressed variants within embedding of samples and cells in single-cell and bulk RNA-seq datasets. XCVATR visualizes local "clumps" of small and large-scale variants and searches for patterns of association between each variant and cellular states, as described by the coordinates of cell embedding, which can be computed independently using any type of distance metrics, such as principal component analysis or t-distributed stochastic neighbor embedding. Through simulations and analysis of real datasets, we demonstrate that XCVATR can detect enrichment of expressed variants and provide insight into the transcriptional states of cells and samples. We next sequenced 2 new single cell RNA-seq tumor samples and applied XCVATR. XCVATR revealed subtle differences in CNV impact on tumors. CONCLUSIONS: XCVATR is publicly available to download from https://github.com/harmancilab/XCVATR .


Asunto(s)
Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Transcriptoma , RNA-Seq , Análisis de Secuencia de ARN/métodos , ARN/genética , Análisis de la Célula Individual/métodos
9.
Bioinformatics ; 36(4): 1014-1021, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31501853

RESUMEN

MOTIVATION: Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. RESULTS: We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. AVAILABILITY AND IMPLEMENTATION: EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Epigenómica , Secuencias Reguladoras de Ácidos Nucleicos , Epigénesis Genética , Histonas
10.
J Neurooncol ; 149(2): 219-230, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-32949309

RESUMEN

INTRODUCTION: Meningiomas are the most common primary intracranial tumor. Recent next generation sequencing analyses have elaborated the molecular drivers of this disease. We aimed to identify and characterize novel fusion genes in meningiomas. METHODS: We performed a secondary analysis of our RNA sequencing data of 145 primary meningioma from 140 patients to detect fusion genes. Semi-quantitative rt-PCR was performed to confirm transcription of the fusion genes in the original tumors. Whole exome sequencing was performed to identify copy number variations within each tumor sample. Comparative RNA seq analysis was performed to assess the clonality of the fusion constructs within the tumor. RESULTS: We detected six fusion events (NOTCH3-SETBP1, NF2-SPATA13, SLC6A3-AGBL3, PHF19-FOXP2 in two patients, and ITPK1-FBP2) in five out of 145 tumor samples. All but one event (NF2-SPATA13) led to extremely short reading frames, making these events de facto null alleles. Three of the five patients had a history of childhood radiation. Four out of six fusion events were detected in expression type C tumors, which represent the most aggressive meningioma. We validated the presence of the RNA transcripts in the tumor tissue by semi-quantitative RT PCR. All but the two PHF19-FOXP2 fusions demonstrated high degrees of clonality. CONCLUSIONS: Fusion genes occur infrequently in meningiomas and are more likely to be found in tumors with greater degree of genomic instability (expression type C) or in patients with history of cranial irradiation.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Meníngeas/genética , Meningioma/genética , Mutación , Proteínas de Fusión Oncogénica/genética , Adulto , Anciano , Estudios de Cohortes , Femenino , Estudios de Seguimiento , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Neoplasias Meníngeas/patología , Meningioma/patología , Persona de Mediana Edad , Pronóstico
11.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164755

RESUMEN

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Transcriptoma/genética , Animales , Caenorhabditis elegans/embriología , Caenorhabditis elegans/crecimiento & desarrollo , Cromatina/genética , Análisis por Conglomerados , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crecimiento & desarrollo , Modelos Genéticos , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Pupa/genética , Pupa/crecimiento & desarrollo , ARN no Traducido/genética , Análisis de Secuencia de ARN
12.
Nat Methods ; 13(3): 251-6, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26828419

RESUMEN

Studies on genomic privacy have traditionally focused on identifying individuals using DNA variants. In contrast, molecular phenotype data, such as gene expression levels, are generally assumed to be free of such identifying information. Although there is no explicit genotypic information in phenotype data, adversaries can statistically link phenotypes to genotypes using publicly available genotype-phenotype correlations such as expression quantitative trait loci (eQTLs). This linking can be accurate when high-dimensional data (i.e., many expression levels) are used, and the resulting links can then reveal sensitive information (for example, the fact that an individual has cancer). Here we develop frameworks for quantifying the leakage of characterizing information from phenotype data sets. These frameworks can be used to estimate the leakage from large data sets before release. We also present a general three-step procedure for practically instantiating linking attacks and a specific attack using outlier gene expression levels that is simple yet accurate. Finally, we describe the effectiveness of this outlier attack under different scenarios.


Asunto(s)
Pruebas Anónimas/métodos , Seguridad Computacional , Confidencialidad , Minería de Datos/métodos , Bases de Datos Genéticas , Privacidad Genética/organización & administración , Genotipo , Humanos , Fenotipo , Sitios de Carácter Cuantitativo/genética
16.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955619

RESUMEN

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Alelos , Línea Celular , Factor de Transcripción GATA1/metabolismo , Perfilación de la Expresión Génica , Genómica , Humanos , Células K562 , Especificidad de Órganos , Fosforilación/genética , Polimorfismo de Nucleótido Simple/genética , Mapas de Interacción de Proteínas , ARN no Traducido/genética , ARN no Traducido/metabolismo , Selección Genética/genética , Sitio de Iniciación de la Transcripción
17.
bioRxiv ; 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38496434

RESUMEN

Prior studies have described the complex interplay that exists between glioma cells and neurons, however, the electrophysiological properties endogenous to tumor cells remain obscure. To address this, we employed Patch-sequencing on human glioma specimens and found that one third of patched cells in IDH mutant (IDH mut ) tumors demonstrate properties of both neurons and glia by firing single, short action potentials. To define these hybrid cells (HCs) and discern if they are tumor in origin, we developed a computational tool, Single Cell Rule Association Mining (SCRAM), to annotate each cell individually. SCRAM revealed that HCs represent tumor and non-tumor cells that feature GABAergic neuron and oligodendrocyte precursor cell signatures. These studies are the first to characterize the combined electrophysiological and molecular properties of human glioma cells and describe a new cell type in human glioma with unique electrophysiological and transcriptomic properties that are likely also present in the non-tumor mammalian brain.

18.
Bioinformatics ; 28(17): 2267-9, 2012 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-22743228

RESUMEN

UNLABELLED: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. AVAILABILITY AND IMPLEMENTATION: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.


Asunto(s)
Genoma Humano , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Variación Genética , Genotipo , Humanos , Internet
19.
iScience ; 26(8): 107227, 2023 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-37529100

RESUMEN

Federated association testing is a powerful approach to conduct large-scale association studies where sites share intermediate statistics through a central server. There are, however, several standing challenges. Confounding factors like population stratification should be carefully modeled across sites. In addition, it is crucial to consider disease etiology using flexible models to prevent biases. Privacy protections for participants pose another significant challenge. Here, we propose distributed Mixed Effects Genome-wide Association study (dMEGA), a method that enables federated generalized linear mixed model-based association testing across multiple sites without explicitly sharing genotype and phenotype data. dMEGA employs a reference projection to correct for population-stratification and utilizes efficient local-gradient updates among sites, incorporating both fixed and random effects. The accuracy and efficiency of dMEGA are demonstrated through simulated and real datasets. dMEGA is publicly available at https://github.com/Li-Wentao/dMEGA.

20.
Genome Biol ; 24(1): 204, 2023 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-37697426

RESUMEN

Growing regulatory requirements set barriers around genetic data sharing and collaborations. Moreover, existing privacy-aware paradigms are challenging to deploy in collaborative settings. We present COLLAGENE, a tool base for building secure collaborative genomic data analysis methods. COLLAGENE protects data using shared-key homomorphic encryption and combines encryption with multiparty strategies for efficient privacy-aware collaborative method development. COLLAGENE provides ready-to-run tools for encryption/decryption, matrix processing, and network transfers, which can be immediately integrated into existing pipelines. We demonstrate the usage of COLLAGENE by building a practical federated GWAS protocol for binary phenotypes and a secure meta-analysis protocol. COLLAGENE is available at https://zenodo.org/record/8125935 .


Asunto(s)
Genómica , Privacidad , Análisis de Datos , Difusión de la Información , Fenotipo , Metaanálisis como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA