Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 35(14): i218-i224, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510659

RESUMO

MOTIVATION: Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. RESULTS: We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, principal component analysis and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction. AVAILABILITY AND IMPLEMENTATION: Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer.


Assuntos
Aprendizado de Máquina , Humanos , Neoplasias
2.
PeerJ ; 5: e3742, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28970965

RESUMO

We have developed a machine learning approach to predict stimulation-dependent enhancer-promoter interactions using evidence from changes in genomic protein occupancy over time. The occupancy of estrogen receptor alpha (ERα), RNA polymerase (Pol II) and histone marks H2AZ and H3K4me3 were measured over time using ChIP-Seq experiments in MCF7 cells stimulated with estrogen. A Bayesian classifier was developed which uses the correlation of temporal binding patterns at enhancers and promoters and genomic proximity as features to predict interactions. This method was trained using experimentally determined interactions from the same system and was shown to achieve much higher precision than predictions based on the genomic proximity of nearest ERα binding. We use the method to identify a genome-wide confident set of ERα target genes and their regulatory enhancers genome-wide. Validation with publicly available GRO-Seq data demonstrates that our predicted targets are much more likely to show early nascent transcription than predictions based on genomic ERα binding proximity alone.

3.
Bioinformatics ; 32(12): i147-i155, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307611

RESUMO

MOTIVATION: Alternative splicing is an important mechanism in which the regions of pre-mRNAs are differentially joined in order to form different transcript isoforms. Alternative splicing is involved in the regulation of normal physiological functions but also linked to the development of diseases such as cancer. We analyse differential expression and splicing using RNA-sequencing time series in three different settings: overall gene expression levels, absolute transcript expression levels and relative transcript expression levels. RESULTS: Using estrogen receptor α signaling response as a model system, our Gaussian process-based test identifies genes with differential splicing and/or differentially expressed transcripts. We discover genes with consistent changes in alternative splicing independent of changes in absolute expression and genes where some transcripts change whereas others stay constant in absolute level. The results suggest classes of genes with different modes of alternative splicing regulation during the experiment. AVAILABILITY AND IMPLEMENTATION: R and Matlab codes implementing the method are available at https://github.com/PROBIC/diffsplicing An interactive browser for viewing all model fits is available at http://users.ics.aalto.fi/hande/splicingGP/ CONTACT: hande.topa@helsinki.fi or antti.honkela@helsinki.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Perfilação da Expressão Gênica , Humanos , Isoformas de Proteínas , Precursores de RNA , Análise de Sequência de RNA
4.
Proc Natl Acad Sci U S A ; 112(42): 13115-20, 2015 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-26438844

RESUMO

Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor α activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.


Assuntos
Genoma Humano , Modelos Genéticos , RNA/biossíntese , Transcrição Gênica , Receptor alfa de Estrogênio/metabolismo , Humanos , Cinética , Células MCF-7 , RNA/genética , Transdução de Sinais
5.
Nat Biotechnol ; 32(12): 1202-12, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24880487

RESUMO

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.


Assuntos
Antineoplásicos/uso terapêutico , Resistencia a Medicamentos Antineoplásicos/genética , Perfilação da Expressão Gênica , Neoplasias/tratamento farmacológico , Algoritmos , Antineoplásicos/efeitos adversos , Epigenômica/métodos , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Genômica/métodos , Humanos , Neoplasias/genética , Proteômica/métodos
6.
PLoS Comput Biol ; 10(5): e1003598, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24830797

RESUMO

Gene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2). The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ERα) and FOXA1 binding in their proximal promoter regions.


Assuntos
Imunoprecipitação da Cromatina/métodos , RNA Polimerases Dirigidas por DNA/genética , Modelos Genéticos , Modelos Estatísticos , Regiões Promotoras Genéticas/genética , Transcrição Gênica/genética , Ativação Transcricional/genética , Animais , Simulação por Computador , Humanos , Ligação Proteica
7.
Methods Mol Biol ; 939: 59-67, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23192541

RESUMO

Reverse engineering the gene regulatory network is challenging because the amount of available data is very limited compared to the complexity of the underlying network. We present a technique addressing this problem through focussing on a more limited problem: inferring direct targets of a transcription factor from short expression time series. The method is based on combining Gaussian process priors and ordinary differential equation models allowing inference on limited potentially unevenly sampled data. The method is implemented as an R/Bioconductor package, and it is demonstrated by ranking candidate targets of the p53 tumour suppressor.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Fatores de Transcrição/genética , Regulação da Expressão Gênica , Genoma Humano , Humanos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA/métodos , Software , Fatores de Transcrição/metabolismo , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo
8.
Inf Process Med Imaging ; 22: 735-47, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21761700

RESUMO

Extensive imaging is routinely used in brain tumor patients to monitor the state of the disease and to evaluate therapeutic options. A large number of multi-modal and multi-temporal image volumes is acquired in standard clinical cases, requiring new approaches for comprehensive integration of information from different image sources and different time points. In this work we propose a joint generative model of tumor growth and of image observation that naturally handles multimodal and longitudinal data. We use the model for analyzing imaging data in patients with glioma. The tumor growth model is based on a reaction-diffusion framework. Model personalization relies only on a forward model for the growth process and on image likelihood. We take advantage of an adaptive sparse grid approximation for efficient inference via Markov Chain Monte Carlo sampling. The approach can be used for integrating information from different multi-modal imaging protocols and can easily be adapted to other tumor growth models.


Assuntos
Algoritmos , Neoplasias Encefálicas/patologia , Glioma/patologia , Interpretação de Imagem Assistida por Computador/métodos , Modelos Biológicos , Proliferação de Células , Simulação por Computador , Humanos , Aumento da Imagem/métodos , Imageamento por Ressonância Magnética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA