Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Biom J ; 63(2): 289-304, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33155717

RESUMO

In precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods. In addition, external information on both the drugs and the features is often available. We propose to model the drug responses through a linear regression with shrinkage enforced through a normal inverse Gaussian prior. We let the prior depend on the external information, and estimate the model and external information dependence in an empirical-variational Bayes framework. We demonstrate the usefulness of this model in both a simulated setting and in the publicly available Genomics of Drug Sensitivity in Cancer data.


Assuntos
Genômica , Preparações Farmacêuticas , Teorema de Bayes , Distribuição Normal , Medicina de Precisão
2.
Biometrics ; 75(4): 1288-1298, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31009060

RESUMO

Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypasses the exploration of the model space. Specifically, we introduce closed-form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes as required by modern applications. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.


Assuntos
Teorema de Bayes , Modelos Estatísticos , Distribuição Normal , Simulação por Computador , Perfilação da Expressão Gênica , Genes Neoplásicos , Genoma , Humanos
3.
Biom J ; 59(5): 932-947, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28393396

RESUMO

Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure that automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study, we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data.


Assuntos
Biometria/métodos , Modelos Estatísticos , Teorema de Bayes , Simulação por Computador , Redes Reguladoras de Genes , Projetos Piloto , Reprodutibilidade dos Testes
4.
Biostatistics ; 14(1): 113-28, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22988280

RESUMO

Next generation sequencing is quickly replacing microarrays as a technique to probe different molecular levels of the cell, such as DNA or RNA. The technology provides higher resolution, while reducing bias. RNA sequencing results in counts of RNA strands. This type of data imposes new statistical challenges. We present a novel, generic approach to model and analyze such data. Our approach aims at large flexibility of the likelihood (count) model and the regression model alike. Hence, a variety of count models is supported, such as the popular NB model, which accounts for overdispersion. In addition, complex, non-balanced designs and random effects are accommodated. Like some other methods, our method provides shrinkage of dispersion-related parameters. However, we extend it by enabling joint shrinkage of parameters, including those for which inference is desired. We argue that this is essential for Bayesian multiplicity correction. Shrinkage is effectuated by empirically estimating priors. We discuss several parametric (mixture) and non-parametric priors and develop procedures to estimate (parameters of) those. Inference is provided by means of local and Bayesian false discovery rates. We illustrate our method on several simulations and two data sets, also to compare it with other methods. Model- and data-based simulations show substantial improvements in the sensitivity at the given specificity. The data motivate the use of the ZI-NB as a powerful alternative to the NB, which results in higher detection rates for low-count data. Finally, compared with other methods, the results on small sample subsets are more reproducible when validated on their large sample complements, illustrating the importance of the type of shrinkage.


Assuntos
Teorema de Bayes , Interpretação Estatística de Dados , Modelos Estatísticos , RNA/química , Análise de Sequência de RNA/métodos , Sequência de Bases , Simulação por Computador , Dados de Sequência Molecular , RNA/genética
5.
Bioinformatics ; 29(8): 1081-2, 2013 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-23419375

RESUMO

SUMMARY: DNA copy number and mRNA expression are commonly used data types in cancer studies. Available software for integrative analysis arbitrarily fixes the parametric form of the association between the two molecular levels and hence offers no opportunities for modelling it. We present a new tool for flexible modelling of this association. PLRS uses a wide class of interpretable models including popular ones and incorporates prior biological knowledge. It is capable to identify the gene-specific type of relationship between gene copy number and mRNA expression. Moreover, it tests the strength of the association and provides confidence intervals. We illustrate PLRS using glioblastoma data from The Cancer Genome Atlas. AVAILABILITY AND IMPLEMENTATION: PLRS is implemented as an R package and available from Bioconductor (as of version 2.12; http://bioconductor.org). Additional code for parallel computations is available as Supplementary Material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Dosagem de Genes , RNA Mensageiro/metabolismo , Software , Variações do Número de Cópias de DNA , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Modelos Genéticos
6.
Proc Natl Acad Sci U S A ; 108(1): 220-5, 2011 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-21173219

RESUMO

Because invasive species threaten the integrity of natural ecosystems, a major goal in ecology is to develop predictive models to determine which species may become widespread and where they may invade. Indeed, considerable progress has been made in understanding the factors that influence the local pattern of spread for specific invaders and the factors that are correlated with the number of introduced species that have become established in a given region. However, few studies have examined the relative importance of multiple drivers of invasion success for widespread species at global scales. Here, we use a dataset of >5,000 presence/absence records to examine the interplay between climatic suitability, biotic resistance by native taxa, human-aided dispersal, and human modification of habitats, in shaping the distribution of one of the world's most notorious invasive species, the Argentine ant (Linepithema humile). Climatic suitability and the extent of human modification of habitats are primarily responsible for the distribution of this global invader. However, we also found some evidence for biotic resistance by native communities. Somewhat surprisingly, and despite the often cited importance of propagule pressure as a crucial driver of invasions, metrics of the magnitude of international traded commodities among countries were not related to global distribution patterns. Together, our analyses on the global-scale distribution of this invasive species provide strong evidence for the interplay of biotic and abiotic determinants of spread and also highlight the challenges of limiting the spread and subsequent impact of highly invasive species.


Assuntos
Formigas/crescimento & desenvolvimento , Clima , Ecologia/métodos , Ecossistema , Espécies Introduzidas/tendências , Modelos Biológicos , Animais , Comércio , Simulação por Computador , Bases de Dados Factuais , Geografia , Atividades Humanas , Humanos , Análise de Regressão
7.
Food Chem Toxicol ; 178: 113928, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37406754

RESUMO

Equivalence testing is an important component of safety assessments, used for example by the European Food Safety Authority, to allow new food or feed products on the market. The aim of such tests is to demonstrate equivalence of characteristics of test and reference crops. Equivalence tests are typically univariate and applied to each measured analyte (characteristic) separately without multiplicity correction. This increases the probability of making false claims of equivalence (type I errors) when evaluating multiple analytes simultaneously. To solve this problem, familywise error rate (FWER) control using Hochberg's method has been proposed. This paper demonstrates that, in the context of equivalence testing, other FWER-controlling methods are more powerful than Hochberg's. Particularly, it is shown that Hommel's method is guaranteed to perform at least as well as Hochberg's and that an "adaptive" version of Bonferroni's method, which uses an estimator of the proportion of non-equivalent characteristics, often substantially outperforms Hommel's method. Adaptive Bonferroni takes better advantage of the particular context of food safety where a large proportion of true equivalences is expected, a situation where other methods are particularly conservative. The different methods are illustrated by their application to two compositional datasets and further assessed and compared using simulated data.


Assuntos
Produtos Agrícolas , Inocuidade dos Alimentos , Probabilidade
8.
BMC Bioinformatics ; 13: 80, 2012 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-22559006

RESUMO

BACKGROUND: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms. RESULTS: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes. CONCLUSIONS: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.


Assuntos
Hibridização Genômica Comparativa , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Variações do Número de Cópias de DNA , Expressão Gênica , Genômica/métodos , Software
9.
Food Chem Toxicol ; 170: 113446, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36191656

RESUMO

Products for food and feed derived from genetically modified (GM) crops are only allowed on the market when they are deemed to be safe for human health and the environment. The European Food Safety Authority (EFSA) performs safety assessment including a comparative approach: the compositional characteristics of a GM genotype are compared to those of reference genotypes that have a history of safe use. Statistical equivalence tests are used to carry out such a comparative assessment. These tests are univariate and therefore only consider one measured variable at a time. Phenotypic data, however, often comprise measurements on multiple variables that must be integrated to arrive at a single decision on acceptance in the regulatory process. The surge of modern molecular phenotyping platforms further challenges this integration, due to the large number of characteristics measured on the plants. This paper presents a new multivariate equivalence test that naturally extends a recently proposed univariate equivalence test and allows to assess equivalence across all variables simultaneously. The proposed test is illustrated on plant compositional data from a field study on maize grain and on untargeted metabolomic data of potato tubers, while its performance is assessed on simulated data.


Assuntos
Alimentos Geneticamente Modificados , Humanos , Plantas Geneticamente Modificadas/genética , Inocuidade dos Alimentos , Produtos Agrícolas/genética , Zea mays/genética
10.
Biol Psychiatry ; 83(1): 70-80, 2018 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-28688579

RESUMO

BACKGROUND: Peripheral inflammation is often associated with major depressive disorder (MDD), and immunological biomarkers of depression remain a focus of investigation. METHODS: We used microarray data on whole blood from two independent case-control studies of MDD: the GlaxoSmithKline-High-Throughput Disease-specific target Identification Program [GSK-HiTDiP] study (113 patients and 57 healthy control subjects) and the Janssen-Brain Resource Company study (94 patients and 100 control subjects). Genome-wide differential gene expression analysis (18,863 probes) resulted in a p value for each gene in each study. A Bayesian method identified the largest p-value threshold (q = .025) associated with twice the number of genes differentially expressed in both studies compared with the number of coincidental case-control differences expected by chance. RESULTS: A total of 165 genes were differentially expressed in both studies with concordant direction of fold change. The 90 genes overexpressed (or UP genes) in MDD were significantly enriched for immune response to infection, were concentrated in a module of the gene coexpression network associated with innate immunity, and included clusters of genes with correlated expression in monocytes, monocyte-derived dendritic cells, and neutrophils. In contrast, the 75 genes underexpressed (or DOWN genes) in MDD were associated with the adaptive immune response and included clusters of genes with correlated expression in T cells, natural killer cells, and erythroblasts. Consistently, the MDD patients with overexpression of UP genes also had underexpression of DOWN genes (correlation > .70 in both studies). CONCLUSIONS: MDD was replicably associated with proinflammatory activation of the peripheral innate immune system, coupled with relative inactivation of the adaptive immune system, indicating the potential of transcriptional biomarkers for immunological stratification of patients with depression.


Assuntos
Transtorno Depressivo Maior/sangue , Transtorno Depressivo Maior/imunologia , Imunidade Inata , Biomarcadores/sangue , Estudos de Casos e Controles , Transtorno Depressivo Maior/genética , Expressão Gênica , Regulação da Expressão Gênica , Humanos , Imunidade Inata/genética , Análise em Microsséries , Transcriptoma , Encefalopatia de Wernicke
11.
Ann Appl Stat ; 11(1): 41-68, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-28408966

RESUMO

Reconstructing a gene network from high-throughput molecular data is an important but challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighbourhood of each node or gene. However, estimation of the many regularization parameters is often difficult and can result in large statistical uncertainties. In this paper we propose to combine local regularization with global shrinkage of the regularization parameters to borrow strength between genes and improve inference. We employ a simple Bayesian model with non-sparse, conjugate priors to facilitate the use of fast variational approximations to posteriors. We discuss empirical Bayes estimation of hyper-parameters of the priors, and propose a novel approach to rank-based posterior thresholding. Using extensive model- and data-based simulations, we demonstrate that the proposed inference strategy outperforms popular (sparse) methods, yields more stable edges, and is more reproducible. The proposed method, termed ShrinkNet, is then applied to Glioblastoma to investigate the interactions between genes associated with patient survival.

12.
Neurobiol Aging ; 34(7): 1825-36, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23428183

RESUMO

To characterize the promoterome of caudate and putamen regions (striatum), frontal and temporal cortices, and hippocampi from aged human brains, we used high-throughput cap analysis of gene expression to profile the transcription start sites and to quantify the differences in gene expression across the 5 brain regions. We also analyzed the extent to which methylation influenced the observed expression profiles. We sequenced more than 71 million cap analysis of gene expression tags corresponding to 70,202 promoter regions and 16,888 genes. More than 7000 transcripts were differentially expressed, mainly because of differential alternative promoter usage. Unexpectedly, 7% of differentially expressed genes were neurodevelopmental transcription factors. Functional pathway analysis on the differentially expressed genes revealed an overrepresentation of several signaling pathways (e.g., fibroblast growth factor and wnt signaling) in hippocampus and striatum. We also found that although 73% of methylation signals mapped within genes, the influence of methylation on the expression profile was small. Our study underscores alternative promoter usage as an important mechanism for determining the regional differences in gene expression at old age.


Assuntos
Envelhecimento/genética , Envelhecimento/patologia , Encéfalo/patologia , Encéfalo/fisiologia , Regulação da Expressão Gênica/fisiologia , Regiões Promotoras Genéticas/fisiologia , Idoso , Idoso de 80 Anos ou mais , Núcleo Caudado/patologia , Núcleo Caudado/fisiologia , Feminino , Lobo Frontal/patologia , Lobo Frontal/fisiologia , Hipocampo/patologia , Hipocampo/fisiologia , Humanos , Masculino , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/patologia , Putamen/patologia , Putamen/fisiologia , Lobo Temporal/patologia , Lobo Temporal/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA