Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 210
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Stat Med ; 43(10): 1867-1882, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38409877

RESUMO

Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better model and predict the dynamics of an epidemic, and provide insight into the efficacy of control and intervention strategies. We present a method for likelihood-based estimation of parameters in the stochastic susceptible-infected-removed model under a time-inhomogeneous transmission rate comprised of piecewise constant components. In doing so, our method simultaneously learns change points in the transmission rate via a Markov chain Monte Carlo algorithm. The method targets the exact model posterior in a difficult missing data setting given only partially observed case counts over time. We validate performance on simulated data before applying our approach to data from an Ebola outbreak in Western Africa and COVID-19 outbreak on a university campus.


Assuntos
Epidemias , Doença pelo Vírus Ebola , Humanos , Funções Verossimilhança , Cadeias de Markov , Surtos de Doenças , Método de Monte Carlo , Teorema de Bayes , Processos Estocásticos
2.
Cereb Cortex ; 33(9): 5307-5322, 2023 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-36320163

RESUMO

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum. Adding cognitive metrics to the risk factor revealed the highest cumulative degree of connectivity for the pericalcarine cortex, insula, banks of the superior sulcus, and the cerebellum. To enable scaling up our approach, we extended tensor network principal component analysis, introducing CCA components. We developed sparse regression predictive models with errors of 17% for genotype, 24% for family risk factor for AD, and 5 years for age. Age prediction in groups including cognitively impaired subjects revealed regions not found using only normal subjects, i.e. middle and transverse temporal, paracentral and superior banks of temporal sulcus, as well as the amygdala and parahippocampal gyrus. These modeling approaches represent stepping stones towards single subject prediction.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/patologia , Imageamento por Ressonância Magnética , Encéfalo/patologia , Genótipo , Envelhecimento
3.
IEEE Trans Signal Process ; 72: 70-83, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38283047

RESUMO

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

4.
Neuroimage ; 276: 120214, 2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37286151

RESUMO

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.


Assuntos
Conectoma , Substância Branca , Humanos , Conectoma/métodos , Encéfalo/diagnóstico por imagem
5.
Bioinformatics ; 38(16): 4011-4018, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35762974

RESUMO

MOTIVATION: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. RESULTS: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications. AVAILABILITY AND IMPLEMENTATION: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Neuroimagem , Humanos , Encéfalo/diagnóstico por imagem , Software
6.
Biometrics ; 79(4): 2987-2997, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37431147

RESUMO

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP). The transmission rate model is further embedded in a hierarchy to allow information borrowing across parallel streams of regional incidence data. Crucially, the method makes use of optional vaccination data as a first step toward modeling of endemic infectious diseases. Computational techniques borrowed from the Bayesian spatial analysis literature enable fast and reliable posterior computation. Simulation studies reveal that the method recovers true covariate effects at nominal coverage levels. We analyze data from the COVID-19 pandemic and validate forecast intervals on held-out data. User-friendly software is provided to enable practitioners to easily deploy the method in public health research.


Assuntos
Doenças Transmissíveis , Pandemias , Humanos , Modelos Estatísticos , Modelos Epidemiológicos , Teorema de Bayes , Doenças Transmissíveis/epidemiologia , Previsões
7.
J Math Biol ; 85(4): 36, 2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36125562

RESUMO

The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of the epidemic and design of appropriate interventions. However, accurately inferring SIR model parameters in such scenarios is problematic. This article provides novel, theoretical insight on this issue of practical identifiability of the SIR model. Our theory provides new understanding of the inferential limits of routinely used epidemic models and provides a valuable addition to current simulate-and-check methods. We illustrate some practical implications through application to a real-world epidemic data set.


Assuntos
Doenças Transmissíveis , Epidemias , Doenças Transmissíveis/epidemiologia , Surtos de Doenças , Suscetibilidade a Doenças/epidemiologia , Modelos Epidemiológicos , Humanos
8.
Neuroimage ; 245: 118750, 2021 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-34823023

RESUMO

There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.


Assuntos
Encéfalo/crescimento & desenvolvimento , Encéfalo/fisiologia , Cognição/fisiologia , Conectoma/métodos , Imageamento por Ressonância Magnética , Adolescente , Algoritmos , Criança , Simulação por Computador , Conjuntos de Dados como Assunto , Feminino , Humanos , Imageamento Tridimensional , Masculino , Modelos Neurológicos , Dinâmica não Linear , Fenótipo , Leitura , Adulto Jovem
9.
Bioinformatics ; 36(11): 3522-3527, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32176244

RESUMO

MOTIVATION: Low-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data. RESULTS: The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumours. AVAILABILITY AND IMPLEMENTATION: Source code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies. CONTACT: aliverti@stat.unipd.it.


Assuntos
Células Endoteliais , Software , Algoritmos , Animais , Expressão Gênica , Perfilação da Expressão Gênica , Camundongos
10.
Blood ; 134(19): 1598-1607, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31558468

RESUMO

Burkitt lymphoma (BL) is an aggressive, MYC-driven lymphoma comprising 3 distinct clinical subtypes: sporadic BLs that occur worldwide, endemic BLs that occur predominantly in sub-Saharan Africa, and immunodeficiency-associated BLs that occur primarily in the setting of HIV. In this study, we comprehensively delineated the genomic basis of BL through whole-genome sequencing (WGS) of 101 tumors representing all 3 subtypes of BL to identify 72 driver genes. These data were additionally informed by CRISPR screens in BL cell lines to functionally annotate the role of oncogenic drivers. Nearly every driver gene was found to have both coding and non-coding mutations, highlighting the importance of WGS for identifying driver events. Our data implicate coding and non-coding mutations in IGLL5, BACH2, SIN3A, and DNMT1. Epstein-Barr virus (EBV) infection was associated with higher mutation load, with type 1 EBV showing a higher mutational burden than type 2 EBV. Although sporadic and immunodeficiency-associated BLs had similar genetic profiles, endemic BLs manifested more frequent mutations in BCL7A and BCL6 and fewer genetic alterations in DNMT1, SNTB2, and CTCF. Silencing mutations in ID3 were a common feature of all 3 subtypes of BL. In vitro, mass spectrometry-based proteomics demonstrated that the ID3 protein binds primarily to TCF3 and TCF4. In vivo knockout of ID3 potentiated the effects of MYC, leading to rapid tumorigenesis and tumor phenotypes consistent with those observed in the human disease.


Assuntos
Linfoma de Burkitt/genética , Sequenciamento Completo do Genoma/métodos , Animais , Humanos , Camundongos
11.
Neuroimage ; 197: 330-343, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-31029870

RESUMO

Advanced brain imaging techniques make it possible to measure individuals' structural connectomes in large cohort studies non-invasively. Given the availability of large scale data sets, it is extremely interesting and important to build a set of advanced tools for structural connectome extraction and statistical analysis that emphasize both interpretability and predictive power. In this paper, we developed and integrated a set of toolboxes, including an advanced structural connectome extraction pipeline and a novel tensor network principal components analysis (TN-PCA) method, to study relationships between structural connectomes and various human traits such as alcohol and drug use, cognition and motion abilities. The structural connectome extraction pipeline produces a set of connectome features for each subject that can be organized as a tensor network, and TN-PCA maps the high-dimensional tensor network data to a lower-dimensional Euclidean space. Combined with classical hypothesis testing, canonical correlation analysis and linear discriminant analysis techniques, we analyzed over 1100 scans of 1076 subjects from the Human Connectome Project (HCP) and the Sherbrooke test-retest data set, as well as 175 human traits measuring different domains including cognition, substance use, motor, sensory and emotion. The test-retest data validated the developed algorithms. With the HCP data, we found that structural connectomes are associated with a wide range of traits, e.g., fluid intelligence, language comprehension, and motor skills are associated with increased cortical-cortical brain structural connectivity, while the use of alcohol, tobacco, and marijuana are associated with decreased cortical-cortical connectivity. We also demonstrated that our extracted structural connectomes and analysis method can give superior prediction accuracies compared with alternative connectome constructions and other tensor and network regression methods.


Assuntos
Encéfalo/anatomia & histologia , Conectoma/métodos , Imagem de Tensor de Difusão/métodos , Processamento de Imagem Assistida por Computador/métodos , Personalidade/fisiologia , Encéfalo/diagnóstico por imagem , Interpretação Estatística de Dados , Feminino , Humanos , Masculino , Modelos Neurológicos , Vias Neurais/anatomia & histologia , Análise de Componente Principal
12.
Bioinformatics ; 34(14): 2457-2464, 2018 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-29506206

RESUMO

Motivation: Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional linear models, functional principal components regression and cluster-based approaches, such as latent trajectory analysis. This article is motivated by applications in which the dynamics in a predictor, across times when the value is relatively extreme, are particularly informative about the response. For example, physicians are interested in relating the dynamics of blood pressure changes during surgery to post-surgery adverse outcomes, and it is thought that the dynamics are more important when blood pressure is significantly elevated or lowered. Results: We propose a novel class of extrema-weighted feature (XWF) extraction models. Key components in defining XWFs include the marginal density of the predictor, a function up-weighting values at extreme quantiles of this marginal, and functionals characterizing local dynamics. Algorithms are proposed for fitting of XWF-based regression and classification models, and are compared with current methods for functional predictors in simulations and a blood pressure during surgery application. XWFs find features of intraoperative blood pressure trajectories that are predictive of postoperative mortality. By their nature, most of these features cannot be found by previous methods. Availability and implementation: The R package 'xwf' is available at the CRAN repository: https://cran.r-project.org/package=xwf. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Pressão Sanguínea , Biologia Computacional/métodos , Complicações Pós-Operatórias , Software , Algoritmos , Feminino , Humanos , Masculino , Resultado do Tratamento
13.
IEEE Trans Signal Process ; 67(7): 1929-1940, 2019 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-37216010

RESUMO

There is an increasing interest in learning a set of small outcome-relevant subgraphs in network-predictor regression. The extracted signal subgraphs can greatly improve the interpretation of the association between the network predictor and the response. In brain connectomics, the brain network for an individual corresponds to a set of interconnections among brain regions and there is a strong interest in linking the brain connectome to human cognitive traits. Modern neuroimaging technology allows a very fine segmentation of the brain, producing very large structural brain networks. Therefore, accurate and efficient methods for identifying a set of small predictive subgraphs become crucial, leading to discovery of key interconnected brain regions related to the trait and important insights on the mechanism of variation in human cognitive traits. We propose a symmetric bilinear model with L1 penalty to search for small clique subgraphs that contain useful information about the response. A coordinate descent algorithm is developed to estimate the model where we derive analytical solutions for a sequence of conditional convex optimizations. Application of this method on human connectome and language comprehension data shows interesting discovery of relevant interconnections among several small sets of brain regions and better predictive performance than competitors.

14.
Neuroimage ; 172: 130-145, 2018 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-29355769

RESUMO

Advances in understanding the structural connectomes of human brain require improved approaches for the construction, comparison and integration of high-dimensional whole-brain tractography data from a large number of individuals. This article develops a population-based structural connectome (PSC) mapping framework to address these challenges. PSC simultaneously characterizes a large number of white matter bundles within and across different subjects by registering different subjects' brains based on coarse cortical parcellations, compressing the bundles of each connection, and extracting novel connection weights. A robust tractography algorithm and streamline post-processing techniques, including dilation of gray matter regions, streamline cutting, and outlier streamline removal are applied to improve the robustness of the extracted structural connectomes. The developed PSC framework can be used to reproducibly extract binary networks, weighted networks and streamline-based brain connectomes. We apply the PSC to Human Connectome Project data to illustrate its application in characterizing normal variations and heritability of structural connectomes in healthy subjects.


Assuntos
Encéfalo/diagnóstico por imagem , Conectoma/métodos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Humanos
15.
Bioinformatics ; 33(12): 1859-1866, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28165112

RESUMO

MOTIVATION: There is increasing interest in learning how human brain networks vary as a function of a continuous trait, but flexible and efficient procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which combines low-rank factorizations and flexible Gaussian process priors to learn changes in the conditional expectation of a network-valued random variable across the values of a continuous predictor, while including subject-specific random effects. RESULTS: The formulation leads to a general framework for inference on changes in brain network structures across human traits, facilitating borrowing of information and coherently characterizing uncertainty. We provide an efficient Gibbs sampler for posterior computation along with simple procedures for inference, prediction and goodness-of-fit assessments. The model is applied to learn how human brain networks vary across individuals with different intelligence scores. Results provide interesting insights on the association between intelligence and brain connectivity, while demonstrating good predictive performance. AVAILABILITY AND IMPLEMENTATION: Source code implemented in R and data are available at https://github.com/wangronglu/BNRR. CONTACT: rl.wang@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Encéfalo/anatomia & histologia , Biologia Computacional/métodos , Modelos Biológicos , Rede Nervosa/anatomia & histologia , Software , Algoritmos , Teorema de Bayes , Encéfalo/fisiologia , Simulação por Computador , Humanos , Rede Nervosa/fisiologia
16.
Biometrics ; 74(4): 1331-1340, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29894557

RESUMO

There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus in these studies is inference on dose levels associated with a given increase in risk relative to a baseline. In addressing this goal, popular methods either dichotomize the continuous response or focus on modeling changes with the dose in the expectation of the outcome. Such choices may lead to information loss and provide inaccurate inference on dose-response relationships. We instead propose a Bayesian convex mixture regression model that allows the entire distribution of the health outcome to be unknown and changing with the dose. To balance flexibility and parsimony, we rely on a mixture model for the density at the extreme doses, and express the conditional density at each intermediate dose via a convex combination of these extremal densities. This representation generalizes classical dose-response models for quantitative outcomes, and provides a more parsimonious, but still powerful, formulation compared to nonparametric methods, thereby improving interpretability and efficiency in inference on risk functions. A Markov chain Monte Carlo algorithm for posterior inference is developed, and the benefits of our methods are outlined in simulations, along with a study on the impact of dde exposure on gestational age.


Assuntos
Biometria/métodos , Simulação por Computador/estatística & dados numéricos , Análise de Regressão , Medição de Risco/estatística & dados numéricos , Teorema de Bayes , Exposição Ambiental , Feminino , Idade Gestacional , Humanos , Avaliação de Resultados em Cuidados de Saúde , Gravidez , Efeitos Tardios da Exposição Pré-Natal , Medição de Risco/métodos
17.
Ecol Lett ; 20(5): 561-576, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28317296

RESUMO

Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and statistical approaches in community ecology, we propose Hierarchical Modelling of Species Communities (HMSC) as a general, flexible framework for modern analysis of community data. While non-manipulative data allow for only correlative and not causal inference, this framework facilitates the formulation of data-driven hypotheses regarding the processes that structure communities. We model environmental filtering by variation and covariation in the responses of individual species to the characteristics of their environment, with potential contingencies on species traits and phylogenetic relationships. We capture biotic assembly rules by species-to-species association matrices, which may be estimated at multiple spatial or temporal scales. We operationalise the HMSC framework as a hierarchical Bayesian joint species distribution model, and implement it as R- and Matlab-packages which enable computationally efficient analyses of large data sets. Armed with this tool, community ecologists can make sense of many types of data, including spatially explicit data and time-series data. We illustrate the use of this framework through a series of diverse ecological examples.


Assuntos
Biodiversidade , Ecossistema , Modelos Teóricos , Software , Teorema de Bayes
18.
Proc Biol Sci ; 284(1855)2017 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-28539525

RESUMO

Estimation of intra- and interspecific interactions from time-series on species-rich communities is challenging due to the high number of potentially interacting species pairs. The previously proposed sparse interactions model overcomes this challenge by assuming that most species pairs do not interact. We propose an alternative model that does not assume that any of the interactions are necessarily zero, but summarizes the influences of individual species by a small number of community-level drivers. The community-level drivers are defined as linear combinations of species abundances, and they may thus represent e.g. the total abundance of all species or the relative proportions of different functional groups. We show with simulated and real data how our approach can be used to compare different hypotheses on community structure. In an empirical example using aquatic microorganisms, the community-level drivers model clearly outperformed the sparse interactions model in predicting independent validation data.


Assuntos
Biota , Ecologia/métodos , Modelos Biológicos , Simulação por Computador , Microbiologia da Água
19.
Biometrics ; 73(3): 1018-1028, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28083869

RESUMO

High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These genomic variables can naturally be grouped by the gene they encode, among other criteria. However, standard practice in such applications is independent screening with a universal correction for multiplicity. We propose a Bayesian approach in which the prior probability of an association for a given genomic variable depends on its gene, and the gene-specific probabilities are modeled nonparametrically. This hierarchical model allows for appropriate gene and genome-wide multiplicity adjustments, and can be incorporated into a variety of Bayesian association screening methodologies with negligible increase in computational complexity. We describe an application to screening for differences in DNA methylation between lower grade glioma and glioblastoma multiforme tumor samples from The Cancer Genome Atlas. Software is available via the package BayesianScreening for R: github.com/lockEF/BayesianScreening.


Assuntos
Genoma , Teorema de Bayes , Ilhas de CpG , Metilação de DNA , Epigênese Genética , Epigenômica , Glioblastoma , Humanos
20.
Ann Stat ; 45(1): 1-38, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29332971

RESUMO

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA