Search | VHL Search Portal

1.

Identifying vulnerable brain networks associated with Alzheimer's disease risk.

Mahzarnia, Ali; Stout, Jacques A; Anderson, Robert J; Moon, Hae Sol; Yar Han, Zay; Beck, Kate; Browndyke, Jeffrey N; Dunson, David B; Johnson, Kim G; O'Brien, Richard J; Badea, Alexandra.

Cereb Cortex ; 33(9): 5307-5322, 2023 04 25.

Article in English | MEDLINE | ID: mdl-36320163

ABSTRACT

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum. Adding cognitive metrics to the risk factor revealed the highest cumulative degree of connectivity for the pericalcarine cortex, insula, banks of the superior sulcus, and the cerebellum. To enable scaling up our approach, we extended tensor network principal component analysis, introducing CCA components. We developed sparse regression predictive models with errors of 17% for genotype, 24% for family risk factor for AD, and 5 years for age. Age prediction in groups including cognitively impaired subjects revealed regions not found using only normal subjects, i.e. middle and transverse temporal, paracentral and superior banks of temporal sulcus, as well as the amygdala and parahippocampal gyrus. These modeling approaches represent stepping stones towards single subject prediction.

Subject(s)

Alzheimer Disease , Humans , Alzheimer Disease/pathology , Magnetic Resonance Imaging , Brain/pathology , Genotype , Aging

2.

Ellipsoid fitting with the Cayley transform.

Melikechi, Omar; Dunson, David B.

IEEE Trans Signal Process ; 72: 70-83, 2024.

Article in English | MEDLINE | ID: mdl-38283047

ABSTRACT

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

3.

PPA: Principal parcellation analysis for brain connectomes and multiple traits.

Liu, Rongjie; Li, Meng; Dunson, David B.

Neuroimage ; 276: 120214, 2023 08 01.

Article in English | MEDLINE | ID: mdl-37286151

ABSTRACT

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.

Subject(s)

Connectome , White Matter , Humans , Connectome/methods , Brain/diagnostic imaging

4.

Outlier detection for multi-network data.

Dey, Pritam; Zhang, Zhengwu; Dunson, David B.

Bioinformatics ; 38(16): 4011-4018, 2022 08 10.

Article in English | MEDLINE | ID: mdl-35762974

ABSTRACT

MOTIVATION: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. RESULTS: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications. AVAILABILITY AND IMPLEMENTATION: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Neuroimaging , Humans , Brain/diagnostic imaging , Software

5.

Explaining transmission rate variations and forecasting epidemic spread in multiple regions with a semiparametric mixed effects SIR model.

Buch, David A; Johndrow, James E; Dunson, David B.

Biometrics ; 79(4): 2987-2997, 2023 12.

Article in English | MEDLINE | ID: mdl-37431147

ABSTRACT

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP). The transmission rate model is further embedded in a hierarchy to allow information borrowing across parallel streams of regional incidence data. Crucially, the method makes use of optional vaccination data as a first step toward modeling of endemic infectious diseases. Computational techniques borrowed from the Bayesian spatial analysis literature enable fast and reliable posterior computation. Simulation studies reveal that the method recovers true covariate effects at nominal coverage levels. We analyze data from the COVID-19 pandemic and validate forecast intervals on held-out data. User-friendly software is provided to enable practitioners to easily deploy the method in public health research.

Subject(s)

Communicable Diseases , Pandemics , Humans , Models, Statistical , Epidemiological Models , Bayes Theorem , Communicable Diseases/epidemiology , Forecasting

6.

Graph auto-encoding brain networks with applications to analyzing large-scale brain imaging datasets.

Liu, Meimei; Zhang, Zhengwu; Dunson, David B.

Neuroimage ; 245: 118750, 2021 12 15.

Article in English | MEDLINE | ID: mdl-34823023

ABSTRACT

There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.

Subject(s)

Brain/growth & development , Brain/physiology , Cognition/physiology , Connectome/methods , Magnetic Resonance Imaging , Adolescent , Algorithms , Child , Computer Simulation , Datasets as Topic , Female , Humans , Imaging, Three-Dimensional , Male , Models, Neurological , Nonlinear Dynamics , Phenotype , Reading , Young Adult

7.

Projected t-SNE for batch correction.

Aliverti, Emanuele; Tilson, Jeffrey L; Filer, Dayne L; Babcock, Benjamin; Colaneri, Alejandro; Ocasio, Jennifer; Gershon, Timothy R; Wilhelmsen, Kirk C; Dunson, David B.

Bioinformatics ; 36(11): 3522-3527, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32176244

ABSTRACT

MOTIVATION: Low-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data. RESULTS: The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumours. AVAILABILITY AND IMPLEMENTATION: Source code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies. CONTACT: aliverti@stat.unipd.it.

Subject(s)

Endothelial Cells , Software , Algorithms , Animals , Gene Expression , Gene Expression Profiling , Mice

8.

The whole-genome landscape of Burkitt lymphoma subtypes.

Panea, Razvan I; Love, Cassandra L; Shingleton, Jennifer R; Reddy, Anupama; Bailey, Jeffrey A; Moormann, Ann M; Otieno, Juliana A; Ong'echa, John Michael; Oduor, Cliff I; Schroeder, Kristin M S; Masalu, Nestory; Chao, Nelson J; Agajanian, Megan; Major, Michael B; Fedoriw, Yuri; Richards, Kristy L; Rymkiewicz, Grzegorz; Miles, Rodney R; Alobeid, Bachir; Bhagat, Govind; Flowers, Christopher R; Ondrejka, Sarah L; Hsi, Eric D; Choi, William W L; Au-Yeung, Rex K H; Hartmann, Wolfgang; Lenz, Georg; Meyerson, Howard; Lin, Yen-Yu; Zhuang, Yuan; Luftig, Micah A; Waldrop, Alexander; Dave, Tushar; Thakkar, Devang; Sahay, Harshit; Li, Guojie; Palus, Brooke C; Seshadri, Vidya; Kim, So Young; Gascoyne, Randy D; Levy, Shawn; Mukhopadyay, Minerva; Dunson, David B; Dave, Sandeep S.

Blood ; 134(19): 1598-1607, 2019 11 07.

Article in English | MEDLINE | ID: mdl-31558468

ABSTRACT

Burkitt lymphoma (BL) is an aggressive, MYC-driven lymphoma comprising 3 distinct clinical subtypes: sporadic BLs that occur worldwide, endemic BLs that occur predominantly in sub-Saharan Africa, and immunodeficiency-associated BLs that occur primarily in the setting of HIV. In this study, we comprehensively delineated the genomic basis of BL through whole-genome sequencing (WGS) of 101 tumors representing all 3 subtypes of BL to identify 72 driver genes. These data were additionally informed by CRISPR screens in BL cell lines to functionally annotate the role of oncogenic drivers. Nearly every driver gene was found to have both coding and non-coding mutations, highlighting the importance of WGS for identifying driver events. Our data implicate coding and non-coding mutations in IGLL5, BACH2, SIN3A, and DNMT1. Epstein-Barr virus (EBV) infection was associated with higher mutation load, with type 1 EBV showing a higher mutational burden than type 2 EBV. Although sporadic and immunodeficiency-associated BLs had similar genetic profiles, endemic BLs manifested more frequent mutations in BCL7A and BCL6 and fewer genetic alterations in DNMT1, SNTB2, and CTCF. Silencing mutations in ID3 were a common feature of all 3 subtypes of BL. In vitro, mass spectrometry-based proteomics demonstrated that the ID3 protein binds primarily to TCF3 and TCF4. In vivo knockout of ID3 potentiated the effects of MYC, leading to rapid tumorigenesis and tumor phenotypes consistent with those observed in the human disease.

Subject(s)

Burkitt Lymphoma/genetics , Whole Genome Sequencing/methods , Animals , Humans , Mice

9.

Extrema-weighted feature extraction for functional data.

van den Boom, Willem; Mao, Callie; Schroeder, Rebecca A; Dunson, David B.

Bioinformatics ; 34(14): 2457-2464, 2018 07 15.

Article in English | MEDLINE | ID: mdl-29506206

ABSTRACT

Motivation: Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional linear models, functional principal components regression and cluster-based approaches, such as latent trajectory analysis. This article is motivated by applications in which the dynamics in a predictor, across times when the value is relatively extreme, are particularly informative about the response. For example, physicians are interested in relating the dynamics of blood pressure changes during surgery to post-surgery adverse outcomes, and it is thought that the dynamics are more important when blood pressure is significantly elevated or lowered. Results: We propose a novel class of extrema-weighted feature (XWF) extraction models. Key components in defining XWFs include the marginal density of the predictor, a function up-weighting values at extreme quantiles of this marginal, and functionals characterizing local dynamics. Algorithms are proposed for fitting of XWF-based regression and classification models, and are compared with current methods for functional predictors in simulations and a blood pressure during surgery application. XWFs find features of intraoperative blood pressure trajectories that are predictive of postoperative mortality. By their nature, most of these features cannot be found by previous methods. Availability and implementation: The R package 'xwf' is available at the CRAN repository: https://cran.r-project.org/package=xwf. Supplementary information: Supplementary data are available at Bioinformatics online.

Subject(s)

Blood Pressure , Computational Biology/methods , Postoperative Complications , Software , Algorithms , Female , Humans , Male , Treatment Outcome

10.

Bayesian network-response regression.

Wang, Lu; Durante, Daniele; Jung, Rex E; Dunson, David B.

Bioinformatics ; 33(12): 1859-1866, 2017 Jun 15.

Article in English | MEDLINE | ID: mdl-28165112

ABSTRACT

MOTIVATION: There is increasing interest in learning how human brain networks vary as a function of a continuous trait, but flexible and efficient procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which combines low-rank factorizations and flexible Gaussian process priors to learn changes in the conditional expectation of a network-valued random variable across the values of a continuous predictor, while including subject-specific random effects. RESULTS: The formulation leads to a general framework for inference on changes in brain network structures across human traits, facilitating borrowing of information and coherently characterizing uncertainty. We provide an efficient Gibbs sampler for posterior computation along with simple procedures for inference, prediction and goodness-of-fit assessments. The model is applied to learn how human brain networks vary across individuals with different intelligence scores. Results provide interesting insights on the association between intelligence and brain connectivity, while demonstrating good predictive performance. AVAILABILITY AND IMPLEMENTATION: Source code implemented in R and data are available at https://github.com/wangronglu/BNRR. CONTACT: rl.wang@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Brain/anatomy & histology , Computational Biology/methods , Models, Biological , Nerve Net/anatomy & histology , Software , Algorithms , Bayes Theorem , Brain/physiology , Computer Simulation , Humans , Nerve Net/physiology

11.

Convex mixture regression for quantitative risk assessment.

Canale, Antonio; Durante, Daniele; Dunson, David B.

Biometrics ; 74(4): 1331-1340, 2018 12.

Article in English | MEDLINE | ID: mdl-29894557

ABSTRACT

There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus in these studies is inference on dose levels associated with a given increase in risk relative to a baseline. In addressing this goal, popular methods either dichotomize the continuous response or focus on modeling changes with the dose in the expectation of the outcome. Such choices may lead to information loss and provide inaccurate inference on dose-response relationships. We instead propose a Bayesian convex mixture regression model that allows the entire distribution of the health outcome to be unknown and changing with the dose. To balance flexibility and parsimony, we rely on a mixture model for the density at the extreme doses, and express the conditional density at each intermediate dose via a convex combination of these extremal densities. This representation generalizes classical dose-response models for quantitative outcomes, and provides a more parsimonious, but still powerful, formulation compared to nonparametric methods, thereby improving interpretability and efficiency in inference on risk functions. A Markov chain Monte Carlo algorithm for posterior inference is developed, and the benefits of our methods are outlined in simulations, along with a study on the impact of dde exposure on gestational age.

Subject(s)

Biometry/methods , Computer Simulation/statistics & numerical data , Regression Analysis , Risk Assessment/statistics & numerical data , Bayes Theorem , Environmental Exposure , Female , Gestational Age , Humans , Outcome Assessment, Health Care , Pregnancy , Prenatal Exposure Delayed Effects , Risk Assessment/methods

12.

Bayesian genome- and epigenome-wide association studies with gene level dependence.

Lock, Eric F; Dunson, David B.

Biometrics ; 73(3): 1018-1028, 2017 09.

Article in English | MEDLINE | ID: mdl-28083869

ABSTRACT

High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These genomic variables can naturally be grouped by the gene they encode, among other criteria. However, standard practice in such applications is independent screening with a universal correction for multiplicity. We propose a Bayesian approach in which the prior probability of an association for a given genomic variable depends on its gene, and the gene-specific probabilities are modeled nonparametrically. This hierarchical model allows for appropriate gene and genome-wide multiplicity adjustments, and can be incorporated into a variety of Bayesian association screening methodologies with negligible increase in computational complexity. We describe an application to screening for differences in DNA methylation between lower grade glioma and glioblastoma multiforme tumor samples from The Cancer Genome Atlas. Software is available via the package BayesianScreening for R: github.com/lockEF/BayesianScreening.

Subject(s)

Genome , Bayes Theorem , CpG Islands , DNA Methylation , Epigenesis, Genetic , Epigenomics , Glioblastoma , Humans

13.

TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS.

Johndrow, James E; Bhattacharya, Anirban; Dunson, David B.

Ann Stat ; 45(1): 1-38, 2017.

Article in English | MEDLINE | ID: mdl-29332971

ABSTRACT

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

14.

Nonparametric Bayes modeling for case control studies with many predictors.

Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B.

Biometrics ; 72(1): 184-92, 2016 Mar.

Article in English | MEDLINE | ID: mdl-26394204

ABSTRACT

It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.

Subject(s)

Bayes Theorem , Case-Control Studies , Congenital Abnormalities/epidemiology , Models, Statistical , Statistics, Nonparametric , Computer Simulation , Data Interpretation, Statistical , Humans , Incidence , Infant, Newborn , Reproducibility of Results , Risk Assessment/methods , Sample Size , Sensitivity and Specificity

15.

Personalised estimation of a woman's most fertile days.

Li, Daniel; Heyer, Leslie; Jennings, Victoria H; Smith, Colin A; Dunson, David B.

Eur J Contracept Reprod Health Care ; 21(4): 323-8, 2016 Aug.

Article in English | MEDLINE | ID: mdl-27297611

ABSTRACT

OBJECTIVES: We propose a new, personalised approach of estimating a woman's most fertile days that only requires recording the first day of menses and can use a smartphone to convey this information to the user so that she can plan or prevent pregnancy. METHODS: We performed a retrospective analysis of two cohort studies (a North Carolina-based study and the Early Pregnancy Study [EPS]) and a prospective multicentre trial (World Health Organization [WHO] study). The North Carolina study consisted of 68 sexually active women with either an intrauterine device or tubal ligation. The EPS comprised 221 women who planned to become pregnant and had no known fertility problems. The WHO study consisted of 706 women from five geographically and culturally diverse settings. Bayesian statistical methods were used to design our proposed method, Dynamic Optimal Timing (DOT). Simulation studies were used to estimate the cumulative pregnancy risk. RESULTS: For the proposed method, simulation analyses indicated a 4.4% cumulative probability of pregnancy over 13 cycles with correct use. After a calibration window, this method flagged between 11 and 13 days when unprotected intercourse should be avoided per cycle. Eligible women should have cycle lengths between 20 and 40 days with a variability range less than or equal to 9 days. CONCLUSIONS: DOT can easily be implemented by computer or smartphone applications, allowing for women to make more informed decisions about their fertility. This approach is already incorporated into a patent-pending system and is available for free download on iPhones and Androids.

Subject(s)

Bayes Theorem , Fertility/physiology , Menstrual Cycle/physiology , Mobile Applications , Natural Family Planning Methods/methods , Female , Humans , Smartphone

16.

Joint eQTL assessment of whole blood and dura mater tissue from individuals with Chiari type I malformation.

Lock, Eric F; Soldano, Karen L; Garrett, Melanie E; Cope, Heidi; Markunas, Christina A; Fuchs, Herbert; Grant, Gerald; Dunson, David B; Gregory, Simon G; Ashley-Koch, Allison E.

BMC Genomics ; 16: 11, 2015 Jan 22.

Article in English | MEDLINE | ID: mdl-25609184

ABSTRACT

BACKGROUND: Expression quantitative trait loci (eQTL) play an important role in the regulation of gene expression. Gene expression levels and eQTLs are expected to vary from tissue to tissue, and therefore multi-tissue analyses are necessary to fully understand complex genetic conditions in humans. Dura mater tissue likely interacts with cranial bone growth and thus may play a role in the etiology of Chiari Type I Malformation (CMI) and related conditions, but it is often inaccessible and its gene expression has not been well studied. A genetic basis to CMI has been established; however, the specific genetic risk factors are not well characterized. RESULTS: We present an assessment of eQTLs for whole blood and dura mater tissue from individuals with CMI. A joint-tissue analysis identified 239 eQTLs in either dura or blood, with 79% of these eQTLs shared by both tissues. Several identified eQTLs were novel and these implicate genes involved in bone development (IPO8, XYLT1, and PRKAR1A), and ribosomal pathways related to marrow and bone dysfunction, as potential candidates in the development of CMI. CONCLUSIONS: Despite strong overall heterogeneity in expression levels between blood and dura, the majority of cis-eQTLs are shared by both tissues. The power to detect shared eQTLs was improved by using an integrative statistical approach. The identified tissue-specific and shared eQTLs provide new insight into the genetic basis for CMI and related conditions.

Subject(s)

Arnold-Chiari Malformation/genetics , Quantitative Trait Loci , Adolescent , Arnold-Chiari Malformation/pathology , Bone Development/genetics , Child , Child, Preschool , Cyclic AMP-Dependent Protein Kinase RIalpha Subunit/blood , Cyclic AMP-Dependent Protein Kinase RIalpha Subunit/genetics , Cyclic AMP-Dependent Protein Kinase RIalpha Subunit/metabolism , Dura Mater/metabolism , Female , Gene Regulatory Networks , Genotype , Humans , Male , Pentosyltransferases/blood , Pentosyltransferases/genetics , Pentosyltransferases/metabolism , Polymorphism, Single Nucleotide , beta Karyopherins/blood , beta Karyopherins/genetics , beta Karyopherins/metabolism , UDP Xylose-Protein Xylosyltransferase

17.

Learning phenotype densities conditional on many interacting predictors.

Kessler, David C; Taylor, Jack A; Dunson, David B.

Bioinformatics ; 30(11): 1562-8, 2014 Jun 01.

Article in English | MEDLINE | ID: mdl-24501099

ABSTRACT

MOTIVATION: Estimating a phenotype distribution conditional on a set of discrete-valued predictors is a commonly encountered task. For example, interest may be in how the density of a quantitative trait varies with single nucleotide polymorphisms and patient characteristics. The subset of important predictors is not usually known in advance. This becomes more challenging with a high-dimensional predictor set when there is the possibility of interaction. RESULTS: We demonstrate a novel non-parametric Bayes method based on a tensor factorization of predictor-dependent weights for Gaussian kernels. The method uses multistage predictor selection for dimension reduction, providing succinct models for the phenotype distribution. The resulting conditional density morphs flexibly with the selected predictors. In a simulation study and an application to molecular epidemiology data, we demonstrate advantages over commonly used methods.

Subject(s)

Phenotype , Algorithms , Bayes Theorem , Humans , Polymorphism, Single Nucleotide

18.

Bayesian consensus clustering.

Lock, Eric F; Dunson, David B.

Bioinformatics ; 29(20): 2610-6, 2013 Oct 15.

Article in English | MEDLINE | ID: mdl-23990412

ABSTRACT

MOTIVATION: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. RESULTS: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. AVAILABILITY: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.

Subject(s)

Genomics/methods , Algorithms , Bayes Theorem , Cluster Analysis , Gene Dosage , Humans , Models, Statistical

19.

Bayesian nonparametric regression with varying residual density.

Pati, Debdeep; Dunson, David B.

Ann Inst Stat Math ; 66(1): 1-31, 2014 Feb.

Article in English | MEDLINE | ID: mdl-24465053

ABSTRACT

We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications.

20.

APOE, Immune Factors, Sex, and Diet Interact to Shape Brain Networks in Mouse Models of Aging.

Winter, Steven; Mahzarnia, Ali; Anderson, Robert J; Han, Zay Yar; Tremblay, Jessica; Stout, Jacques; Moon, Hae Sol; Marcellino, Daniel; Dunson, David B; Badea, Alexandra.

bioRxiv ; 2024 Jul 01.

Article in English | MEDLINE | ID: mdl-39005377

ABSTRACT

Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic, fixed and modifiable risk factors influence susceptibility to AD are under intense investigation, yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear. To model multiple risk factors including APOE genotype, age, sex, diet, and immunity we leveraged mice expressing the human APOE and NOS2 genes, conferring a reduced immune response compared to mouse Nos2. Employing graph analyses of brain connectomes derived from accelerated diffusion-weighted MRI, we assessed the global and local impact of risk factors in the absence of AD pathology. Aging and a high-fat diet impacted extensive networks comprising AD-vulnerable regions, including the temporal association cortex, amygdala, and the periaqueductal gray, involved in stress responses. Sex impacted networks including sexually dimorphic regions (thalamus, insula, hypothalamus) and key memory-processing areas (fimbria, septum). APOE genotypes modulated connectivity in memory, sensory, and motor regions, while diet and immunity both impacted the insula and hypothalamus. Notably, these risk factors converged on a circuit comprising 63 of 54,946 total connections (0.11% of the connectome), highlighting shared vulnerability amongst multiple AD risk factors in regions essential for sensory integration, emotional regulation, decision making, motor coordination, memory, homeostasis, and interoception. These network-based biomarkers hold translational value for distinguishing high-risk versus low-risk participants at preclinical AD stages, suggest circuits as potential therapeutic targets, and advance our understanding of network fingerprints associated with AD risk. Significance Statement: Current interventions for Alzheimer's disease (AD) do not provide a cure, and are delivered years after neuropathological onset. Addressing the impact of risk factors on brain networks holds promises for early detection, prevention, and revealing putative therapeutic targets at preclinical stages. We utilized six mouse models to investigate the impact of factors, including APOE genotype, age, sex, immunity, and diet, on brain networks. Large structural connectomes were derived from high resolution compressed sensing diffusion MRI. A highly parallelized graph classification identified subnetworks associated with unique risk factors, revealing their network fingerprints, and a common network composed of 63 connections with shared vulnerability to all risk factors. APOE genotype specific immune signatures support the design of interventions tailored to risk profiles.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL