Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 171
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cereb Cortex ; 33(9): 5307-5322, 2023 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-36320163

RESUMO

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum. Adding cognitive metrics to the risk factor revealed the highest cumulative degree of connectivity for the pericalcarine cortex, insula, banks of the superior sulcus, and the cerebellum. To enable scaling up our approach, we extended tensor network principal component analysis, introducing CCA components. We developed sparse regression predictive models with errors of 17% for genotype, 24% for family risk factor for AD, and 5 years for age. Age prediction in groups including cognitively impaired subjects revealed regions not found using only normal subjects, i.e. middle and transverse temporal, paracentral and superior banks of temporal sulcus, as well as the amygdala and parahippocampal gyrus. These modeling approaches represent stepping stones towards single subject prediction.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/patologia , Imageamento por Ressonância Magnética , Encéfalo/patologia , Genótipo , Envelhecimento
2.
IEEE Trans Signal Process ; 72: 70-83, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38283047

RESUMO

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

3.
Neuroimage ; 276: 120214, 2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37286151

RESUMO

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.


Assuntos
Conectoma , Substância Branca , Humanos , Conectoma/métodos , Encéfalo/diagnóstico por imagem
4.
Bioinformatics ; 38(16): 4011-4018, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35762974

RESUMO

MOTIVATION: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. RESULTS: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications. AVAILABILITY AND IMPLEMENTATION: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Neuroimagem , Humanos , Encéfalo/diagnóstico por imagem , Software
5.
Biometrics ; 79(4): 2987-2997, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37431147

RESUMO

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP). The transmission rate model is further embedded in a hierarchy to allow information borrowing across parallel streams of regional incidence data. Crucially, the method makes use of optional vaccination data as a first step toward modeling of endemic infectious diseases. Computational techniques borrowed from the Bayesian spatial analysis literature enable fast and reliable posterior computation. Simulation studies reveal that the method recovers true covariate effects at nominal coverage levels. We analyze data from the COVID-19 pandemic and validate forecast intervals on held-out data. User-friendly software is provided to enable practitioners to easily deploy the method in public health research.


Assuntos
Doenças Transmissíveis , Pandemias , Humanos , Modelos Estatísticos , Modelos Epidemiológicos , Teorema de Bayes , Doenças Transmissíveis/epidemiologia , Previsões
6.
Neuroimage ; 245: 118750, 2021 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-34823023

RESUMO

There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.


Assuntos
Encéfalo/crescimento & desenvolvimento , Encéfalo/fisiologia , Cognição/fisiologia , Conectoma/métodos , Imageamento por Ressonância Magnética , Adolescente , Algoritmos , Criança , Simulação por Computador , Conjuntos de Dados como Assunto , Feminino , Humanos , Imageamento Tridimensional , Masculino , Modelos Neurológicos , Dinâmica não Linear , Fenótipo , Leitura , Adulto Jovem
7.
Bioinformatics ; 36(11): 3522-3527, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32176244

RESUMO

MOTIVATION: Low-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data. RESULTS: The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumours. AVAILABILITY AND IMPLEMENTATION: Source code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies. CONTACT: aliverti@stat.unipd.it.


Assuntos
Células Endoteliais , Software , Algoritmos , Animais , Expressão Gênica , Perfilação da Expressão Gênica , Camundongos
8.
Blood ; 134(19): 1598-1607, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31558468

RESUMO

Burkitt lymphoma (BL) is an aggressive, MYC-driven lymphoma comprising 3 distinct clinical subtypes: sporadic BLs that occur worldwide, endemic BLs that occur predominantly in sub-Saharan Africa, and immunodeficiency-associated BLs that occur primarily in the setting of HIV. In this study, we comprehensively delineated the genomic basis of BL through whole-genome sequencing (WGS) of 101 tumors representing all 3 subtypes of BL to identify 72 driver genes. These data were additionally informed by CRISPR screens in BL cell lines to functionally annotate the role of oncogenic drivers. Nearly every driver gene was found to have both coding and non-coding mutations, highlighting the importance of WGS for identifying driver events. Our data implicate coding and non-coding mutations in IGLL5, BACH2, SIN3A, and DNMT1. Epstein-Barr virus (EBV) infection was associated with higher mutation load, with type 1 EBV showing a higher mutational burden than type 2 EBV. Although sporadic and immunodeficiency-associated BLs had similar genetic profiles, endemic BLs manifested more frequent mutations in BCL7A and BCL6 and fewer genetic alterations in DNMT1, SNTB2, and CTCF. Silencing mutations in ID3 were a common feature of all 3 subtypes of BL. In vitro, mass spectrometry-based proteomics demonstrated that the ID3 protein binds primarily to TCF3 and TCF4. In vivo knockout of ID3 potentiated the effects of MYC, leading to rapid tumorigenesis and tumor phenotypes consistent with those observed in the human disease.


Assuntos
Linfoma de Burkitt/genética , Sequenciamento Completo do Genoma/métodos , Animais , Humanos , Camundongos
9.
Bioinformatics ; 34(14): 2457-2464, 2018 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-29506206

RESUMO

Motivation: Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional linear models, functional principal components regression and cluster-based approaches, such as latent trajectory analysis. This article is motivated by applications in which the dynamics in a predictor, across times when the value is relatively extreme, are particularly informative about the response. For example, physicians are interested in relating the dynamics of blood pressure changes during surgery to post-surgery adverse outcomes, and it is thought that the dynamics are more important when blood pressure is significantly elevated or lowered. Results: We propose a novel class of extrema-weighted feature (XWF) extraction models. Key components in defining XWFs include the marginal density of the predictor, a function up-weighting values at extreme quantiles of this marginal, and functionals characterizing local dynamics. Algorithms are proposed for fitting of XWF-based regression and classification models, and are compared with current methods for functional predictors in simulations and a blood pressure during surgery application. XWFs find features of intraoperative blood pressure trajectories that are predictive of postoperative mortality. By their nature, most of these features cannot be found by previous methods. Availability and implementation: The R package 'xwf' is available at the CRAN repository: https://cran.r-project.org/package=xwf. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Pressão Sanguínea , Biologia Computacional/métodos , Complicações Pós-Operatórias , Software , Algoritmos , Feminino , Humanos , Masculino , Resultado do Tratamento
10.
Bioinformatics ; 33(12): 1859-1866, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28165112

RESUMO

MOTIVATION: There is increasing interest in learning how human brain networks vary as a function of a continuous trait, but flexible and efficient procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which combines low-rank factorizations and flexible Gaussian process priors to learn changes in the conditional expectation of a network-valued random variable across the values of a continuous predictor, while including subject-specific random effects. RESULTS: The formulation leads to a general framework for inference on changes in brain network structures across human traits, facilitating borrowing of information and coherently characterizing uncertainty. We provide an efficient Gibbs sampler for posterior computation along with simple procedures for inference, prediction and goodness-of-fit assessments. The model is applied to learn how human brain networks vary across individuals with different intelligence scores. Results provide interesting insights on the association between intelligence and brain connectivity, while demonstrating good predictive performance. AVAILABILITY AND IMPLEMENTATION: Source code implemented in R and data are available at https://github.com/wangronglu/BNRR. CONTACT: rl.wang@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Encéfalo/anatomia & histologia , Biologia Computacional/métodos , Modelos Biológicos , Rede Nervosa/anatomia & histologia , Software , Algoritmos , Teorema de Bayes , Encéfalo/fisiologia , Simulação por Computador , Humanos , Rede Nervosa/fisiologia
11.
Biometrics ; 74(4): 1331-1340, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29894557

RESUMO

There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus in these studies is inference on dose levels associated with a given increase in risk relative to a baseline. In addressing this goal, popular methods either dichotomize the continuous response or focus on modeling changes with the dose in the expectation of the outcome. Such choices may lead to information loss and provide inaccurate inference on dose-response relationships. We instead propose a Bayesian convex mixture regression model that allows the entire distribution of the health outcome to be unknown and changing with the dose. To balance flexibility and parsimony, we rely on a mixture model for the density at the extreme doses, and express the conditional density at each intermediate dose via a convex combination of these extremal densities. This representation generalizes classical dose-response models for quantitative outcomes, and provides a more parsimonious, but still powerful, formulation compared to nonparametric methods, thereby improving interpretability and efficiency in inference on risk functions. A Markov chain Monte Carlo algorithm for posterior inference is developed, and the benefits of our methods are outlined in simulations, along with a study on the impact of dde exposure on gestational age.


Assuntos
Biometria/métodos , Simulação por Computador/estatística & dados numéricos , Análise de Regressão , Medição de Risco/estatística & dados numéricos , Teorema de Bayes , Exposição Ambiental , Feminino , Idade Gestacional , Humanos , Avaliação de Resultados em Cuidados de Saúde , Gravidez , Efeitos Tardios da Exposição Pré-Natal , Medição de Risco/métodos
12.
Biometrics ; 73(3): 1018-1028, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28083869

RESUMO

High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These genomic variables can naturally be grouped by the gene they encode, among other criteria. However, standard practice in such applications is independent screening with a universal correction for multiplicity. We propose a Bayesian approach in which the prior probability of an association for a given genomic variable depends on its gene, and the gene-specific probabilities are modeled nonparametrically. This hierarchical model allows for appropriate gene and genome-wide multiplicity adjustments, and can be incorporated into a variety of Bayesian association screening methodologies with negligible increase in computational complexity. We describe an application to screening for differences in DNA methylation between lower grade glioma and glioblastoma multiforme tumor samples from The Cancer Genome Atlas. Software is available via the package BayesianScreening for R: github.com/lockEF/BayesianScreening.


Assuntos
Genoma , Teorema de Bayes , Ilhas de CpG , Metilação de DNA , Epigênese Genética , Epigenômica , Glioblastoma , Humanos
13.
Ann Stat ; 45(1): 1-38, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29332971

RESUMO

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

14.
Biometrics ; 72(1): 184-92, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26394204

RESUMO

It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.


Assuntos
Teorema de Bayes , Estudos de Casos e Controles , Anormalidades Congênitas/epidemiologia , Modelos Estatísticos , Estatísticas não Paramétricas , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Incidência , Recém-Nascido , Reprodutibilidade dos Testes , Medição de Risco/métodos , Tamanho da Amostra , Sensibilidade e Especificidade
15.
Eur J Contracept Reprod Health Care ; 21(4): 323-8, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27297611

RESUMO

OBJECTIVES: We propose a new, personalised approach of estimating a woman's most fertile days that only requires recording the first day of menses and can use a smartphone to convey this information to the user so that she can plan or prevent pregnancy. METHODS: We performed a retrospective analysis of two cohort studies (a North Carolina-based study and the Early Pregnancy Study [EPS]) and a prospective multicentre trial (World Health Organization [WHO] study). The North Carolina study consisted of 68 sexually active women with either an intrauterine device or tubal ligation. The EPS comprised 221 women who planned to become pregnant and had no known fertility problems. The WHO study consisted of 706 women from five geographically and culturally diverse settings. Bayesian statistical methods were used to design our proposed method, Dynamic Optimal Timing (DOT). Simulation studies were used to estimate the cumulative pregnancy risk. RESULTS: For the proposed method, simulation analyses indicated a 4.4% cumulative probability of pregnancy over 13 cycles with correct use. After a calibration window, this method flagged between 11 and 13 days when unprotected intercourse should be avoided per cycle. Eligible women should have cycle lengths between 20 and 40 days with a variability range less than or equal to 9 days. CONCLUSIONS: DOT can easily be implemented by computer or smartphone applications, allowing for women to make more informed decisions about their fertility. This approach is already incorporated into a patent-pending system and is available for free download on iPhones and Androids.


Assuntos
Teorema de Bayes , Fertilidade/fisiologia , Ciclo Menstrual/fisiologia , Aplicativos Móveis , Métodos Naturais de Planejamento Familiar/métodos , Feminino , Humanos , Smartphone
16.
BMC Genomics ; 16: 11, 2015 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-25609184

RESUMO

BACKGROUND: Expression quantitative trait loci (eQTL) play an important role in the regulation of gene expression. Gene expression levels and eQTLs are expected to vary from tissue to tissue, and therefore multi-tissue analyses are necessary to fully understand complex genetic conditions in humans. Dura mater tissue likely interacts with cranial bone growth and thus may play a role in the etiology of Chiari Type I Malformation (CMI) and related conditions, but it is often inaccessible and its gene expression has not been well studied. A genetic basis to CMI has been established; however, the specific genetic risk factors are not well characterized. RESULTS: We present an assessment of eQTLs for whole blood and dura mater tissue from individuals with CMI. A joint-tissue analysis identified 239 eQTLs in either dura or blood, with 79% of these eQTLs shared by both tissues. Several identified eQTLs were novel and these implicate genes involved in bone development (IPO8, XYLT1, and PRKAR1A), and ribosomal pathways related to marrow and bone dysfunction, as potential candidates in the development of CMI. CONCLUSIONS: Despite strong overall heterogeneity in expression levels between blood and dura, the majority of cis-eQTLs are shared by both tissues. The power to detect shared eQTLs was improved by using an integrative statistical approach. The identified tissue-specific and shared eQTLs provide new insight into the genetic basis for CMI and related conditions.


Assuntos
Malformação de Arnold-Chiari/genética , Locos de Características Quantitativas , Adolescente , Malformação de Arnold-Chiari/patologia , Desenvolvimento Ósseo/genética , Criança , Pré-Escolar , Subunidade RIalfa da Proteína Quinase Dependente de AMP Cíclico/sangue , Subunidade RIalfa da Proteína Quinase Dependente de AMP Cíclico/genética , Subunidade RIalfa da Proteína Quinase Dependente de AMP Cíclico/metabolismo , Dura-Máter/metabolismo , Feminino , Redes Reguladoras de Genes , Genótipo , Humanos , Masculino , Pentosiltransferases/sangue , Pentosiltransferases/genética , Pentosiltransferases/metabolismo , Polimorfismo de Nucleotídeo Único , beta Carioferinas/sangue , beta Carioferinas/genética , beta Carioferinas/metabolismo , UDP Xilose-Proteína Xilosiltransferase
17.
Bioinformatics ; 30(11): 1562-8, 2014 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-24501099

RESUMO

MOTIVATION: Estimating a phenotype distribution conditional on a set of discrete-valued predictors is a commonly encountered task. For example, interest may be in how the density of a quantitative trait varies with single nucleotide polymorphisms and patient characteristics. The subset of important predictors is not usually known in advance. This becomes more challenging with a high-dimensional predictor set when there is the possibility of interaction. RESULTS: We demonstrate a novel non-parametric Bayes method based on a tensor factorization of predictor-dependent weights for Gaussian kernels. The method uses multistage predictor selection for dimension reduction, providing succinct models for the phenotype distribution. The resulting conditional density morphs flexibly with the selected predictors. In a simulation study and an application to molecular epidemiology data, we demonstrate advantages over commonly used methods.


Assuntos
Fenótipo , Algoritmos , Teorema de Bayes , Humanos , Polimorfismo de Nucleotídeo Único
18.
Bioinformatics ; 29(20): 2610-6, 2013 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-23990412

RESUMO

MOTIVATION: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. RESULTS: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. AVAILABILITY: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.


Assuntos
Genômica/métodos , Algoritmos , Teorema de Bayes , Análise por Conglomerados , Dosagem de Genes , Humanos , Modelos Estatísticos
19.
Ann Inst Stat Math ; 66(1): 1-31, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24465053

RESUMO

We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications.

20.
Epidemiology ; 24(6): 921-8, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24051893

RESUMO

BACKGROUND: Some environmental chemical exposures are lipophilic and need to be adjusted by serum lipid levels before data analyses. There are currently various strategies that attempt to account for this problem, but all have their drawbacks. To address such concerns, we propose a new method that uses Box-Cox transformations and a simple Bayesian hierarchical model to adjust for lipophilic chemical exposures. METHODS: We compared our Box-Cox method to existing methods. We ran simulation studies in which increasing levels of lipid-adjusted chemical exposure did and did not increase the odds of having a disease, and we looked at both single-exposure and multiple-exposure cases. We also analyzed an epidemiology dataset that examined the effects of various chemical exposure on the risk of birth defects. RESULTS: Compared with existing methods, our Box-Cox method produced unbiased estimates, good coverage, similar power, and lower type I error rates. This was the case in both single- and multiple-exposure simulation studies. Results from analysis of the birth-defect data differed from results using existing methods. CONCLUSION: Our Box-Cox method is a novel and intuitive way to account for the lipophilic nature of certain chemical exposures. It addresses some of the problems with existing methods, is easily extendable to multiple exposures, and can be used in any analysis that involves concomitant variables.


Assuntos
Exposição Ambiental , Lipídeos/sangue , Modelos Teóricos , Compostos Orgânicos/química , Teorema de Bayes , Simulação por Computador , Humanos , Compostos Orgânicos/sangue
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA