Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 172
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cereb Cortex ; 33(9): 5307-5322, 2023 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-36320163

RESUMEN

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum. Adding cognitive metrics to the risk factor revealed the highest cumulative degree of connectivity for the pericalcarine cortex, insula, banks of the superior sulcus, and the cerebellum. To enable scaling up our approach, we extended tensor network principal component analysis, introducing CCA components. We developed sparse regression predictive models with errors of 17% for genotype, 24% for family risk factor for AD, and 5 years for age. Age prediction in groups including cognitively impaired subjects revealed regions not found using only normal subjects, i.e. middle and transverse temporal, paracentral and superior banks of temporal sulcus, as well as the amygdala and parahippocampal gyrus. These modeling approaches represent stepping stones towards single subject prediction.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/patología , Imagen por Resonancia Magnética , Encéfalo/patología , Genotipo , Envejecimiento
2.
IEEE Trans Signal Process ; 72: 70-83, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38283047

RESUMEN

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

3.
Neuroimage ; 276: 120214, 2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37286151

RESUMEN

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.


Asunto(s)
Conectoma , Sustancia Blanca , Humanos , Conectoma/métodos , Encéfalo/diagnóstico por imagen
4.
Bioinformatics ; 38(16): 4011-4018, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35762974

RESUMEN

MOTIVATION: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. RESULTS: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications. AVAILABILITY AND IMPLEMENTATION: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Neuroimagen , Humanos , Encéfalo/diagnóstico por imagen , Programas Informáticos
5.
Biometrics ; 79(4): 2987-2997, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37431147

RESUMEN

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP). The transmission rate model is further embedded in a hierarchy to allow information borrowing across parallel streams of regional incidence data. Crucially, the method makes use of optional vaccination data as a first step toward modeling of endemic infectious diseases. Computational techniques borrowed from the Bayesian spatial analysis literature enable fast and reliable posterior computation. Simulation studies reveal that the method recovers true covariate effects at nominal coverage levels. We analyze data from the COVID-19 pandemic and validate forecast intervals on held-out data. User-friendly software is provided to enable practitioners to easily deploy the method in public health research.


Asunto(s)
Enfermedades Transmisibles , Pandemias , Humanos , Modelos Estadísticos , Modelos Epidemiológicos , Teorema de Bayes , Enfermedades Transmisibles/epidemiología , Predicción
6.
Neuroimage ; 245: 118750, 2021 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-34823023

RESUMEN

There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.


Asunto(s)
Encéfalo/crecimiento & desarrollo , Encéfalo/fisiología , Cognición/fisiología , Conectoma/métodos , Imagen por Resonancia Magnética , Adolescente , Algoritmos , Niño , Simulación por Computador , Conjuntos de Datos como Asunto , Femenino , Humanos , Imagenología Tridimensional , Masculino , Modelos Neurológicos , Dinámicas no Lineales , Fenotipo , Lectura , Adulto Joven
7.
Bioinformatics ; 36(11): 3522-3527, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32176244

RESUMEN

MOTIVATION: Low-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data. RESULTS: The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumours. AVAILABILITY AND IMPLEMENTATION: Source code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies. CONTACT: aliverti@stat.unipd.it.


Asunto(s)
Células Endoteliales , Programas Informáticos , Algoritmos , Animales , Expresión Génica , Perfilación de la Expresión Génica , Ratones
8.
Blood ; 134(19): 1598-1607, 2019 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-31558468

RESUMEN

Burkitt lymphoma (BL) is an aggressive, MYC-driven lymphoma comprising 3 distinct clinical subtypes: sporadic BLs that occur worldwide, endemic BLs that occur predominantly in sub-Saharan Africa, and immunodeficiency-associated BLs that occur primarily in the setting of HIV. In this study, we comprehensively delineated the genomic basis of BL through whole-genome sequencing (WGS) of 101 tumors representing all 3 subtypes of BL to identify 72 driver genes. These data were additionally informed by CRISPR screens in BL cell lines to functionally annotate the role of oncogenic drivers. Nearly every driver gene was found to have both coding and non-coding mutations, highlighting the importance of WGS for identifying driver events. Our data implicate coding and non-coding mutations in IGLL5, BACH2, SIN3A, and DNMT1. Epstein-Barr virus (EBV) infection was associated with higher mutation load, with type 1 EBV showing a higher mutational burden than type 2 EBV. Although sporadic and immunodeficiency-associated BLs had similar genetic profiles, endemic BLs manifested more frequent mutations in BCL7A and BCL6 and fewer genetic alterations in DNMT1, SNTB2, and CTCF. Silencing mutations in ID3 were a common feature of all 3 subtypes of BL. In vitro, mass spectrometry-based proteomics demonstrated that the ID3 protein binds primarily to TCF3 and TCF4. In vivo knockout of ID3 potentiated the effects of MYC, leading to rapid tumorigenesis and tumor phenotypes consistent with those observed in the human disease.


Asunto(s)
Linfoma de Burkitt/genética , Secuenciación Completa del Genoma/métodos , Animales , Humanos , Ratones
9.
Bioinformatics ; 34(14): 2457-2464, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29506206

RESUMEN

Motivation: Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional linear models, functional principal components regression and cluster-based approaches, such as latent trajectory analysis. This article is motivated by applications in which the dynamics in a predictor, across times when the value is relatively extreme, are particularly informative about the response. For example, physicians are interested in relating the dynamics of blood pressure changes during surgery to post-surgery adverse outcomes, and it is thought that the dynamics are more important when blood pressure is significantly elevated or lowered. Results: We propose a novel class of extrema-weighted feature (XWF) extraction models. Key components in defining XWFs include the marginal density of the predictor, a function up-weighting values at extreme quantiles of this marginal, and functionals characterizing local dynamics. Algorithms are proposed for fitting of XWF-based regression and classification models, and are compared with current methods for functional predictors in simulations and a blood pressure during surgery application. XWFs find features of intraoperative blood pressure trajectories that are predictive of postoperative mortality. By their nature, most of these features cannot be found by previous methods. Availability and implementation: The R package 'xwf' is available at the CRAN repository: https://cran.r-project.org/package=xwf. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Presión Sanguínea , Biología Computacional/métodos , Complicaciones Posoperatorias , Programas Informáticos , Algoritmos , Femenino , Humanos , Masculino , Resultado del Tratamiento
10.
Bioinformatics ; 33(12): 1859-1866, 2017 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-28165112

RESUMEN

MOTIVATION: There is increasing interest in learning how human brain networks vary as a function of a continuous trait, but flexible and efficient procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which combines low-rank factorizations and flexible Gaussian process priors to learn changes in the conditional expectation of a network-valued random variable across the values of a continuous predictor, while including subject-specific random effects. RESULTS: The formulation leads to a general framework for inference on changes in brain network structures across human traits, facilitating borrowing of information and coherently characterizing uncertainty. We provide an efficient Gibbs sampler for posterior computation along with simple procedures for inference, prediction and goodness-of-fit assessments. The model is applied to learn how human brain networks vary across individuals with different intelligence scores. Results provide interesting insights on the association between intelligence and brain connectivity, while demonstrating good predictive performance. AVAILABILITY AND IMPLEMENTATION: Source code implemented in R and data are available at https://github.com/wangronglu/BNRR. CONTACT: rl.wang@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Encéfalo/anatomía & histología , Biología Computacional/métodos , Modelos Biológicos , Red Nerviosa/anatomía & histología , Programas Informáticos , Algoritmos , Teorema de Bayes , Encéfalo/fisiología , Simulación por Computador , Humanos , Red Nerviosa/fisiología
11.
Biometrics ; 74(4): 1331-1340, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29894557

RESUMEN

There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus in these studies is inference on dose levels associated with a given increase in risk relative to a baseline. In addressing this goal, popular methods either dichotomize the continuous response or focus on modeling changes with the dose in the expectation of the outcome. Such choices may lead to information loss and provide inaccurate inference on dose-response relationships. We instead propose a Bayesian convex mixture regression model that allows the entire distribution of the health outcome to be unknown and changing with the dose. To balance flexibility and parsimony, we rely on a mixture model for the density at the extreme doses, and express the conditional density at each intermediate dose via a convex combination of these extremal densities. This representation generalizes classical dose-response models for quantitative outcomes, and provides a more parsimonious, but still powerful, formulation compared to nonparametric methods, thereby improving interpretability and efficiency in inference on risk functions. A Markov chain Monte Carlo algorithm for posterior inference is developed, and the benefits of our methods are outlined in simulations, along with a study on the impact of dde exposure on gestational age.


Asunto(s)
Biometría/métodos , Simulación por Computador/estadística & datos numéricos , Análisis de Regresión , Medición de Riesgo/estadística & datos numéricos , Teorema de Bayes , Exposición a Riesgos Ambientales , Femenino , Edad Gestacional , Humanos , Evaluación de Resultado en la Atención de Salud , Embarazo , Efectos Tardíos de la Exposición Prenatal , Medición de Riesgo/métodos
12.
Biometrics ; 73(3): 1018-1028, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28083869

RESUMEN

High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These genomic variables can naturally be grouped by the gene they encode, among other criteria. However, standard practice in such applications is independent screening with a universal correction for multiplicity. We propose a Bayesian approach in which the prior probability of an association for a given genomic variable depends on its gene, and the gene-specific probabilities are modeled nonparametrically. This hierarchical model allows for appropriate gene and genome-wide multiplicity adjustments, and can be incorporated into a variety of Bayesian association screening methodologies with negligible increase in computational complexity. We describe an application to screening for differences in DNA methylation between lower grade glioma and glioblastoma multiforme tumor samples from The Cancer Genome Atlas. Software is available via the package BayesianScreening for R: github.com/lockEF/BayesianScreening.


Asunto(s)
Genoma , Teorema de Bayes , Islas de CpG , Metilación de ADN , Epigénesis Genética , Epigenómica , Glioblastoma , Humanos
13.
Ann Stat ; 45(1): 1-38, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29332971

RESUMEN

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

14.
Biometrics ; 72(1): 184-92, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26394204

RESUMEN

It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.


Asunto(s)
Teorema de Bayes , Estudios de Casos y Controles , Anomalías Congénitas/epidemiología , Modelos Estadísticos , Estadísticas no Paramétricas , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Incidencia , Recién Nacido , Reproducibilidad de los Resultados , Medición de Riesgo/métodos , Tamaño de la Muestra , Sensibilidad y Especificidad
15.
Eur J Contracept Reprod Health Care ; 21(4): 323-8, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-27297611

RESUMEN

OBJECTIVES: We propose a new, personalised approach of estimating a woman's most fertile days that only requires recording the first day of menses and can use a smartphone to convey this information to the user so that she can plan or prevent pregnancy. METHODS: We performed a retrospective analysis of two cohort studies (a North Carolina-based study and the Early Pregnancy Study [EPS]) and a prospective multicentre trial (World Health Organization [WHO] study). The North Carolina study consisted of 68 sexually active women with either an intrauterine device or tubal ligation. The EPS comprised 221 women who planned to become pregnant and had no known fertility problems. The WHO study consisted of 706 women from five geographically and culturally diverse settings. Bayesian statistical methods were used to design our proposed method, Dynamic Optimal Timing (DOT). Simulation studies were used to estimate the cumulative pregnancy risk. RESULTS: For the proposed method, simulation analyses indicated a 4.4% cumulative probability of pregnancy over 13 cycles with correct use. After a calibration window, this method flagged between 11 and 13 days when unprotected intercourse should be avoided per cycle. Eligible women should have cycle lengths between 20 and 40 days with a variability range less than or equal to 9 days. CONCLUSIONS: DOT can easily be implemented by computer or smartphone applications, allowing for women to make more informed decisions about their fertility. This approach is already incorporated into a patent-pending system and is available for free download on iPhones and Androids.


Asunto(s)
Teorema de Bayes , Fertilidad/fisiología , Ciclo Menstrual/fisiología , Aplicaciones Móviles , Métodos Naturales de Planificación Familiar/métodos , Femenino , Humanos , Teléfono Inteligente
16.
BMC Genomics ; 16: 11, 2015 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-25609184

RESUMEN

BACKGROUND: Expression quantitative trait loci (eQTL) play an important role in the regulation of gene expression. Gene expression levels and eQTLs are expected to vary from tissue to tissue, and therefore multi-tissue analyses are necessary to fully understand complex genetic conditions in humans. Dura mater tissue likely interacts with cranial bone growth and thus may play a role in the etiology of Chiari Type I Malformation (CMI) and related conditions, but it is often inaccessible and its gene expression has not been well studied. A genetic basis to CMI has been established; however, the specific genetic risk factors are not well characterized. RESULTS: We present an assessment of eQTLs for whole blood and dura mater tissue from individuals with CMI. A joint-tissue analysis identified 239 eQTLs in either dura or blood, with 79% of these eQTLs shared by both tissues. Several identified eQTLs were novel and these implicate genes involved in bone development (IPO8, XYLT1, and PRKAR1A), and ribosomal pathways related to marrow and bone dysfunction, as potential candidates in the development of CMI. CONCLUSIONS: Despite strong overall heterogeneity in expression levels between blood and dura, the majority of cis-eQTLs are shared by both tissues. The power to detect shared eQTLs was improved by using an integrative statistical approach. The identified tissue-specific and shared eQTLs provide new insight into the genetic basis for CMI and related conditions.


Asunto(s)
Malformación de Arnold-Chiari/genética , Sitios de Carácter Cuantitativo , Adolescente , Malformación de Arnold-Chiari/patología , Desarrollo Óseo/genética , Niño , Preescolar , Subunidad RIalfa de la Proteína Quinasa Dependiente de AMP Cíclico/sangre , Subunidad RIalfa de la Proteína Quinasa Dependiente de AMP Cíclico/genética , Subunidad RIalfa de la Proteína Quinasa Dependiente de AMP Cíclico/metabolismo , Duramadre/metabolismo , Femenino , Redes Reguladoras de Genes , Genotipo , Humanos , Masculino , Pentosiltransferasa/sangre , Pentosiltransferasa/genética , Pentosiltransferasa/metabolismo , Polimorfismo de Nucleótido Simple , beta Carioferinas/sangre , beta Carioferinas/genética , beta Carioferinas/metabolismo , UDP Xilosa Proteína Xilosiltransferasa
17.
Bioinformatics ; 30(11): 1562-8, 2014 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-24501099

RESUMEN

MOTIVATION: Estimating a phenotype distribution conditional on a set of discrete-valued predictors is a commonly encountered task. For example, interest may be in how the density of a quantitative trait varies with single nucleotide polymorphisms and patient characteristics. The subset of important predictors is not usually known in advance. This becomes more challenging with a high-dimensional predictor set when there is the possibility of interaction. RESULTS: We demonstrate a novel non-parametric Bayes method based on a tensor factorization of predictor-dependent weights for Gaussian kernels. The method uses multistage predictor selection for dimension reduction, providing succinct models for the phenotype distribution. The resulting conditional density morphs flexibly with the selected predictors. In a simulation study and an application to molecular epidemiology data, we demonstrate advantages over commonly used methods.


Asunto(s)
Fenotipo , Algoritmos , Teorema de Bayes , Humanos , Polimorfismo de Nucleótido Simple
18.
Bioinformatics ; 29(20): 2610-6, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-23990412

RESUMEN

MOTIVATION: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. RESULTS: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. AVAILABILITY: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/software.html.


Asunto(s)
Genómica/métodos , Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Dosificación de Gen , Humanos , Modelos Estadísticos
19.
Ann Inst Stat Math ; 66(1): 1-31, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24465053

RESUMEN

We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications.

20.
bioRxiv ; 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-39005377

RESUMEN

Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic, fixed and modifiable risk factors influence susceptibility to AD are under intense investigation, yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear. To model multiple risk factors including APOE genotype, age, sex, diet, and immunity we leveraged mice expressing the human APOE and NOS2 genes, conferring a reduced immune response compared to mouse Nos2. Employing graph analyses of brain connectomes derived from accelerated diffusion-weighted MRI, we assessed the global and local impact of risk factors in the absence of AD pathology. Aging and a high-fat diet impacted extensive networks comprising AD-vulnerable regions, including the temporal association cortex, amygdala, and the periaqueductal gray, involved in stress responses. Sex impacted networks including sexually dimorphic regions (thalamus, insula, hypothalamus) and key memory-processing areas (fimbria, septum). APOE genotypes modulated connectivity in memory, sensory, and motor regions, while diet and immunity both impacted the insula and hypothalamus. Notably, these risk factors converged on a circuit comprising 63 of 54,946 total connections (0.11% of the connectome), highlighting shared vulnerability amongst multiple AD risk factors in regions essential for sensory integration, emotional regulation, decision making, motor coordination, memory, homeostasis, and interoception. These network-based biomarkers hold translational value for distinguishing high-risk versus low-risk participants at preclinical AD stages, suggest circuits as potential therapeutic targets, and advance our understanding of network fingerprints associated with AD risk. Significance Statement: Current interventions for Alzheimer's disease (AD) do not provide a cure, and are delivered years after neuropathological onset. Addressing the impact of risk factors on brain networks holds promises for early detection, prevention, and revealing putative therapeutic targets at preclinical stages. We utilized six mouse models to investigate the impact of factors, including APOE genotype, age, sex, immunity, and diet, on brain networks. Large structural connectomes were derived from high resolution compressed sensing diffusion MRI. A highly parallelized graph classification identified subnetworks associated with unique risk factors, revealing their network fingerprints, and a common network composed of 63 connections with shared vulnerability to all risk factors. APOE genotype specific immune signatures support the design of interventions tailored to risk profiles.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA