Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 209
Filtrar
1.
Stat Med ; 43(10): 1867-1882, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38409877

RESUMEN

Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better model and predict the dynamics of an epidemic, and provide insight into the efficacy of control and intervention strategies. We present a method for likelihood-based estimation of parameters in the stochastic susceptible-infected-removed model under a time-inhomogeneous transmission rate comprised of piecewise constant components. In doing so, our method simultaneously learns change points in the transmission rate via a Markov chain Monte Carlo algorithm. The method targets the exact model posterior in a difficult missing data setting given only partially observed case counts over time. We validate performance on simulated data before applying our approach to data from an Ebola outbreak in Western Africa and COVID-19 outbreak on a university campus.


Asunto(s)
Epidemias , Fiebre Hemorrágica Ebola , Humanos , Funciones de Verosimilitud , Cadenas de Markov , Brotes de Enfermedades , Método de Montecarlo , Teorema de Bayes , Procesos Estocásticos
2.
IEEE Trans Signal Process ; 72: 70-83, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38283047

RESUMEN

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.

3.
Front Neurosci ; 17: 1200373, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37901431

RESUMEN

The brain structural connectome is generated by a collection of white matter fiber bundles constructed from diffusion weighted MRI (dMRI), acting as highways for neural activity. There has been abundant interest in studying how the structural connectome varies across individuals in relation to their traits, ranging from age and gender to neuropsychiatric outcomes. After applying tractography to dMRI to get white matter fiber bundles, a key question is how to represent the brain connectome to facilitate statistical analyses relating connectomes to traits. The current standard divides the brain into regions of interest (ROIs), and then relies on an adjacency matrix (AM) representation. Each cell in the AM is a measure of connectivity, e.g., number of fiber curves, between a pair of ROIs. Although the AM representation is intuitive, a disadvantage is the high-dimensionality due to the large number of cells in the matrix. This article proposes a simpler tree representation of the brain connectome, which is motivated by ideas in computational topology and takes topological and biological information on the cortical surface into consideration. We demonstrate that our tree representation preserves useful information and interpretability, while reducing dimensionality to improve statistical and computational efficiency. Applications to data from the Human Connectome Project (HCP) are considered and code is provided for reproducing our analyses.

4.
J R Stat Soc Ser C Appl Stat ; 72(4): 912-936, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37662555

RESUMEN

Targeted brain stimulation has the potential to treat mental illnesses. We develop an approach to help design protocols by identifying relevant multi-region electrical dynamics. Our approach models these dynamics as a superposition of latent networks, where the latent variables predict a relevant outcome. We use supervised autoencoders (SAEs) to improve predictive performance in this context, describe the conditions where SAEs improve predictions, and provide modelling constraints to ensure biological relevance. We experimentally validate our approach by finding a network associated with stress that aligns with a previous stimulation protocol and characterizing a genotype associated with bipolar disorder.

5.
Biometrics ; 79(4): 2987-2997, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37431147

RESUMEN

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP). The transmission rate model is further embedded in a hierarchy to allow information borrowing across parallel streams of regional incidence data. Crucially, the method makes use of optional vaccination data as a first step toward modeling of endemic infectious diseases. Computational techniques borrowed from the Bayesian spatial analysis literature enable fast and reliable posterior computation. Simulation studies reveal that the method recovers true covariate effects at nominal coverage levels. We analyze data from the COVID-19 pandemic and validate forecast intervals on held-out data. User-friendly software is provided to enable practitioners to easily deploy the method in public health research.


Asunto(s)
Enfermedades Transmisibles , Pandemias , Humanos , Modelos Estadísticos , Modelos Epidemiológicos , Teorema de Bayes , Enfermedades Transmisibles/epidemiología , Predicción
6.
Neuroimage ; 276: 120214, 2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37286151

RESUMEN

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.


Asunto(s)
Conectoma , Sustancia Blanca , Humanos , Conectoma/métodos , Encéfalo/diagnóstico por imagen
7.
J R Stat Soc Ser C Appl Stat ; 72(2): 254-270, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37197290

RESUMEN

We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sample prediction of activity for chemicals not yet assayed, quantifies uncertainty of predicted activity, and adjusts for multiplicity in hypothesis testing. Furthermore, this paper makes a novel attempt in toxicology to simultaneously model heteroscedastic errors and a nonparametric mean function, leading to a broader definition of activity whose need has been suggested by toxicologists. Real application identifies chemicals most likely active for neurodevelopmental disorders and obesity.

8.
PLoS One ; 18(4): e0284904, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37099536

RESUMEN

Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R2), and (iv) is interpreted the same way for numerical and categorical data. Unfortunately, MI typically receives little to no attention in introductory statistics courses and is more difficult than correlation to estimate from data. In this article, we motivate the use of MI in the analyses of epidemiologic data, while providing a general introduction to estimation and interpretation. We illustrate its utility through a retrospective study relating intraoperative heart rate (HR) and mean arterial pressure (MAP). We: (i) show postoperative mortality is associated with decreased MI between HR and MAP and (ii) improve existing postoperative mortality risk assessment by including MI and additional hemodynamic statistics.


Asunto(s)
Hemodinámica , Humanos , Estudios Retrospectivos , Frecuencia Cardíaca
9.
Cereb Cortex ; 33(9): 5307-5322, 2023 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-36320163

RESUMEN

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum. Adding cognitive metrics to the risk factor revealed the highest cumulative degree of connectivity for the pericalcarine cortex, insula, banks of the superior sulcus, and the cerebellum. To enable scaling up our approach, we extended tensor network principal component analysis, introducing CCA components. We developed sparse regression predictive models with errors of 17% for genotype, 24% for family risk factor for AD, and 5 years for age. Age prediction in groups including cognitively impaired subjects revealed regions not found using only normal subjects, i.e. middle and transverse temporal, paracentral and superior banks of temporal sulcus, as well as the amygdala and parahippocampal gyrus. These modeling approaches represent stepping stones towards single subject prediction.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/patología , Imagen por Resonancia Magnética , Encéfalo/patología , Genotipo , Envejecimiento
10.
J Am Stat Assoc ; 118(544): 2521-2532, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38501061

RESUMEN

We aim at modeling the appearance of distinct tags in a sequence of labeled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarized via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian method for species sampling modeling by directly specifying the probability of a new discovery, therefore, allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties, including one that shares the same discovery probability with the Dirichlet process. Moreover, due to strong connections with logistic regression models, the latter subclass can naturally account for covariates. We finally test our proposal on both synthetic and real data, with special emphasis on a large fungal biodiversity study in Finland. Supplementary materials for this article are available online.

11.
Ann Appl Stat ; 16(3): 1380-1399, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36465815

RESUMEN

We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity and long range dependence. This rules out Poisson process based models where the rate function itself is not long range dependent. The proposed class of FRActional Probit (FRAP) models is based on thresholding, a latent process. This latent process is modeled by a smooth Gaussian process and a fractional Brownian motion by assuming an additive structure. We develop a Bayesian approach to inference using Markov chain Monte Carlo and show good performance in simulation studies. Applying the methods to the Amazon bird vocalization data, we find substantial evidence for self-similarity and non-Markovian/Poisson dynamics. To accommodate the bird vocalization data in which there are many different species of birds exhibiting their own vocalization dynamics, a hierarchical expansion of FRAP is provided in the Supplementary Material.

13.
Ann Appl Stat ; 16(4): 2369-2395, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36425314

RESUMEN

Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of routinely-used community detection algorithms, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation and uncertainty quantification. To cover these gaps, we develop a new class of extended stochastic block models (esbm) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses many realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates the inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with criminal networks. A collapsed Gibbs sampler is proposed for the whole esbm class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. The esbm performance is illustrated in realistic simulations and in an application to an Italian mafia network, where we unveil key complex block structures, mostly hidden from state-of-the-art alternatives.

14.
J Math Biol ; 85(4): 36, 2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36125562

RESUMEN

The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of the epidemic and design of appropriate interventions. However, accurately inferring SIR model parameters in such scenarios is problematic. This article provides novel, theoretical insight on this issue of practical identifiability of the SIR model. Our theory provides new understanding of the inferential limits of routinely used epidemic models and provides a valuable addition to current simulate-and-check methods. We illustrate some practical implications through application to a real-world epidemic data set.


Asunto(s)
Enfermedades Transmisibles , Epidemias , Enfermedades Transmisibles/epidemiología , Brotes de Enfermedades , Susceptibilidad a Enfermedades/epidemiología , Modelos Epidemiológicos , Humanos
15.
Artículo en Inglés | MEDLINE | ID: mdl-35891979

RESUMEN

High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably less attention has been devoted to methods for large scale data that allow the description of complex relationships between several outcomes recorded at high resolutions by different sensors. Our Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph. Information-theoretic arguments and considerations on computational efficiency guide the construction of the tree and the related efficient sampling algorithms in imbalanced multivariate settings. In addition to simulated data examples, we illustrate SpamTrees using a large climate data set which combines satellite data with land-based station data. Software and source code are available on CRAN at https://CRAN.R-project.org/package=spamtree.

16.
Front Neurosci ; 16: 848654, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35784847

RESUMEN

Spatial navigation and orientation are emerging as promising markers for altered cognition in prodromal Alzheimer's disease, and even in cognitively normal individuals at risk for Alzheimer's disease. The different APOE gene alleles confer various degrees of risk. The APOE2 allele is considered protective, APOE3 is seen as control, while APOE4 carriage is the major known genetic risk for Alzheimer's disease. We have used mouse models carrying the three humanized APOE alleles and tested them in a spatial memory task in the Morris water maze. We introduce a new metric, the absolute winding number, to characterize the spatial search strategy, through the shape of the swim path. We show that this metric is robust to noise, and works for small group samples. Moreover, the absolute winding number better differentiated APOE3 carriers, through their straighter swim paths relative to both APOE2 and APOE4 genotypes. Finally, this novel metric supported increased vulnerability in APOE4 females. We hypothesized differences in spatial memory and navigation strategies are linked to differences in brain networks, and showed that different genotypes have different reliance on the hippocampal and caudate putamen circuits, pointing to a role for white matter connections. Moreover, differences were most pronounced in females. This departure from a hippocampal centric to a brain network approach may open avenues for identifying regions linked to increased risk for Alzheimer's disease, before overt disease manifestation. Further exploration of novel biomarkers based on spatial navigation strategies may enlarge the windows of opportunity for interventions. The proposed framework will be significant in dissecting vulnerable circuits associated with cognitive changes in prodromal Alzheimer's disease.

17.
Ann Appl Stat ; 16(2): 765-790, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35813556

RESUMEN

Psychiatric studies of suicide provide fundamental insights on the evolution of severe psychopathologies, and contribute to the development of early treatment interventions. Our focus is on modelling different traits of psychosis and their interconnections, focusing on a case study on suicide attempt survivors. Such aspects are recorded via multivariate categorical data, involving a large numbers of items for multiple subjects. Current methods for multivariate categorical data-such as penalized log-linear models and latent structure analysis-are either limited to low-dimensional settings or include parameters with difficult interpretation. Motivated by this application, this article proposes a new class of approaches, which we refer to as Mixture of Log Linear models (mills). Combining latent class analysis and log-linear models, mills defines a novel Bayesian approach to model complex multivariate categorical data with flexibility and interpretability, providing interesting insights on the relationship between psychotic diseases and psychological aspects in suicide attempt survivors.

18.
Ann Appl Stat ; 16(1): 391-413, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35757598

RESUMEN

Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say 4). Mixed membership models quantify this phenomenon, but they typically focus on goodness-of-fit more than on interpretable inference. To achieve a good numerical fit, these models may in fact require many extreme profiles, making the results difficult to interpret. We introduce a new class of multivariate mixed membership models that, when variables can be partitioned into subject-matter based domains, can provide a good fit to the data using fewer profiles than standard formulations. The proposed model explicitly accounts for the blocks of variables corresponding to the distinct domains along with a cross-domain correlation structure, which provides new information about shared membership of individuals in a complex classification scheme. We specify a multivariate logistic normal distribution for the membership vectors, which allows easy introduction of auxiliary information leveraging a latent multivariate logistic regression. A Bayesian approach to inference, relying on Pólya gamma data augmentation, facilitates efficient posterior computation via Markov Chain Monte Carlo. We apply this methodology to a spatially explicit study of malaria risk over time on the Brazilian Amazon frontier.

19.
Bioinformatics ; 38(16): 4011-4018, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35762974

RESUMEN

MOTIVATION: It has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity between a pair of brain regions. There is an emerging statistical literature describing methods for the analysis of such multi-network data in which nodes are common across networks but the edges vary. However, there has been essentially no consideration of the important problem of outlier detection. In particular, for certain subjects, the neuroimaging data are so poor quality that the network cannot be reliably reconstructed. For such subjects, the resulting adjacency matrix may be mostly zero or exhibit a bizarre pattern not consistent with a functioning brain. These outlying networks may serve as influential points, contaminating subsequent statistical analyses. We propose a simple Outlier DetectIon for Networks (ODIN) method relying on an influence measure under a hierarchical generalized linear model for the adjacency matrices. An efficient computational algorithm is described, and ODIN is illustrated through simulations and an application to data from the UK Biobank. RESULTS: ODIN was successful in identifying moderate to extreme outliers. Removing such outliers can significantly change inferences in downstream applications. AVAILABILITY AND IMPLEMENTATION: ODIN has been implemented in both Python and R and these implementations along with other code are publicly available at github.com/pritamdey/ODIN-python and github.com/pritamdey/ODIN-r, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Neuroimagen , Humanos , Encéfalo/diagnóstico por imagen , Programas Informáticos
20.
Artículo en Inglés | MEDLINE | ID: mdl-35162394

RESUMEN

Humans are exposed to a diverse mixture of chemical and non-chemical exposures across their lifetimes. Well-designed epidemiology studies as well as sophisticated exposure science and related technologies enable the investigation of the health impacts of mixtures. While existing statistical methods can address the most basic questions related to the association between environmental mixtures and health endpoints, there were gaps in our ability to learn from mixtures data in several common epidemiologic scenarios, including high correlation among health and exposure measures in space and/or time, the presence of missing observations, the violation of important modeling assumptions, and the presence of computational challenges incurred by current implementations. To address these and other challenges, NIEHS initiated the Powering Research through Innovative methods for Mixtures in Epidemiology (PRIME) program, to support work on the development and expansion of statistical methods for mixtures. Six independent projects supported by PRIME have been highly productive but their methods have not yet been described collectively in a way that would inform application. We review 37 new methods from PRIME projects and summarize the work across previously published research questions, to inform methods selection and increase awareness of these new methods. We highlight important statistical advancements considering data science strategies, exposure-response estimation, timing of exposures, epidemiological methods, the incorporation of toxicity/chemical information, spatiotemporal data, risk assessment, and model performance, efficiency, and interpretation. Importantly, we link to software to encourage application and testing on other datasets. This review can enable more informed analyses of environmental mixtures. We stress training for early career scientists as well as innovation in statistical methodology as an ongoing need. Ultimately, we direct efforts to the common goal of reducing harmful exposures to improve public health.


Asunto(s)
National Institute of Environmental Health Sciences (U.S.) , Proyectos de Investigación , Exposición a Riesgos Ambientales/análisis , Métodos Epidemiológicos , Estudios Epidemiológicos , Humanos , Medición de Riesgo , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...