Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 61
Filter
1.
Biometrics ; 80(2)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38567733

ABSTRACT

Brain-effective connectivity analysis quantifies directed influence of one neural element or region over another, and it is of great scientific interest to understand how effective connectivity pattern is affected by variations of subject conditions. Vector autoregression (VAR) is a useful tool for this type of problems. However, there is a paucity of solutions when there is measurement error, when there are multiple subjects, and when the focus is the inference of the transition matrix. In this article, we study the problem of transition matrix inference under the high-dimensional VAR model with measurement error and multiple subjects. We propose a simultaneous testing procedure, with three key components: a modified expectation-maximization (EM) algorithm, a test statistic based on the tensor regression of a bias-corrected estimator of the lagged auto-covariance given the covariates, and a properly thresholded simultaneous test. We establish the uniform consistency for the estimators of our modified EM, and show that the subsequent test achieves both a consistent false discovery control, and its power approaches one asymptotically. We demonstrate the efficacy of our method through both simulations and a brain connectivity study of task-evoked functional magnetic resonance imaging.


Subject(s)
Brain , Magnetic Resonance Imaging , Humans , Time Factors , Magnetic Resonance Imaging/methods , Brain/diagnostic imaging , Brain/physiology
2.
bioRxiv ; 2024 Jan 26.
Article in English | MEDLINE | ID: mdl-38328176

ABSTRACT

Computational cognitive modeling is an important tool for understanding the processes supporting human and animal decision-making. Choice data in decision-making tasks are inherently noisy, and separating noise from signal can improve the quality of computational modeling. Common approaches to model decision noise often assume constant levels of noise or exploration throughout learning (e.g., the ϵ-softmax policy). However, this assumption is not guaranteed to hold - for example, a subject might disengage and lapse into an inattentive phase for a series of trials in the middle of otherwise low-noise performance. Here, we introduce a new, computationally inexpensive method to dynamically infer the levels of noise in choice behavior, under a model assumption that agents can transition between two discrete latent states (e.g., fully engaged and random). Using simulations, we show that modeling noise levels dynamically instead of statically can substantially improve model fit and parameter estimation, especially in the presence of long periods of noisy behavior, such as prolonged attentional lapses. We further demonstrate the empirical benefits of dynamic noise estimation at the individual and group levels by validating it on four published datasets featuring diverse populations, tasks, and models. Based on the theoretical and empirical evaluation of the method reported in the current work, we expect that dynamic noise estimation will improve modeling in many decision-making paradigms over the static noise estimation method currently used in the modeling literature, while keeping additional model complexity and assumptions minimal.

3.
J Am Stat Assoc ; 118(543): 2158-2170, 2023.
Article in English | MEDLINE | ID: mdl-38143786

ABSTRACT

Thanks to its fine balance between model flexibility and interpretability, the nonparametric additive model has been widely used, and variable selection for this type of model has been frequently studied. However, none of the existing solutions can control the false discovery rate (FDR) unless the sample size tends to infinity. The knockoff framework is a recent proposal that can address this issue, but few knockoff solutions are directly applicable to nonparametric models. In this article, we propose a novel kernel knockoffs selection procedure for the nonparametric additive model. We integrate three key components: the knockoffs, the subsampling for stability, and the random feature mapping for nonparametric function approximation. We show that the proposed method is guaranteed to control the FDR for any sample size, and achieves a power that approaches one as the sample size tends to infinity. We demonstrate the efficacy of our method through intensive simulations and comparisons with the alternative solutions. our proposal thus makes useful contributions to the methodology of nonparametric variable selection, FDR-based inference, as well as knockoffs.

4.
J Am Stat Assoc ; 118(543): 1984-1996, 2023.
Article in English | MEDLINE | ID: mdl-38099062

ABSTRACT

Multimodal data are now prevailing in scientific research. One of the central questions in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the literature. In this article, we propose a novel generalized liquid association analysis method, which offers a new and unique angle to this important class of problems of studying three-way associations. We extend the notion of liquid association of Li (2002) from the univariate setting to the sparse, multivariate, and high-dimensional setting. We establish a population dimension reduction model, transform the problem to sparse Tucker decomposition of a three-way tensor, and develop a higher-order orthogonal iteration algorithm for parameter estimation. We derive the non-asymptotic error bound and asymptotic consistency of the proposed estimator, while allowing the variable dimensions to be larger than and diverge with the sample size. We demonstrate the efficacy of the method through both simulations and a multimodal neuroimaging application for Alzheimer's disease research.

5.
J R Stat Soc Series B Stat Methodol ; 85(4): 1204-1222, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37780936

ABSTRACT

The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilise and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimise the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.

6.
J Am Stat Assoc ; 118(543): 1796-1810, 2023.
Article in English | MEDLINE | ID: mdl-37771509

ABSTRACT

Multimodal imaging has transformed neuroscience research. While it presents unprecedented opportunities, it also imposes serious challenges. Particularly, it is difficult to combine the merits of the interpretability attributed to a simple association model with the flexibility achieved by a highly adaptive nonlinear model. In this article, we propose an orthogonalized kernel debiased machine learning approach, which is built upon the Neyman orthogonality and a form of decomposition orthogonality, for multimodal data analysis. We target the setting that naturally arises in almost all multimodal studies, where there is a primary modality of interest, plus additional auxiliary modalities. We establish the root-N-consistency and asymptotic normality of the estimated primary parameter, the semi-parametric estimation efficiency, and the asymptotic validity of the confidence band of the predicted primary modality effect. Our proposal enjoys, to a good extent, both model interpretability and model flexibility. It is also considerably different from the existing statistical methods for multimodal data integration, as well as the orthogonality-based methods for high-dimensional inferences. We demonstrate the efficacy of our method through both simulations and an application to a multimodal neuroimaging study of Alzheimer's disease.

7.
J Am Stat Assoc ; 118(542): 830-845, 2023.
Article in English | MEDLINE | ID: mdl-37519438

ABSTRACT

Point process modeling is gaining increasing attention, as point process type data are emerging in a large variety of scientific applications. In this article, motivated by a neuronal spike trains study, we propose a novel point process regression model, where both the response and the predictor can be a high-dimensional point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We organize the corresponding transferring coefficients in the form of a three-way tensor, then impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. These structures help reduce the dimensionality, integrate information across different individual processes, and facilitate the interpretation. We develop a highly scalable optimization algorithm for parameter estimation. We derive the large sample error bound for the recovered coefficient tensor, and establish the subgroup identification consistency, while allowing the dimension of the multivariate point process to diverge. We demonstrate the efficacy of our method through both simulations and a cross-area neuronal spike trains analysis in a sensory cortex study.

8.
J Am Stat Assoc ; 118(541): 424-439, 2023.
Article in English | MEDLINE | ID: mdl-37333062

ABSTRACT

In modern data science, dynamic tensor data prevail in numerous applications. An important task is to characterize the relationship between dynamic tensor datasets and external covariates. However, the tensor data are often only partially observed, rendering many existing methods inapplicable. In this article, we develop a regression model with a partially observed dynamic tensor as the response and external covariates as the predictor. We introduce the low-rankness, sparsity, and fusion structures on the regression coefficient tensor, and consider a loss function projected over the observed entries. We develop an efficient nonconvex alternating updating algorithm, and derive the finite-sample error bound of the actual estimator from each step of our optimization algorithm. Unobserved entries in the tensor response have imposed serious challenges. As a result, our proposal differs considerably in terms of estimation algorithm, regularity conditions, as well as theoretical properties, compared to the existing tensor completion or tensor response regression solutions. We illustrate the efficacy of our proposed method using simulations and two real applications, including a neuroimaging dementia study and a digital advertising study.

9.
J Am Stat Assoc ; 118(541): 257-271, 2023.
Article in English | MEDLINE | ID: mdl-37193511

ABSTRACT

Graphical modeling of multivariate functional data is becoming increasingly important in a wide variety of applications. The changes of graph structure can often be attributed to external variables, such as the diagnosis status or time, the latter of which gives rise to the problem of dynamic graphical modeling. Most existing methods focus on estimating the graph by aggregating samples, but largely ignore the subject-level heterogeneity due to the external variables. In this article, we introduce a conditional graphical model for multivariate random functions, where we treat the external variables as conditioning set, and allow the graph structure to vary with the external variables. Our method is built on two new linear operators, the conditional precision operator and the conditional partial correlation operator, which extend the precision matrix and the partial correlation matrix to both the conditional and functional settings. We show that their nonzero elements can be used to characterize the conditional graphs, and develop the corresponding estimators. We establish the uniform convergence of the proposed estimators and the consistency of the estimated graph, while allowing the graph size to grow with the sample size, and accommodating both completely and partially observed data. We demonstrate the efficacy of the method through both simulations and a study of brain functional connectivity network.

10.
J Comput Graph Stat ; 32(1): 252-262, 2023.
Article in English | MEDLINE | ID: mdl-36970553

ABSTRACT

Multiple-subject network data are fast emerging in recent years, where a separate connectivity matrix is measured over a common set of nodes for each individual subject, along with subject covariates information. In this article, we propose a new generalized matrix response regression model, where the observed network is treated as a matrix-valued response and the subject covariates as predictors. The new model characterizes the population-level connectivity pattern through a low-rank intercept matrix, and the effect of subject covariates through a sparse slope tensor. We develop an efficient alternating gradient descent algorithm for parameter estimation, and establish the non-asymptotic error bound for the actual estimator from the algorithm, which quantifies the interplay between the computational and statistical errors. We further show the strong consistency for graph community recovery, as well as the edge selection consistency. We demonstrate the efficacy of our method through simulations and two brain connectivity studies.

11.
J R Stat Soc Series B Stat Methodol ; 85(5): 1589-1614, 2023 Nov.
Article in English | MEDLINE | ID: mdl-38584801

ABSTRACT

Delineating associations between images and covariates is a central aim of imaging studies. To tackle this problem, we propose a novel non-parametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Our method incorporates spatial smoothness, handles subject heterogeneity, and provides straightforward interpretations. It is also highly flexible and accurate, making it ideal for capturing complex association patterns. We establish estimation and selection consistency and derive asymptotic error bounds. We demonstrate the method's advantages through intensive simulations and analyses of two functional magnetic resonance imaging data sets.

12.
Microb Biotechnol ; 15(11): 2758-2772, 2022 11.
Article in English | MEDLINE | ID: mdl-36070350

ABSTRACT

L-5-Methyltetrahydrofolate (L-5-MTHF) is the only biologically active form of folate in the human body. Production of L-5-MTHF by using microbes is an emerging consideration for green synthesis. However, microbes naturally produce only a small amount of L-5-MTHF. Here, Escherichia coli BL21(DE3) was engineered to increase the production of L-5-MTHF by overexpressing the intrinsic genes of dihydrofolate reductase and methylenetetrahydrofolate (methylene-THF) reductase, introducing the genes encoding formate-THF ligase, formyl-THF cyclohydrolase and methylene-THF dehydrogenase from the one-carbon metabolic pathway of Methylobacterium extorquens or Clostridium autoethanogenum and disrupting the gene of methionine synthase involved in the consumption and synthesis inhibition of the target product. Thus, upon its native pathway, an additional pathway for L-5-MTHF synthesis was developed in E. coli, which was further analysed and confirmed by qRT-PCR, enzyme assays and metabolite determination. After optimizing the conditions of induction time, temperature, cell density and concentration of IPTG and supplementing exogenous substances (folic acid, sodium formate and glucose) to the culture, the highest yield of 527.84 µg g-1 of dry cell weight for L-5-MTHF was obtained, which was about 11.8 folds of that of the original strain. This study paves the way for further metabolic engineering to improve the biosynthesis of L-5-MTHF in E. coli.


Subject(s)
Escherichia coli Infections , Escherichia coli , Humans , Escherichia coli/genetics , Escherichia coli/metabolism , Tetrahydrofolates/metabolism , Tetrahydrofolates/pharmacology , Folic Acid/metabolism , Folic Acid/pharmacology
13.
Stat Med ; 41(25): 5113-5133, 2022 11 10.
Article in English | MEDLINE | ID: mdl-35983945

ABSTRACT

In this article, we tackle the estimation and inference problem of analyzing distributed streaming data that is collected continuously over multiple data sites. We propose an online two-way approach via linear mixed-effects models. We explicitly model the site-specific effects as random-effect terms, and tackle both between-site heterogeneity and within-site correlation. We develop an online updating procedure that does not need to re-access the previous data and can efficiently update the parameter estimate, when either new data sites, or new streams of sample observations of the existing data sites, become available. We derive the non-asymptotic error bound for our proposed online estimator, and show that it is asymptotically equivalent to the offline counterpart based on all the raw data. We compare with some key alternative solutions both analytically and numerically, and demonstrate the advantages of our proposal. We further illustrate our method with two data applications.


Subject(s)
Research Design , Humans , Computer Simulation , Linear Models
14.
Microbiol Spectr ; 10(4): e0043622, 2022 08 31.
Article in English | MEDLINE | ID: mdl-35762779

ABSTRACT

Thermotoga maritima is an anaerobic hyperthermophilic bacterium that efficiently produces H2 by fermenting carbohydrates. High concentration of H2 inhibits the growth of T. maritima, and S0 could eliminate the inhibition and stimulate the growth through its reduction. The mechanism of T. maritima sulfur reduction, however, has not been fully understood. Herein, based on its similarity with archaeal NAD(P)H-dependent sulfur reductases (NSR), the ORF THEMA_RS02810 was identified and expressed in Escherichia coli, and the recombinant protein was characterized. The purified flavoprotein possessed NAD(P)H-dependent S0 reductase activity (1.3 U/mg for NADH and 0.8 U/mg for NADPH), polysulfide reductase activity (0.32 U/mg for NADH and 0.35 U/mg for NADPH), and thiosulfate reductase activity (2.3 U/mg for NADH and 2.5 U/mg for NADPH), which increased 3~4-folds by coenzyme A stimulation. Quantitative RT-PCR analysis showed that nsr was upregulated together with the mbx, yeeE, and rnf genes when the strain grew in S0- or thiosulfate-containing medium. The mechanism for sulfur reduction in T. maritima was discussed, which may affect the redox balance and energy metabolism of T. maritima. Genome search revealed that NSR homolog is widely distributed in thermophilic bacteria and archaea, implying its important role in the sulfur cycle of geothermal environments. IMPORTANCE The reduction of S0 and thiosulfate is essential in the sulfur cycle of geothermal environments, in which thermophiles play an important role. Despite previous research on some sulfur reductases of thermophilic archaea, the mechanism of sulfur reduction in thermophilic bacteria is still not clearly understood. Herein, we confirmed the presence of a cytoplasmic NAD(P)H-dependent polysulfide reductase (NSR) from the hyperthermophile T. maritima, with S0, polysulfide, and thiosulfate reduction activities, in contrast to other sulfur reductases. When grown in S0- or thiosulfate-containing medium, its expression was upregulated. And the putative membrane-bound MBX and Rnf may also play a role in the metabolism, which might influence the redox balance and energy metabolism of T. maritima. This is distinct from the mechanism of sulfur reduction in mesophiles such as Wolinella succinogenes. NSR homologs are widely distributed among heterotrophic thermophiles, suggesting that they may be vital in the sulfur cycle in geothermal environments.


Subject(s)
NAD , Thermotoga maritima , Archaea/metabolism , Bacteria/metabolism , NAD/metabolism , NADP/metabolism , Oxidation-Reduction , Oxidoreductases/genetics , Oxidoreductases/metabolism , Sulfur/metabolism , Sulfurtransferases , Thermotoga maritima/genetics , Thermotoga maritima/metabolism , Thiosulfates/metabolism
15.
Can J Stat ; 50(1): 59-85, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35530428

ABSTRACT

In this article, we propose a new sparse neural ordinary differential equation (ODE) model to characterize flexible relations among multiple functional processes. We characterize the latent states of the functions via a set of ordinary differential equations. We then model the dynamic changes of the latent states using a deep neural network (DNN) with a specially designed architecture and a sparsity-inducing regularization. The new model is able to capture both nonlinear and sparse dependent relations among multivariate functions. We develop an efficient optimization algorithm to estimate the unknown weights for the DNN under the sparsity constraint. We establish both the algorithmic convergence and selection consistency, which constitute the theoretical guarantees of the proposed method. We illustrate the efficacy of the method through simulations and a gene regulatory network example.

16.
J R Stat Soc Series B Stat Methodol ; 84(2): 600-629, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35450387

ABSTRACT

In this article, we introduce a functional structural equation model for estimating directional relations from multivariate functional data. We decouple the estimation into two major steps: directional order determination and selection through sparse functional regression. We first propose a score function at the linear operator level, and show that its minimization can recover the true directional order when the relation between each function and its parental functions is nonlinear. We then develop a sparse functional additive regression, where both the response and the multivariate predictors are functions and the regression relation is additive and nonlinear. We also propose strategies to speed up the computation and scale up our method. In theory, we establish the consistencies of order determination, sparse functional additive regression, and directed acyclic graph estimation, while allowing both the dimension of the Karhunen-Loéve expansion coefficients and the number of random functions to diverge with the sample size. We illustrate the efficacy of our method through simulations, and an application to brain effective connectivity analysis.

17.
Stat ; 11(1)2022 Dec.
Article in English | MEDLINE | ID: mdl-35450402

ABSTRACT

Motivated by a multimodal neuroimaging study for Alzheimer's disease, in this article, we study the inference problem, i.e., hypothesis testing, of sequential mediation analysis. The existing sequential mediation solutions mostly focus on sparse estimation, while hypothesis testing is an utterly different and more challenging problem. Meanwhile, the few mediation testing solutions often ignore the potential dependency among the mediators, or cannot be applied to the sequential problem directly. We propose a statistical inference procedure to test mediation pathways when there are sequentially ordered multiple data modalities and each modality involves multiple mediators. We allow the mediators to be conditionally dependent, and the number of mediators within each modality to diverge with the sample size. We produce the explicit significance quantification and establish the theoretical guarantees in terms of asymptotic size, power, and false discovery control. We demonstrate the efficacy of the method through both simulations and an application to a multimodal neuroimaging pathway analysis of Alzheimer's disease.

18.
Hum Brain Mapp ; 43(8): 2519-2533, 2022 06 01.
Article in English | MEDLINE | ID: mdl-35129252

ABSTRACT

Motivated by an imaging proteomics study for Alzheimer's disease (AD), in this article, we propose a mediation analysis approach with high-dimensional exposures and high-dimensional mediators to integrate data collected from multiple platforms. The proposed method combines principal component analysis with penalized least squares estimation for a set of linear structural equation models. The former reduces the dimensionality and produces uncorrelated linear combinations of the exposure variables, whereas the latter achieves simultaneous path selection and effect estimation while allowing the mediators to be correlated. Applying the method to the AD data identifies numerous interesting protein peptides, brain regions, and protein-structure-memory paths, which are in accordance with and also supplement existing findings of AD research. Additional simulations further demonstrate the effective empirical performance of the method.


Subject(s)
Alzheimer Disease , Mediation Analysis , Alzheimer Disease/diagnostic imaging , Brain/diagnostic imaging , Humans , Least-Squares Analysis , Principal Component Analysis
19.
Stat Sin ; 32: 293-321, 2022.
Article in English | MEDLINE | ID: mdl-35002179

ABSTRACT

Comparing two population means of network data is of paramount importance in a wide range of scientific applications. Numerous existing network inference solutions focus on global testing of entire networks, without comparing individual network links. The observed data often take the form of vectors or matrices, and the problem is formulated as comparing two covariance or precision matrices under a normal or matrix normal distribution. Moreover, many tests suffer from a limited power under a small sample size. In this article, we tackle the problem of network comparison, both global and simultaneous inferences, when the data come in a different format, i.e., in the form of a collection of symmetric matrices, each of which encodes the network structure of an individual subject. Such data format commonly arises in applications such as brain connectivity analysis and clinical genomics. We no longer require the underlying data to follow a normal distribution, but instead impose some moment conditions that are easily satisfied for numerous types of network data. Furthermore, we propose a power enhancement procedure, and show that it can control the false discovery, while it has the potential to substantially enhance the power of the test. We investigate the efficacy of our testing procedure through both an asymptotic analysis and a simulation study under a finite sample size. We further illustrate our method with examples of brain connectivity analysis.

20.
J Am Stat Assoc ; 117(540): 2014-2027, 2022.
Article in English | MEDLINE | ID: mdl-36945327

ABSTRACT

A central question in high-dimensional mediation analysis is to infer the significance of individual mediators. The main challenge is that the total number of potential paths that go through any mediator is super-exponential in the number of mediators. Most existing mediation inference solutions either explicitly impose that the mediators are conditionally independent given the exposure, or ignore any potential directed paths among the mediators. In this article, we propose a novel hypothesis testing procedure to evaluate individual mediation effects, while taking into account potential interactions among the mediators. Our proposal thus fills a crucial gap, and greatly extends the scope of existing mediation tests. Our key idea is to construct the test statistic using the logic of Boolean matrices, which enables us to establish the proper limiting distribution under the null hypothesis. We further employ screening, data splitting, and decorrelated estimation to reduce the bias and increase the power of the test. We show that our test can control both the size and false discovery rate asymptotically, and the power of the test approaches one, while allowing the number of mediators to diverge to infinity with the sample size. We demonstrate the efficacy of the method through simulations and a neuroimaging study of Alzheimer's disease. A Python implementation of the proposed procedure is available at https://github.com/callmespring/LOGAN.

SELECTION OF CITATIONS
SEARCH DETAIL
...