Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
Biostatistics ; 24(2): 481-501, 2023 04 14.
Article in English | MEDLINE | ID: mdl-34654923

ABSTRACT

In recent years, a number of methods have been proposed to estimate the times at which a neuron spikes on the basis of calcium imaging data. However, quantifying the uncertainty associated with these estimated spikes remains an open problem. We consider a simple and well-studied model for calcium imaging data, which states that calcium decays exponentially in the absence of a spike, and instantaneously increases when a spike occurs. We wish to test the null hypothesis that the neuron did not spike-i.e., that there was no increase in calcium-at a particular timepoint at which a spike was estimated. In this setting, classical hypothesis tests lead to inflated Type I error, because the spike was estimated on the same data used for testing. To overcome this problem, we propose a selective inference approach. We describe an efficient algorithm to compute finite-sample $p$-values that control selective Type I error, and confidence intervals with correct selective coverage, for spikes estimated using a recent proposal from the literature. We apply our proposal in simulation and on calcium imaging data from the $\texttt{spikefinder}$ challenge.


Subject(s)
Calcium , Diagnostic Imaging , Humans , Uncertainty , Action Potentials/physiology , Computer Simulation , Algorithms
2.
PLoS Comput Biol ; 19(10): e1011509, 2023 10.
Article in English | MEDLINE | ID: mdl-37824442

ABSTRACT

A major goal of computational neuroscience is to build accurate models of the activity of neurons that can be used to interpret their function in circuits. Here, we explore using functional cell types to refine single-cell models by grouping them into functionally relevant classes. Formally, we define a hierarchical generative model for cell types, single-cell parameters, and neural responses, and then derive an expectation-maximization algorithm with variational inference that maximizes the likelihood of the neural recordings. We apply this "simultaneous" method to estimate cell types and fit single-cell models from simulated data, and find that it accurately recovers the ground truth parameters. We then apply our approach to in vitro neural recordings from neurons in mouse primary visual cortex, and find that it yields improved prediction of single-cell activity. We demonstrate that the discovered cell-type clusters are well separated and generalizable, and thus amenable to interpretation. We then compare discovered cluster memberships with locational, morphological, and transcriptomic data. Our findings reveal the potential to improve models of neural responses by explicitly allowing for shared functional properties across neurons.


Subject(s)
Algorithms , Neurons , Mice , Animals , Computer Simulation , Neurons/physiology , Probability , Models, Neurological , Action Potentials/physiology
3.
Biostatistics ; 21(4): 709-726, 2020 10 01.
Article in English | MEDLINE | ID: mdl-30753436

ABSTRACT

Calcium imaging data promises to transform the field of neuroscience by making it possible to record from large populations of neurons simultaneously. However, determining the exact moment in time at which a neuron spikes, from a calcium imaging data set, amounts to a non-trivial deconvolution problem which is of critical importance for downstream analyses. While a number of formulations have been proposed for this task in the recent literature, in this article, we focus on a formulation recently proposed in Jewell and Witten (2018. Exact spike train inference via $\ell_{0} $ optimization. The Annals of Applied Statistics12(4), 2457-2482) that can accurately estimate not just the spike rate, but also the specific times at which the neuron spikes. We develop a much faster algorithm that can be used to deconvolve a fluorescence trace of 100 000 timesteps in less than a second. Furthermore, we present a modification to this algorithm that precludes the possibility of a "negative spike". We demonstrate the performance of this algorithm for spike deconvolution on calcium imaging datasets that were recently released as part of the $\texttt{spikefinder}$ challenge (http://spikefinder.codeneuro.org/). The algorithm presented in this article was used in the Allen Institute for Brain Science's "platform paper" to decode neural activity from the Allen Brain Observatory; this is the main scientific paper in which their data resource is presented. Our $\texttt{C++}$ implementation, along with $\texttt{R}$ and $\texttt{python}$ wrappers, is publicly available. $\texttt{R}$ code is available on $\texttt{CRAN}$ and $\texttt{Github}$, and $\texttt{python}$ wrappers are available on $\texttt{Github}$; see https://github.com/jewellsean/FastLZeroSpikeInference.


Subject(s)
Calcium , Neurons , Algorithms , Brain/diagnostic imaging , Diagnostic Imaging , Humans
4.
Hum Brain Mapp ; 41(10): 2553-2566, 2020 07.
Article in English | MEDLINE | ID: mdl-32216125

ABSTRACT

Brain networks are increasingly characterized at different scales, including summary statistics, community connectivity, and individual edges. While research relating brain networks to behavioral measurements has yielded many insights into brain-phenotype relationships, common analytical approaches only consider network information at a single scale. Here, we designed, implemented, and deployed Multi-Scale Network Regression (MSNR), a penalized multivariate approach for modeling brain networks that explicitly respects both edge- and community-level information by assuming a low rank and sparse structure, both encouraging less complex and more interpretable modeling. Capitalizing on a large neuroimaging cohort (n = 1, 051), we demonstrate that MSNR recapitulates interpretable and statistically significant connectivity patterns associated with brain development, sex differences, and motion-related artifacts. Compared to single-scale methods, MSNR achieves a balance between prediction performance and model complexity, with improved interpretability. Together, by jointly exploiting both edge- and community-level information, MSNR has the potential to yield novel insights into brain-behavior relationships.


Subject(s)
Brain/physiology , Connectome/methods , Magnetic Resonance Imaging/methods , Models, Statistical , Nerve Net/physiology , Adolescent , Brain/diagnostic imaging , Cross-Sectional Studies , Female , Humans , Individuality , Male , Nerve Net/diagnostic imaging , Phenotype , Regression Analysis , Sex Characteristics
5.
Genome Res ; 27(1): 38-52, 2017 01.
Article in English | MEDLINE | ID: mdl-27831498

ABSTRACT

Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R2 of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.


Subject(s)
Chromatin/genetics , Enhancer Elements, Genetic/genetics , Gene Expression Regulation/genetics , Plasmids/genetics , Chromatin Assembly and Disassembly/genetics , Chromosomes/genetics , Genes, Reporter , High-Throughput Nucleotide Sequencing , Humans , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors
7.
Nucleic Acids Res ; 40(9): 3849-55, 2012 May.
Article in English | MEDLINE | ID: mdl-22266657

ABSTRACT

A growing body of experimental evidence supports the hypothesis that the 3D structure of chromatin in the nucleus is closely linked to important functional processes, including DNA replication and gene regulation. In support of this hypothesis, several research groups have examined sets of functionally associated genomic loci, with the aim of determining whether those loci are statistically significantly colocalized. This work presents a critical assessment of two previously reported analyses, both of which used genome-wide DNA-DNA interaction data from the yeast Saccharomyces cerevisiae, and both of which rely upon a simple notion of the statistical significance of colocalization. We show that these previous analyses rely upon a faulty assumption, and we propose a correct non-parametric resampling approach to the same problem. Applying this approach to the same data set does not support the hypothesis that transcriptionally coregulated genes tend to colocalize, but strongly supports the colocalization of centromeres, and provides some evidence of colocalization of origins of early DNA replication, chromosomal breakpoints and transfer RNAs.


Subject(s)
Genome Components , Genomics/methods , Saccharomyces cerevisiae/genetics , Data Interpretation, Statistical , Gene Expression Regulation, Fungal , Genes, Fungal , Genome, Fungal , Statistics, Nonparametric , Transcription, Genetic
8.
bioRxiv ; 2024 Jul 07.
Article in English | MEDLINE | ID: mdl-39005417

ABSTRACT

The central amygdala (CeA) has emerged as an important brain region for regulating both negative (fear and anxiety) and positive (reward) affective behaviors. The CeA has been proposed to encode affective information in the form of valence (whether the stimulus is good or bad) or salience (how significant is the stimulus), but the extent to which these two types of stimulus representation occur in the CeA is not known. Here, we used single cell calcium imaging in mice during appetitive and aversive conditioning and found that majority of CeA neurons (∼65%) encode the valence of the unconditioned stimulus (US) with a smaller subset of cells (∼15%) encoding the salience of the US. Valence and salience encoding of the conditioned stimulus (CS) was also observed, albeit to a lesser extent. These findings show that the CeA is a site of convergence for encoding oppositely valenced US information.

9.
Biostatistics ; 13(3): 523-38, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22003245

ABSTRACT

We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.


Subject(s)
Data Interpretation, Statistical , Models, Statistical , RNA, Messenger/genetics , Sequence Analysis, DNA/methods , Humans , RNA, Messenger/chemistry , Reverse Transcriptase Polymerase Chain Reaction
10.
J Mach Learn Res ; 242023 May.
Article in English | MEDLINE | ID: mdl-38264325

ABSTRACT

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of k-means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the k-means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.

11.
bioRxiv ; 2023 Mar 01.
Article in English | MEDLINE | ID: mdl-36909648

ABSTRACT

A major goal of computational neuroscience is to build accurate models of the activity of neurons that can be used to interpret their function in circuits. Here, we explore using functional cell types to refine single-cell models by grouping them into functionally relevant classes. Formally, we define a hierarchical generative model for cell types, single-cell parameters, and neural responses, and then derive an expectation-maximization algorithm with variational inference that maximizes the likelihood of the neural recordings. We apply this "simultaneous" method to estimate cell types and fit single-cell models from simulated data, and find that it accurately recovers the ground truth parameters. We then apply our approach to in vitro neural recordings from neurons in mouse primary visual cortex, and find that it yields improved prediction of single-cell activity. We demonstrate that the discovered cell-type clusters are well separated and generalizable, and thus amenable to interpretation. We then compare discovered cluster memberships with locational, morphological, and transcriptomic data. Our findings reveal the potential to improve models of neural responses by explicitly allowing for shared functional properties across neurons.

12.
bioRxiv ; 2023 Mar 20.
Article in English | MEDLINE | ID: mdl-36993278

ABSTRACT

Material- and cell-based technologies such as engineered tissues hold great promise as human therapies. Yet, the development of many of these technologies becomes stalled at the stage of pre-clinical animal studies due to the tedious and low-throughput nature of in vivo implantation experiments. We introduce a 'plug and play' in vivo screening array platform called Highly Parallel Tissue Grafting (HPTG). HPTG enables parallelized in vivo screening of 43 three-dimensional microtissues within a single 3D printed device. Using HPTG, we screen microtissue formations with varying cellular and material components and identify formulations that support vascular self-assembly, integration and tissue function. Our studies highlight the importance of combinatorial studies that vary cellular and material formulation variables concomitantly, by revealing that inclusion of stromal cells can "rescue" vascular self-assembly in manner that is material-dependent. HPTG provides a route for accelerating pre-clinical progress for diverse medical applications including tissue therapy, cancer biomedicine, and regenerative medicine.

13.
Article in English | MEDLINE | ID: mdl-38481523

ABSTRACT

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

14.
PLoS One ; 16(6): e0252345, 2021.
Article in English | MEDLINE | ID: mdl-34086726

ABSTRACT

Calcium imaging has led to discoveries about neural correlates of behavior in subcortical neurons, including dopamine (DA) neurons. However, spike inference methods have not been tested in most populations of subcortical neurons. To address this gap, we simultaneously performed calcium imaging and electrophysiology in DA neurons in brain slices and applied a recently developed spike inference algorithm to the GCaMP fluorescence. This revealed that individual spikes can be inferred accurately in this population. Next, we inferred spikes in vivo from calcium imaging from these neurons during Pavlovian conditioning, as well as during navigation in virtual reality. In both cases, we quantitatively recapitulated previous in vivo electrophysiological observations. Our work provides a validated approach to infer spikes from calcium imaging in DA neurons and implies that aspects of both tonic and phasic spike patterns can be recovered.


Subject(s)
Calcium/metabolism , Dopamine/metabolism , Dopaminergic Neurons/metabolism , Action Potentials/physiology , Algorithms , Animals , Brain/metabolism , Calcium Signaling/physiology , Conditioning, Classical/physiology , Electrophysiological Phenomena/physiology , Mice
15.
Biostatistics ; 10(3): 515-34, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19377034

ABSTRACT

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.


Subject(s)
Biometry/methods , Principal Component Analysis/methods , Algorithms , Breast Neoplasms/genetics , Chromosomes, Human, Pair 1/genetics , DNA, Neoplasm/genetics , Data Interpretation, Statistical , Female , Gene Dosage , Genomics/statistics & numerical data , Humans , Models, Statistical
16.
Stat Appl Genet Mol Biol ; 8: Article28, 2009.
Article in English | MEDLINE | ID: mdl-19572827

ABSTRACT

In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are highly correlated with each other. It has been shown to be useful in the analysis of high-dimensional genomic data, when two sets of assays are available on the same set of samples. In this paper, we propose two extensions to the sparse CCA methodology. (1) Sparse CCA is an unsupervised method; that is, it does not make use of outcome measurements that may be available for each observation (e.g., survival time or cancer subtype). We propose an extension to sparse CCA, which we call sparse supervised CCA, which results in the identification of linear combinations of the two sets of variables that are correlated with each other and associated with the outcome. (2) It is becoming increasingly common for researchers to collect data on more than two assays on the same set of samples; for instance, SNP, gene expression, and DNA copy number measurements may all be available. We develop sparse multiple CCA in order to extend the sparse CCA methodology to the case of more than two data sets. We demonstrate these new methods on simulated data and on a recently published and publicly available diffuse large B-cell lymphoma data set.


Subject(s)
Algorithms , Genomics/statistics & numerical data , Models, Statistical , Humans
17.
Biometrika ; 107(2): 293-310, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32454528

ABSTRACT

The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the [Formula: see text]-nearest-neighbours fused lasso, involves computing the [Formula: see text]-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the [Formula: see text]-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an [Formula: see text]-graph rather than a [Formula: see text]-nearest-neighbours graph and contrast it with the [Formula: see text]-nearest-neighbours fused lasso.

18.
Mol Biol Evol ; 25(6): 1025-42, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18199829

ABSTRACT

A beneficial mutation that has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether nonadaptive processes might generate similar haplotype configurations has not been extensively explored. Here, we consider 5 population genetic data sets taken from regions flanking high-frequency transposable elements in North American strains of Drosophila melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Whereas most of the data sets would be rejected as nonneutral under the standard neutral null model, only the data set for which there is strong external evidence in support of an adaptive transposition appears to be nonneutral under the more complex null model and in particular when demography is taken into account. High-frequency, derived mutations from a recently bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.


Subject(s)
Adaptation, Biological/genetics , Drosophila melanogaster/genetics , Drosophila melanogaster/physiology , Models, Genetic , Mutation , Animals , Base Sequence , Computer Simulation , DNA Transposable Elements , Genomics , Molecular Sequence Data , Recombination, Genetic
19.
Foot Ankle Int ; 29(11): 1063-8, 2008 Nov.
Article in English | MEDLINE | ID: mdl-19026197

ABSTRACT

BACKGROUND: Orthopaedic procedures have been reported to have the highest incidence of pain compared to other types of operations. There are limited studies in the literature that investigate postoperative pain. MATERIALS AND METHODS: A prospective study of 98 patients undergoing orthopedic foot and ankle operations was undertaken to evaluate their pain experience. A Short-Form McGill Pain Questionnaire (SF-MPQ) was administered preoperatively and postoperatively. RESULTS: The results showed that patients who experienced pain before the operation anticipated feeling higher pain intensity immediately postoperatively. Patients, on average, experienced higher pain intensity 3 days after the operation than anticipated. The postoperative pain intensity at 3 days was the most severe, while postoperative pain intensity at 6 weeks was the least severe. Age, gender and preoperative diagnosis (acute versus chronic) did not have a significant effect on the severity of pain that patients experienced. Six weeks following the operation, the majority of patients felt no pain. In addition, the severity of preoperative pain was highly predictive of their anticipated postoperative pain and 6-week postoperative pain, and both preoperative pain and anticipated pain predict higher immediate postoperative pain. CONCLUSION: The intensity of patients' preoperative pain was predictive of the anticipated postoperative pain. Patients' preoperative pain and anticipated postoperative pain were independently predictive of the 3-day postoperative pain. The higher pain intensity a patient experienced preoperatively suggested that their postoperative pain severity would be greater. Therefore, surgeons should be aware of these findings when treating postoperative pain after orthopaedic foot and ankle operations.


Subject(s)
Foot/surgery , Orthopedic Procedures , Pain, Postoperative/epidemiology , Pain, Postoperative/psychology , Adolescent , Adult , Aged , Aged, 80 and over , Female , Follow-Up Studies , Health Surveys , Humans , Male , Middle Aged , Pain Measurement , Pain, Postoperative/diagnosis , Prospective Studies , Risk Factors , Young Adult
20.
J Am Stat Assoc ; 112(520): 1697-1707, 2017.
Article in English | MEDLINE | ID: mdl-29618851

ABSTRACT

We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.

SELECTION OF CITATIONS
SEARCH DETAIL