Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
PLoS Comput Biol ; 19(10): e1011509, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37824442

RESUMO

A major goal of computational neuroscience is to build accurate models of the activity of neurons that can be used to interpret their function in circuits. Here, we explore using functional cell types to refine single-cell models by grouping them into functionally relevant classes. Formally, we define a hierarchical generative model for cell types, single-cell parameters, and neural responses, and then derive an expectation-maximization algorithm with variational inference that maximizes the likelihood of the neural recordings. We apply this "simultaneous" method to estimate cell types and fit single-cell models from simulated data, and find that it accurately recovers the ground truth parameters. We then apply our approach to in vitro neural recordings from neurons in mouse primary visual cortex, and find that it yields improved prediction of single-cell activity. We demonstrate that the discovered cell-type clusters are well separated and generalizable, and thus amenable to interpretation. We then compare discovered cluster memberships with locational, morphological, and transcriptomic data. Our findings reveal the potential to improve models of neural responses by explicitly allowing for shared functional properties across neurons.


Assuntos
Algoritmos , Neurônios , Camundongos , Animais , Simulação por Computador , Neurônios/fisiologia , Probabilidade , Modelos Neurológicos , Potenciais de Ação/fisiologia
2.
bioRxiv ; 2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36909648

RESUMO

A major goal of computational neuroscience is to build accurate models of the activity of neurons that can be used to interpret their function in circuits. Here, we explore using functional cell types to refine single-cell models by grouping them into functionally relevant classes. Formally, we define a hierarchical generative model for cell types, single-cell parameters, and neural responses, and then derive an expectation-maximization algorithm with variational inference that maximizes the likelihood of the neural recordings. We apply this "simultaneous" method to estimate cell types and fit single-cell models from simulated data, and find that it accurately recovers the ground truth parameters. We then apply our approach to in vitro neural recordings from neurons in mouse primary visual cortex, and find that it yields improved prediction of single-cell activity. We demonstrate that the discovered cell-type clusters are well separated and generalizable, and thus amenable to interpretation. We then compare discovered cluster memberships with locational, morphological, and transcriptomic data. Our findings reveal the potential to improve models of neural responses by explicitly allowing for shared functional properties across neurons.

3.
bioRxiv ; 2023 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-36993278

RESUMO

Material- and cell-based technologies such as engineered tissues hold great promise as human therapies. Yet, the development of many of these technologies becomes stalled at the stage of pre-clinical animal studies due to the tedious and low-throughput nature of in vivo implantation experiments. We introduce a 'plug and play' in vivo screening array platform called Highly Parallel Tissue Grafting (HPTG). HPTG enables parallelized in vivo screening of 43 three-dimensional microtissues within a single 3D printed device. Using HPTG, we screen microtissue formations with varying cellular and material components and identify formulations that support vascular self-assembly, integration and tissue function. Our studies highlight the importance of combinatorial studies that vary cellular and material formulation variables concomitantly, by revealing that inclusion of stromal cells can "rescue" vascular self-assembly in manner that is material-dependent. HPTG provides a route for accelerating pre-clinical progress for diverse medical applications including tissue therapy, cancer biomedicine, and regenerative medicine.

4.
Biostatistics ; 24(2): 481-501, 2023 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-34654923

RESUMO

In recent years, a number of methods have been proposed to estimate the times at which a neuron spikes on the basis of calcium imaging data. However, quantifying the uncertainty associated with these estimated spikes remains an open problem. We consider a simple and well-studied model for calcium imaging data, which states that calcium decays exponentially in the absence of a spike, and instantaneously increases when a spike occurs. We wish to test the null hypothesis that the neuron did not spike-i.e., that there was no increase in calcium-at a particular timepoint at which a spike was estimated. In this setting, classical hypothesis tests lead to inflated Type I error, because the spike was estimated on the same data used for testing. To overcome this problem, we propose a selective inference approach. We describe an efficient algorithm to compute finite-sample $p$-values that control selective Type I error, and confidence intervals with correct selective coverage, for spikes estimated using a recent proposal from the literature. We apply our proposal in simulation and on calcium imaging data from the $\texttt{spikefinder}$ challenge.


Assuntos
Cálcio , Diagnóstico por Imagem , Humanos , Incerteza , Potenciais de Ação/fisiologia , Simulação por Computador , Algoritmos
5.
J Mach Learn Res ; 242023 May.
Artigo em Inglês | MEDLINE | ID: mdl-38264325

RESUMO

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of k-means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the k-means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38481523

RESUMO

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

7.
PLoS One ; 16(6): e0252345, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34086726

RESUMO

Calcium imaging has led to discoveries about neural correlates of behavior in subcortical neurons, including dopamine (DA) neurons. However, spike inference methods have not been tested in most populations of subcortical neurons. To address this gap, we simultaneously performed calcium imaging and electrophysiology in DA neurons in brain slices and applied a recently developed spike inference algorithm to the GCaMP fluorescence. This revealed that individual spikes can be inferred accurately in this population. Next, we inferred spikes in vivo from calcium imaging from these neurons during Pavlovian conditioning, as well as during navigation in virtual reality. In both cases, we quantitatively recapitulated previous in vivo electrophysiological observations. Our work provides a validated approach to infer spikes from calcium imaging in DA neurons and implies that aspects of both tonic and phasic spike patterns can be recovered.


Assuntos
Cálcio/metabolismo , Dopamina/metabolismo , Neurônios Dopaminérgicos/metabolismo , Potenciais de Ação/fisiologia , Algoritmos , Animais , Encéfalo/metabolismo , Sinalização do Cálcio/fisiologia , Condicionamento Clássico/fisiologia , Fenômenos Eletrofisiológicos/fisiologia , Camundongos
8.
Biometrika ; 107(2): 293-310, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32454528

RESUMO

The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the [Formula: see text]-nearest-neighbours fused lasso, involves computing the [Formula: see text]-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the [Formula: see text]-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an [Formula: see text]-graph rather than a [Formula: see text]-nearest-neighbours graph and contrast it with the [Formula: see text]-nearest-neighbours fused lasso.

9.
Hum Brain Mapp ; 41(10): 2553-2566, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32216125

RESUMO

Brain networks are increasingly characterized at different scales, including summary statistics, community connectivity, and individual edges. While research relating brain networks to behavioral measurements has yielded many insights into brain-phenotype relationships, common analytical approaches only consider network information at a single scale. Here, we designed, implemented, and deployed Multi-Scale Network Regression (MSNR), a penalized multivariate approach for modeling brain networks that explicitly respects both edge- and community-level information by assuming a low rank and sparse structure, both encouraging less complex and more interpretable modeling. Capitalizing on a large neuroimaging cohort (n = 1, 051), we demonstrate that MSNR recapitulates interpretable and statistically significant connectivity patterns associated with brain development, sex differences, and motion-related artifacts. Compared to single-scale methods, MSNR achieves a balance between prediction performance and model complexity, with improved interpretability. Together, by jointly exploiting both edge- and community-level information, MSNR has the potential to yield novel insights into brain-behavior relationships.


Assuntos
Encéfalo/fisiologia , Conectoma/métodos , Imageamento por Ressonância Magnética/métodos , Modelos Estatísticos , Rede Nervosa/fisiologia , Adolescente , Encéfalo/diagnóstico por imagem , Estudos Transversais , Feminino , Humanos , Individualidade , Masculino , Rede Nervosa/diagnóstico por imagem , Fenótipo , Análise de Regressão , Caracteres Sexuais
10.
Biostatistics ; 21(4): 709-726, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30753436

RESUMO

Calcium imaging data promises to transform the field of neuroscience by making it possible to record from large populations of neurons simultaneously. However, determining the exact moment in time at which a neuron spikes, from a calcium imaging data set, amounts to a non-trivial deconvolution problem which is of critical importance for downstream analyses. While a number of formulations have been proposed for this task in the recent literature, in this article, we focus on a formulation recently proposed in Jewell and Witten (2018. Exact spike train inference via $\ell_{0} $ optimization. The Annals of Applied Statistics12(4), 2457-2482) that can accurately estimate not just the spike rate, but also the specific times at which the neuron spikes. We develop a much faster algorithm that can be used to deconvolve a fluorescence trace of 100 000 timesteps in less than a second. Furthermore, we present a modification to this algorithm that precludes the possibility of a "negative spike". We demonstrate the performance of this algorithm for spike deconvolution on calcium imaging datasets that were recently released as part of the $\texttt{spikefinder}$ challenge (http://spikefinder.codeneuro.org/). The algorithm presented in this article was used in the Allen Institute for Brain Science's "platform paper" to decode neural activity from the Allen Brain Observatory; this is the main scientific paper in which their data resource is presented. Our $\texttt{C++}$ implementation, along with $\texttt{R}$ and $\texttt{python}$ wrappers, is publicly available. $\texttt{R}$ code is available on $\texttt{CRAN}$ and $\texttt{Github}$, and $\texttt{python}$ wrappers are available on $\texttt{Github}$; see https://github.com/jewellsean/FastLZeroSpikeInference.


Assuntos
Cálcio , Neurônios , Algoritmos , Encéfalo/diagnóstico por imagem , Diagnóstico por Imagem , Humanos
12.
Genome Res ; 27(1): 38-52, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27831498

RESUMO

Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R2 of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.


Assuntos
Cromatina/genética , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica/genética , Plasmídeos/genética , Montagem e Desmontagem da Cromatina/genética , Cromossomos/genética , Genes Reporter , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição
13.
J Am Stat Assoc ; 112(520): 1697-1707, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29618851

RESUMO

We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.

14.
Biometrika ; 103(4): 761-777, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-28736452

RESUMO

In classical statistics, much thought has been put into experimental design and data collection. In the high-dimensional setting, however, experimental design has been less of a focus. In this paper, we stress the importance of collecting multiple replicates for each subject in this setting. We consider learning the structure of a graphical model with latent variables, under the assumption that these variables take a constant value across replicates within each subject. By collecting multiple replicates for each subject, we are able to estimate the conditional dependence relationships among the observed variables given the latent variables. To test the null hypothesis of conditional independence between two observed variables, we propose a pairwise decorrelated score test. Theoretical guarantees are established for parameter estimation and for this test. We show that our proposal is able to estimate latent variable graphical models more accurately than some existing proposals, and apply the proposed method to a brain imaging dataset.

15.
Biometrika ; 102(1): 47-64, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27625437

RESUMO

We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.

16.
J Comput Graph Stat ; 23(4): 985-1008, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25364221

RESUMO

We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log likelihood. We apply an ℓ1 penalty to the means of the biclusters in order to obtain sparse and interpretable biclusters. Our proposal amounts to a sparse, symmetrized version of k-means clustering. We show that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of our proposal, and that a relaxation of our proposal yields the singular value decomposition. In addition, we propose a framework for bi-clustering based on the matrix-variate normal distribution. The performances of our proposals are demonstrated in a simulation study and on a gene expression data set. This article has supplementary material online.

17.
Technometrics ; 56(1): 112-122, 2014 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-24817772

RESUMO

In the high-dimensional regression setting, the elastic net produces a parsimonious model by shrinking all coefficients towards the origin. However, in certain settings, this behavior might not be desirable: if some features are highly correlated with each other and associated with the response, then we might wish to perform less shrinkage on the coefficients corresponding to that subset of features. We propose the cluster elastic net, which selectively shrinks the coefficients for such variables towards each other, rather than towards the origin. Instead of assuming that the clusters are known a priori, the cluster elastic net infers clusters of features from the data, on the basis of correlation among the variables as well as association with the response. These clusters are then used in order to more accurately perform regression. We demonstrate the theoretical advantages of our proposed approach, and explore its performance in a simulation study, and in an application to HIV drug resistance data. Supplementary Materials are available online.

18.
J R Stat Soc Series B Stat Methodol ; 76(2): 373-397, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24817823

RESUMO

We consider the problem of estimating multiple related Gaussian graphical models from a high-dimensional data set with observations belonging to distinct classes. We propose the joint graphical lasso, which borrows strength across the classes in order to estimate multiple graphical models that share certain characteristics, such as the locations or weights of nonzero edges. Our approach is based upon maximizing a penalized log likelihood. We employ generalized fused lasso or group lasso penalties, and implement a fast ADMM algorithm to solve the corresponding convex optimization problems. The performance of the proposed method is illustrated through simulated and real data examples.

19.
Nat Genet ; 46(3): 310-5, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24487276

RESUMO

Current methods for annotating and interpreting human genetic variation tend to exploit a single information type (for example, conservation) and/or are restricted in scope (for example, to missense changes). Here we describe Combined Annotation-Dependent Depletion (CADD), a method for objectively integrating many diverse annotations into a single measure (C score) for each variant. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants. We precompute C scores for all 8.6 billion possible human single-nucleotide variants and enable scoring of short insertions-deletions. C scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.


Assuntos
Variação Genética , Genoma Humano , Modelos Genéticos , Estudo de Associação Genômica Ampla , Humanos , Mutação INDEL , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Seleção Genética , Máquina de Vetores de Suporte
20.
Stat Interface ; 6(2): 211-221, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23875057

RESUMO

We consider the problem of performing unsupervised learning in the presence of outliers - that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an "error" term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations' errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...