Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.730
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Annu Rev Cell Dev Biol ; 30: 23-37, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25000992

RESUMO

The physicist Ernest Rutherford said, "If your experiment needs statistics, you ought to have done a better experiment." Although this aphorism remains true for much of today's research in cell biology, a basic understanding of statistics can be useful to cell biologists to help in monitoring the conduct of their experiments, in interpreting the results, in presenting them in publications, and when critically evaluating research by others. However, training in statistics is often focused on the sophisticated needs of clinical researchers, psychologists, and epidemiologists, whose conclusions depend wholly on statistics, rather than the practical needs of cell biologists, whose experiments often provide evidence that is not statistical in nature. This review describes some of the basic statistical principles that may be of use to experimental biologists, but it does not cover the sophisticated statistics needed for papers that contain evidence of no other kind.


Assuntos
Biologia Celular , Estatística como Assunto , Causalidade , Interpretação Estatística de Dados , Probabilidade , Reprodutibilidade dos Testes , Projetos de Pesquisa , Distribuições Estatísticas
2.
Nature ; 577(7792): 671-675, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31942076

RESUMO

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.


Assuntos
Dopamina/metabolismo , Aprendizagem/fisiologia , Modelos Neurológicos , Reforço Psicológico , Recompensa , Animais , Inteligência Artificial , Neurônios Dopaminérgicos/metabolismo , Neurônios GABAérgicos/metabolismo , Camundongos , Otimismo , Pessimismo , Probabilidade , Distribuições Estatísticas , Área Tegmentar Ventral/citologia , Área Tegmentar Ventral/fisiologia
3.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37018147

RESUMO

MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.


Assuntos
Transcriptoma , Distribuição Normal , Simulação por Computador , Distribuições Estatísticas , Análise de Sequência de RNA
4.
Cereb Cortex ; 33(16): 9439-9449, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37409647

RESUMO

Numbers of neurons and their spatial variation are fundamental organizational features of the brain. Despite the large corpus of cytoarchitectonic data available in the literature, the statistical distributions of neuron densities within and across brain areas remain largely uncharacterized. Here, we show that neuron densities are compatible with a lognormal distribution across cortical areas in several mammalian species, and find that this also holds true within cortical areas. A minimal model of noisy cell division, in combination with distributed proliferation times, can account for the coexistence of lognormal distributions within and across cortical areas. Our findings uncover a new organizational principle of cortical cytoarchitecture: the ubiquitous lognormal distribution of neuron densities, which adds to a long list of lognormal variables in the brain.


Assuntos
Encéfalo , Neurônios , Animais , Neurônios/fisiologia , Encéfalo/fisiologia , Mamíferos , Córtex Cerebral/fisiologia , Distribuições Estatísticas
5.
Bioinformatics ; 38(18): 4352-4359, 2022 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-35916726

RESUMO

MOTIVATION: The Chemical Master Equation is a stochastic approach to describe the evolution of a (bio)chemical reaction system. Its solution is a time-dependent probability distribution on all possible configurations of the system. As this number is typically large, the Master Equation is often practically unsolvable. The Method of Moments reduces the system to the evolution of a few moments, which are described by ordinary differential equations. Those equations are not closed, since lower order moments generally depend on higher order moments. Various closure schemes have been suggested to solve this problem. Two major problems with these approaches are first that they are open loop systems, which can diverge from the true solution, and second, some of them are computationally expensive. RESULTS: Here we introduce Quasi-Entropy Closure, a moment-closure scheme for the Method of Moments. It estimates higher order moments by reconstructing the distribution that minimizes the distance to a uniform distribution subject to lower order moment constraints. Quasi-Entropy Closure can be regarded as an advancement of Zero-Information Closure, which similarly maximizes the information entropy. Results show that both approaches outperform truncation schemes. Quasi-Entropy Closure is computationally much faster than Zero-Information Closure, although both methods consider solutions on the space of configurations and hence do not completely overcome the curse of dimensionality. In addition, our scheme includes a plausibility check for the existence of a distribution satisfying a given set of moments on the feasible set of configurations. All results are evaluated on different benchmark problems. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Modelos Biológicos , Processos Estocásticos , Entropia , Probabilidade , Distribuições Estatísticas
6.
Biometrics ; 79(2): 1159-1172, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-35178716

RESUMO

Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.


Assuntos
Modelos Lineares , Distribuições Estatísticas , Estudos de Associação Genética , Distribuição Normal
7.
Biometrics ; 79(4): 3818-3830, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36795803

RESUMO

Contact-tracing is one of the most effective tools in infectious disease outbreak control. A capture-recapture approach based upon ratio regression is suggested to estimate the completeness of case detection. Ratio regression has been recently developed as flexible tool for count data modeling and has proved to be successful in the capture-recapture setting. The methodology is applied here to Covid-19 contact tracing data from Thailand. A simple weighted straight line approach is used which includes the Poisson and geometric distribution as special cases. For the case study data of contact tracing for Thailand, a completeness of 83% could be found with a 95% confidence interval of 74%-93%.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Busca de Comunicante , Surtos de Doenças , Distribuições Estatísticas
8.
Stat Med ; 42(8): 1113-1126, 2023 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-36650701

RESUMO

Non-inferiority (NI) trials are implemented when there is a practical demand to search for alternatives to standard therapies, such as to reduce side effects. An experimental treatment is considered non-inferior to the standard treatment when it exhibits clinically non-significant loss of efficacy. Ordinal categorical responses are frequently observed in clinical trials. It has been reported that responses measured using an ordinal scale produce more informative analysis than when responses collapse into binary outcomes. We study the NI trials using ordinal endpoints. We propose a latent variable model for ordinal categorical responses. Based on the proposed latent variable model, the mean efficacy of the different treatments is denoted by the corresponding mean parameter of the underlying continuous distributions. A two-step procedure is proposed for model identification and parameter estimation. A non-inferiority analysis can then be conducted based on the latent variable model and the corresponding estimation procedure. We also develop a method and an algorithm to produce an optimal sample size configuration based on the proposed testing procedure. Two clinical examples are provided for demonstrative purposes.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Estatísticos , Humanos , Tamanho da Amostra , Distribuições Estatísticas
9.
J Biopharm Stat ; 33(3): 386-399, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-36511635

RESUMO

The Weibull distribution is applied to the number of days between the start date of drug administration and the date of occurrence of an adverse event. The tendency of occurrence of adverse events can be clarified by estimating the two- or three-parameter Weibull distribution, using the data regarding the number of days. Our purpose is to estimate the parameters of the Weibull distribution with high accuracy, even in low-reported adverse events, such as new drugs, polypharmacy and small clinical trials. Furthermore, the two-sample Kolmogorov - Smirnov test (two-sided) is used to examine whether the tendency of occurrence of adverse events is different for two Weibull distributions estimated from two drugs with similar efficacy. We used discrete data derived from FDA Adverse Event Reporting System (FAERS), as the FAERS data are presented in years, months and days without hours and minutes. Because this study focuses on early onset adverse events, data may be contained 0 days. The discreteness of the data and the fact that it may include zero make this distribution different from the general Weibull distribution, which is defined for continuous data greater than zero. We search for the optimal parameter estimation method for the Weibull distribution under these two conditions, and verify its effectiveness using Monte Carlo simulations and FAERS data. Because the results obtained from FAERS data may differ depending on data handling, we describe the of data handling technique and the sample code that can reproduce the results.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Estados Unidos , Humanos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Sistemas de Notificação de Reações Adversas a Medicamentos , United States Food and Drug Administration , Software , Distribuições Estatísticas
10.
An Acad Bras Cienc ; 95(2): e20200841, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37531487

RESUMO

In this paper, a new class of semi-continuous distributions called zero-adjusted log-symmetric is introduced and studied. Some properties and parameters estimation by maximum likelihood method are derived and confidence intervals (CIs) are developed. A simulation study is conducted to evaluate properties of the maximum likelihood estimators in lighter/heavier-tailed distributions. Finally, an application in a real data set is presented to illustrate the flexibility of the proposed class of distributions.


Assuntos
Simulação por Computador , Distribuições Estatísticas
11.
Behav Res Methods ; 55(8): 4343-4368, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37277644

RESUMO

The multibridge R package allows a Bayesian evaluation of informed hypotheses [Formula: see text] applied to frequency data from an independent binomial or multinomial distribution. multibridge uses bridge sampling to efficiently compute Bayes factors for the following hypotheses concerning the latent category proportions 𝜃: (a) hypotheses that postulate equality constraints (e.g., 𝜃1 = 𝜃2 = 𝜃3); (b) hypotheses that postulate inequality constraints (e.g., 𝜃1 < 𝜃2 < 𝜃3 or 𝜃1 > 𝜃2 > 𝜃3); (c) hypotheses that postulate combinations of inequality constraints and equality constraints (e.g., 𝜃1 < 𝜃2 = 𝜃3); and (d) hypotheses that postulate combinations of (a)-(c) (e.g., 𝜃1 < (𝜃2 = 𝜃3),𝜃4). Any informed hypothesis [Formula: see text] may be compared against the encompassing hypothesis [Formula: see text] that all category proportions vary freely, or against the null hypothesis [Formula: see text] that all category proportions are equal. multibridge facilitates the fast and accurate comparison of large models with many constraints and models for which relatively little posterior mass falls in the restricted parameter space. This paper describes the underlying methodology and illustrates the use of multibridge through fully reproducible examples.


Assuntos
Teorema de Bayes , Humanos , Distribuições Estatísticas
12.
Stat Med ; 41(7): 1172-1190, 2022 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-34786744

RESUMO

Confidence intervals for the mean of discrete exponential families are widely used in many applications. Since missing data are commonly encountered, the interval estimation for incomplete data is an important problem. The performances of the existing multiple imputation confidence intervals are unsatisfactory. We propose modified multiple imputation confidence intervals to improve the existing confidence intervals for the mean of the discrete exponential families with quadratic variance functions. A simulation study shows that the coverage probabilities of the modified confidence intervals are closer to the nominal level than the existing confidence intervals when the true mean is near the boundaries of the parameter space. These confidence intervals are also illustrated with real data examples.


Assuntos
Intervalos de Confiança , Simulação por Computador , Humanos , Probabilidade , Distribuições Estatísticas
13.
Stat Med ; 41(25): 5061-5083, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-35973712

RESUMO

In clinical trials, comparisons of treatments with ordinal responses are frequently conducted using the proportional odds model. However, the use of this model necessitates the adoption of the proportional odds assumption, which may not be appropriate. In particular, when responses are skewed, the use of the proportional odds model may result in a markedly inflated type I error rate. The latent Weibull distribution has recently been proposed to remedy this problem, and it has been demonstrated to be superior to the proportional odds model, especially when response-adaptive randomization is incorporated. However, there are several drawbacks associated with the latent Weibull model and the previously suggested response-adaptive treatment randomization scheme. In this paper, we propose the modified latent Weibull model to address these issues. Based on the modified latent Weibull model, the original response-adaptive design was also revised. In addition, the group sequential monitoring mechanism was included to enable interim analyses to be performed to determine, during a trial, whether a specific treatment is significantly more effective than another. If so, this will enable the trial to be terminated at a much earlier stage than a trial based on a fixed sample size. We performed a simulation study that clearly demonstrated the merits of our proposed framework. Furthermore, we redesigned a clinical study to further illustrate the advantages of our response-adaptive approach.


Assuntos
Projetos de Pesquisa , Humanos , Distribuição Aleatória , Tamanho da Amostra , Distribuições Estatísticas , Simulação por Computador
14.
An Acad Bras Cienc ; 94(3): e20201542, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36350887

RESUMO

We investigate the use of the Probabilistic Incremental Programming Evolution (PIPE) algorithm as a tool to construct continuous cumulative distribution functions to model given data sets. The PIPE algorithm can generate several candidate functions to fit the empirical distribution of data. These candidates are generated by following a set of probability rules. The set of rules is then evolved over a number of iterations to generate better candidates regarding some optimality criteria. This approach rivals that of generated distribution, obtained by adding parameters to existing probability distributions. There are two main advantages for this method. The first is that it is possible to explicitly control the complexity of the candidate functions, by specifying which mathematical functions and operators can be used and how lengthy the mathematical expression of the candidate can be. The second advantage is that this approach deals with model selection and estimation at the same time. The overall performance in both simulated and real data was very satisfying. For the real data applications, the PIPE algorithm obtained better likelihoods for the data when compared to existing models, but with remarkably simpler mathematical expressions.


Assuntos
Algoritmos , Probabilidade , Distribuições Estatísticas
15.
An Acad Bras Cienc ; 94(4): e20191597, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36287483

RESUMO

This paper introduce two new families of distributions that allow fitting unimodal, bimodal or trimodal data sets. Statistical properties such as distribution function, moments, moment generating function and stochastic representation of these new families are studied in details. The problem of estimating parameters is addressed by considering the maximum likelihood method and Fisher information matrices are derived. A small Monte Carlo simulation study is conducted to examine the performance of the obtained estimators. The methodology developed is illustrated with three real data applications.


Assuntos
Distribuições Estatísticas , Método de Monte Carlo , Simulação por Computador
16.
An Acad Bras Cienc ; 94(2): e20201972, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35857939

RESUMO

We define two new flexible families of continuous distributions to fit real data by compoun-ding the Marshall-Olkin class and the power series distribution. These families are very competitive to the popular beta and Kumaraswamy generators. Their densities have linear representations of exponentiated densities. In fact, as the main properties of thirty five exponentiated distributions are well-known, we can easily obtain several properties of about three hundred fifty distributions using the references of this article and five special cases of the power series distribution. We provide a package implemented in R software that shows numerically the precision of one of the linear representations. This package is useful to calculate numerical values for some statistical measurements of the generated distributions. We estimate the parameters by maximum likelihood. We define a regression based on one of the two families. The usefulness of a generated distribution and the associated regression is proved empirically.


Assuntos
Distribuições Estatísticas
17.
Biom J ; 64(3): 617-634, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34873728

RESUMO

With improvements to cancer diagnoses and treatments, incidences and mortality rates have changed. However, the most commonly used analysis methods do not account for such distributional changes. In survival analysis, change point problems can concern a shift in a distribution for a set of time-ordered observations, potentially under censoring or truncation. We propose a sequential testing approach for detecting multiple change points in the Weibull accelerated failure time model, since this is sufficiently flexible to accommodate increasing, decreasing, or constant hazard rates and is also the only continuous distribution for which the accelerated failure time model can be reparameterized as a proportional hazards model. Our sequential testing procedure does not require the number of change points to be known; this information is instead inferred from the data. We conduct a simulation study to show that the method accurately detects change points and estimates the model. The numerical results along with real data applications demonstrate that our proposed method can detect change points in the hazard rate.


Assuntos
Modelos de Riscos Proporcionais , Simulação por Computador , Distribuições Estatísticas , Análise de Sobrevida
18.
BMC Bioinformatics ; 22(1): 67, 2021 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-33579202

RESUMO

BACKGROUND: The search for statistically significant relationships between molecular markers and outcomes is challenging when dealing with high-dimensional, noisy and collinear multivariate omics data, such as metabolomic profiles. Permutation procedures allow for the estimation of adjusted significance levels without assuming independence among metabolomic variables. Nevertheless, the complex non-normal structure of metabolic profiles and outcomes may bias the permutation results leading to overly conservative threshold estimates i.e. lower than those from a Bonferroni or Sidak correction. METHODS: Within a univariate permutation procedure we employ parametric simulation methods based on the multivariate (log-)Normal distribution to obtain adjusted significance levels which are consistent across different outcomes while effectively controlling the type I error rate. Next, we derive an alternative closed-form expression for the estimation of the number of non-redundant metabolic variates based on the spectral decomposition of their correlation matrix. The performance of the method is tested for different model parametrizations and across a wide range of correlation levels of the variates using synthetic and real data sets. RESULTS: Both the permutation-based formulation and the more practical closed form expression are found to give an effective indication of the number of independent metabolic effects exhibited by the system, while guaranteeing that the derived adjusted threshold is stable across outcome measures with diverse properties.


Assuntos
Metaboloma , Metabolômica , Modelos Biológicos , Marcadores Genéticos/genética , Metabolômica/métodos , Distribuições Estatísticas
19.
Biostatistics ; 21(3): 384-399, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30260365

RESUMO

In observational studies of treatment effects, it is common to have several outcomes, perhaps of uncertain quality and relevance, each purporting to measure the effect of the treatment. A single planned combination of several outcomes may increase both power and insensitivity to unmeasured bias when the plan is wisely chosen, but it may miss opportunities in other cases. A method is proposed that uses one planned combination with only a mild correction for multiple testing and exhaustive consideration of all possible combinations fully correcting for multiple testing. The method works with the joint distribution of $\kappa^{T}\left( \mathbf{T}-\boldsymbol{\mu}\right) /\sqrt {\boldsymbol{\kappa}^{T}\boldsymbol{\Sigma\boldsymbol{\kappa}}}$ and $max_{\boldsymbol{\lambda}\neq\mathbf{0}}$$\,\lambda^{T}\left( \mathbf{T} -\boldsymbol{\mu}\right) /$$\sqrt{\boldsymbol{\lambda}^{T}\boldsymbol{\Sigma \lambda}}$ where $\kappa$ is chosen a priori and the test statistic $\mathbf{T}$ is asymptotically $N_{L}\left( \boldsymbol{\mu},\boldsymbol{\Sigma}\right) $. The correction for multiple testing has a smaller effect on the power of $\kappa^{T}\left( \mathbf{T}-\boldsymbol{\mu }\right) /\sqrt{\boldsymbol{\kappa}^{T}\boldsymbol{\Sigma\boldsymbol{\kappa} }}$ than does switching to a two-tailed test, even though the opposite tail does receive consideration when $\lambda=-\kappa$. In the application, there are three measures of cognitive decline, and the a priori comparison $\kappa$ is their first principal component, computed without reference to treatment assignments. The method is implemented in an R package sensitivitymult.


Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Estudos Observacionais como Assunto/estatística & dados numéricos , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , Disfunção Cognitiva/diagnóstico , Humanos , Análise de Componente Principal , Distribuições Estatísticas
20.
Biostatistics ; 21(3): 531-544, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30590499

RESUMO

We propose a novel model for hierarchical time-to-event data, for example, healthcare data in which patients are grouped by their healthcare provider. The most common model for this kind of data is the Cox proportional hazard model, with frailties that are common to patients in the same group and given a parametric distribution. We relax the parametric frailty assumption in this class of models by using a non-parametric discrete distribution. This improves the flexibility of the model by allowing very general frailty distributions and enables the data to be clustered into groups of healthcare providers with a similar frailty. A tailored Expectation-Maximization algorithm is proposed for estimating the model parameters, methods of model selection are compared, and the code is assessed in simulation studies. This model is particularly useful for administrative data in which there are a limited number of covariates available to explain the heterogeneity associated with the risk of the event. We apply the model to a clinical administrative database recording times to hospital readmission, and related covariates, for patients previously admitted once to hospital for heart failure, and we explore latent clustering structures among healthcare providers.


Assuntos
Algoritmos , Pessoal de Saúde/estatística & dados numéricos , Admissão do Paciente , Modelos de Riscos Proporcionais , Tempo para o Tratamento/estatística & dados numéricos , Análise por Conglomerados , Simulação por Computador , Humanos , Distribuições Estatísticas , Estatísticas não Paramétricas , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA