Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Biom J ; 66(1): e2200237, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38285404

RESUMO

The two-sample problem is one of the earliest problems in statistics: given two samples, the question is whether or not the observations were sampled from the same distribution. Many statistical tests have been developed for this problem, and many tests have been evaluated in simulation studies, but hardly any study has tried to set up a neutral comparison study. In this paper, we introduce an open science initiative that potentially allows for neutral comparisons of two-sample tests. It is designed as an open-source R package, a repository, and an online R Shiny app. This paper describes the principles, the design of the system and illustrates the use of the system.


Assuntos
Simulação por Computador
2.
Entropy (Basel) ; 25(5)2023 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-37238489

RESUMO

We obtain expressions for the asymptotic distributions of the Rényi and Tsallis of order q entropies and Fisher information when computed on the maximum likelihood estimator of probabilities from multinomial random samples. We verify that these asymptotic models, two of which (Tsallis and Fisher) are normal, describe well a variety of simulated data. In addition, we obtain test statistics for comparing (possibly different types of) entropies from two samples without requiring the same number of categories. Finally, we apply these tests to social survey data and verify that the results are consistent but more general than those obtained with a χ2 test.

3.
Int J Mol Sci ; 23(14)2022 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-35886973

RESUMO

Making statistical inference on quantities defining various characteristics of a temporally measured biochemical process and analyzing its variability across different experimental conditions is a core challenge in various branches of science. This problem is particularly difficult when the amount of data that can be collected is limited in terms of both the number of replicates and the number of time points per process trajectory. We propose a method for analyzing the variability of smooth functionals of the growth or production trajectories associated with such processes across different experimental conditions. Our modeling approach is based on a spline representation of the mean trajectories. We also develop a bootstrap-based inference procedure for the parameters while accounting for possible multiple comparisons. This methodology is applied to study two types of quantities-the "time to harvest" and "maximal productivity"-in the context of an experiment on the production of recombinant proteins. We complement the findings with extensive numerical experiments comparing the effectiveness of different types of bootstrap procedures for various tests of hypotheses. These numerical experiments convincingly demonstrate that the proposed method yields reliable inference on complex characteristics of the processes even in a data-limited environment where more traditional methods for statistical inference are typically not reliable.


Assuntos
Projetos de Pesquisa , Proteínas Recombinantes/genética
4.
Ann Appl Probab ; 32(4): 2967-3003, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-36034074

RESUMO

We study the sample covariance matrix for real-valued data with general population covariance, as well as MANOVA-type covariance estimators in variance components models under null hypotheses of global sphericity. In the limit as matrix dimensions increase proportionally, the asymptotic spectra of such estimators may have multiple disjoint intervals of support, possibly intersecting the negative half line. We show that the distribution of the extremal eigenvalue at each regular edge of the support has a GOE Tracy-Widom limit. Our proof extends a comparison argument of Ji Oon Lee and Kevin Schnelli, replacing a continuous Green function flow by a discrete Lindeberg swapping scheme.

5.
Biometrics ; 77(3): 1037-1049, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33434289

RESUMO

Changepoint detection methods are used in many areas of science and engineering, for example, in the analysis of copy number variation data to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or the presence) of given changepoints post-selection are lacking. Post-selection inference offers a framework to fill this gap, but the most straightforward application of these methods results in low-powered hypothesis tests and leaves open several important questions about practical usability. In this work, we carefully tailor post-selection inference methods toward changepoint detection, focusing on copy number variation data. To accomplish this, we study commonly used changepoint algorithms: binary segmentation, as well as two of its most popular variants, wild and circular, and the fused lasso. We implement some of the latest developments in post-selection inference theory, mainly auxiliary randomization. This improves the power, which requires implementations of Markov chain Monte Carlo algorithms (importance sampling and hit-and-run sampling) to carry out our tests. We also provide recommendations for improving practical useability, detailed simulations, and example analyses on array comparative genomic hybridization as well as sequencing data.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Hibridização Genômica Comparativa , Variações do Número de Cópias de DNA/genética , Cadeias de Markov , Método de Monte Carlo
6.
Stat Med ; 39(17): 2291-2307, 2020 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-32478440

RESUMO

In lifetime data, like cancer studies, there may be long term survivors, which lead to heavy censoring at the end of the follow-up period. Since a standard survival model is not appropriate to handle these data, a cure model is needed. In the literature, covariate hypothesis tests for cure models are limited to parametric and semiparametric methods. We fill this important gap by proposing a nonparametric covariate hypothesis test for the probability of cure in mixture cure models. A bootstrap method is proposed to approximate the null distribution of the test statistic. The procedure can be applied to any type of covariate, and could be extended to the multivariate setting. Its efficiency is evaluated in a Monte Carlo simulation study. Finally, the method is applied to a colorectal cancer dataset.


Assuntos
Modelos Estatísticos , Sobreviventes , Simulação por Computador , Humanos , Método de Monte Carlo , Probabilidade
7.
BMC Med Res Methodol ; 20(1): 197, 2020 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-32711456

RESUMO

BACKGROUND: Under competing risks, the commonly used sub-distribution hazard ratio (SHR) is not easy to interpret clinically and is valid only under the proportional sub-distribution hazard (SDH) assumption. This paper introduces an alternative statistical measure: the restricted mean time lost (RMTL). METHODS: First, the definition and estimation methods of the measures are introduced. Second, based on the differences in RMTLs, a basic difference test (Diff) and a supremum difference test (sDiff) are constructed. Then, the corresponding sample size estimation method is proposed. The statistical properties of the methods and the estimated sample size are evaluated using Monte Carlo simulations, and these methods are also applied to two real examples. RESULTS: The simulation results show that sDiff performs well and has relatively high test efficiency in most situations. Regarding sample size calculation, sDiff exhibits good performance in various situations. The methods are illustrated using two examples. CONCLUSIONS: RMTL can meaningfully summarize treatment effects for clinical decision making, which can then be reported with the SDH ratio for competing risks data. The proposed sDiff test and the two calculated sample size formulas have wide applicability and can be considered in real data analysis and trial design.


Assuntos
Modelos de Riscos Proporcionais , Simulação por Computador , Humanos , Método de Monte Carlo , Tamanho da Amostra , Fatores de Tempo
8.
BMC Med Res Methodol ; 20(1): 244, 2020 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-32998683

RESUMO

BACKGROUND: Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple methods to aid researchers in interpreting statistical outputs. These methods emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. METHODS: We use the Shannon transform of the P-value p, also known as the binary surprisal or S-value s = -log2(p), to provide a measure of the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing. We also use tables or graphs of test statistics for alternative hypotheses, and interval estimates for different percentile levels, to thwart fallacies arising from arbitrary dichotomies. Finally, we reinterpret P-values and interval estimates in unconditional terms, which describe compatibility of data with the entire set of analysis assumptions. We illustrate these methods with a reanalysis of data from an existing record-based cohort study. CONCLUSIONS: In line with other recent recommendations, we advise that teaching materials and research reports discuss P-values as measures of compatibility rather than significance, compute P-values for alternative hypotheses whenever they are computed for null hypotheses, and interpret interval estimates as showing values of high compatibility with data, rather than regions of confidence. Our recommendations emphasize cognitive devices for displaying the compatibility of the observed data with various hypotheses of interest, rather than focusing on single hypothesis tests or interval estimates. We believe these simple reforms are well worth the minor effort they require.


Assuntos
Cognição , Semântica , Teorema de Bayes , Estudos de Coortes , Intervalos de Confiança , Humanos , Probabilidade
9.
J Res Natl Inst Stand Technol ; 125: 125003, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-38343525

RESUMO

Given a composite null hypothesis ℋ0, test supermartingales are non-negative supermartingales with respect to ℋ0 with an initial value of 1. Large values of test supermartingales provide evidence against ℋ0. As a result, test supermartingales are an effective tool for rejecting ℋ0, particularly when the p-values obtained are very small and serve as certificates against the null hypothesis. Examples include the rejection of local realism as an explanation of Bell test experiments in the foundations of physics and the certification of entanglement in quantum information science. Test supermartingales have the advantage of being adaptable during an experiment and allowing for arbitrary stopping rules. By inversion of acceptance regions, they can also be used to determine confidence sets. We used an example to compare the performance of test supermartingales for computing p-values and confidence intervals to Chernoff-Hoeffding bounds and the "exact" p-value. The example is the problem of inferring the probability of success in a sequence of Bernoulli trials. There is a cost in using a technique that has no restriction on stopping rules, and, for a particular test supermartingale, our study quantifies this cost.

10.
Stat Appl Genet Mol Biol ; 17(3)2018 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-29897889

RESUMO

MOTIVATION: Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. RESULTS: We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. AVAILABILITY: This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.


Assuntos
Redes Reguladoras de Genes , Redes e Vias Metabólicas/genética , Modelos Estatísticos , Bases de Dados Genéticas , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/patologia , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Modelos Genéticos , Neoplasias Gástricas/genética , Neoplasias Gástricas/patologia
11.
Biom J ; 61(1): 162-165, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30417414

RESUMO

A well-known problem in classical two-tailed hypothesis testing is that P-values go to zero when the sample size goes to infinity, irrespectively of the effect size. This pitfall can make the testing of data consisting of large sample sizes potentially unreliable. In this note, we propose to test for relevant differences to overcome this issue. We illustrate the proposed test a on real data set of about 40 million privately insured patients.


Assuntos
Biometria/métodos , Serviço Hospitalar de Emergência/estatística & dados numéricos , Humanos , Tamanho da Amostra , Viroses/epidemiologia
12.
Biometrics ; 74(1): 196-206, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29542118

RESUMO

Researchers in genetics and other life sciences commonly use permutation tests to evaluate differences between groups. Permutation tests have desirable properties, including exactness if data are exchangeable, and are applicable even when the distribution of the test statistic is analytically intractable. However, permutation tests can be computationally intensive. We propose both an asymptotic approximation and a resampling algorithm for quickly estimating small permutation p-values (e.g., <10-6) for the difference and ratio of means in two-sample tests. Our methods are based on the distribution of test statistics within and across partitions of the permutations, which we define. In this article, we present our methods and demonstrate their use through simulations and an application to cancer genomic data. Through simulations, we find that our resampling algorithm is more computationally efficient than another leading alternative, particularly for extremely small p-values (e.g., <10-30). Through application to cancer genomic data, we find that our methods can successfully identify up- and down-regulated genes. While we focus on the difference and ratio of means, we speculate that our approaches may work in other settings.


Assuntos
Genômica/métodos , Modelos Estatísticos , Algoritmos , Animais , Simulação por Computador , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Genômica/estatística & dados numéricos , Humanos , Neoplasias/genética
13.
Biometrics ; 73(2): 441-451, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-27918612

RESUMO

This article investigates a generalized semiparametric varying-coefficient model for longitudinal data that can flexibly model three types of covariate effects: time-constant effects, time-varying effects, and covariate-varying effects. Different link functions can be selected to provide a rich family of models for longitudinal data. The model assumes that the time-varying effects are unspecified functions of time and the covariate-varying effects are parametric functions of an exposure variable specified up to a finite number of unknown parameters. The estimation procedure is developed using local linear smoothing and profile weighted least squares estimation techniques. Hypothesis testing procedures are developed to test the parametric functions of the covariate-varying effects. The asymptotic distributions of the proposed estimators are established. A working formula for bandwidth selection is discussed and examined through simulations. Our simulation study shows that the proposed methods have satisfactory finite sample performance. The proposed methods are applied to the ACTG 244 clinical trial of HIV infected patients being treated with Zidovudine to examine the effects of antiretroviral treatment switching before and after HIV develops the T215Y/F drug resistance mutation. Our analysis shows benefits of treatment switching to the combination therapies as compared to continuing with ZDV monotherapy before and after developing the 215-mutation.


Assuntos
Distribuição Aleatória , Simulação por Computador , Humanos , Análise dos Mínimos Quadrados , Projetos de Pesquisa
15.
Pharm Stat ; 14(2): 139-50, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25641830

RESUMO

Drug development is not the only industrial-scientific enterprise subject to government regulations. In some fields of ecology and environmental sciences, the application of statistical methods is also regulated by ordinance. Over the past 20years, ecologists and environmental scientists have argued against an unthinking application of null hypothesis significance tests. More recently, Canadian ecologists have suggested a new approach to significance testing, taking account of the costs of both type I and type II errors. In this paper, we investigate the implications of this for testing in drug development and demonstrate that its adoption leads directly to the likelihood principle and Bayesian approaches.


Assuntos
Interpretação Estatística de Dados , Descoberta de Drogas/métodos , Descoberta de Drogas/estatística & dados numéricos , Teorema de Bayes , Humanos , Tamanho da Amostra
16.
Pharm Stat ; 12(5): 255-9, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23893876

RESUMO

In May 2012, the Committee of Health and Medicinal Products issued a concept paper on the need to review the points to consider document on multiplicity issues in clinical trials. In preparation for the release of the updated guidance document, Statisticians in the Pharmaceutical Industry held a one-day expert group meeting in January 2013. Topics debated included multiplicity and the drug development process, the usefulness and limitations of newly developed strategies to deal with multiplicity, multiplicity issues arising from interim decisions and multiregional development, and the need for simultaneous confidence intervals (CIs) corresponding to multiple test procedures. A clear message from the meeting was that multiplicity adjustments need to be considered when the intention is to make a formal statement about efficacy or safety based on hypothesis tests. Statisticians have a key role when designing studies to assess what adjustment really means in the context of the research being conducted. More thought during the planning phase needs to be given to multiplicity adjustments for secondary endpoints given these are increasing in importance in differentiating products in the market place. No consensus was reached on the role of simultaneous CIs in the context of superiority trials. It was argued that unadjusted intervals should be employed as the primary purpose of the intervals is estimation, while the purpose of hypothesis testing is to formally establish an effect. The opposing view was that CIs should correspond to the test decision whenever possible.


Assuntos
Indústria Farmacêutica/estatística & dados numéricos , Projetos de Pesquisa/estatística & dados numéricos , Pesquisadores , Ensaios Clínicos como Assunto/estatística & dados numéricos , Intervalos de Confiança , Humanos
17.
Psychometrika ; 88(2): 636-655, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36892727

RESUMO

Research questions in the human sciences often seek to answer if and when a process changes across time. In functional MRI studies, for instance, researchers may seek to assess the onset of a shift in brain state. For daily diary studies, the researcher may seek to identify when a person's psychological process shifts following treatment. The timing and presence of such a change may be meaningful in terms of understanding state changes. Currently, dynamic processes are typically quantified as static networks where edges indicate temporal relations among nodes, which may be variables reflecting emotions, behaviors, or brain activity. Here we describe three methods for detecting changes in such correlation networks from a data-driven perspective. Networks here are quantified using the lag-0 pair-wise correlation (or covariance) estimates as the representation of the dynamic relations among variables. We present three methods for change point detection: dynamic connectivity regression, max-type method, and a PCA-based method. The change point detection methods each include different ways to test if two given correlation network patterns from different segments in time are significantly different. These tests can also be used outside of the change point detection approaches to test any two given blocks of data. We compare the three methods for change point detection as well as the complementary significance testing approaches on simulated and empirical functional connectivity fMRI data examples.


Assuntos
Mapeamento Encefálico , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Mapeamento Encefálico/métodos , Vias Neurais , Psicometria , Encéfalo/diagnóstico por imagem
18.
Genes (Basel) ; 13(1)2022 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-35052480

RESUMO

The inference of ancestry has become a part of the services many forensic genetic laboratories provide. Interest in ancestry may be to provide investigative leads or identify the region of origin in cases of unidentified missing persons. There exist many biostatistical methods developed for the study of population structure in the area of population genetics. However, the challenges and questions are slightly different in the context of forensic genetics, where the origin of a specific sample is of interest compared to the understanding of population histories and genealogies. In this paper, the methodologies for modelling population admixture and inferring ancestral populations are reviewed with a focus on their strengths and weaknesses in relation to ancestry inference in the forensic context.


Assuntos
Etnicidade/genética , Genética Forense/métodos , Marcadores Genéticos , Genética Populacional , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genética , Humanos
19.
MethodsX ; 9: 101660, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35345788

RESUMO

Large sets of autocorrelated data are common in fields such as remote sensing and genomics. For example, remote sensing can produce maps of information for millions of pixels, and the information from nearby pixels will likely be spatially autocorrelated. Although there are well-established statistical methods for testing hypotheses using autocorrelated data, these methods become computationally impractical for large datasets. • The method developed here makes it feasible to perform F-tests, likelihood ratio tests, and t-tests for large autocorrelated datasets. The method involves subsetting the dataset into partitions, analyzing each partition separately, and then combining the separate tests to give an overall test. • The separate statistical tests on partitions are non-independent, because the points in different partitions are not independent. Therefore, combining separate analyses of partitions requires accounting for the non-independence of the test statistics among partitions. • The methods can be applied to a wide range of data, including not only purely spatial data but also spatiotemporal data. For spatiotemporal data, it is possible to estimate coefficients from time-series models at different spatial locations and then analyze the spatial distribution of the estimates. The spatial analysis can be simplified by estimating spatial autocorrelation directly from the spatial autocorrelation among time series.

20.
J Appl Stat ; 49(14): 3659-3676, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36246862

RESUMO

The problem of testing the intercept and slope parameters of doubly multivariate linear models with site-dependent covariates using Rao's score test (RST) is studied. The RST statistic is developed for a block exchangeable covariance structure on the error vector under the assumption of multivariate normality. We compare our developed RST statistic with the likelihood ratio test (LRT) statistic. Monte Carlo simulations indicate that the RST statistic is much more accurate than its counterpart LRT statistic and it takes significantly less computation time than the LRT statistic. The proposed method is illustrated with an example of multiple response variables measured on multiple trees in a single plot in an agricultural study.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA