Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
BMC Genomics ; 21(1): 75, 2020 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-31992223

RESUMO

BACKGROUND: High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths. RESULTS: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size. CONCLUSION: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.


Assuntos
Perfilação da Expressão Gênica , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Método de Monte Carlo , Neoplasias/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
2.
Stat Med ; 39(23): 3173-3183, 2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-32557688

RESUMO

We analytically obtain the average success probability (ASP) and the contemplated average success probability (CASP) for normally distributed observed differences in the treatment group and the placebo group means of the early trial and the confirmatory trial, assuming a uniform noninformative prior for the population treatment effect and a common known variance of the observations from both groups. For the CASP optimization problem with a fixed subtotal sample size of the early trial and the confirmatory trial of one arm larger than a threshold, we obtain the optimal plan of the sample sizes in a theorem. Moreover, in the theorem, we obtain the analytical formula of the optimal CASP as an increasing function of the subtotal sample size. After that, we calculate and compare the numerical values of the ASP with those in Table 1 of Chuang-Stein (2006). Finally, we investigate the numerical features of the CASP and find the optimal plan of the sample sizes for a given subtotal sample size.


Assuntos
Tamanho da Amostra , Teorema de Bayes , Humanos , Probabilidade
3.
Stat Med ; 38(12): 2282-2291, 2019 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-30773666

RESUMO

In the 1990s, China experienced a high degree of antibiotics abuse, which resulted in increased drug resistance. As a result, the World Health Organization introduced a program for children under the age of 5 years who had an acute respiratory tract infection. We analyze the data pertaining to the treatment provided by doctors in several hospitals in China in order to understand the relationships in the data. The data are nested in a three-level hierarchical structure with small cluster sizes ranging from 2 to 10. While large sample theory provides a mechanism to construct confidence intervals and test hypotheses about regression coefficients, the estimation algorithms often fail to converge when they are applied to small cluster sizes. This paper presents a combination of the cluster bootstrap and primary unit splitting methods, called split bootstrap, which is a novel combination that can be used as an alternative when analyzing data pertaining to the abuse of antibiotics in China with small cluster sizes. The split bootstrap method provides accurate estimations with a minimal reduction in precision.


Assuntos
Algoritmos , Biometria/métodos , Modelos Estatísticos , Antibacterianos/uso terapêutico , China , Simulação por Computador , Uso de Medicamentos , Humanos , Prescrição Inadequada , Infecções Respiratórias/tratamento farmacológico
4.
J Stat Plan Inference ; 194: 106-121, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29358843

RESUMO

As the US health care system undergoes unprecedented changes, the need for adequately powered studies to understand the multiple levels of main and interaction factors that influence patient and other care outcomes in hierarchical settings has taken center stage. We consider two-level models where n lower-level units are nested within each of J higher-level clusters (e.g. patients within practices and practices within networks) and where two factors may have arbitrary a and b factor levels, respectively. Both factors may represent a × b treatment combinations, or one of them may be a pretreatment covariate. Consideration of both factors at the same higher or lower hierarchical level, or one factor per hierarchical level yields a cluster (C), multisite (M) or split-plot randomized design (S). We express statistical power to detect main, interaction, or any treatment effects as a function of sample sizes (n, J), a and b factor levels, intraclass correlation ρ and effect sizes δ given each design d ∈ {C, M, S}. The power function given a, b, ρ, δ and d determines adequate sample sizes to achieve a minimum power requirement. Next, we compare the impact of the designs on power to facilitate selection of optimal design and sample sizes in a way that minimizes the total cost given budget and logistic constraints. Our approach enables accurate and conservative power computation with a priori knowledge of only three effect size differences regardless of how large a × b is, simplifying previously available computation methods for health services and other researches.

5.
J Biopharm Stat ; 27(2): 220-232, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28060570

RESUMO

Large sample size imbalance is not uncommon in the biosimilar development. At the beginning of a product development, sample sizes of a biosimilar and a reference product may be limited. Thus, a sample size calculation may not be feasible. During the development stage, more batches of reference products may be added at a later stage to have a more reliable estimate of the reference variability. On the other hand, we also need a sufficient number of biosimilar batches in order to have a better understanding of the product. Those challenges lead to a potential sample size imbalance. In this paper, we show that large sample size imbalance may increase the power of the equivalence test in an unfavorable way, giving higher power for less similar products when the sample size of biosimilar is much smaller than that of the reference product. Thus, it is necessary to make some sample size imbalance adjustments to motivate sufficient sample size for biosimilar as well. This paper discusses two adjustment methods for the equivalence test in analytical biosimilarity studies. Please keep in mind that sufficient sample sizes for both biosimilar and reference products (if feasible) are desired during the planning stage.


Assuntos
Medicamentos Biossimilares/normas , Interpretação Estatística de Dados , Projetos de Pesquisa , Tamanho da Amostra , Humanos
6.
Patterns (N Y) ; 5(2): 100910, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38370125

RESUMO

Big genomic data and artificial intelligence (AI) are ushering in an era of precision medicine, providing opportunities to study previously under-represented subtypes and rare diseases rather than categorize them as variances. However, clinical researchers face challenges in accessing such novel technologies as well as reliable methods to study small datasets or subcohorts with unique phenotypes. To address this need, we developed an integrative approach, GAiN, to capture patterns of gene expression from small datasets on the basis of an ensemble of generative adversarial networks (GANs) while leveraging big population data. Where conventional biostatistical methods fail, GAiN reliably discovers differentially expressed genes (DEGs) and enriched pathways between two cohorts with limited numbers of samples (n = 10) when benchmarked against a gold standard. GAiN is freely available at GitHub. Thus, GAiN may serve as a crucial tool for gene expression analysis in scenarios with limited samples, as in the context of rare diseases, under-represented populations, or limited investigator resources.

7.
Ecol Evol ; 13(11): e10747, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38020673

RESUMO

How to effectively obtain species-related low-dimensional data from massive environmental variables has become an urgent problem for species distribution models (SDMs). In this study, we will explore whether dimensionality reduction on environmental variables can improve the predictive performance of SDMs. We first used two linear (i.e., principal component analysis (PCA) and independent components analysis) and two nonlinear (i.e., kernel principal component analysis (KPCA) and uniform manifold approximation and projection) dimensionality reduction techniques (DRTs) to reduce the dimensionality of high-dimensional environmental data. Then, we established five SDMs based on the environmental variables of dimensionality reduction for 23 real plant species and nine virtual species, and compared the predictive performance of those with the SDMs based on the selected environmental variables through Pearson's correlation coefficient (PCC). In addition, we studied the effects of DRTs, model complexity, and sample size on the predictive performance of SDMs. The predictive performance of SDMs under DRTs other than KPCA is better than using PCC. And the predictive performance of SDMs using linear DRTs is better than using nonlinear DRTs. In addition, using DRTs to deal with environmental variables has no less impact on the predictive performance of SDMs than model complexity and sample size. When the model complexity is at the complex level, PCA can improve the predictive performance of SDMs the most by 2.55% compared with PCC. At the middle level of sample size, the PCA improved the predictive performance of SDMs by 2.68% compared with the PCC. Our study demonstrates that DRTs have a significant effect on the predictive performance of SDMs. Specifically, linear DRTs, especially PCA, are more effective at improving model predictive performance under relatively complex model complexity or large sample sizes.

8.
Br J Math Stat Psychol ; 75(3): 444-465, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35094381

RESUMO

Cochran's Q statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value is also used in several popular estimators of the between-study variance, τ 2 . Those applications generally have not considered the implications of its use of estimated variances in the inverse-variance weights. Importantly, those weights make approximating the distribution of Q (more explicitly, Q IV ) rather complicated. As an alternative, we investigate a new Q statistic, Q F , whose constant weights use only the studies' effective sample sizes. For the standardized mean difference as the measure of effect, we study, by simulation, approximations to distributions of Q IV and Q F , as the basis for tests of heterogeneity and for new point and interval estimators of τ 2 . These include new DerSimonian-Kacker-type moment estimators based on the first moment of Q F , and novel median-unbiased estimators. The results show that: an approximation based on an algorithm of Farebrother follows both the null and the alternative distributions of Q F reasonably well, whereas the usual chi-squared approximation for the null distribution of Q IV and the Biggerstaff-Jackson approximation to its alternative distribution are poor; in estimating τ 2 , our moment estimator based on Q F is almost unbiased, the Mandel - Paule estimator has some negative bias in some situations, and the DerSimonian-Laird and restricted maximum likelihood estimators have considerable negative bias; and all 95% interval estimators have coverage that is too high when τ 2 = 0 , but otherwise the Q-profile interval performs very well.


Assuntos
Algoritmos , Modelos Estatísticos , Simulação por Computador
9.
Eval Health Prof ; 45(1): 36-53, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35225017

RESUMO

Single-Case Experimental Designs (SCEDs) are increasingly recognized as a valuable alternative to group designs. Mediation analysis is useful in SCEDs contexts because it informs researchers about the underlying mechanism through which an intervention influences the outcome. However, methods for conducting mediation analysis in SCEDs have only recently been proposed. Furthermore, repeated measures of a target behavior present the challenges of autocorrelation and missing data. This paper aims to extend methods for estimating indirect effects in piecewise regression analysis in SCEDs by (1) evaluating three methods for modeling autocorrelation, namely, Newey-West (NW) estimation, feasible generalized least squares (FGLS) estimation, and explicit modeling of an autoregressive structure of order one (AR(1)) in the error terms and (2) evaluating multiple imputation in the presence of data that are missing completely at random. FGLS and AR(1) outperformed NW and OLS estimation in terms of efficiency, Type I error rates, and coverage, while OLS was superior to the methods in terms of power for larger samples. The performance of all methods is consistent across 0% and 20% missing data conditions. 50% missing data led to unsatisfactory power and biased estimates. In light of these findings, we provide recommendations for applied researchers.


Assuntos
Análise de Mediação , Projetos de Pesquisa , Humanos
10.
Stat Methods Med Res ; 30(7): 1667-1690, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34110941

RESUMO

Contemporary statistical publications rely on simulation to evaluate performance of new methods and compare them with established methods. In the context of random-effects meta-analysis of log-odds-ratios, we investigate how choices in generating data affect such conclusions. The choices we study include the overall log-odds-ratio, the distribution of probabilities in the control arm, and the distribution of study-level sample sizes. We retain the customary normal distribution of study-level effects. To examine the impact of the components of simulations, we assess the performance of the best available inverse-variance-weighted two-stage method, a two-stage method with constant sample-size-based weights, and two generalized linear mixed models. The results show no important differences between fixed and random sample sizes. In contrast, we found differences among data-generation models in estimation of heterogeneity variance and overall log-odds-ratio. This sensitivity to design poses challenges for use of simulation in choosing methods of meta-analysis.


Assuntos
Modelos Estatísticos , Simulação por Computador , Modelos Lineares , Razão de Chances , Tamanho da Amostra
11.
J Appl Stat ; 47(13-15): 2641-2657, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35707435

RESUMO

When applying analysis of variance, the sample sizes may not be previously known, so it is more appropriate to consider them as realizations of random variables. A motivating example is the collection of observations during a fixed time span in a study comparing, for example, several pathologies of patients arriving at a hospital. This paper extends the theory of analysis of variance to those situations considering mixed effects models. We will assume that the occurrences of observations correspond to a counting process and the sample dimensions have Poisson distribution. The proposed approach is applied to a study of cancer patients.

12.
Assessment ; 25(6): 793-800, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-27655971

RESUMO

Sample sizes of 50 have been cited as sufficient to obtain stable means and standard deviations in normative test data. The influence of skewness on this minimum number, however, has not been evaluated. Normative test data with varying levels of skewness were compiled for 12 measures from 7 tests collected as part of ongoing normative studies in Brisbane, Australia. Means and standard deviations were computed from sample sizes of 10 to 100 drawn with replacement from larger samples of 272 to 973 cases. The minimum sample size was determined by the number at which both mean and standard deviation estimates remained within the 90% confidence intervals surrounding the population estimates. Sample sizes of greater than 85 were found to generate stable means and standard deviations regardless of the level of skewness, with smaller samples required in skewed distributions. A formula was derived to compute recommended sample size at differing levels of skewness.


Assuntos
Tamanho da Amostra , Estatística como Assunto , Adolescente , Adulto , Humanos , Pessoa de Meia-Idade , Testes Neuropsicológicos , Adulto Jovem
13.
Struct Equ Modeling ; 24(5): 666-683, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29662296

RESUMO

It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results.

14.
BMC Res Notes ; 10(1): 446, 2017 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-28877742

RESUMO

BACKGROUND: In the first stage of meta-analytic structural equation modeling (MASEM), researchers synthesized studies using univariate meta-analysis (UM) and multivariate meta-analysis (MM) approaches. The MM approaches are known to be of better performance than the UM approaches in the meta-analysis with equal sized studies. However in real situations, where the studies might be of different sizes, the empirical performance of these approaches is yet to be studied in the first and second stages of MASEM. The present study aimed to evaluate the performance of the UM and MM methods, having unequal sample sizes in different primary studies. Testing the homogeneity of correlation matrices and the empirical power, estimating the pooled correlation matrix and also, estimating parameters of a path model were investigated using these approaches by simulation. RESULTS: The results of the first stage showed that Type I error rate was well under control at 0.05 level when the average sample sizes were 200 or more, irrespective of the types of the methods or the sample sizes used. Moreover, the relative percentage biases of the pooled correlation matrices were also lower than 2.5% for all methods. There was a dramatic decrease in the empirical power for all synthesis methods when the inequality of the sample sizes was increased. In fitting the path model at the second stage, MM methods provided better estimation of the parameters. CONCLUSIONS: This study showed the different performance of the four methods in the statistical power, especially when the sample sizes of primary studies were highly unequal. Moreover, in fitting the path model, the MM approaches provided better estimation of the parameters.


Assuntos
Metanálise como Assunto , Modelos Teóricos , Tamanho da Amostra
15.
Genetics ; 204(3): 921-931, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27646141

RESUMO

Single nucleotide polymorphism (SNP) set tests have been a powerful method in analyzing next-generation sequencing (NGS) data. The popular sequence kernel association test (SKAT) method tests a set of variants as random effects in the linear mixed model setting. Its P-value is calculated based on asymptotic theory that requires a large sample size. Therefore, it is known that SKAT is conservative and can lose power at small or moderate sample sizes. Given the current cost of sequencing technology, scales of NGS are still limited. In this report, we derive and implement computationally efficient, exact (nonasymptotic) score (eScore), likelihood ratio (eLRT), and restricted likelihood ratio (eRLRT) tests, ExactVCTest, that can achieve high power even when sample sizes are small. We perform simulation studies under various genetic scenarios. Our ExactVCTest (i.e., eScore, eLRT, eRLRT) exhibits well-controlled type I error. Under the alternative model, eScore P-values are universally smaller than those from SKAT. eLRT and eRLRT demonstrate significantly higher power than eScore, SKAT, and SKAT optimal (SKAT-o) across all scenarios and various samples sizes. We applied these tests to an exome sequencing study. Our findings replicate previous results and shed light on rare variant effects within genes. The software package is implemented in the open source, high-performance technical computing language Julia, and is freely available at https://github.com/Tao-Hu/VarianceComponentTest.jl Analysis of each trait in the exome sequencing data set with 399 individuals and 16,619 genes takes around 1 min on a desktop computer.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Polimorfismo de Nucleotídeo Único , Software , Exoma , Humanos
16.
Vaccine ; 33(6): 749-52, 2015 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-25454855

RESUMO

Phase 1 preventive HIV vaccine trials are often designed as randomized, double-blind studies with the inclusion of placebo recipients. Careful consideration is needed to determine when the inclusion of placebo recipients is highly advantageous and when it is optional for achieving the study objectives of assessing vaccine safety, tolerability and immunogenicity. The inclusion of placebo recipients is generally important to form a reference group that ensures fair evaluation and interpretation of subjective study endpoints, or endpoints whose levels may change due to exposures besides vaccination. In some settings, however, placebo recipients are less important because other data sources and tools are available to achieve the study objectives.


Assuntos
Vacinas contra a AIDS/imunologia , Ensaios Clínicos Fase I como Assunto , Infecções por HIV/prevenção & controle , Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa , Vacinação , Vacinas contra a AIDS/administração & dosagem , Método Duplo-Cego , Infecções por HIV/imunologia , Infecções por HIV/virologia , HIV-1/imunologia , Humanos , Placebos
17.
Eur J Cancer ; 51(9): 1082-90, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24239127

RESUMO

Detecting statistically significant trends in incidence with cancer registries data not only depends on the size of their covered population but also on the levels of incidence rates, duration of diagnostic period and type of temporal variation. We simulated sample sizes of newly diagnosed cases based on a variety of plausible levels of cancer rates and scenarios of changing trends over a period of about 30 years. Each simulated set of cases was then analysed with joinpoint regression models. The power was derived as the relative frequency of the simulation runs where the p-value of the coefficient was less than 0.05 under the alternative model. In case of a decreasing trend with no change of direction (join), an Annual Percentage Change (APC) of 1% for an average rate of 10 per 100,000 is detectable in populations of half a million inhabitants or more with a nominal power of 80%. In a model with one joinpoint followed by an increasing trend, the minimum detectable APC increases, and an APC of about 2%, can be detected only with populations of at least 2 million. For analyses requiring a larger sample size than the actual covered population, alternative organisational strategies should be considered, such as an extension of population coverage or data pooling and merging from registries with comparable data. (i.e. when heterogeneity across merging registries is low or acceptable for the specific study question).


Assuntos
Interpretação Estatística de Dados , Neoplasias/epidemiologia , Sistema de Registros , Estudos de Coortes , Simulação por Computador , Europa (Continente)/epidemiologia , União Europeia/estatística & dados numéricos , Humanos , Modelos Lineares , População , Sistema de Registros/normas , Sistema de Registros/estatística & dados numéricos , Programa de SEER/estatística & dados numéricos , Tamanho da Amostra
18.
Ther Innov Regul Sci ; 48(5): 613-622, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30231455

RESUMO

The European Union (EU) test for uniformity of dosage units using large sample sizes was published in European Pharmacopoeia 7.7 in 2012. There are 2 alternative tests. Option 1 is a parametric two-sided tolerance interval-based method modified with an indifference zone and counting units outside of (0.75 M, 1.25 M) (here, M is defined by sample mean, [Formula: see text], as M = 98.5% if [Formula: see text] < 98.5%, M = 101.5% if [Formula: see text] > 101.5%, and M = [Formula: see text] otherwise). Option 2 is a nonparametric counting method with an additional indifference-zone concept. The authors extended the parametric two one-sided tolerance interval-based method that was proposed for dose content uniformity testing based on 30 tablets to large sample sizes with the restriction that all operating characteristic curves of the two one-sided tolerance intervals for any given sample size intersect with the operating characteristic curve of the US Pharmacopoeia harmonized method for a sample size of 30 at the acceptance probability of 90% when the individual tablets with on-target mean are assumed to be normally distributed. This paper studies the acceptance probabilities in relation to the batch mean and batch standard deviation among the 2 EU options and the authors' proposed method. The acceptance probabilities of EU options 1 and 2 and the proposed method were compared using simulation; results revealed that both EU options 1 and 2 produce larger acceptance probabilities when the batch mean is off-target. Furthermore, for a given standard deviation, the acceptance probability of EU option 2 at a mean 102% of the label claim is larger than that at a mean of 100% of the label claim under the normality assumption.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa