RESUMO
Biologists currently have an assortment of high-throughput sequencing techniques allowing the study of population dynamics in increasing detail. The utility of genetic estimates depends on their ability to recover meaningful approximations while filtering out noise produced by artifacts. In this study, we empirically compared the congruence of two reduced representation approaches (genotyping-by-sequencing, GBS, and whole-exome sequencing, WES) in estimating genetic diversity and population structure using SNP markers typed in a small number of wild jaguar (Panthera onca) samples from South America. Due to its targeted nature, WES allowed for a more straightforward reconstruction of loci compared to GBS, facilitating the identification of true polymorphisms across individuals. We therefore used WES-derived metrics as a benchmark against which GBS-derived indicators were compared, adjusting parameters for locus assembly and SNP filtering in the latter. We observed significant variation in SNP call rates across samples in GBS datasets, leading to a recurrent miscalling of heterozygous sites. This issue was further amplified by small sample sizes, ultimately impacting the consistency of summary statistics between genotyping methods. Recognizing that the genetic markers obtained from GBS and WES are intrinsically different due to varying evolutionary pressures, particularly selection, we consider that our empirical comparison offers valuable insights and highlights critical considerations for estimating population genetic attributes using reduced representation datasets. Our results emphasize the critical need for careful evaluation of missing data and stringent filtering to achieve reliable estimates of genetic diversity and differentiation in elusive wildlife species.
Assuntos
Sequenciamento do Exoma , Panthera , Polimorfismo de Nucleotídeo Único , Animais , Panthera/genética , Sequenciamento do Exoma/métodos , Técnicas de Genotipagem/métodos , Genética Populacional , Tamanho da Amostra , Sequenciamento de Nucleotídeos em Larga Escala , Variação Genética , Animais Selvagens/genéticaRESUMO
OBJECTIVE: While statistical analysis plays a crucial role in medical science, some published studies might have utilized suboptimal analysis methods, potentially undermining the credibility of their findings. Critically appraising analytical approaches can help elevate the standard of evidence and ensure clinicians and other stakeholders have trustworthy results on which to base decisions. The aim of the present study was to examine the statistical characteristics of original articles published in Peruvian medical journals in 2021-2022. DESIGN AND SETTING: We performed a methodological study of articles published between 2021 and 2022 from nine medical journals indexed in SciELO-Peru, Scopus, and Medline. We included original articles that conducted analytical analyses (i.e., association between variables). The statistical variables assessed were: statistical software used for analysis, sample size, and statistical methods employed (measures of effect), controlling for confounders, and the method employed for confounder control or epidemiological approaches. RESULTS: We included 313 articles (ranging from 11 to 77 across journals), of which 67.7% were cross-sectional studies. While 90.7% of articles specified the statistical software used, 78.3% omitted details on sample size calculation. Descriptive and bivariate statistics were commonly employed, whereas measures of association were less common. Only 13.4% of articles (ranging from 0% to 39% across journals) presented measures of effect controlling for confounding and explained the criteria for selecting such confounders. CONCLUSION: This study revealed important statistical deficiencies within analytical studies published in Peruvian journals, including inadequate reporting of sample sizes, absence of measures of association and confounding control, and suboptimal explanations regarding the methodologies employed for adjusted analyses. These findings highlight the need for better statistical reporting and researcher-editor collaboration to improve the quality of research production and dissemination in Peruvian journals.
Assuntos
Publicações Periódicas como Assunto , Peru , Publicações Periódicas como Assunto/estatística & dados numéricos , Humanos , Tamanho da Amostra , Editoração/estatística & dados numéricos , Projetos de PesquisaRESUMO
BACKGROUND: The physical therapy profession has made efforts to increase the use of confidence intervals due to the valuable information they provide for clinical decision-making. Confidence intervals indicate the precision of the results and describe the strength and direction of a treatment effect measure. OBJECTIVES: To determine the prevalence of reporting of confidence intervals, achievement of intended sample size, and adjustment for multiple primary outcomes in randomised trials of physical therapy interventions. METHODS: We randomly selected 100 trials published in 2021 and indexed on the Physiotherapy Evidence Database. Two independent reviewers extracted the number of participants, any sample size calculation, and any adjustments for multiple primary outcomes. We extracted whether at least one between-group comparison was reported with a 95 % confidence interval and whether any confidence intervals were interpreted. RESULTS: The prevalence of use of confidence intervals was 47 % (95 % CI 38, 57). Only 6 % of trials (95 % CI: 3, 12) both reported and interpreted a confidence interval. Among the 100 trials, 59 (95 % CI: 49, 68) calculated and achieved the required sample size. Among the 100 trials, 19 % (95 % CI: 13, 28) had a problem with unadjusted multiplicity on the primary outcomes. CONCLUSIONS: Around half of trials of physical therapy interventions published in 2021 reported confidence intervals around between-group differences. This represents an increase of 5 % from five years earlier. Very few trials interpreted the confidence intervals. Most trials reported a sample size calculation, and among these most achieved that sample size. There is still a need to increase the use of adjustment for multiple comparisons.
Assuntos
Modalidades de Fisioterapia , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Tamanho da Amostra , Intervalos de ConfiançaRESUMO
At some moment in our lives, we are probably faced with the following question: How likely is it that you would recommend [company X] to a friend or colleague?. This question is related to the Net Promoter Score (NPS), a simple measure used by several companies as indicator of customer loyalty. Even though it is a well-known measure in the business world, studies that address the statistical properties or the sample size determination problem related to this measure are still scarce. We adopt a Bayesian approach to provide point and interval estimators for the NPS and discuss the determination of the sample size. Computational tools were implemented to use this methodology in practice. An illustrative example with data from financial services is also presented.
Assuntos
Teorema de Bayes , Tamanho da Amostra , Humanos , Comportamento do ConsumidorRESUMO
Modern randomization methods in clinical trials are invariably adaptive, meaning that the assignment of the next subject to a treatment group uses the accumulated information in the trial. Some of the recent adaptive randomization methods use mathematical programming to construct attractive clinical trials that balance the group features, such as their sizes and covariate distributions of their subjects. We review some of these methods and compare their performance with common covariate-adaptive randomization methods for small clinical trials. We introduce an energy distance measure that compares the discrepancy between the two groups using the joint distribution of the subjects' covariates. This metric is more appealing than evaluating the discrepancy between the groups using their marginal covariate distributions. Using numerical experiments, we demonstrate the advantages of the mathematical programming methods under the new measure. In the supplementary material, we provide R codes to reproduce our study results and facilitate comparisons of different randomization procedures.
Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Projetos de Pesquisa/estatística & dados numéricos , Distribuição Aleatória , Modelos Estatísticos , Tamanho da AmostraRESUMO
Computing the agreement between 2 continuous sequences is of great interest in statistics when comparing 2 instruments or one instrument with a gold standard. The probability of agreement quantifies the similarity between 2 variables of interest, and it is useful for determining what constitutes a practically important difference. In this article, we introduce a generalization of the PA for the treatment of spatial variables. Our proposal makes the PA dependent on the spatial lag. We establish the conditions for which the PA decays as a function of the distance lag for isotropic stationary and nonstationary spatial processes. Estimation is addressed through a first-order approximation that guarantees the asymptotic normality of the sample version of the PA. The sensitivity of the PA with respect to the covariance parameters is studied for finite sample size. The new method is described and illustrated with real data involving autumnal changes in the green chromatic coordinate (Gcc), an index of "greenness" that captures the phenological stage of tree leaves, is associated with carbon flux from ecosystems, and is estimated from repeated images of forest canopies.
Assuntos
Ecossistema , Florestas , Probabilidade , Tamanho da AmostraRESUMO
OBJECTIVE: To understand the social representations of people with tuberculosis about the disease and its implications for following treatment. METHOD: A descriptive, qualitative study based on the Theory of Social Representations. It was conducted in a municipal health unit in the city of Belém. The participants were people diagnosed with tuberculosis and undergoing directly observed treatment, with the sample size defined by the data saturation technique. Data collection was done through semi-structured interviews. For data analysis it was used thematic content analysis. RESULTS: The records converged into three categories: Representations of tuberculosis and its impacts on the diagnosis; The faces of treatment: challenges facing follow-up and hope; and Constructions of living with the disease in family and society. FINAL CONSIDERATIONS: Living with the disease transforms everyday life and relationships. Discrimination and prejudice denote the need to reconfigure such representations for patients to be embraced.
Assuntos
Tuberculose , Humanos , Coleta de Dados , Preconceito , Pesquisa Qualitativa , Tamanho da AmostraRESUMO
The regulatory EMA's reference scaled average bioequivalence (RSABE) approach for highly variable drugs suffers from some type I error control problems at the neighborhood of the 30% coefficient of variation (CV), where the bioequivalence (BE) limits change from constant to linearly scaled. This paper analyses BE inference methods based on the "Leveling-off" (LO) soft sigmoid expanding BE limits that were proposed as an appealing surrogate for the EMA's limits and compares both approaches, on the replicated and partially replicated crossover designs. The initially proposed version of the LO method also has type I error inflation problems, albeit attenuated. But given its more mathematically regular character, it is more suitable for analytical corrections. Here we introduce two improvements over LO, one based on the application of Howe's method and the other based on correcting the estimation error. They further reduce the type I error inflation, although it does not disappear completely. Finally, the effect of heteroscedasticity on the above results is studied. It leads to inflation or deflation of the type I error, depending on the design and the type of heteroscedasticity (variability of the test product greater than that of the reference product or the opposite). The replicated design is much more stable against these effects than the partially replicated design and maintains these improvements much better.
Assuntos
Equivalência Terapêutica , Humanos , Estudos Cross-Over , Tamanho da AmostraRESUMO
It is important to determine the number of plants to be evaluated to allow accurate inferences about the traits under evaluation. Investigating the linear relations among traits is important for identifying traits for indirect selection. So, the objectives of this study were to determine the sample size (number of plants) necessary to estimate the means of forage pea traits and to investigate the relations among the traits. Experiments were carried out in 2021 with three sowing dates (May 3, May 26 and July 13). Five hundred plants were randomly sampled, 100 plants in each of the five evaluation dates (June 25, August 30, July 24, September 17, September 16). In these 500 plants, the traits plant height, number of branches, number of nodes, number of leaves, number of pods, fresh matter of leaves, fresh matter of stems, fresh matter of pods, fresh matter of shoots, dry matter of leaves, dry matter of stems, dry matter of pods, and dry matter of shoots, were evaluated. The sample size was calculated to estimate the means of these traits, based on Student's t-distribution, and the relations among traits were investigated through correlation and path analysis. In an experiment, to estimate the means of these 13 traits of forage pea, with an estimation error of approximately 10% of the mean, 99 plants per treatment should be sampled. The numbers of pods and leaves have a positive linear relations with fresh and dry matter of shoots.
É importante dimensionar o número de plantas a serem avaliadas para possibilitar inferências precisas sobre os caracteres em avaliação. Investigar as relações lineares entre caracteres é importante para a identificação de caracteres para a seleção indireta. Assim, os objetivos deste trabalho foram determinar o tamanho de amostra (número de plantas) necessário para a estimação da média de caracteres de ervilha forrageira e investigar as relações entre os caracteres. Foram conduzidos experimentos, no ano de 2021, em três datas de semeadura (03 de maio, 26 de maio e 13 de julho). Foram amostradas, aleatoriamente, 500 plantas, sendo 100 plantas em cada uma das cinco datas de avaliação (25 de junho, 30 de agosto, 24 de julho, 17 de setembro e 16 de setembro). Nessas 500 plantas avaliaram-se os caracteres altura de planta, número de ramificações, número de nós, número de folhas, número de legumes, matéria fresca de folhas, matéria fresca de caule, matéria fresca de legumes, matéria fresca de parte aérea, matéria seca de folhas, matéria seca de caule, matéria seca de legumes e matéria seca de parte aérea. Foi calculado o tamanho de amostra para a estimação da média desses caracteres, com base na distribuição t de Student e investigada a relação entre os caracteres por meio de análises de correlação e de trilha. Em um experimento, para a estimação da média desses 13 caracteres de ervilha forrageira, com erro de estimação de aproximadamente 10% da média, devem ser amostradas 99 plantas por tratamento. Os números de legumes e de folhas têm relação linear positiva com as matérias fresca e seca de parte aérea.
Assuntos
Tamanho da Amostra , Pisum sativum , Melhoramento VegetalRESUMO
We propose a new family of distributions, so-called the unit ratio-extended Weibull family ([Formula: see text]). It is derived from ratio transformation in an extended Weibull random variable. The use of this transformation is a novelty of the work since it has been less explored than the exponential and has not yet been studied within the extended Weibull class. Moreover, we offer a valuable alternative to model double-bounded variables on the unit interval. Five [Formula: see text] special models are studied in detail, namely the: i) unit ratio-Gompertz; ii) unit ratio-Burr XII; iii) unit ratio-Lomax; v) unit ratio-Rayleigh, and vi) unit ratio-Weibull distributions. We propose a quantile-parameterization for the new family. The maximum likelihood estimators (MLEs) are presented. A Monte Carlo study is performed to evaluate the behavior of the MLEs of unit ratio-Gompertz and unit ratio-Rayleigh distributions. This last model has closed-form and approximately unbiased MLE for small sample sizes. Further, the [Formula: see text] submodels are adjusted to the dropout rate in Brazilian undergraduate courses. We focus on the areas of civil engineering, economics, computer sciences, and control engineering. The applications show that the new family is suitable for modeling educational data and may provide effective alternatives compared to other usual unit models, such as the Beta, Kumaraswamy, and unit gamma distributions. They can also outperform some recent contributions in the unit distribution literature. Thus, the [Formula: see text] family can provide competitive alternatives when those models are unsuitable.
Assuntos
Engenharia , Brasil , Distribuições Estatísticas , Tamanho da Amostra , Método de Monte CarloRESUMO
Statistical sequential analysis of binary data is an important tool in clinical trials such as placebo-controlled trials, where a total of K individuals are randomly allocated into two groups, one of size κ1 receiving the treatment/drug, and the other of size κ2 for placebo. The ratio z=κ2/κ1, namely "matching ratio," determines the expected proportion of adverse events from the treatment group among the κ1+κ2 individuals. Bernoulli-based designs are used for monitoring the safety of post-licensed drugs and vaccines as well. For instance, in a self-control design, z is the ratio between the risk and the control time windows. Irrespective of the type of application, the choice of z is a critical design criterion as it determines the sample size, the statistical power, the expected sample size, and the expected time to signal the sequential procedure. In this paper, we run exact calculations to offer a statistical rule of thumb for the choice of z. All the calculations and examples are performed using the R Sequential package.
Assuntos
Projetos de Pesquisa , Vacinas , Humanos , Tamanho da AmostraRESUMO
It is important to adequately size the number of plants that should be evaluated to allow precise inferences about the traits under evaluation. The study of the linear relations among traits provides important information, especially in the identification of traits for indirect selection. So, the objectives of this work were to determine the sample size (number of plants) to estimate the mean of Crotalaria spectabilis traits and investigate the relations among traits. Were randomly selected 200 and 110 plants of C. spectabilis in the experiments conducted, respectively, in 2019/2020 and 2020/2021. In these 310 plants, the following traits were evaluated: plant height, stem diameter, number of nodes, number of leaves, leaf fresh matter, stem fresh matter, shoot fresh matter, leaf dry matter, stem dry matter and shoot dry matter. The sample size was calculated to estimate the mean of these traits, based on Student's t-distribution, and the relations among traits were investigated through correlation and path analysis. To estimate the mean of these ten traits of C. spectabilis, with a maximum error of 10% of the mean and 95% confidence level, 64 plants are needed. In an experiment, to estimate the mean of each treatment with 10% precision, 64 plants per treatment must be evaluated. The number of leaves has a positive linear relation with the amount leaf, stem and shoot fresh and dry matter.
É importante dimensionar adequadamente o número de plantas que devem ser avaliadas para possibilitar inferências precisas sobre os caracteres em avaliação. O estudo das relações lineares entre caracteres fornece informações importantes, especialmente, na identificação de caracteres para seleção indireta. Assim, os objetivos deste trabalho foram determinar o tamanho de amostra (número de plantas) necessário para a estimação da média de caracteres de Crotalaria spectabilis e investigar as relações entre os caracteres. Foram selecionadas, aleatoriamente, 200 e 110 plantas de C. spectabilis, nos experimentos conduzidos, respectivamente, em 2019/2020 e 2020/2021. Nessas 310 plantas avaliaram-se os caracteres altura de planta, diâmetro de caule, número de nós, número de folhas, matéria fresca de folhas, matéria fresca de caule, matéria fresca de parte aérea, matéria seca de folhas, matéria seca de caule e matéria seca de parte aérea. Foi calculado o tamanho de amostra para a estimação da média desses caracteres, com base na distribuição t de Student e investigada a relação entre os caracteres por meio de análises de correlação e de trilha. Para a estimação da média, desses dez caracteres de C. spectabilis, com erro máximo de 10% da média e grau de confiança de 95%, são necessárias 64 plantas. Em um experimento, para a estimação da média de cada tratamento com 10% de precisão, devem ser avaliadas 64 plantas por tratamento. O número de folhas tem relação linear positiva com a quantidade de matérias fresca e seca de folhas, do caule e de parte aérea.
Assuntos
Modelos Lineares , Tamanho da Amostra , CrotalariaRESUMO
In conceptual properties norming studies (CPNs), participants list properties that describe a set of concepts. From CPNs, many different parameters are calculated, such as semantic richness. A generally overlooked issue is that those values are only point estimates of the true unknown population parameters. In the present work, we present an R package that allows us to treat those values as population parameter estimates. Relatedly, a general practice in CPNs is using an equal number of participants who list properties for each concept (i.e., standardizing sample size). As we illustrate through examples, this procedure has negative effects on data's statistical analyses. Here, we argue that a better method is to standardize coverage (i.e., the proportion of sampled properties to the total number of properties that describe a concept), such that a similar coverage is achieved across concepts. When standardizing coverage rather than sample size, it is more likely that the set of concepts in a CPN all exhibit a similar representativeness. Moreover, by computing coverage the researcher can decide whether the CPN reached a sufficiently high coverage, so that its results might be generalizable to other studies. The R package we make available in the current work allows one to compute coverage and to estimate the necessary number of participants to reach a target coverage. We show this sampling procedure by using the R package on real and simulated CPN data.
Assuntos
Projetos de Pesquisa , Semântica , Humanos , Tamanho da AmostraRESUMO
Sample size and statistical power are often limited in pediatric cardiology studies due to the relative infrequency of specific congenital malformations of the heart and specific circulatory physiologies. The primary aim of this study was to determine what proportion of pediatric cardiology randomized controlled trials achieve an 80% statistical power. Secondary aims included characterizing reporting habits in these studies. A systematic review was performed to identify pertinent pediatric cardiology randomized controlled trials. The following data were collected: publication year, journal, if "power" or "sample size" were mentioned if a discrete, primary endpoint was identified. Power analyses were conducted to assess if the sample size was adequate to demonstrate results at 80% power with a p-value of less than 0.05. A total of 83 pediatric cardiology randomized controlled trials were included. Of these studies, 48% mentioned "power" or "sample size" in the methods, 49% mentioned either in the results, 12% mentioned either in the discussion, and 66% mentioned either at any point in the manuscript. 63% defined a discrete, primary endpoint. 38 studies (45%) had an adequate sample size to demonstrate differences with 80% power at a p-value of less than 0.05. A majority of these are not powered to reach the conventionally accepted 80% power target. Adequately powered studies were found to be more likely to report "power" or "sample size" and have a discrete, primary endpoint.
Assuntos
Cardiologia , Humanos , Criança , Ensaios Clínicos Controlados Aleatórios como Assunto , Tamanho da AmostraRESUMO
This study verified whether sample size would affect the precision of the analysis of variance in experiments with cauliflower seedlings. An experiment was carried out where the number of leaves and shoot, root and total length were measured. For each variable, resamplings with repositions were performed in sample scenarios of 1, 2, , 100 seedlings per experimental unit, and the sample size was defined for the variance components through Schumacher models and maximum curvature points. The mean squares of the analysis of variance suffer direct interference from the number of sampled seedlings. The sampling of 16 seedlings per experimental unit is enough to estimate the analysis of variance reliably, promoting satisfactory precision gains compared to the sampling of only one seedling per experimental unit.
Este estudo verificou se o tamanho de amostra afetaria a precisão da análise de variância em experimentos com mudas de couve-flor. Um experimento foi conduzido onde o número de folhas, comprimento de parte aérea, raiz e total foram mensurados. Para cada variável, reamostragens com reposição foram realizadas em cenários amostrais de 1, 2, ..., 100 mudas por unidade experimental e o tamanho de amostra foi definido para os componentes de variância por meio de modelos de Schumacher e pontos de máxima curvatura. Os quadrados médios da análise de variância sofrem interferência direta do número de mudas amostradas. A amostragem de 16 mudas por unidade experimental é suficiente para estimar a análise de variância de forma confiável, promovendo satisfatórios ganhos de precisão ao comparar-se com a amostragem de apenas uma muda por unidade experimental.
Assuntos
Brassica/crescimento & desenvolvimento , Análise de Variância , Tamanho da Amostra , Brotos de Planta/crescimento & desenvolvimentoRESUMO
This study analyzed the interference of sample size on Tukey's test for non-additivity and found the sample size to optimize the test for soybean grain yield. Six experiments were conducted in a completely randomized block design with either 20 or 30 cultivars and three repetitions of each treatment. Grain yield was determined per plant, totaling 9,000 sampled plants. Next, sample scenarios up to 100 plants were simulated, estimating F statistic for a degree of freedom of the error in each scenario. After that, the optimal sample size was defined via power models and maximum curvature point. Results showed the number of sampled plants per experimental unit influences the estimates of Tukey's test for non-additivity. Also, the sampling of 14 to 19 plants per experimental unit allows for maintaining the accuracy of the test.
Os objetivos deste estudo foram analisar a interferência do tamanho amostral no teste de não aditividade de Tukey e encontrar o tamanho de amostra para otimizar o teste para a produtividade de grãos em soja. Seis experimentos em delineamento de blocos ao acaso foram conduzidos com 20 ou 30 cultivares de soja em três repetições de cada tratamento. A produtividade de grãos foi definida por planta, totalizando 9.000 plantas amostradas. A seguir, foram simulados cenários amostrais de até 100 plantas, estimando a estatística F para um grau de liberdade do erro em cada cenário. Após, foi definido o tamanho amostral ótimo via modelos de potência e pontos de máxima curvatura. Os resultados mostram que o número de plantas amostradas por unidade experimental influencia as estimativas do teste de não aditividade de Tukey. Além disso, a amostragem de 14 a 19 plantas por unidade experimental possibilita manter a acurácia do teste.
Assuntos
Glycine max , Análise de Variância , Tamanho da AmostraRESUMO
OBJECTIVES.: This article introduces randomized clinical trials and basic concepts of statistical inference. We present methods for calculating the sample size by outcome type and the hypothesis to be tested, together with the code in the R programming language. We describe four methods for adjusting the original sample size for interim analyses. We sought to introduce these topics in a simple and concrete way, considering the mathematical expressions that support the results and their implementation in available statistical programs; therefore, bringing health students closer to statistics and the use of statistical programs, which are aspects that are rarely considered during their training.