|

1.

Investigation of normalization procedures for transcriptome profiles of compounds oriented toward practical study design.

Mizuno, Tadahaya; Kusuhara, Hiroyuki.

J Toxicol Sci ; 49(6): 249-259, 2024.

Article En | MEDLINE | ID: mdl-38825484

The transcriptome profile is a representative phenotype-based descriptor of compounds, widely acknowledged for its ability to effectively capture compound effects. However, the presence of batch differences is inevitable. Despite the existence of sophisticated statistical methods, many of them presume a substantial sample size. How should we design a transcriptome analysis to obtain robust compound profiles, particularly in the context of small datasets frequently encountered in practical scenarios? This study addresses this question by investigating the normalization procedures for transcriptome profiles, focusing on the baseline distribution employed in deriving biological responses as profiles. Firstly, we investigated two large GeneChip datasets, comparing the impact of different normalization procedures. Through an evaluation of the similarity between response profiles of biological replicates within each dataset and the similarity between response profiles of the same compound across datasets, we revealed that the baseline distribution defined by all samples within each batch under batch-corrected condition is a good choice for large datasets. Subsequently, we conducted a simulation to explore the influence of the number of control samples on the robustness of response profiles across datasets. The results offer insights into determining the suitable quantity of control samples for diminutive datasets. It is crucial to acknowledge that these conclusions stem from constrained datasets. Nevertheless, we believe that this study enhances our understanding of how to effectively leverage transcriptome profiles of compounds and promotes the accumulation of essential knowledge for the practical application of such profiles.

Gene Expression Profiling , Research Design , Transcriptome , Gene Expression Profiling/methods , Humans , Oligonucleotide Array Sequence Analysis , Sample Size , Animals

2.

Multi-arm multi-stage (MAMS) randomised selection designs: impact of treatment selection rules on the operating characteristics.

Choodari-Oskooei, Babak; Blenkinsop, Alexandra; Handley, Kelly; Pinkney, Thomas; Parmar, Mahesh K B.

BMC Med Res Methodol ; 24(1): 124, 2024 Jun 03.

Article En | MEDLINE | ID: mdl-38831421

BACKGROUND: Multi-arm multi-stage (MAMS) randomised trial designs have been proposed to evaluate multiple research questions in the confirmatory setting. In designs with several interventions, such as the 8-arm 3-stage ROSSINI-2 trial for preventing surgical wound infection, there are likely to be strict limits on the number of individuals that can be recruited or the funds available to support the protocol. These limitations may mean that not all research treatments can continue to accrue the required sample size for the definitive analysis of the primary outcome measure at the final stage. In these cases, an additional treatment selection rule can be applied at the early stages of the trial to restrict the maximum number of research arms that can progress to the subsequent stage(s). This article provides guidelines on how to implement treatment selection within the MAMS framework. It explores the impact of treatment selection rules, interim lack-of-benefit stopping boundaries and the timing of treatment selection on the operating characteristics of the MAMS selection design. METHODS: We outline the steps to design a MAMS selection trial. Extensive simulation studies are used to explore the maximum/expected sample sizes, familywise type I error rate (FWER), and overall power of the design under both binding and non-binding interim stopping boundaries for lack-of-benefit. RESULTS: Pre-specification of a treatment selection rule reduces the maximum sample size by approximately 25% in our simulations. The familywise type I error rate of a MAMS selection design is smaller than that of the standard MAMS design with similar design specifications without the additional treatment selection rule. In designs with strict selection rules - for example, when only one research arm is selected from 7 arms - the final stage significance levels can be relaxed for the primary analyses to ensure that the overall type I error for the trial is not underspent. When conducting treatment selection from several treatment arms, it is important to select a large enough subset of research arms (that is, more than one research arm) at early stages to maintain the overall power at the pre-specified level. CONCLUSIONS: Multi-arm multi-stage selection designs gain efficiency over the standard MAMS design by reducing the overall sample size. Diligent pre-specification of the treatment selection rule, final stage significance level and interim stopping boundaries for lack-of-benefit are key to controlling the operating characteristics of a MAMS selection design. We provide guidance on these design features to ensure control of the operating characteristics.

Randomized Controlled Trials as Topic , Research Design , Humans , Randomized Controlled Trials as Topic/methods , Sample Size , Patient Selection

3.

Upstrapping to determine futility: predicting future outcomes nonparametrically from past data.

Wild, Jessica L; Ginde, Adit A; Lindsell, Christopher J; Kaizer, Alexander M.

Trials ; 25(1): 312, 2024 May 09.

Article En | MEDLINE | ID: mdl-38725072

BACKGROUND: Clinical trials often involve some form of interim monitoring to determine futility before planned trial completion. While many options for interim monitoring exist (e.g., alpha-spending, conditional power), nonparametric based interim monitoring methods are also needed to account for more complex trial designs and analyses. The upstrap is one recently proposed nonparametric method that may be applied for interim monitoring. METHODS: Upstrapping is motivated by the case resampling bootstrap and involves repeatedly sampling with replacement from the interim data to simulate thousands of fully enrolled trials. The p-value is calculated for each upstrapped trial and the proportion of upstrapped trials for which the p-value criteria are met is compared with a pre-specified decision threshold. To evaluate the potential utility for upstrapping as a form of interim futility monitoring, we conducted a simulation study considering different sample sizes with several different proposed calibration strategies for the upstrap. We first compared trial rejection rates across a selection of threshold combinations to validate the upstrapping method. Then, we applied upstrapping methods to simulated clinical trial data, directly comparing their performance with more traditional alpha-spending and conditional power interim monitoring methods for futility. RESULTS: The method validation demonstrated that upstrapping is much more likely to find evidence of futility in the null scenario than the alternative across a variety of simulations settings. Our three proposed approaches for calibration of the upstrap had different strengths depending on the stopping rules used. Compared to O'Brien-Fleming group sequential methods, upstrapped approaches had type I error rates that differed by at most 1.7% and expected sample size was 2-22% lower in the null scenario, while in the alternative scenario power fluctuated between 15.7% lower and 0.2% higher and expected sample size was 0-15% lower. CONCLUSIONS: In this proof-of-concept simulation study, we evaluated the potential for upstrapping as a resampling-based method for futility monitoring in clinical trials. The trade-offs in expected sample size, power, and type I error rate control indicate that the upstrap can be calibrated to implement futility monitoring with varying degrees of aggressiveness and that performance similarities can be identified relative to considered alpha-spending and conditional power futility monitoring methods.

Clinical Trials as Topic , Computer Simulation , Medical Futility , Research Design , Humans , Clinical Trials as Topic/methods , Sample Size , Data Interpretation, Statistical , Models, Statistical , Treatment Outcome

4.

Using Bayesian statistics in confirmatory clinical trials in the regulatory setting: a tutorial review.

Lee, Se Yoon.

BMC Med Res Methodol ; 24(1): 110, 2024 May 07.

Article En | MEDLINE | ID: mdl-38714936

Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, such as historical data or another source of co-data. In recent years, there has been a significant increase in regulatory submissions using Bayesian statistics due to its flexibility and ability to provide valuable insights for decision-making, addressing the modern complexity of clinical trials where frequentist trials are inadequate. For regulatory submissions, companies often need to consider the frequentist operating characteristics of the Bayesian analysis strategy, regardless of the design complexity. In particular, the focus is on the frequentist type I error rate and power for all realistic alternatives. This tutorial review aims to provide a comprehensive overview of the use of Bayesian statistics in sample size determination, control of type I error rate, multiplicity adjustments, external data borrowing, etc., in the regulatory environment of clinical trials. Fundamental concepts of Bayesian sample size determination and illustrative examples are provided to serve as a valuable resource for researchers, clinicians, and statisticians seeking to develop more complex and innovative designs.

Bayes Theorem , Clinical Trials as Topic , Humans , Clinical Trials as Topic/methods , Clinical Trials as Topic/statistics & numerical data , Research Design/standards , Sample Size , Data Interpretation, Statistical , Models, Statistical

5.

Innovative statistical approaches: the use of neural networks reduces the sample size in the splenectomy-MCAO mouse model.

Romic, Dominik; Berecki, Monika; Srakocic, Sanja; Josic Dominovic, Paula; Justic, Helena; Hamer, Dominik; Petrinec, Daniela; Radmilovic, Marina; Hackenberger, Branimir; Gajovic, Srecko; Glasnovic, Anton.

Croat Med J ; 65(2): 122-137, 2024 Apr 30.

Article En | MEDLINE | ID: mdl-38706238

AIM: To compare the effectiveness of artificial neural network (ANN) and traditional statistical analysis on identical data sets within the splenectomy-middle carotid artery occlusion (MCAO) mouse model. METHODS: Mice were divided into the splenectomized (SPLX) and sham-operated (SPLX-sham) group. A splenectomy was conducted 14 days before middle carotid artery occlusion (MCAO). Magnetic resonance imaging (MRI), bioluminescent imaging, neurological scoring (NS), and histological analysis, were conducted at two, four, seven, and 28 days after MCAO. Frequentist statistical analyses and ANN analysis employing a multi-layer perceptron architecture were performed to assess the probability of discriminating between SPLX and SPLX-sham mice. RESULTS: Repeated measures ANOVA showed no significant differences in body weight (F (5, 45)=0.696, P=0.629), NS (F (2.024, 18.218)=1.032, P=0.377) and brain infarct size on MRI between the SPLX and SPLX-sham groups post-MCAO (F (2, 24)=0.267, P=0.768). ANN analysis was employed to predict SPLX and SPL-sham classes. The highest accuracy in predicting SPLX class was observed when the model was trained on a data set containing all variables (0.7736±0.0234). For SPL-sham class, the highest accuracy was achieved when it was trained on a data set excluding the variable combination MR contralateral/animal mass/NS (0.9284±0.0366). CONCLUSION: This study validated the neuroprotective impact of splenectomy in an MCAO model using ANN for data analysis with a reduced animal sample size, demonstrating the potential for leveraging advanced statistical methods to minimize sample sizes in experimental biomedical research.

Disease Models, Animal , Infarction, Middle Cerebral Artery , Magnetic Resonance Imaging , Neural Networks, Computer , Splenectomy , Animals , Mice , Splenectomy/methods , Infarction, Middle Cerebral Artery/surgery , Infarction, Middle Cerebral Artery/diagnostic imaging , Sample Size , Male

6.

Sequential covariate-adjusted randomization via hierarchically minimizing Mahalanobis distance and marginal imbalance.

Yang, Haoyu; Qin, Yichen; Li, Yang; Hu, Feifang.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38801258

In comparative studies, covariate balance and sequential allocation schemes have attracted growing academic interest. Although many theoretically justified adaptive randomization methods achieve the covariate balance, they often allocate patients in pairs or groups. To better meet the practical requirements where the clinicians cannot wait for other participants to assign the current patient for some economic or ethical reasons, we propose a method that randomizes patients individually and sequentially. The proposed method conceptually separates the covariate imbalance, measured by the newly proposed modified Mahalanobis distance, and the marginal imbalance, that is the sample size difference between the 2 groups, and it minimizes them with an explicit priority order. Compared with the existing sequential randomization methods, the proposed method achieves the best possible covariate balance while maintaining the marginal balance directly, offering us more control of the randomization process. We demonstrate the superior performance of the proposed method through a wide range of simulation studies and real data analysis, and also establish theoretical guarantees for the proposed method in terms of both the convergence of the imbalance measure and the subsequent treatment effect estimation.

Computer Simulation , Randomized Controlled Trials as Topic , Humans , Randomized Controlled Trials as Topic/statistics & numerical data , Randomized Controlled Trials as Topic/methods , Biometry/methods , Models, Statistical , Data Interpretation, Statistical , Random Allocation , Sample Size , Algorithms

7.

Tenecteplase for Stroke at 4.5 to 24 Hours.

Goyal, Mayank; Bosshart, Salome L; Ospel, Johanna M.

N Engl J Med ; 390(18): 1729, 2024 May 09.

Article En | MEDLINE | ID: mdl-38718369

Clinical Trials as Topic , Fibrinolytic Agents , Sample Size , Stroke , Tenecteplase , Humans , Fibrinolytic Agents/therapeutic use , Fibrinolytic Agents/adverse effects , Stroke/drug therapy , Tenecteplase/therapeutic use , Time-to-Treatment

8.

Tenecteplase for Stroke at 4.5 to 24 Hours. Reply.

Albers, Gregory W; Purdon, Barbara; Campbell, Bruce C V.

N Engl J Med ; 390(18): 1729, 2024 May 09.

Article En | MEDLINE | ID: mdl-38718370

Clinical Trials as Topic , Fibrinolytic Agents , Sample Size , Stroke , Tenecteplase , Humans , Fibrinolytic Agents/therapeutic use , Stroke/drug therapy , Tenecteplase/therapeutic use , Time-to-Treatment

9.

Perspective on statistical power and equivalence tests.

Neuhäuser, Markus; Ruxton, Graeme D.

Am J Physiol Heart Circ Physiol ; 326(6): H1420-H1423, 2024 Jun 01.

Article En | MEDLINE | ID: mdl-38700473

The use of both sexes or genders should be considered in experimental design, analysis, and reporting. Since there is no requirement to double the sample size or to have sufficient power to study sex differences, challenges for the statistical analysis can arise. In this article, we focus on the topics of statistical power and ways to increase this power. We also discuss the choice of an appropriate design and statistical method and include a separate section on equivalence tests needed to show the absence of a relevant difference.

Research Design , Animals , Female , Humans , Male , Data Interpretation, Statistical , Models, Statistical , Sample Size , Sex Factors

10.

Minimum sample size for estimating the Net Promoter Score under a Bayesian approach.

Costa, Eliardo G; Ponte, Rachel Tarini Q.

An Acad Bras Cienc ; 96(2): e20230991, 2024.

Article En | MEDLINE | ID: mdl-38808878

At some moment in our lives, we are probably faced with the following question: How likely is it that you would recommend [company X] to a friend or colleague?. This question is related to the Net Promoter Score (NPS), a simple measure used by several companies as indicator of customer loyalty. Even though it is a well-known measure in the business world, studies that address the statistical properties or the sample size determination problem related to this measure are still scarce. We adopt a Bayesian approach to provide point and interval estimators for the NPS and discuss the determination of the sample size. Computational tools were implemented to use this methodology in practice. An illustrative example with data from financial services is also presented.

Bayes Theorem , Sample Size , Humans , Consumer Behavior

11.

Integrating randomized and observational studies to estimate optimal dynamic treatment regimes.

Batorsky, Anna; Anstrom, Kevin J; Zeng, Donglin.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38804219

Sequential multiple assignment randomized trials (SMARTs) are the gold standard for estimating optimal dynamic treatment regimes (DTRs), but are costly and require a large sample size. We introduce the multi-stage augmented Q-learning estimator (MAQE) to improve efficiency of estimation of optimal DTRs by augmenting SMART data with observational data. Our motivating example comes from the Back Pain Consortium, where one of the overarching aims is to learn how to tailor treatments for chronic low back pain to individual patient phenotypes, knowledge which is lacking clinically. The Consortium-wide collaborative SMART and observational studies within the Consortium collect data on the same participant phenotypes, treatments, and outcomes at multiple time points, which can easily be integrated. Previously published single-stage augmentation methods for integration of trial and observational study (OS) data were adapted to estimate optimal DTRs from SMARTs using Q-learning. Simulation studies show the MAQE, which integrates phenotype, treatment, and outcome information from multiple studies over multiple time points, more accurately estimates the optimal DTR, and has a higher average value than a comparable Q-learning estimator without augmentation. We demonstrate this improvement is robust to a wide range of trial and OS sample sizes, addition of noise variables, and effect sizes.

Computer Simulation , Low Back Pain , Observational Studies as Topic , Randomized Controlled Trials as Topic , Humans , Observational Studies as Topic/statistics & numerical data , Randomized Controlled Trials as Topic/statistics & numerical data , Low Back Pain/therapy , Sample Size , Treatment Outcome , Models, Statistical , Biometry/methods

12.

Virus Quasispecies Rarefaction: Subsampling with or without Replacement?

Gregori, Josep; Ibañez-Lligoña, Marta; Colomer-Castell, Sergi; Campos, Carolina; Quer, Josep.

Viruses ; 16(5)2024 04 29.

Article En | MEDLINE | ID: mdl-38793592

In quasispecies diversity studies, the comparison of two samples of varying sizes is a common necessity. However, the sensitivity of certain diversity indices to sample size variations poses a challenge. To address this issue, rarefaction emerges as a crucial tool, serving to normalize and create fairly comparable samples. This study emphasizes the imperative nature of sample size normalization in quasispecies diversity studies using next-generation sequencing (NGS) data. We present a thorough examination of resampling schemes using various simple hypothetical cases of quasispecies showing different quasispecies structures in the sense of haplotype genomic composition, offering a comprehensive understanding of their implications in general cases. Despite the big numbers implied in this sort of study, often involving coverages exceeding 100,000 reads per sample and amplicon, the rarefaction process for normalization should be performed with repeated resampling without replacement, especially when rare haplotypes constitute a significant fraction of interest. However, it is noteworthy that different diversity indices exhibit distinct sensitivities to sample size. Consequently, some diversity indicators may be compared directly without normalization, or instead may be resampled safely with replacement.

Genetic Variation , Haplotypes , High-Throughput Nucleotide Sequencing , Quasispecies , Viruses , Quasispecies/genetics , High-Throughput Nucleotide Sequencing/methods , Viruses/genetics , Viruses/classification , Viruses/isolation & purification , Genome, Viral , Humans , Genomics/methods , Phylogeny , Sample Size

13.

Replication of null results: Absence of evidence or evidence of absence?

Pawel, Samuel; Heyard, Rachel; Micheloud, Charlotte; Held, Leonhard.

Elife ; 122024 May 13.

Article En | MEDLINE | ID: mdl-38739437

In several large-scale replication projects, statistically non-significant results in both the original and the replication study have been interpreted as a 'replication success.' Here, we discuss the logical problems with this approach: Non-significance in both studies does not ensure that the studies provide evidence for the absence of an effect and 'replication success' can virtually always be achieved if the sample sizes are small enough. In addition, the relevant error rates are not controlled. We show how methods, such as equivalence testing and Bayes factors, can be used to adequately quantify the evidence for the absence of an effect and how they can be applied in the replication setting. Using data from the Reproducibility Project: Cancer Biology, the Experimental Philosophy Replicability Project, and the Reproducibility Project: Psychology we illustrate that many original and replication studies with 'null results' are in fact inconclusive. We conclude that it is important to also replicate studies with statistically non-significant results, but that they should be designed, analyzed, and interpreted appropriately.

Bayes Theorem , Reproducibility of Results , Humans , Research Design , Sample Size , Data Interpretation, Statistical

14.

Isotonic design for single-arm biomarker stratified trials.

Li, Lang; Ivanova, Anastasia.

Stat Methods Med Res ; 33(6): 945-952, 2024 Jun.

Article En | MEDLINE | ID: mdl-38573793

In single-arm trials with a predefined subgroup based on baseline biomarkers, it is often assumed that a biomarker defined subgroup, the biomarker positive subgroup, has the same or higher response to treatment compared to its complement, the biomarker negative subgroup. The goal is to determine if the treatment is effective in each of the subgroups or in the biomarker positive subgroup only or not effective at all. We propose the isotonic stratified design for this problem. The design has a joint set of decision rules for biomarker positive and negative subjects and utilizes joint estimation of response probabilities using assumed monotonicity of response between the biomarker negative and positive subgroups. The new design reduces the sample size requirement when compared to running two Simon's designs in each biomarker positive and negative. For example, the new design requires 23%-35% fewer patients than running two Simon's designs for scenarios we considered. Alternatively, the new design allows evaluating the response probability in both biomarker negative and biomarker positive subgroups using only 40% more patients needed for running Simon's design in the biomarker positive subgroup only.

Biomarkers , Research Design , Humans , Sample Size , Clinical Trials as Topic/statistics & numerical data , Models, Statistical

15.

Bayesian estimation of the measurement of interactions in epidemiological studies.

Lin, Shaowei; Hu, Chanchan; Lin, Zhifeng; Hu, Zhijian.

PeerJ ; 12: e17128, 2024.

Article En | MEDLINE | ID: mdl-38562994

Background: Interaction identification is important in epidemiological studies and can be detected by including a product term in the model. However, as Rothman noted, a product term in exponential models may be regarded as multiplicative rather than additive to better reflect biological interactions. Currently, the additive interaction is largely measured by the relative excess risk due to interaction (RERI), the attributable proportion due to interaction (AP), and the synergy index (S), and confidence intervals are developed via frequentist approaches. However, few studies have focused on the same issue from a Bayesian perspective. The present study aims to provide a Bayesian view of the estimation and credible intervals of the additive interaction measures. Methods: Bayesian logistic regression was employed, and estimates and credible intervals were calculated from posterior samples of the RERI, AP and S. Since Bayesian inference depends only on posterior samples, it is very easy to apply this method to preventive factors. The validity of the proposed method was verified by comparing the Bayesian method with the delta and bootstrap approaches in simulation studies with example data. Results: In all the simulation studies, the Bayesian estimates were very close to the corresponding true values. Due to the skewness of the interaction measures, compared with the confidence intervals of the delta method, the credible intervals of the Bayesian approach were more balanced and matched the nominal 95% level. Compared with the bootstrap method, the Bayesian method appeared to be a competitive alternative and fared better when small sample sizes were used. Conclusions: The proposed Bayesian method is a competitive alternative to other methods. This approach can assist epidemiologists in detecting additive-scale interactions.

Bayes Theorem , Computer Simulation , Logistic Models , Epidemiologic Studies , Sample Size

16.

Uncertainty limits the use of power analysis.

Pek, Jolynn; Pitt, Mark A; Wegener, Duane T.

J Exp Psychol Gen ; 153(4): 1139-1151, 2024 Apr.

Article En | MEDLINE | ID: mdl-38587935

The calculation of statistical power has been taken up as a simple yet informative tool to assist in designing an experiment, particularly in justifying sample size. A difficulty with using power for this purpose is that the classical power formula does not incorporate sources of uncertainty (e.g., sampling variability) that can impact the computed power value, leading to a false sense of precision and confidence in design choices. We use simulations to demonstrate the consequences of adding two common sources of uncertainty to the calculation of power. Sampling variability in the estimated effect size (Cohen's d) can introduce a large amount of uncertainty (e.g., sometimes producing rather flat distributions) in power and sample-size determination. The addition of random fluctuations in the population effect size can cause values of its estimates to take on a sign opposite the population value, making calculated power values meaningless. These results suggest that calculated power values or use of such values to justify sample size add little to planning a study. As a result, researchers should put little confidence in power-based choices when planning future studies. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Uncertainty , Humans , Sample Size

17.

Well-spread samples with dynamic sample sizes.

Robertson, Blair; Price, Chris; Reale, Marco.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38591365

A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with relatively high precision. If the response variable has spatial trends, spatially balanced or well-spread designs give precise results for commonly used estimators. This article proposes a new method that draws well-spread samples over arbitrary auxiliary spaces and can be used for master sampling applications. All we require is a measure of the distance between population units. Numerical results show that the method generates well-spread samples and compares favorably with existing designs. We provide an example application using several auxiliary variables to estimate total aboveground biomass over a large study area in Eastern Amazonia, Brazil. Multipurpose surveys are also considered, where the totals of aboveground biomass, primary production, and clay content (3 responses) are estimated from a single well-spread sample over the auxiliary space.

Sample Size , Surveys and Questionnaires

18.

The effective sample size in Bayesian information criterion for level-specific fixed and random-effect selection in a two-level nested model.

Cho, Sun-Joo; Wu, Hao; Naveiras, Matthew.

Br J Math Stat Psychol ; 77(2): 289-315, 2024 May.

Article En | MEDLINE | ID: mdl-38591555

Popular statistical software provides the Bayesian information criterion (BIC) for multi-level models or linear mixed models. However, it has been observed that the combination of statistical literature and software documentation has led to discrepancies in the formulas of the BIC and uncertainties as to the proper use of the BIC in selecting a multi-level model with respect to level-specific fixed and random effects. These discrepancies and uncertainties result from different specifications of sample size in the BIC's penalty term for multi-level models. In this study, we derive the BIC's penalty term for level-specific fixed- and random-effect selection in a two-level nested design. In this new version of BIC, called BIC E 1 , this penalty term is decomposed into two parts if the random-effect variance-covariance matrix has full rank: (a) a term with the log of average sample size per cluster and (b) the total number of parameters times the log of the total number of clusters. Furthermore, we derive the new version of BIC, called BIC E 2 , in the presence of redundant random effects. We show that the derived formulae, BIC E 1 and BIC E 2 , adhere to empirical values via numerical demonstration and that BIC E ( E indicating either E 1 or E 2 ) is the best global selection criterion, as it performs at least as well as BIC with the total sample size and BIC with the number of clusters across various multi-level conditions through a simulation study. In addition, the use of BIC E 1 is illustrated with a textbook example dataset.

Software , Sample Size , Bayes Theorem , Linear Models , Computer Simulation

19.

Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes.

Ma, Guoxuan; Kang, Jian; Yu, Tianwei.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38581417

Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible.

Metabolomics , Mice , Animals , Bayes Theorem , Sample Size , Uncertainty , Metabolomics/methods , Computer Simulation

20.

Realism and robustness require increased sample size when studying both sexes.

Drobniak, Szymon M; Lagisz, Malgorzata; Yang, Yefeng; Nakagawa, Shinichi.

PLoS Biol ; 22(4): e3002456, 2024 Apr.

Article En | MEDLINE | ID: mdl-38603525

A recent article claimed that researchers need not increase the overall sample size for a study that includes both sexes. This Formal Comment points out that that study assumed two sexes to have the same variance, and explains why this is a unrealistic assumption.

Research Design , Male , Female , Humans , Sample Size