Search | VHL Search Portal

1.

Toward a more credible assessment of the credibility of science by many-analyst studies.

Auspurg, Katrin; Brüderl, Josef.

Proc Natl Acad Sci U S A ; 121(38): e2404035121, 2024 Sep 17.

Article in English | MEDLINE | ID: mdl-39236231

ABSTRACT

We discuss a relatively new meta-scientific research design: many-analyst studies that attempt to assess the replicability and credibility of research based on large-scale observational data. In these studies, a large number of analysts try to answer the same research question using the same data. The key idea is the greater the variation in results, the greater the uncertainty in answering the research question and, accordingly, the lower the credibility of any individual research finding. Compared to individual replications, the large crowd of analysts allows for a more systematic investigation of uncertainty and its sources. However, many-analyst studies are also resource-intensive, and there are some doubts about their potential to provide credible assessments. We identify three issues that any many-analyst study must address: 1) identifying the source of variation in the results; 2) providing an incentive structure similar to that of standard research; and 3) conducting a proper meta-analysis of the results. We argue that some recent many-analyst studies have failed to address these issues satisfactorily and have therefore provided an overly pessimistic assessment of the credibility of science. We also provide some concrete guidance on how future many-analyst studies could provide a more constructive assessment.

2.

Effects of Statistical Practices for Longitudinal Group Comparison of the Penetration-Aspiration Scale on Power and Effect Size Estimation: A Monte Carlo Simulation Study.

Borders, James C; Grande, Alessandro A; Barbon, Carly E A; Hutcheson, Katherine A; Troche, Michelle S.

Dysphagia ; 2024 Aug 17.

Article in English | MEDLINE | ID: mdl-39153045

ABSTRACT

Multiple bolus trials are administered during clinical and research swallowing assessments to comprehensively capture an individual's swallowing function. Despite valuable information obtained from these boluses, it remains common practice to use a single bolus (e.g., the worst score) to describe the degree of dysfunction. Researchers also often collapse continuous or ordinal swallowing measures into categories, potentially exacerbating information loss. These practices may adversely affect statistical power to detect and estimate smaller, yet potentially meaningful, treatment effects. This study sought to examine the impact of aggregating and categorizing penetration-aspiration scale (PAS) scores on statistical power and effect size estimates. We used a Monte Carlo approach to simulate three hypothetical within-subject treatment studies in Parkinson's disease and head and neck cancer across a range of data characteristics (e.g., sample size, number of bolus trials, variability). Different statistical models (aggregated or multilevel) as well as various PAS reduction approaches (i.e., types of categorizations) were performed to examine their impact on power and the accuracy of effect size estimates. Across all scenarios, multilevel models demonstrated higher statistical power to detect group-level longitudinal change and more accurate estimates compared to aggregated (worst score) models. Categorizing PAS scores also reduced power and biased effect size estimates compared to an ordinal approach, though this depended on the type of categorization and baseline PAS distribution. Multilevel models should be considered as a more robust approach for the statistical analysis of multiple boluses administered in standardized swallowing protocols due to its high sensitivity and accuracy to compare group-level changes in swallowing function. Importantly, this finding appears to be consistent across patient populations with distinct pathophysiology (i.e., PD and HNC) and patterns of airway invasion. The decision to categorize a continuous or ordinal outcome should be grounded in the clinical or research question with recognition that scale reduction may negatively affect the quality of statistical inferences in certain scenarios.

3.

Where do we intervene to optimize sports systems? Leverage Points the way.

Naughton, Mitchell; Salmon, Paul M; McLean, Scott.

J Sports Sci ; 42(7): 566-573, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38767324

ABSTRACT

Sport and sports research are inherently complex systems. This appears to be somewhat at odds with the current research paradigm in sport in which interventions are aimed are fixing or solving singular broken components within the system. In any complex system, such as sport, there are places where we can intervene to change behaviour and, ideally, system outcomes. Meadows influential work describes 12 different points with which to intervene in complex systems (termed "Leverage Points"), which are ordered from shallow to deeper based on their potential effectiveness to influence transformational change. Whether research in sport is aimed at shallow or deeper Leverage Points is unknown. This study aimed to assess highly impactful research in sports science, sports nutrition/metabolism, sports medicine, sport and exercise psychology, sports management, motor control, sports biomechanics and sports policy/law through a Leverage Points lens. The 10 most highly cited original-research manuscripts from each journal representing these fields were analysed for the Leverage Point with which the intervention described in the manuscript was focused. The results indicate that highly impactful research in sports science, sports nutrition/metabolism, sports biomechanics and sports medicine is predominantly focused at the shallow end of the Leverage Points hierarchy. Conversely, the interventions drawn from journals representing sports management and sports policy/law were focused on the deeper end. Other journals analysed had a mixed profile. Explanations for these findings include the dual practitioner/academic needing to "think fast" to solve immediate questions in sports science/medicine/nutrition, limited engagement with "working slow" systems and method experts and differences in incremental vs. non-incremental research strategies.

Subject(s)

Sports Medicine , Sports , Humans , Sports/physiology , Biomechanical Phenomena , Journal Impact Factor , Periodicals as Topic , Bibliometrics

4.

Many nonnormalities, one simulation: Do different data generation algorithms affect study results?

Fairchild, Amanda J; Yin, Yunhang; Baraldi, Amanda N; Astivia, Oscar L Olvera; Shi, Dexin.

Behav Res Methods ; 56(7): 6464-6484, 2024 10.

Article in English | MEDLINE | ID: mdl-38389030

ABSTRACT

Monte Carlo simulation studies are among the primary scientific outputs contributed by methodologists, guiding application of various statistical tools in practice. Although methodological researchers routinely extend simulation study findings through follow-up work, few studies are ever replicated. Simulation studies are susceptible to factors that can contribute to replicability failures, however. This paper sought to conduct a meta-scientific study by replicating one highly cited simulation study (Curran et al., Psychological Methods, 1, 16-29, 1996) that investigated the robustness of normal theory maximum likelihood (ML)-based chi-square fit statistics under multivariate nonnormality. We further examined the generalizability of the original study findings across different nonnormal data generation algorithms. Our replication results were generally consistent with original findings, but we discerned several differences. Our generalizability results were more mixed. Only two results observed under the original data generation algorithm held completely across other algorithms examined. One of the most striking findings we observed was that results associated with the independent generator (IG) data generation algorithm vastly differed from other procedures examined and suggested that ML was robust to nonnormality for the particular factor model used in the simulation. Findings point to the reality that extant methodological recommendations may not be universally valid in contexts where multiple data generation algorithms exist for a given data characteristic. We recommend that researchers consider multiple approaches to generating a specific data or model characteristic (when more than one is available) to optimize the generalizability of simulation results.

Subject(s)

Algorithms , Computer Simulation , Monte Carlo Method , Humans , Likelihood Functions , Data Interpretation, Statistical , Models, Statistical , Reproducibility of Results

5.

Toward Open and Reproducible Epidemiology.

Mathur, Maya B; Fox, Matthew P.

Am J Epidemiol ; 192(4): 658-664, 2023 04 06.

Article in English | MEDLINE | ID: mdl-36627249

ABSTRACT

Starting in the 2010s, researchers in the experimental social sciences rapidly began to adopt increasingly open and reproducible scientific practices. These practices include publicly sharing deidentified data when possible, sharing analytical code, and preregistering study protocols. Empirical evidence from the social sciences suggests such practices are feasible, can improve analytical reproducibility, and can reduce selective reporting. In academic epidemiology, adoption of open-science practices has been slower than in the social sciences (with some notable exceptions, such as registering clinical trials). Epidemiologic studies are often large, complex, conceived after data have already been collected, and difficult to replicate directly by collecting new data. These characteristics make it especially important to ensure their integrity and analytical reproducibility. Open-science practices can also pay immediate dividends to researchers' own work by clarifying scientific reasoning and encouraging well-documented, organized workflows. We consider how established epidemiologists and early-career researchers alike can help midwife a culture of open science in epidemiology through their research practices, mentorship, and editorial activities.

Subject(s)

Epidemiology , Research Design , Humans , Reproducibility of Results

6.

Using Normative Language When Describing Scientific Findings: Randomized Controlled Trial of Effects on Trust and Credibility.

Agley, Jon; Xiao, Yunyu; Thompson, Esi E; Golzarri-Arroyo, Lilian.

J Med Internet Res ; 25: e45482, 2023 03 30.

Article in English | MEDLINE | ID: mdl-36995753

ABSTRACT

BACKGROUND: Scientists often make cognitive claims (eg, the results of their work) and normative claims (eg, what should be done based on those results). Yet, these types of statements contain very different information and implications. This randomized controlled trial sought to characterize the granular effects of using normative language in science communication. OBJECTIVE: Our study examined whether viewing a social media post containing scientific claims about face masks for COVID-19 using both normative and cognitive language (intervention arm) would reduce perceptions of trust and credibility in science and scientists compared with an identical post using only cognitive language (control arm). We also examined whether effects were mediated by political orientation. METHODS: This was a 2-arm, parallel group, randomized controlled trial. We aimed to recruit 1500 US adults (age 18+) from the Prolific platform who were representative of the US population census by cross sections of age, race/ethnicity, and gender. Participants were randomly assigned to view 1 of 2 images of a social media post about face masks to prevent COVID-19. The control image described the results of a real study (cognitive language), and the intervention image was identical, but also included recommendations from the same study about what people should do based on the results (normative language). Primary outcomes were trust in science and scientists (21-item scale) and 4 individual items related to trust and credibility; 9 additional covariates (eg, sociodemographics, political orientation) were measured and included in analyses. RESULTS: From September 4, 2022, to September 6, 2022, 1526 individuals completed the study. For the sample as a whole (eg, without interaction terms), there was no evidence that a single exposure to normative language affected perceptions of trust or credibility in science or scientists. When including the interaction term (study arm × political orientation), there was some evidence of differential effects, such that individuals with liberal political orientation were more likely to trust scientific information from the social media post's author if the post included normative language, and political conservatives were more likely to trust scientific information from the post's author if the post included only cognitive language (ß=0.05, 95% CI 0.00 to 0.10; P=.04). CONCLUSIONS: This study does not support the authors' original hypotheses that single exposures to normative language can reduce perceptions of trust or credibility in science or scientists for all people. However, the secondary preregistered analyses indicate the possibility that political orientation may differentially mediate the effect of normative and cognitive language from scientists on people's perceptions. We do not submit this paper as definitive evidence thereof but do believe that there is sufficient evidence to support additional research into this topic, which may have implications for effective scientific communication. TRIAL REGISTRATION: OSF Registries osf.io/kb3yh; https://osf.io/kb3yh. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/41747.

Subject(s)

COVID-19 , Communication , Trust , Adult , Humans , COVID-19/epidemiology , COVID-19/prevention & control , Language , Social Media , Masks

7.

Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology.

Yang, Yefeng; Hillebrand, Helmut; Lagisz, Malgorzata; Cleasby, Ian; Nakagawa, Shinichi.

Glob Chang Biol ; 28(3): 969-989, 2022 02.

Article in English | MEDLINE | ID: mdl-34736291

ABSTRACT

Field studies are essential to reliably quantify ecological responses to global change because they are exposed to realistic climate manipulations. Yet such studies are limited in replicates, resulting in less power and, therefore, potentially unreliable effect estimates. Furthermore, while manipulative field experiments are assumed to be more powerful than non-manipulative observations, it has rarely been scrutinized using extensive data. Here, using 3847 field experiments that were designed to estimate the effect of environmental stressors on ecosystems, we systematically quantified their statistical power and magnitude (Type M) and sign (Type S) errors. Our investigations focused upon the reliability of field experiments to assess the effect of stressors on both ecosystem's response magnitude and variability. When controlling for publication bias, single experiments were underpowered to detect response magnitude (median power: 18%-38% depending on effect sizes). Single experiments also had much lower power to detect response variability (6%-12% depending on effect sizes) than response magnitude. Such underpowered studies could exaggerate estimates of response magnitude by 2-3 times (Type M errors) and variability by 4-10 times. Type S errors were comparatively rare. These observations indicate that low power, coupled with publication bias, inflates the estimates of anthropogenic impacts. Importantly, we found that meta-analyses largely mitigated the issues of low power and exaggerated effect size estimates. Rather surprisingly, manipulative experiments and non-manipulative observations had very similar results in terms of their power, Type M and S errors. Therefore, the previous assumption about the superiority of manipulative experiments in terms of power is overstated. These results call for highly powered field studies to reliably inform theory building and policymaking, via more collaboration and team science, and large-scale ecosystem facilities. Future studies also require transparent reporting and open science practices to approach reproducible and reliable empirical work and evidence synthesis.

Subject(s)

Anthropogenic Effects , Ecosystem , Biology , Publication Bias , Reproducibility of Results

8.

Statistical Power and Swallowing Rehabilitation Research: Current Landscape and Next Steps.

Borders, James C; Grande, Alessandro A; Troche, Michelle S.

Dysphagia ; 37(6): 1673-1688, 2022 12.

Article in English | MEDLINE | ID: mdl-35226185

ABSTRACT

Despite rapid growth in the number of treatments to rehabilitate dysphagia, studies often demonstrate mixed results with non-significant changes to functional outcomes. Given that power analyses are infrequently reported in dysphagia research, it remains unclear whether studies are adequately powered to detect a range of treatment effects. Therefore, this review sought to examine the current landscape of statistical power in swallowing rehabilitation research. Databases were searched for swallowing treatments using instrumental evaluations of swallowing and the penetration-aspiration scale as an outcome. Sensitivity power analyses based on each study's statistical test and sample size were performed to determine the minimum effect size detectable with 80% power. Eighty-nine studies with 94 treatment comparisons were included. Sixty-seven percent of treatment comparisons were unable to detect effects smaller than d = 0.80. The smallest detectable effect size was d = 0.29 for electrical stimulation, d = 0.49 for postural maneuvers, d = 0.52 for non-invasive brain stimulation, d = 0.61 for combined treatments, d = 0.63 for respiratory-based interventions, d = 0.70 for lingual strengthening, and d = 0.79 for oral sensory stimulation. Dysphagia treatments examining changes in penetration-aspiration scale scores were generally powered to reliably detect larger effect sizes and not smaller (but potentially clinically meaningful) effects. These findings suggest that non-significant results may be related to low statistical power, highlighting the need for collaborative, well-powered intervention studies that can detect smaller, clinically meaningful changes in swallowing function. To facilitate implementation, a tutorial on simulation-based power analyses for ordinal outcomes is provided ( https://osf.io/e6usd/ ).

Subject(s)

Deglutition Disorders , Deglutition , Humans , Deglutition/physiology , Rehabilitation Research , Combined Modality Therapy

9.

A meta-review of transparency and reproducibility-related reporting practices in published meta-analyses on clinical psychological interventions (2000-2020).

López-Nicolás, Rubén; López-López, José Antonio; Rubio-Aparicio, María; Sánchez-Meca, Julio.

Behav Res Methods ; 54(1): 334-349, 2022 02.

Article in English | MEDLINE | ID: mdl-34173943

ABSTRACT

Meta-analysis is a powerful and important tool to synthesize the literature about a research topic. Like other kinds of research, meta-analyses must be reproducible to be compliant with the principles of the scientific method. Furthermore, reproducible meta-analyses can be easily updated with new data and reanalysed applying new and more refined analysis techniques. We attempted to empirically assess the prevalence of transparency and reproducibility-related reporting practices in published meta-analyses from clinical psychology by examining a random sample of 100 meta-analyses. Our purpose was to identify the key points that could be improved, with the aim of providing some recommendations for carrying out reproducible meta-analyses. We conducted a meta-review of meta-analyses of psychological interventions published between 2000 and 2020. We searched PubMed, PsycInfo and Web of Science databases. A structured coding form to assess transparency indicators was created based on previous studies and existing meta-analysis guidelines. We found major issues concerning: completely reproducible search procedures report, specification of the exact method to compute effect sizes, choice of weighting factors and estimators, lack of availability of the raw statistics used to compute the effect size and of interoperability of available data, and practically total absence of analysis script code sharing. Based on our findings, we conclude with recommendations intended to improve the transparency, openness, and reproducibility-related reporting practices of meta-analyses in clinical psychology and related areas.

Subject(s)

Psychosocial Intervention , Research Design , Humans , Meta-Analysis as Topic , Prevalence , Reproducibility of Results

10.

Contextual sensitivity in scientific reproducibility.

Van Bavel, Jay J; Mende-Siedlecki, Peter; Brady, William J; Reinero, Diego A.

Proc Natl Acad Sci U S A ; 113(23): 6454-9, 2016 Jun 07.

Article in English | MEDLINE | ID: mdl-27217556

ABSTRACT

In recent years, scientists have paid increasing attention to reproducibility. For example, the Reproducibility Project, a large-scale replication attempt of 100 studies published in top psychology journals found that only 39% could be unambiguously reproduced. There is a growing consensus among scientists that the lack of reproducibility in psychology and other fields stems from various methodological factors, including low statistical power, researcher's degrees of freedom, and an emphasis on publishing surprising positive results. However, there is a contentious debate about the extent to which failures to reproduce certain results might also reflect contextual differences (often termed "hidden moderators") between the original research and the replication attempt. Although psychologists have found extensive evidence that contextual factors alter behavior, some have argued that context is unlikely to influence the results of direct replications precisely because these studies use the same methods as those used in the original research. To help resolve this debate, we recoded the 100 original studies from the Reproducibility Project on the extent to which the research topic of each study was contextually sensitive. Results suggested that the contextual sensitivity of the research topic was associated with replication success, even after statistically adjusting for several methodological characteristics (e.g., statistical power, effect size). The association between contextual sensitivity and replication success did not differ across psychological subdisciplines. These results suggest that researchers, replicators, and consumers should be mindful of contextual factors that might influence a psychological process. We offer several guidelines for dealing with contextual sensitivity in reproducibility.

Subject(s)

Psychology , Reproducibility of Results , Humans , Psychology/statistics & numerical data , Publishing , Research/statistics & numerical data , Science/statistics & numerical data

11.

Statistical considerations in reporting cardiovascular research.

Lindsey, Merry L; Gray, Gillian A; Wood, Susan K; Curran-Everett, Douglas.

Am J Physiol Heart Circ Physiol ; 315(2): H303-H313, 2018 08 01.

Article in English | MEDLINE | ID: mdl-30028200

ABSTRACT

The problem of inadequate statistical reporting is long standing and widespread in the biomedical literature, including in cardiovascular physiology. Although guidelines for reporting statistics have been available in clinical medicine for some time, there are currently no guidelines specific to cardiovascular physiology. To assess the need for guidelines, we determined the type and frequency of statistical tests and procedures currently used in the American Journal of Physiology-Heart and Circulatory Physiology. A PubMed search for articles published in the American Journal of Physiology-Heart and Circulatory Physiology between January 1, 2017, and October 6, 2017, provided a final sample of 146 articles evaluated for methods used and 38 articles for indepth analysis. The t-test and ANOVA accounted for 71% (212 of 300 articles) of the statistical tests performed. Of six categories of post hoc tests, Bonferroni and Tukey tests were used in 63% (62 of 98 articles). There was an overall lack in details provided by authors publishing in the American Journal of Physiology-Heart and Circulatory Physiology, and we compiled a list of recommended minimum reporting guidelines to aid authors in preparing manuscripts. Following these guidelines could substantially improve the quality of statistical reports and enhance data rigor and reproducibility.

Subject(s)

Biostatistics/methods , Peer Review/standards , Periodicals as Topic/standards , Physiology/standards , Heart/physiology , Practice Guidelines as Topic

12.

Diversity of Research Participant Gender, Race, and Ethnicity in Communication Sciences and Disorders: A Systematic Review and Quantitative Synthesis of American Speech-Language-Hearing Association Publications in 2020.

Millager, Ryan A; Feldman, Jacob I; Williams, Zachary J; Shibata, Kiiya; Martinez-Torres, Keysha A; Bryan, Katherine M; Pruett, Dillon G; Mitchell, Jade T; Markfeld, Jennifer E; Merritt, Brandon; Daniels, Derek E; Jones, Robin M; Woynaroski, Tiffany.

Perspect ASHA Spec Interest Groups ; 9(3): 836-852, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38912383

ABSTRACT

Purpose: One manifestation of systemic inequities in communication sciences and disorders (CSD) is the chronic underreporting and underrepresentation of sex, gender, race, and ethnicity in research. The present study characterized recent demographic reporting practices and representation of participants across CSD research. Methods: We systematically reviewed and extracted key reporting and participant data from empirical studies conducted in the United States (US) with human participants published in the year 2020 in journals by the American Speech-Language-Hearing Association (ASHA; k = 407 articles comprising a total n = 80,058 research participants, search completed November 2021). Sex, gender, race, and ethnicity were operationalized per National Institutes of Health guidelines (National Institutes of Health, 2015a, 2015b). Results: Sex or gender was reported in 85.5% of included studies; race was reported in 33.7%; and ethnicity was reported in 13.8%. Sex and gender were clearly differentiated in 3.4% of relevant studies. Where reported, median proportions for race and ethnicity were significantly different from the US population, with underrepresentation noted for all non-White racial groups and Hispanic participants. Moreover, 64.7% of studies that reported sex or gender and 67.2% of studies that reported race or ethnicity did not consider these respective variables in analyses or discussion. Conclusion: At present, research published in ASHA journals frequently fails to report key demographic data summarizing the characteristics of participants. Moreover, apparent gaps in representation of minoritized racial and ethnic groups threaten the external validity of CSD research and broader health care equity endeavors in the US. Although our study is limited to a single year and publisher, our results point to several steps for readers that may bring greater accountability, consistency, and diversity to the discipline.

13.

Dealing With Diversity in Psychology: Science and Ideology.

Roberts, Steven Othello.

Perspect Psychol Sci ; 19(3): 590-601, 2024 May.

Article in English | MEDLINE | ID: mdl-38652780

ABSTRACT

In the spirit of America's Shakespeare, August Wilson (1997), I have written this article as a testimony to the conditions under which I, and too many others, engage in scholarly discourse. I hope to make clear from the beginning that although the ideas presented here are not entirely my own-as they have been inherited from the minority of scholars who dared and managed to bring the most necessary, unpalatable, and unsettling truths about our discipline to the broader scientific community-I do not write for anyone but myself and those scholars who have felt similarly marginalized, oppressed, and silenced. And I write as a race scholar, meaning simply that I believe that race-and racism-affects the sociopolitical conditions in which humans, and scholars, develop their thoughts, feelings, and actions. I believe that it is important for all scholars to have a basic understanding of these conditions, as well as the landmines and pitfalls that define them, as they shape how research is conducted, reviewed, and disseminated. I also believe that to evolve one's discipline into one that is truly robust and objective, it must first become diverse and self-aware. Any effort to suggest otherwise, no matter how scholarly it might present itself, is intellectually unsound.

Subject(s)

Cultural Diversity , Psychology , Humans , Racism , Politics

14.

Analysis of NIH K99/R00 awards and the career progression of awardees.

Woitowich, Nicole C; Hengel, Sarah R; Solis, Christopher; Vilgalys, Tauras P; Babdor, Joel; Tyrrell, Daniel J.

Elife ; 122024 Jan 19.

Article in English | MEDLINE | ID: mdl-38240745

ABSTRACT

Many postdoctoral fellows and scholars who hope to secure tenure-track faculty positions in the United States apply to the National Institutes of Health (NIH) for a Pathway to Independence Award. This award has two phases (K99 and R00) and provides funding for up to 5 years. Using NIH data for the period 2006-2022, we report that ~230 K99 awards were made every year, representing up to ~$250 million annual investment. About 40% of K99 awardees were women and ~89% of K99 awardees went on to receive an R00 award annually. Institutions with the most NIH funding produced the most recipients of K99 awards and recruited the most recipients of R00 awards. The time between a researcher starting an R00 award and receiving a major NIH award (such as an R01) ranged between 4.6 and 7.4 years, and was significantly longer for women, for those who remained at their home institution, and for those hired by an institution that was not one of the 25 institutions with the most NIH funding. Shockingly, there has yet to be a K99 awardee at a historically Black college or university. We go on to show how K99 awardees flow to faculty positions, and to identify various factors that influence the future success of individual researchers and, therefore, also influence the composition of biomedical faculty at universities in the United States.

Subject(s)

Awards and Prizes , Biomedical Research , Humans , Female , United States , Male , National Institutes of Health (U.S.) , Health Personnel , Research Personnel

15.

Replication marketplaces would help science to become more self-correcting.

Hüffmeier, Joachim; Kühner, Clara.

R Soc Open Sci ; 11(10): 240850, 2024 Oct.

Article in English | MEDLINE | ID: mdl-39359470

ABSTRACT

Independent replications are very rare in the behavioural and social sciences. This is problematic because they can help to detect 'false positives' in published research and, in turn, contribute to scientific self-correction. The lack of replication studies is, among other factors, due to a rather passive editorial approach concerning replications by many journals, which does not encourage and may sometimes even actively discourage submission of replications. In this Perspective article, we advocate for a more proactive editorial approach concerning replications and suggest introducing journal-based replication marketplaces as a new publication track. We argue that such replication marketplaces could solve the long-standing problem of lacking independent replications. To establish these marketplaces, a designated part of a journal's editorial board identifies the most relevant new findings reported within the journal's pages and publicly offers them for replication. This public offering could be combined with small grants for authors to support these replications. Authors then compete for the first accepted registered report to conduct the related replications and can thus be sure that their replication will be published independent of the later findings. Replication marketplaces would not only increase the prevalence of independent replications but also help science to become more self-correcting.

16.

The psychological reality of the learned "p < .05" boundary.

Rao, V N Vimal; Bye, Jeffrey K; Varma, Sashank.

Cogn Res Princ Implic ; 9(1): 27, 2024 05 03.

Article in English | MEDLINE | ID: mdl-38700660

ABSTRACT

The .05 boundary within Null Hypothesis Statistical Testing (NHST) "has made a lot of people very angry and been widely regarded as a bad move" (to quote Douglas Adams). Here, we move past meta-scientific arguments and ask an empirical question: What is the psychological standing of the .05 boundary for statistical significance? We find that graduate students in the psychological sciences show a boundary effect when relating p-values across .05. We propose this psychological boundary is learned through statistical training in NHST and reading a scientific literature replete with "statistical significance". Consistent with this proposal, undergraduates do not show the same sensitivity to the .05 boundary. Additionally, the size of a graduate student's boundary effect is not associated with their explicit endorsement of questionable research practices. These findings suggest that training creates distortions in initial processing of p-values, but these might be dampened through scientific processes operating over longer timescales.

Subject(s)

Statistics as Topic , Humans , Adult , Young Adult , Data Interpretation, Statistical , Male , Psychology , Female

17.

Effect-Size Discrepancies in Literature Versus Raw Datasets from Experimental Spinal Cord Injury Studies: A CLIMBER Meta-Analysis.

Iorio, Emma G; Khanteymoori, Alireza; Fond, Kenneth A; Keller, Anastasia V; Davis, Lex Maliga; Schwab, Jan M; Ferguson, Adam R; Torres-Espin, Abel; Watzlawick, Ralf.

Neurotrauma Rep ; 5(1): 686-698, 2024.

Article in English | MEDLINE | ID: mdl-39071986

ABSTRACT

Translation of spinal cord injury (SCI) therapeutics from pre-clinical animal studies into human studies is challenged by effect size variability, irreproducibility, and misalignment of evidence used by pre-clinical versus clinical literature. Clinical literature values reproducibility, with the highest grade evidence (class 1) consisting of meta-analysis demonstrating large therapeutic efficacy replicating across multiple studies. Conversely, pre-clinical literature values novelty over replication and lacks rigorous meta-analyses to assess reproducibility of effect sizes across multiple articles. Here, we applied modified clinical meta-analysis methods to pre-clinical studies, comparing effect sizes extracted from published literature to raw data on individual animals from these same studies. Literature-extracted data (LED) from numerical and graphical outcomes reported in publications were compared with individual animal data (IAD) deposited in a federally supported repository of SCI data. The animal groups from the IAD were matched with the same cohorts in the LED for a direct comparison. We applied random-effects meta-analysis to evaluate predictors of neuroconversion in LED versus IAD. We included publications with common injury models (contusive injuries) and standardized end-points (open field assessments). The extraction of data from 25 published articles yielded n = 1841 subjects, whereas IAD from these same articles included n = 2441 subjects. We observed differences in the number of experimental groups and animals per group, insufficient reporting of dropout animals, and missing information on experimental details. Meta-analysis revealed differences in effect sizes across LED versus IAD stratifications, for instance, severe injuries had the largest effect size in LED (standardized mean difference [SMD = 4.92]), but mild injuries had the largest effect size in IAD (SMD = 6.06). Publications with smaller sample sizes yielded larger effect sizes, while studies with larger sample sizes had smaller effects. The results demonstrate the feasibility of combining IAD analysis with traditional LED meta-analysis to assess effect size reproducibility in SCI.

18.

Doubling Down: Will Large Increases in the NIH Budget Promote More Meaningful Medical Innovation?

Sampat, Bhaven N.

J Law Med Ethics ; 51(S2): 21-23, 2023.

Article in English | MEDLINE | ID: mdl-38433677

ABSTRACT

Kesselheim proposes doubling the NIH's budget to promote clinically meaningful pharmaceutical innovation. Since the effects of a previous doubling (from 1998-2003) were mixed, I argue that policymakers should couple future budget growth with investments in experimentation and evaluation.

Subject(s)

Budgets , Investments , Humans , Empirical Research , Research Design

19.

Perspectives on scientific error.

van Ravenzwaaij, D; Bakker, M; Heesen, R; Romero, F; van Dongen, N; Crüwell, S; Field, S M; Held, L; Munafò, M R; Pittelkow, M M; Tiokhin, L; Traag, V A; van den Akker, O R; van 't Veer, A E; Wagenmakers, E J.

R Soc Open Sci ; 10(7): 230448, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37476516

ABSTRACT

Theoretical arguments and empirical investigations indicate that a high proportion of published findings do not replicate and are likely false. The current position paper provides a broad perspective on scientific error, which may lead to replication failures. This broad perspective focuses on reform history and on opportunities for future reform. We organize our perspective along four main themes: institutional reform, methodological reform, statistical reform and publishing reform. For each theme, we illustrate potential errors by narrating the story of a fictional researcher during the research cycle. We discuss future opportunities for reform. The resulting agenda provides a resource to usher in an era that is marked by a research culture that is less error-prone and a scientific publication landscape with fewer spurious findings.

20.

ChatGPT identifies gender disparities in scientific peer review.

Verharen, Jeroen P H.

Elife ; 122023 11 03.

Article in English | MEDLINE | ID: mdl-37922198

ABSTRACT

The peer review process is a critical step in ensuring the quality of scientific research. However, its subjectivity has raised concerns. To investigate this issue, I examined over 500 publicly available peer review reports from 200 published neuroscience papers in 2022-2023. OpenAI's generative artificial intelligence ChatGPT was used to analyze language use in these reports, which demonstrated superior performance compared to traditional lexicon- and rule-based language models. As expected, most reviews for these published papers were seen as favorable by ChatGPT (89.8% of reviews), and language use was mostly polite (99.8% of reviews). However, this analysis also demonstrated high levels of variability in how each reviewer scored the same paper, indicating the presence of subjectivity in the peer review process. The results further revealed that female first authors received less polite reviews than their male peers, indicating a gender bias in reviewing. In addition, published papers with a female senior author received more favorable reviews than papers with a male senior author, for which I discuss potential causes. Together, this study highlights the potential of generative artificial intelligence in performing natural language processing of specialized scientific texts. As a proof of concept, I show that ChatGPT can identify areas of concern in scientific peer review, underscoring the importance of transparent peer review in studying equitability in scientific publishing.

Peer review is a vital step in ensuring the quality and accuracy of scientific research before publication. Experts assess research manuscripts, advise journal editors on publishing them, and provide authors with recommendations for improvement. But some scientists have raised concerns about potential biases and subjectivity in the peer review process. Author attributes, such as gender, reputation, or how prestigious their institution is, may subconsciously influence reviewers' scores. Studying peer review to identify potential biases is challenging. The language reviewers use is very technical, and some of their commentary may be subjective and vary from reviewer to reviewer. The emergence of OpenAI's ChatGPT, which uses machine learning to process large amounts of information, may provide a new tool to analyze peer review for signs of bias. Verharen demonstrated that ChatGPT can be used to analyze peer review reports and found potential indications of gender bias in scientific publishing. In the experiments, Verharen asked ChatGPT to analyze more than 500 reviews of 200 neuroscience studies published in the scientific journal Nature Communications over the past year. The experiments found no evidence that institutional reputation influenced reviews. Yet, female first authors were more likely to receive impolite comments from reviewers. Female senior authors were more likely to receive higher review scores, which may indicate they had to clear a higher bar for publication. The experiments indicate that ChatGPT could be used to analyze peer review for fairness. Verharen suggests that reviewers might apply this tool to ensure their reviews are polite and accurate reflections of their opinions. Scientists or publishers might also use it for large-scale analyses of peer review in individual journals or in scientific publishing more widely. Journals might also use ChatGPT to assess the impact of bias-prevention interventions on review fairness.

Subject(s)

Artificial Intelligence , Publishing , Female , Male , Humans , Sexism , Peer Review , Research Report

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL