Búsqueda | Portal Regional de la BVS

1.

Correcting for outcome reporting bias in a meta-analysis: A meta-regression approach.

van Aert, Robbie C M; Wicherts, Jelte M.

Behav Res Methods ; 56(3): 1994-2012, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-37540470

RESUMEN

Outcome reporting bias (ORB) refers to the biasing effect caused by researchers selectively reporting outcomes within a study based on their statistical significance. ORB leads to inflated effect size estimates in meta-analysis if only the outcome with the largest effect size is reported due to ORB. We propose a new method (CORB) to correct for ORB that includes an estimate of the variability of the outcomes' effect size as a moderator in a meta-regression model. An estimate of the variability of the outcomes' effect size can be computed by assuming a correlation among the outcomes. Results of a Monte-Carlo simulation study showed that the effect size in meta-analyses may be severely overestimated without correcting for ORB. Estimates of CORB are close to the true effect size when overestimation caused by ORB is the largest. Applying the method to a meta-analysis on the effect of playing violent video games on aggression showed that the effect size estimate decreased when correcting for ORB. We recommend to routinely apply methods to correct for ORB in any meta-analysis. We provide annotated R code and functions to help researchers apply the CORB method.

Asunto(s)

Sesgo , Humanos , Simulación por Computador

2.

The dire disregard of measurement invariance testing in psychological science.

Maassen, Esther; D'Urso, E Damiano; van Assen, Marcel A L M; Nuijten, Michèle B; De Roover, Kim; Wicherts, Jelte M.

Psychol Methods ; 2023 Dec 25.

Artículo en Inglés | MEDLINE | ID: mdl-38147039

RESUMEN

Self-report scales are widely used in psychology to compare means in latent constructs across groups, experimental conditions, or time points. However, for these comparisons to be meaningful and unbiased, the scales must demonstrate measurement invariance (MI) across compared time points or (experimental) groups. MI testing determines whether the latent constructs are measured equivalently across groups or time, which is essential for meaningful comparisons. We conducted a systematic review of 426 psychology articles with openly available data, to (a) examine common practices in conducting and reporting of MI testing, (b) assess whether we could reproduce the reported MI results, and (c) conduct MI tests for the comparisons that enabled sufficiently powerful MI testing. We identified 96 articles that contained a total of 929 comparisons. Results showed that only 4% of the 929 comparisons underwent MI testing, and the tests were generally poorly reported. None of the reported MI tests were reproducible, and only 26% of the 174 newly performed MI tests reached sufficient (scalar) invariance, with MI failing completely in 58% of tests. Exploratory analyses suggested that in nearly half of the comparisons where configural invariance was rejected, the number of factors differed between groups. These results indicate that MI tests are rarely conducted and poorly reported in psychological studies. We observed frequent violations of MI, suggesting that reported differences between (experimental) groups may not be solely attributed to group differences in the latent constructs. We offer recommendations aimed at improving reporting and computational reproducibility practices in psychology. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

3.

Preregistration in practice: A comparison of preregistered and non-preregistered studies in psychology.

van den Akker, Olmo R; van Assen, Marcel A L M; Bakker, Marjan; Elsherif, Mahmoud; Wong, Tsz Keung; Wicherts, Jelte M.

Behav Res Methods ; 2023 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-37950113

RESUMEN

Preregistration has gained traction as one of the most promising solutions to improve the replicability of scientific effects. In this project, we compared 193 psychology studies that earned a Preregistration Challenge prize or preregistration badge to 193 related studies that were not preregistered. In contrast to our theoretical expectations and prior research, we did not find that preregistered studies had a lower proportion of positive results (Hypothesis 1), smaller effect sizes (Hypothesis 2), or fewer statistical errors (Hypothesis 3) than non-preregistered studies. Supporting our Hypotheses 4 and 5, we found that preregistered studies more often contained power analyses and typically had larger sample sizes than non-preregistered studies. Finally, concerns about the publishability and impact of preregistered studies seem unwarranted, as preregistered studies did not take longer to publish and scored better on several impact measures. Overall, our data indicate that preregistration has beneficial effects in the realm of statistical power and impact, but we did not find robust evidence that preregistration prevents p-hacking and HARKing (Hypothesizing After the Results are Known).

4.

Comparing the prevalence of statistical reporting inconsistencies in COVID-19 preprints and matched controls: a registered report.

van Aert, Robbie C M; Nuijten, Michèle B; Olsson-Collentine, Anton; Stoevenbelt, Andrea H; van den Akker, Olmo R; Klein, Richard A; Wicherts, Jelte M.

R Soc Open Sci ; 10(8): 202326, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-37593717

RESUMEN

The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output. In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints. Our results show that the overall prevalence of incorrectly reported statistics is 9-10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints. In conclusion, the literature suggests that COVID-19 research may on average have more methodological problems than non-COVID-19 research, but our results show that there is no difference in the statistical reporting quality.

5.

Are Speeded Tests Unfair? Modeling the Impact of Time Limits on the Gender Gap in Mathematics.

Stoevenbelt, Andrea H; Wicherts, Jelte M; Flore, Paulette C; Phillips, Lorraine A T; Pietschnig, Jakob; Verschuere, Bruno; Voracek, Martin; Schwabe, Inga.

Educ Psychol Meas ; 83(4): 684-709, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-37398839

RESUMEN

When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test is administered with a strict time limit, whereas gender gaps are frequently reduced when time limits are relaxed. In this study, we propose that gender differences in test strategy might inflate gender gaps favoring men, and relate test strategy to stereotype threat effects under which women underperform due to the pressure of negative stereotypes about their performance. First, we applied a Bayesian two-dimensional item response theory (IRT) model to data obtained from two registered reports that investigated stereotype threat in mathematics, and estimated the latent correlation between underlying test strategy (here, completion factor, a proxy for working speed) and mathematics ability. Second, we tested the gender gap and assessed potential effects of stereotype threat on female test performance. We found a positive correlation between the completion factor and mathematics ability, such that more able participants dropped out later in the test. We did not observe a stereotype threat effect but found larger gender differences on the latent completion factor than on latent mathematical ability, suggesting that test strategies affect the gender gap in timed mathematics performance. We argue that if the effect of time limits on tests is not taken into account, this may lead to test unfairness and biased group comparisons, and urge researchers to consider these effects in either their analyses or study planning.

6.

The effects of height-for-age and HIV on cognitive development of school-aged children in Nairobi, Kenya: a structural equation modelling analysis.

Maina, Rachel; He, Jia; Abubakar, Amina; Perez-Garcia, Miguel; Kumar, Manasi; Wicherts, Jelte M.

Front Public Health ; 11: 1171851, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37415707

RESUMEN

Background: Empirical evidence indicates that both HIV infection and stunting impede cognitive functions of school-going children. However, there is less evidence on how these two risk factors amplify each other's negative effects. This study aimed to examine the direct effects of stunting on cognitive outcomes and the extent to which stunting (partially) mediates the effects of HIV, age, and gender on cognitive outcomes. Methodology: We applied structural equation modelling to cross-sectional data from 328 children living with HIV and 260 children living without HIV aged 6-14 years from Nairobi, Kenya to test the mediating effect of stunting and predictive effects of HIV, age, and gender on cognitive latent variables flexibility, fluency, reasoning, and verbal memory. Results: The model predicting the cognitive outcomes fitted well (RMSEA = 0.041, CFI = 0.966, χ2 = 154.29, DF = 77, p < 0.001). Height-for-age (a continuous indicator of stunting) predicted fluency (ß = 0.14) and reasoning (ß = 0.16). HIV predicted height-for-age (ß = -0.24) and showed direct effects on reasoning (ß = -0.66), fluency (ß = -0.34), flexibility (ß = 0.26), and verbal memory (ß = -0.22), highlighting that the effect of HIV on cognitive variables was partly mediated by height-for-age. Conclusion: In this study, we found evidence that stunting partly explains the effects of HIV on cognitive outcomes. The model suggests there is urgency to develop targeted preventative and rehabilitative nutritional interventions for school children with HIV as part of a comprehensive set of interventions to improve cognitive functioning in this high-risk group of children. Being infected or having been born to a mother who is HIV positive poses a risk to normal child development.

Asunto(s)

Infecciones por VIH , Femenino , Humanos , Niño , Infecciones por VIH/epidemiología , Infecciones por VIH/complicaciones , Análisis de Clases Latentes , Kenia/epidemiología , Estudios Transversales , Trastornos del Crecimiento/epidemiología , Trastornos del Crecimiento/etiología , Cognición

7.

Meta-analyzing the multiverse: A peek under the hood of selective reporting.

Olsson-Collentine, Anton; van Aert, Robbie C M; Bakker, Marjan; Wicherts, Jelte.

Psychol Methods ; 2023 May 11.

Artículo en Inglés | MEDLINE | ID: mdl-37166859

RESUMEN

Researcher degrees of freedom refer to arbitrary decisions in the execution and reporting of hypothesis-testing research that allow for many possible outcomes from a single study. Selective reporting of results (p-hacking) from this "multiverse" of outcomes can inflate effect size estimates and false positive rates. We studied the effects of researcher degrees of freedom and selective reporting using empirical data from extensive multistudy projects in psychology (Registered Replication Reports) featuring 211 samples and 14 dependent variables. We used a counterfactual design to examine what biases could have emerged if the studies (and ensuing meta-analyses) had not been preregistered and could have been subjected to selective reporting based on the significance of the outcomes in the primary studies. Our results show the substantial variability in effect sizes that researcher degrees of freedom can create in relatively standard psychological studies, and how selective reporting of outcomes can alter conclusions and introduce bias in meta-analysis. Despite the typically thousands of outcomes appearing in the multiverses of the 294 included studies, only in about 30% of studies did significant effect sizes in the hypothesized direction emerge. We also observed that the effect of a particular researcher degree of freedom was inconsistent across replication studies using the same protocol, meaning multiverse analyses often fail to replicate across samples. We recommend hypothesis-testing researchers to preregister their preferred analysis and openly report multiverse analysis. We propose a descriptive index (underlying multiverse variability) that quantifies the robustness of results across alternative ways to analyze the data. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

8.

Psychometric evaluation of the computerized battery for neuropsychological evaluation of children (BENCI) among school aged children in the context of HIV in an urban Kenyan setting.

Rachel, Maina; Jia, He; Amina, Abubakar; Perez-Garcia, Miguel; Kumar, Manasi; Wicherts, Jelte M.

BMC Psychiatry ; 23(1): 373, 2023 05 29.

Artículo en Inglés | MEDLINE | ID: mdl-37248481

RESUMEN

INTRODUCTION: Culturally validated neurocognitive measures for children in Low- and Middle-Income Countries are important in the timely and correct identification of neurocognitive impairments. Such measures can inform development of interventions for children exposed to additional vulnerabilities like HIV infection. The Battery for Neuropsychological Evaluation of Children (BENCI) is an openly available, computerized neuropsychological battery specifically developed to evaluate neurocognitive impairment. This study adapted the BENCI and evaluated its reliability and validity in Kenya. METHODOLOGY: The BENCI was adapted using translation and back-translation from Spanish to English. The psychometric properties were evaluated in a case-control study of 328 children (aged 6 - 14 years) living with HIV and 260 children not living with HIV in Kenya. We assessed reliability, factor structure, and measurement invariance with respect to HIV. Additionally, we examined convergent validity of the BENCI using tests from the Kilifi Toolkit. RESULTS: Internal consistencies (0.49 < α < 0.97) and test-retest reliabilities (-.34 to .81) were sufficient-to-good for most of the subtests. Convergent validity was supported by significant correlations between the BENCI's Verbal memory and Kilifi's Verbal List Learning (r = .41), the BENCI's Visual memory and Kilifi's Verbal List Learning (r = .32) and the BENCI's Planning total time test and Kilifi's Tower Test (r = -.21) and the BENCI's Abstract Reasoning test and Kilifi's Raven's Progressive Matrix (r = .21). The BENCI subtests highlighted meaningful differences between children living with HIV and those not living with HIV. After some minor adaptions, a confirmatory four-factor model consisting of flexibility, fluency, reasoning and working memory fitted well (χ2 = 135.57, DF = 51, N = 604, p < .001, RMSEA = .052, CFI = .944, TLI = .914) and was partially scalar invariant between HIV positive and negative groups. CONCLUSION: The English version of the BENCI formally translated for use in Kenya can be further adapted and integrated in clinical and research settings as a valid and reliable cognitive test battery.

Asunto(s)

Infecciones por VIH , Humanos , Niño , Kenia , Infecciones por VIH/complicaciones , Infecciones por VIH/diagnóstico , Infecciones por VIH/psicología , Psicometría , Reproducibilidad de los Resultados , Estudios de Casos y Controles , Pruebas Neuropsicológicas , Encuestas y Cuestionarios

9.

The process of replication target selection in psychology: what to consider?

Pittelkow, Merle-Marie; Field, Sarahanne M; Isager, Peder M; Van't Veer, Anna E; Anderson, Thomas; Cole, Scott N; Dominik, Tomás; Giner-Sorolla, Roger; Gok, Sebahat; Heyman, Tom; Jekel, Marc; Luke, Timothy J; Mitchell, David B; Peels, Rik; Pendrous, Rosina; Sarrazin, Samuel; Schauer, Jacob M; Specker, Eva; Tran, Ulrich S; Vranka, Marek A; Wicherts, Jelte M; Yoshimura, Naoto; Zwaan, Rolf A; van Ravenzwaaij, Don.

R Soc Open Sci ; 10(2): 210586, 2023 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-36756069

RESUMEN

Increased execution of replication studies contributes to the effort to restore credibility of empirical research. However, a second generation of problems arises: the number of potential replication targets is at a serious mismatch with available resources. Given limited resources, replication target selection should be well-justified, systematic and transparently communicated. At present the discussion on what to consider when selecting a replication target is limited to theoretical discussion, self-reported justifications and a few formalized suggestions. In this Registered Report, we proposed a study involving the scientific community to create a list of considerations for consultation when selecting a replication target in psychology. We employed a modified Delphi approach. First, we constructed a preliminary list of considerations. Second, we surveyed psychologists who previously selected a replication target with regards to their considerations. Third, we incorporated the results into the preliminary list of considerations and sent the updated list to a group of individuals knowledgeable about concerns regarding replication target selection. Over the course of several rounds, we established consensus regarding what to consider when selecting a replication target. The resulting checklist can be used for transparently communicating the rationale for selecting studies for replication.

10.

How do psychology researchers interpret the results of multiple replication studies?

van den Akker, Olmo R; Wicherts, Jelte M; Alvarez, Linda Dominguez; Bakker, Marjan; van Assen, Marcel A L M.

Psychon Bull Rev ; 30(4): 1609-1620, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-36635588

RESUMEN

Employing two vignette studies, we examined how psychology researchers interpret the results of a set of four experiments that all test a given theory. In both studies, we found that participants' belief in the theory increased with the number of statistically significant results, and that the result of a direct replication had a stronger effect on belief in the theory than the result of a conceptual replication. In Study 2, we additionally found that participants' belief in the theory was lower when they assumed the presence of p-hacking, but that belief in the theory did not differ between preregistered and non-preregistered replication studies. In analyses of individual participant data from both studies, we examined the heuristics academics use to interpret the results of four experiments. Only a small proportion (Study 1: 1.6%; Study 2: 2.2%) of participants used the normative method of Bayesian inference, whereas many of the participants' responses were in line with generally dismissed and problematic vote-counting approaches. Our studies demonstrate that many psychology researchers overestimate the evidence in favor of a theory if one or more results from a set of replication studies are statistically significant, highlighting the need for better statistical education.

Asunto(s)

Heurística , Política , Humanos , Teorema de Bayes , Psicología

11.

Type D Personality as a Risk Factor for Adverse Outcome in Patients With Cardiovascular Disease: An Individual Patient-Data Meta-analysis.

Lodder, Paul; Wicherts, Jelte M; Antens, Marijn; Albus, Christian; Bessonov, Ivan S; Condén, Emelie; Dulfer, Karolijn; Gostoli, Sara; Grande, Gesine; Hedberg, Pär; Herrmann-Lingen, Christoph; Jaarsma, Tiny; Koo, Malcolm; Lin, Ping; Lin, Tin-Kwang; Meyer, Thomas; Pushkarev, Georgiy; Rafanelli, Chiara; Raykh, Olga I; Schaan de Quadros, Alexandre; Schmidt, Marcia; Sumin, Alexei N; Utens, Elisabeth M W J; van Veldhuisen, Dirk J; Wang, Yini; Kupper, Nina.

Psychosom Med ; 85(2): 188-202, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36640440

RESUMEN

OBJECTIVE: Type D personality, a joint tendency toward negative affectivity and social inhibition, has been linked to adverse events in patients with heart disease, although with inconsistent findings. Here, we apply an individual patient-data meta-analysis to data from 19 prospective cohort studies ( N = 11,151) to investigate the prediction of adverse outcomes by type D personality in patients with acquired cardiovascular disease. METHOD: For each outcome (all-cause mortality, cardiac mortality, myocardial infarction, coronary artery bypass grafting, percutaneous coronary intervention, major adverse cardiac event, any adverse event), we estimated type D's prognostic influence and the moderation by age, sex, and disease type. RESULTS: In patients with cardiovascular disease, evidence for a type D effect in terms of the Bayes factor (BF) was strong for major adverse cardiac event (BF = 42.5; odds ratio [OR] = 1.14) and any adverse event (BF = 129.4; OR = 1.15). Evidence for the null hypothesis was found for all-cause mortality (BF = 45.9; OR = 1.03), cardiac mortality (BF = 23.7; OR = 0.99), and myocardial infarction (BF = 16.9; OR = 1.12), suggesting that type D had no effect on these outcomes. This evidence was similar in the subset of patients with coronary artery disease (CAD), but inconclusive for patients with heart failure (HF). Positive effects were found for negative affectivity on cardiac and all-cause mortality, with the latter being more pronounced in male than female patients. CONCLUSION: Across 19 prospective cohort studies, type D predicts adverse events in patients with CAD, whereas evidence in patients with HF was inconclusive. In both patients with CAD and HF, we found evidence for a null effect of type D on cardiac and all-cause mortality.

Asunto(s)

Enfermedades Cardiovasculares , Enfermedad de la Arteria Coronaria , Infarto del Miocardio , Intervención Coronaria Percutánea , Personalidad Tipo D , Humanos , Masculino , Femenino , Enfermedades Cardiovasculares/epidemiología , Enfermedades Cardiovasculares/etiología , Estudios Prospectivos , Teorema de Bayes , Enfermedad de la Arteria Coronaria/etiología , Infarto del Miocardio/epidemiología , Infarto del Miocardio/etiología , Factores de Riesgo , Resultado del Tratamiento

12.

How to protect privacy in open data.

Wicherts, Jelte M; Klein, Richard A; Swaans, Sofie H F; Maassen, Esther; Stoevenbelt, Andrea H; Peeters, Victor H B T G; de Jonge, Myrthe; Rüffer, Franziska.

Nat Hum Behav ; 6(12): 1603-1605, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-36333477

Asunto(s)

Confidencialidad , Privacidad , Humanos

13.

Prevalence of responsible research practices among academics in The Netherlands.

Gopalakrishna, Gowri; Wicherts, Jelte M; Vink, Gerko; Stoop, Ineke; van den Akker, Olmo R; Ter Riet, Gerben; Bouter, Lex M.

F1000Res ; 11: 471, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36128558

RESUMEN

Background: Traditionally, research integrity studies have focused on research misbehaviors and their explanations. Over time, attention has shifted towards preventing questionable research practices and promoting responsible ones. However, data on the prevalence of responsible research practices, especially open methods, open codes and open data and their underlying associative factors, remains scarce. Methods: We conducted a web-based anonymized questionnaire, targeting all academic researchers working at or affiliated to a university or university medical center in The Netherlands, to investigate the prevalence and potential explanatory factors of 11 responsible research practices. Results: A total of 6,813 academics completed the survey, the results of which show that prevalence of responsible practices differs substantially across disciplines and ranks, with 99 percent avoiding plagiarism in their work but less than 50 percent pre-registering a research protocol. Arts and humanities scholars as well as PhD candidates and junior researchers engaged less often in responsible research practices. Publication pressure negatively affected responsible practices, while mentoring, scientific norms subscription and funding pressure stimulated them. Conclusions: Understanding the prevalence of responsible research practices across disciplines and ranks, as well as their associated explanatory factors, can help to systematically address disciplinary- and academic rank-specific obstacles, and thereby facilitate responsible conduct of research.

Asunto(s)

Humanidades , Investigadores , Humanos , Países Bajos , Prevalencia , Universidades

14.

Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands.

Gopalakrishna, Gowri; Ter Riet, Gerben; Vink, Gerko; Stoop, Ineke; Wicherts, Jelte M; Bouter, Lex M.

PLoS One ; 17(2): e0263023, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35171921

RESUMEN

Prevalence of research misconduct, questionable research practices (QRPs) and their associations with a range of explanatory factors has not been studied sufficiently among academic researchers. The National Survey on Research Integrity targeted all disciplinary fields and academic ranks in the Netherlands. It included questions about engagement in fabrication, falsification and 11 QRPs over the previous three years, and 12 explanatory factor scales. We ensured strict identity protection and used the randomized response method for questions on research misconduct. 6,813 respondents completed the survey. Prevalence of fabrication was 4.3% (95% CI: 2.9, 5.7) and of falsification 4.2% (95% CI: 2.8, 5.6). Prevalence of QRPs ranged from 0.6% (95% CI: 0.5, 0.9) to 17.5% (95% CI: 16.4, 18.7) with 51.3% (95% CI: 50.1, 52.5) of respondents engaging frequently in at least one QRP. Being a PhD candidate or junior researcher increased the odds of frequently engaging in at least one QRP, as did being male. Scientific norm subscription (odds ratio (OR) 0.79; 95% CI: 0.63, 1.00) and perceived likelihood of detection by reviewers (OR 0.62, 95% CI: 0.44, 0.88) were associated with engaging in less research misconduct. Publication pressure was associated with more often engaging in one or more QRPs frequently (OR 1.22, 95% CI: 1.14, 1.30). We found higher prevalence of misconduct than earlier surveys. Our results suggest that greater emphasis on scientific norm subscription, strengthening reviewers in their role as gatekeepers of research quality and curbing the "publish or perish" incentive system promotes research integrity.

Asunto(s)

Investigación Biomédica/ética , Ética en Investigación , Proyectos de Investigación/normas , Investigadores/ética , Mala Conducta Científica/ética , Mala Conducta Científica/estadística & datos numéricos , Estudios Transversales , Femenino , Humanos , Masculino , Prevalencia , Encuestas y Cuestionarios

15.

Times are changing, bias isn't: A meta-meta-analysis on publication bias detection practices, prevalence rates, and predictors in industrial/organizational psychology.

Siegel, Magdalena; Eder, Junia Sophia Nur; Wicherts, Jelte M; Pietschnig, Jakob.

J Appl Psychol ; 107(11): 2013-2039, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-34968082

RESUMEN

Effect misestimations plague Psychological Science, but advances in the identification of dissemination biases in general and publication bias in particular have helped in dealing with biased effects in the literature. However, the application of publication bias detection methods appears to be not equally prevalent across subdisciplines. It has been suggested that particularly in I/O Psychology, appropriate publication bias detection methods are underused. In this meta-meta-analysis, we present prevalence estimates, predictors, and time trends of publication bias in 128 meta-analyses that were published in the Journal of Applied Psychology (7,263 effect sizes, 3,000,000 + participants). Moreover, we reanalyzed data of 87 meta-analyses and applied nine standard and more modern publication bias detection methods. We show that (a) the bias detection method applications are underused (only 41% of meta-analyses use at least one method) but have increased in recent years, (b) those meta-analyses that apply such methods now use more, but mostly inappropriate methods, and (c) the prevalence of potential publication bias is concerning but mostly remains undetected. Although our results indicate somewhat of a trend toward higher bias awareness, they substantiate concerns about potential publication bias in I/O Psychology, warranting increased researcher awareness about appropriate and state-of-the-art bias detection and triangulation. Embracing open science practices such as data sharing or study preregistration is needed to raise reproducibility and ultimately strengthen Psychological Science in general and I/O Psychology in particular. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

Asunto(s)

Psicología Industrial , Humanos , Sesgo de Publicación , Reproducibilidad de los Resultados , Prevalencia , Sesgo

16.

How misconduct helped psychological science to thrive.

Wicherts, Jelte.

Nature ; 597(7875): 153, 2021 09.

Artículo en Inglés | MEDLINE | ID: mdl-34493844

Asunto(s)

Mala Conducta Científica , Ética en Investigación

17.

A systematic review comparing two popular methods to assess a Type D personality effect.

Lodder, Paul; Kupper, Nina; Antens, Marijn; Wicherts, Jelte M.

Gen Hosp Psychiatry ; 71: 62-75, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33962138

RESUMEN

INTRODUCTION: Type D personality, operationalized as high scores on negative affectivity (NA) and social inhibition (SI), has been associated with various medical and psychosocial outcomes. The recent failure to replicate several earlier findings could result from the various methods used to assess the Type D effect. Despite recommendations to analyze the continuous NA and SI scores, a popular approach groups people as having Type D personality or not. This method does not adequately detect a Type D effect as it is also sensitive to main effects of NA or SI only, suggesting the literature contains false positive Type D effects. Here, we systematically assess the extent of this problem. METHOD: We conducted a systematic review including 44 published studies assessing a Type D effect with both a continuous and dichotomous operationalization. RESULTS: The dichotomous method showed poor agreement with the continuous Type D effect. Of the 89 significant dichotomous method effects, 37 (41.6%) were Type D effects according to the continuous method. The remaining 52 (58.4%) are therefore likely not Type D effects based on the continuous method, as 42 (47.2%) were main effects of NA or SI only. CONCLUSION: Half of the published Type D effect according to the dichotomous method may be false positives, with only NA or SI driving the outcome.

Asunto(s)

Personalidad Tipo D , Humanos , Inhibición Psicológica , Personalidad

18.

Ensuring the quality and specificity of preregistrations.

Bakker, Marjan; Veldkamp, Coosje L S; van Assen, Marcel A L M; Crompvoets, Elise A V; Ong, How Hwee; Nosek, Brian A; Soderberg, Courtney K; Mellor, David; Wicherts, Jelte M.

PLoS Biol ; 18(12): e3000937, 2020 12.

Artículo en Inglés | MEDLINE | ID: mdl-33296358

RESUMEN

Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of "researcher degrees of freedom" aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false-positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared 2 formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge Registration (now called "OSF Preregistration," http://osf.io/prereg/). The Prereg Challenge format was a "structured" workflow with detailed instructions and an independent review to confirm completeness; the "Standard" format was "unstructured" with minimal direct guidance to give researchers flexibility for what to prespecify. Results of comparing random samples of 53 preregistrations from each format indicate that the "structured" format restricted the opportunistic use of researcher degrees of freedom better (Cliff's Delta = 0.49) than the "unstructured" format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.

Asunto(s)

Recolección de Datos/métodos , Proyectos de Investigación/estadística & datos numéricos , Recolección de Datos/normas , Recolección de Datos/tendencias , Humanos , Control de Calidad , Sistema de Registros/estadística & datos numéricos , Proyectos de Investigación/tendencias

19.

Effect Sizes, Power, and Biases in Intelligence Research: A Meta-Meta-Analysis.

Nuijten, Michèle B; van Assen, Marcel A L M; Augusteijn, Hilde E M; Crompvoets, Elise A V; Wicherts, Jelte M.

J Intell ; 8(4)2020 Oct 02.

Artículo en Inglés | MEDLINE | ID: mdl-33023250

RESUMEN

In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson's correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.

20.

Recommendations in pre-registrations and internal review board proposals promote formal power analyses but do not increase sample size.

Bakker, Marjan; Veldkamp, Coosje L S; van den Akker, Olmo R; van Assen, Marcel A L M; Crompvoets, Elise; Ong, How Hwee; Wicherts, Jelte M.

PLoS One ; 15(7): e0236079, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32735597

RESUMEN

In this preregistered study, we investigated whether the statistical power of a study is higher when researchers are asked to make a formal power analysis before collecting data. We compared the sample size descriptions from two sources: (i) a sample of pre-registrations created according to the guidelines for the Center for Open Science Preregistration Challenge (PCRs) and a sample of institutional review board (IRB) proposals from Tilburg School of Behavior and Social Sciences, which both include a recommendation to do a formal power analysis, and (ii) a sample of pre-registrations created according to the guidelines for Open Science Framework Standard Pre-Data Collection Registrations (SPRs) in which no guidance on sample size planning is given. We found that PCRs and IRBs (72%) more often included sample size decisions based on power analyses than the SPRs (45%). However, this did not result in larger planned sample sizes. The determined sample size of the PCRs and IRB proposals (Md = 90.50) was not higher than the determined sample size of the SPRs (Md = 126.00; W = 3389.5, p = 0.936). Typically, power analyses in the registrations were conducted with G*power, assuming a medium effect size, α = .05 and a power of .80. Only 20% of the power analyses contained enough information to fully reproduce the results and only 62% of these power analyses pertained to the main hypothesis test in the pre-registration. Therefore, we see ample room for improvements in the quality of the registrations and we offer several recommendations to do so.

Asunto(s)

Comités de Ética en Investigación , Tamaño de la Muestra , Estadística como Asunto/métodos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA