RESUMEN
The Journal Impact Factor is often used as a proxy measure for journal quality, but the empirical evidence is scarce. In particular, it is unclear how peer review characteristics for a journal relate to its impact factor. We analysed 10,000 peer review reports submitted to 1,644 biomedical journals with impact factors ranging from 0.21 to 74.7. Two researchers hand-coded sentences using categories of content related to the thoroughness of the review (Materials and Methods, Presentation and Reporting, Results and Discussion, Importance and Relevance) and helpfulness (Suggestion and Solution, Examples, Praise, Criticism). We fine-tuned and validated transformer machine learning language models to classify sentences. We then examined the association between the number and percentage of sentences addressing different content categories and 10 groups defined by the Journal Impact Factor. The median length of reviews increased with higher impact factor, from 185 words (group 1) to 387 words (group 10). The percentage of sentences addressing Materials and Methods was greater in the highest Journal Impact Factor journals than in the lowest Journal Impact Factor group. The results for Presentation and Reporting went in the opposite direction, with the highest Journal Impact Factor journals giving less emphasis to such content. For helpfulness, reviews for higher impact factor journals devoted relatively less attention to Suggestion and Solution than lower impact factor journals. In conclusion, peer review in journals with higher impact factors tends to be more thorough, particularly in addressing study methods while giving relatively less emphasis to presentation or suggesting solutions. Differences were modest and variability high, indicating that the Journal Impact Factor is a bad predictor of the quality of peer review of an individual manuscript.