Mario Malički

Stanford University

Structured Peer Review: Pilot Results from23 Elsevier Journals

Mario Malički, Bahar Mehmani

15 September 2023
Session 5 ‣ Research integrity, assessment and social impact
9:30 – 11:15

Background and objectives: Reviewers rarely comment on the same aspects of a manuscript, making it difficult to properly assess manuscripts’ quality and the quality of the peer review process itself. With regards to reviewers’ recommendations, 2010 meta-analysis found a very low inter-reviewer agreement of 0.34,(1) and the Elsevier data covering 7,220,243 manuscripts from 2019 to 2021 across 2,416 journals, found 30% absolute reviewer recommendation agreement for the first review round.(2) It was the goal of this study to evaluate a pilot of structured peer review by: 1) exploring if and how reviewers answered structured peer review questions, 2) analysing their agreement, 3) comparing that agreement to agreement rate before implementation of structured peer review, and 4) further enhancing the piloted set of structured peer review questions.

Design: Structured peer review consisting of 9 questions was piloted in August 2022 in 220 Elsevier journals. For pilot analysis we aimed for 10% of this sample. We applied a random selection of journals across all fields and IF quartiles, and then selected research manuscripts that in the first 2 months of the pilot received 2 reviewer reports, leaving us with 107 manuscripts belonging to 23 journals. We did not have access to further review rounds, or final editors’ recommendations for these manuscripts. Review reports were qualitatively analysed, with (partial) agreement defined as reviewers answering a question with the same answer (e.g., yes, no, NA, etc.) or a similar answer (i.e., one reviewer answering yes, the other – yes, but I would suggest improving…). Eight questions had open ended fields, while the ninth question (on language editing) had only a yes/no option. After the 9 questions, reviews could leave Comments-to-Author, and Comments-to-Editor. All answers (for questions 1 to 8 and Comments-to-Author) were independently coded by MM and BM (with inter-rater agreement of 94%), who then met on regular intervals to reach a consensus that was used for results reporting.

Results: Almost all reviewers (n=196, 92%) provided answers to all questions, with 12 (6%) skipping one question, and 6 (3%) skipping two questions. Overall length of reviewers’ answers to the 8 questions (9th questions was a yes/no) was 164 words (IQR 73 to 357), with the longest answer (Md 27 words, IQR 11 to 68) provided for question 2 (reporting methods with sufficient details for replicability or reproducibility). Reviewers had highest (partial) agreement (of 72%) for assessing the flow and structure of the manuscript, and lowest (of 53%) for assessing if interpretation of results are supported by data, and for assessing if statistical analyses were appropriate and reported in sufficient detail (also 53%). Two thirds of reviewers (n=145, 68%) filled out the Comments-to-Author section, which resembled standard peer review reports compiled during the review process and then copied to the field. Those Comments-to-Author sections contained on average 4 out of 9 topics (SD 2) covered by the structured questions. Absolute agreement regarding final recommendations (exact match of recommendation choice) was 41%, which was higher than what those journals had in the period of 2019 to 2021 (31% agreement, P=0.0275).

Conclusions: Our preliminary results indicate that adoption of structured peer review leads to reviewers covering more topics than they usually do in their reports. Individual question analysis indicated highest disagreement regarding interpretation of results and conducting and reporting of statistical analyses. While structured peer review did lead to improvement in reviewer final recommendation agreements, this was not a randomised trial, and further studies should be done to corroborate this. Further research is also needed to determine if structured peer review leads to greater knowledge transfer or improvement of final version of manuscripts.