variable,description
row,original rank order
keypoints,occasional notes by DBishop
ID,PubPeer ID 
comment.timestamp,date/time for 1st comment 
Publication.date,date of commented publication
Pubpeer.Link,weblink to 1st PubPeer comment
Article.title,as in variable name
publisher,as in variable name
authors,as in variable name
journal_name,as in variable name
author.affiliations,as in variable name
doi,as in variable name
url,as in variable name
html,as in variable name
uk.affiliations,as in variable name
N AUTHORS,as in variable name
newcat,Category 1-12 as described below
author response - either pubpeer or to journal,1 if any evidence of author engaging
greenticks,N authors with greentick for email - coded only for those categorised 3 or 4 (can determine if 3 or 4 appropriate)
journal action,"Any evidence of retraction, correction etc. (Some could be missed if not obvious)"
% inst involvement,"N authors from institution divided by all authors, except where 1st, last or corresponding author from this institution, in which case 100%"
uni,Short name for university
query type,Notes on nature of PubPeer comments
notes,Notes on findings from web search etc on author
,
,
Newcat is 12 option multiple choice (mutually exclusive - higher category awarded if several apply),
"1. Minor errors, such as mislabelling.",
"2. Detailed review of methods or logic.  Some commentators engaged in detailed post-publication peer review of the kind that was originally envisaged for the PubPeer platform. Although these were often critical, this appeared as robust scientific debate and not to reflect misconduct.  The commentators may sometimes point out problems with methods, though this is often difficult to evaluate without subject expertise. Some authors engage positively with commentators.",
"3. A serious problem with data that could plausibly reflect honest error where the authors engage to explain this and/or correct it. Errors could reflect duplication of an image within a paper to illustrate groups or conditions that should be different, impossible or implausible data values, or a wrongly reported genetic sequence.  In general, this code was used when it was judged that a reputable researcher would not ignore a problem that was this serious.",
"4. Errors in the data similar to those noted under (3) where the authors fail to give a satisfactory explanation, or obfuscate and/or attack the commentator. Where there was uncertainty about the sufficiency of an author's response, they were given the benefit of the doubt, so the code 4 was used only if the author ignored a serious problem or attempted to cover it up. In general, corresponding authors are notified when a PubPeer comment appears, and given the opportunity to respond; however, if the PubPeer record did not show an associated author email, code 3 was used, as it could not be assumed the author was aware of the comment.",
5. Undocumented departure from protocol in a registered study.,
6. Failure to report ethics approval for a study that requires it.,
7. Undeclared conflict of interest.,
"8. Authorship/affiliation issues. This included guest authorship, or misleading affiliation or email. This code excluded articles with the hallmarks of a paper mill (see below).",
9. Evidence of plagiarism or self-plagiarism,
"10. Evidence that the article comes from a paper mill.  Paper mill outputs typically involve elements of fabrication or falsification but were categorised separately here, because they have the distinctive feature that researchers have purchased authorship. They can also be identified by other characteristics, including evidence from the Problematic Paper Screener (Cabanac, Labbé, & Magazinov, 2022), which notes ""tortured phrases"", citation of retracted, questionable or otherwise unreliable sources, and indicators of compromised editorial or peer review processes. Another indicator is numerous co-authors from many different countries in a context where this is hard to explain, or a co-author with a strong track record of paper mill publications.  In general, there is no one definitive indicator of paper mill products. This code would not be applied if there was just a single instance of a tortured phrase or inappropriate citation in an article; rather the judgement is made from the presence of several ""red flags"" such as these. Finally, some paper mill products can be identified from a retraction notice; the publisher Hindawi retracted thousands of papers after its journals became infested with paper mill products (Bik, 2023).",
"11. Evidence of data fabrication or falsification.  Some PubPeer comments include screenshots that show manipulations of data that are difficult to explain other than as a deliberate attempt to mislead.  The main category here is manipulation of images using Photoshop. This goes beyond the kind of image duplication described in (3) to include overwriting sections of an image, or rotating, stretching or otherwise manipulating an image to hide the fact it is a duplication.",
,
Comments that did not belong in any of the categories 1-11 were coded as Other (12).,