RAISE D4.2 Data Plagiarism Checker Results
Creators
Description
Dataset plagiarism is a growing concern in the research community. It is crucial to prevent researchers from uploading or generating slightly modified resource updates and asserting ownership or authorship of research outputs without making a substantial contribution. This report aims to investigate various plagiarism detection methods specifically tailored to common research dataset formats, such as images, time series, and comma-separated values (csv). The result of this work will be used as a module for the RAISE EU project’s distributed crowdsourced data processing system. First, we explore transformer models for image plagiarism detection. Second, we adapt the
SFA representation to detect plagiarism in time-series. Finally, we explore cosine similarity, Jaccard index, Earth Mover’s Distance and Cramer’s V to detect plagiarism in csv datasets. We have evaluated our methods using publicly available datasets as well as datasets generated by RAISE.
Files
RAISE D4.2 Data Plagiarism Checker Results.pdf
Files
(718.4 kB)
Name | Size | Download all |
---|---|---|
md5:64215e03949d463b914269b2cbf66ae9
|
718.4 kB | Preview Download |