Published February 25, 2025 | Version v1
Project deliverable Open

RAISE D4.2 Data Plagiarism Checker Results

  • 1. ROR icon University of Western Macedonia
  • 2. ROR icon Athena Research and Innovation Center In Information Communication & Knowledge Technologies

Description

Dataset plagiarism is a growing concern in the research community. It is crucial to prevent researchers from uploading or generating slightly modified resource updates and asserting ownership or authorship of research outputs without making a substantial contribution. This report aims to investigate various plagiarism detection methods specifically tailored to common research dataset formats, such as images, time series, and comma-separated values (csv). The result of this work will be used as a module for the RAISE EU project’s distributed crowdsourced data processing system. First, we explore transformer models for image plagiarism detection. Second, we adapt the 
SFA representation to detect plagiarism in time-series. Finally, we explore cosine similarity, Jaccard index, Earth Mover’s Distance and Cramer’s V to detect plagiarism in csv datasets. We have evaluated our methods using publicly available datasets as well as datasets generated by RAISE.

Files

RAISE D4.2 Data Plagiarism Checker Results.pdf

Files (718.4 kB)

Name Size Download all
md5:64215e03949d463b914269b2cbf66ae9
718.4 kB Preview Download

Additional details

Funding

European Commission
RAISE - Research Analysis Identifier SystEm 101058479