Published June 27, 2024 | Version v1
Journal article Open

An Approach to Improve k-Anonymization Practices in Educational Data Mining

  • 1. University of Illinois Urbana–Champaign
  • 2. ROR icon University of Pennsylvania

Description

Educational data mining has allowed for large improvements in educational outcomes and understanding
of educational processes. However, there remains a constant tension between educational data mining advances
and protecting student privacy while using educational datasets. Publicly available datasets have
facilitated numerous research projects while striving to preserve student privacy via strict anonymization
protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization
and utility of educational datasets for downstream educational data mining tasks, nor how anonymization
processes might be improved for such tasks. We provide a framework for strictly anonymizing educational
datasets with a focus on improving downstream performance in common tasks such as student
outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with
machine learning-based downstream task examples to demonstrate both the effect of anonymization and
our means to improve it. Our method improves downstream machine learning accuracy versus baseline
data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that
anonymize the least important information while leaving the most valuable information intact.

Files

764Stinar61To83.pdf

Files (419.3 kB)

Name Size Download all
md5:cd90d1f7c8e12edb74306e60a67c0949
419.3 kB Preview Download

Additional details