An Approach to Improve k-Anonymization Practices in Educational Data Mining
Description
Educational data mining has allowed for large improvements in educational outcomes and understanding
of educational processes. However, there remains a constant tension between educational data mining advances
and protecting student privacy while using educational datasets. Publicly available datasets have
facilitated numerous research projects while striving to preserve student privacy via strict anonymization
protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization
and utility of educational datasets for downstream educational data mining tasks, nor how anonymization
processes might be improved for such tasks. We provide a framework for strictly anonymizing educational
datasets with a focus on improving downstream performance in common tasks such as student
outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with
machine learning-based downstream task examples to demonstrate both the effect of anonymization and
our means to improve it. Our method improves downstream machine learning accuracy versus baseline
data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that
anonymize the least important information while leaving the most valuable information intact.
Files
764Stinar61To83.pdf
Files
(419.3 kB)
Name | Size | Download all |
---|---|---|
md5:cd90d1f7c8e12edb74306e60a67c0949
|
419.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/fjstinar/improve-kanon-practices/