Privacy Preservation in Textual Data: A Systematic Mapping Study on Differential Privacy and Semantic Similarity
Description
Background: Artificial Intelligence and Machine Learning solutions rely heavily on extracting value from data, often in textual form. Ethical considerations and data protection regulations have intensified the focus on safeguarding sensitive information. Disclosure risks in textual datasets, especially when analyzed through the lens of differential privacy are influenced by text frequency, semantic similarity, and the presence of rare events. Goal: This work aims to identify state-of-the-art techniques for privacy-preserving processing of textual data. The focus is on enabling the application of privacy-enhancing methods for unstructured data, particularly text, as well as on approaches for semantic similarity. Method: To achieve this objective, a Systematic Mapping Study (SMS) was conducted to investigate state-of-the-art privacy preservation techniques. Peer-reviewed studies published between 2010 and 2025 were retrieved from ACM Digital Library, IEEE Xplore, Scopus, and Web of Science. Techniques highlighted in a significant number of studies were selected for deeper analysis and potential application in software engineering. The methodology incorporates concepts from differential privacy, vector databases, semantic similarity, and rare event detection. Results: This study identifies state-of-the-art techniques for privacy-preserving textual data analysis and text similarity. It also investigates how data science methods, large language models, and agent-based AI systems support the implementation of privacy-preserving mechanisms. Additionally, it highlights techniques used for semantic similarity and rare event detection in text-based contexts. The identified techniques provide a foundation for defining guidelines, best practices, and validated methods that can enhance software engineering maturity throughout the lifecycle of textual data, including collection, storage, and processing while addressing privacy risks and regulatory compliance requirements.
Files
Search-String.pdf
Files
(2.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1c6862cbef3fba988ac9c9a3cea74755
|
2.0 MB | Download |
|
md5:22e9c5c05e770ba221276f1c4777aec1
|
133.6 kB | Download |
|
md5:b3ea5c520c0402e20848a23d51f67a8e
|
28.8 kB | Preview Download |
|
md5:2186b25fcd5c1720b377877621d3a43b
|
28.3 kB | Preview Download |
|
md5:37ee3e01187aa1c729cfa4e0c0d7aade
|
26.8 kB | Preview Download |
|
md5:a6798eba2e0fa9b992426ee409383f01
|
25.7 kB | Preview Download |
|
md5:d86be2d008b160636552f44ff8838495
|
25.8 kB | Preview Download |
|
md5:0b2af2952a85db2f99b42777b9888b2a
|
27.4 kB | Preview Download |
|
md5:267d1b4dcdd6b54ae96408fbdbd0f936
|
52.7 kB | Preview Download |
|
md5:630a8e3b1e7a741547c2490181ff5b83
|
57.0 kB | Preview Download |
|
md5:8eb713e09e8174be0655eb74ed66b0cb
|
67.2 kB | Preview Download |
|
md5:84928a931c0b18b08ec5031c49f766e9
|
41.7 kB | Preview Download |