There is a newer version of the record available.

Published November 25, 2025 | Version v2
Journal article Open

AI-Powered Secure Document Anonymization Pipeline: A Serverless AWS Architecture for PII Detection and Redaction

Authors/Creators

  • 1. ROR icon Birla Institute of Technology and Science, Pilani

Description

Organizations struggle with manual redaction of Personally Identifiable Information (PII) from sensitive documents, facing significant compliance challenges with GDPR and HIPAA regulations. This work develops an intelligent, serverless document anonymization pipeline using Amazon Web Services to automate PII detection and redaction processes. The solution employs AWS Step Functions to orchestrate a microservices architecture that ingests documents through a secure web interface, extracts text using Amazon Textract, identifies sensitive information via Amazon Comprehend, and applies configurable anonymization strategies. The system integrates multiple AWS services including Lambda functions for processing logic, API Gateway for API communication between frontend and backend, S3 for storage, DynamoDB for audit trails, and EventBridge for workflow management. Key features include a JavaScript-based frontend with real-time progress tracking, support for multiple document formats (PDF, TXT, and images), and intelligent PII detection covering names, Social Security numbers, emails, medical information and other sensitive PII. Security measures encompass malware scanning via GuardDuty, encryption at rest and in transit, fine-grained IAM policies, and comprehensive audit logging via CloudTrail. The serverless architecture ensures cost-effectiveness through pay-per-use pricing while providing automatic scaling capabilities. This implementation showcases a practical application of cloud-native architectures and AI services for solving real-world data privacy challenges in enterprise environments.

Files

document-anonymization-publication.pdf

Files (860.2 kB)

Name Size Download all
md5:76d46ec0eb7b254416871ab27631ac84
860.2 kB Preview Download