Published December 19, 2012 | Version v1
Report Open

Identification of preservation risks in PDF with Apache Preflight - a first impression

  • 1. KB

Description

This report explores the feasibility of using the Apache Preflight PDF/A validator to detect 'risky' features in 'regular' (i.e. non-PDF/A) PDF documents.

The specific objectives of this work were:

  •     To get a first impression of the Apache Preflight (part of PDFBox) PDF/A-1b validator.
  •     To investigate if Apache Preflight is able to detect unwanted (from a preservation point of view) features in PDF files (i.e. PDFs that are not necessarily of the PDF/A sub-type) such as password protection, encryption and non-embedded fonts.
  •     To provide a comparison with the Preflight module of Adobe Acrobat 9.5.
  •     To decide if doing more work on Apache Preflight (more elaborate testing, possible involvement in its development) are worthwhile.

Files

pdfProfilingJvdK19122012.pdf

Files (573.1 kB)

Name Size Download all
md5:8032fef68a2377bc25f7ff6fa955f4c6
573.1 kB Preview Download

Additional details

Funding

European Commission
SCAPE - Scalable Preservation Environments 270137