Open-Source Email Curation Software Designed for Reusability
Description
Email is more than half a century old and fills a vital role in activities across all sectors of society. However, the professional curation of email is still relatively immature. Many proprietary tools that extract and query email content operate as black boxes and cannot be easily evaluated or integrated with other digital curation software. Recently, there has been progress in the development of open-source software for email curation, but many institutions struggle to integrate these tools into digital curation workflows that usually involve other applications and systems.
We report on a project called Review, Appraisal and Triage of Mail (RATOM) that developed software for interactive review, selection and appraisal of email collections held in PST, OST, and mbox formats. These tools allow users to create, validate, and query reports and metadata generated from email collections. We have designed the software for reusability from the ground up. The software is open source and distributed as several independent modules that can be incorporated into existing and emerging workflows. Output is structured to be easily queried in support of new and emerging tasks and access scenarios. We also describe a reusable tool to simplify management of machine learning models for identifying named entities in email.
Files
IDCC 2022: Open-Source Email Curation Software Designed for Reusability.pdf
Files
(1.2 MB)
Name | Size | Download all |
---|---|---|
md5:981261ada281eb58f18ae9948be9d576
|
1.2 MB | Preview Download |