Published December 3, 2024 | Version v1
Presentation Open

Metacurate-ML: Metadata Extraction from CAIs

  • 1. School of Computer Science & Electronic Engineering, University of Surrey
  • 2. ROR icon University of Surrey
  • 3. Scottish Centre for Social Research (ScotCen)
  • 4. CLOSER, UCL, Social Research Institute

Description

Extending the results of our work on pre-trained language models with recent developments in text-layout models and zero-shot techniques. Since relying solely on textual information makes it difficult to accurately classify and extract metadata, a combination of textual content and visual logic that incorporates vision transformers with optimisation techniques will be explored. This will allow us to extract the specific items with questionnaires such as question texts, responses and routing to create a rich source of metadata which provenances’ data collection methodology to the resultant data which can be transformed into DDI-Lifecycle. We will investigate the feasibility of document understanding multimodal models that employ masked language techniques and present the resulting challenges.

Files

EDDI Metadata Extraction v1.pdf

Files (4.0 MB)

Name Size Download all
md5:f5bc6c70e77c804bed950d698c227c90
4.0 MB Preview Download