Modelling of multi-modal data in LiRI Corpus Platform and beyond
Description
While there exist common solutions and standard practices to model text-only corpora, multimodal conversation corpora present the unique challenge of integrating textual transcripts, temporal information and annotations that can equally relate to speech and non-speech modalities. The LiRI Corpus Platform (LCP) was designed to accommodate both textual and multimodal corpora, allowing for the modeling of time-aligned annotations associated with multimedia files, making it possible for users of the platform to efficiently browse and query multimodal corpora. LCP aims to become a solution of reference for hosting multimodal corpora and we especially encourage curators of such corpora to attend this session.
In this session, we report on the process of importing a multimodal conversation corpus into LCP as part of the FAIR FI-LD project. We will give an overview of the different file formats that were involved in the modelling of the corpus and explain how we converged on using the TEI/ISO format as our interoperable standard. Participants engage in a practical exercise as they are guided through a simple Python script to import a simplified sample corpus into LCP. We end the session with a general discussion about setting up workflows for the curation of multi-modal corpora and about the possibilities of automatizing the extraction and annotation of speech from videos.
The CLARIN-CH Training Sessions 2025: Exploring Swiss Language Resources and Tools
This training series took place during the spring semester 2025 and was organized by members of the CLARIN-CH ecosystem of infrastructures with the aim to introduce and deepen participants' knowledge of the Swiss national infrastructure for language data.
Files
CLARIN-CH_Training_20250428_Miecznikowski+Zehr_Presentation.pdf
Files
(811.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:bca3911d53f639b86fb44781a32c8bc7
|
2.1 MB | Preview Download |
|
md5:5e57995e9cb60029169763d75ab3a7f4
|
809.3 MB | Preview Download |
Additional details
Related works
- Is documented by
- Report: 10.5281/zenodo.15826674 (DOI)
Funding
- Swissuniversities