Vagueness in Metadata
Description
Researchers looking for data across language archives often rely on harvesters such as OLAC or VLO. In this search, they are dependent on mappings of metadata categories across archives.
While conducting a survey of metadata standards and landing pages of a number of DELAMAN archives, we came across problems with the notion that entering a search term in a unifying search box will reliably turn out all relevant available data, nothing less, nothing more.
However, different types of vagueness in the metadata distort this picture. Taking “language”, the probably most relevant metadata category for linguists as an example, it becomes clear that it can refer to at least two categories: In some cases, it describes the content language, in others the languages of the actor(s) in the recording, in yet others it may be the language that is used to translate the resource.
In our talk, we identify different types of vagueness: vagueness of the metadata category itself, values entered can be poorly defined, values can be overlapping, the relation of the metadata category to the resource can be unclear (e.g. does the date describe the publication date, the recording date, the date of metadata creation, the duration of a documentation project?).
We show how vagueness impacts findability of resources and reliability of search results. We attempt to sketch possible solutions for some types of vagueness and discuss whether we just have to live with this vagueness in our metadata descriptions.
Files
MajkaSchwiertzRau2021_Vagueness.pdf
Files
(14.0 MB)
Name | Size | Download all |
---|---|---|
md5:a5ffccab169739c2af47acb95a7e6586
|
14.0 MB | Preview Download |