Published December 17, 2023 | Version v3
Dataset Open

Lists of stopwords, polarity shifters and AnAwords of Bosnian language

  • 1. University of Primorska, UP FAMNIT

Description

The dataset comprises three lists, a list of stopwords, a list of polarity shifters and a list of AnAwords (in two files) of the Bosnian language.

Stopwords refer to a set of words contained in a stop list that are deliberately filtered out or "stopped" during the processing of natural language data, specifically text. These words are typically common and frequently occurring words in a language that are considered to have little or no significance in determining the meaning or context of a text.

AnAwords (intensifiers and diminishers) refer to a set of words primarily functioning as intensifiers and diminishers, often manifesting as adverbs of manner and adjectives. The compilation of AnAwords is based on categorization, which includes six sublists: maximizers, boosters, approximators, relative intensifiers, diminishers, and minimizers. The list is split into two parts (intensifiers and diminishers) in two separate files.

Polarity shifters are words that can affect the polarity of a phrase, inverting or weakening it. When these words are content words, such as verbs, nouns, and adjectives, we refer to them as polarity shifters.

Files

BOSNIAN_AnAwords_diminishers_2023.txt

Files (4.1 kB)

Name Size Download all
md5:0c120e3cc184f3f7bac2df43a1739f9b
455 Bytes Preview Download
md5:a406a24226ce3e4788f291b8bad64d0a
827 Bytes Preview Download
md5:b062bf0c095603b4697dfc0110ed2e88
260 Bytes Preview Download
md5:d7d6bae65540d392418894b56a88b8e6
2.6 kB Preview Download