Other Open Access
Rhee, Bo-A; Pianzola, Federico; Choi, Gang-Ta
This is a list of labels assigned by Google Vision API to a set of 10,000 images retrieved from Instagram using museum hashtags.
First, we employed computer vision techniques through the Google Cloud Platform Vision API. An object recognition algorithm returned at most ten tags for each image, for instance ‘Pyramid’, ‘Illustration’, ‘Person’, etc. Second, we trained a machine learning algorithm (word2vec) on all image tags to compute their semantic similarity. Looking at the output we then identified clusters of similar words, which correspond to similar contents in the images: body, food, clothes, music, nature, interior, architecture, museum, animals, sport. Third, we edited these data-driven categories and combined them with top-down art categories relevant for museum research, creating the following final list of image types: art exhibition (e.g. performances, events, and graphics), artifact (e.g. sculptures, paintings, and pottery), architecture (e.g. buildings, or parts of them, and indoor spaces), selfie (e.g. faces), food, human body (e.g. non-face body parts and people), landscape (e.g. outdoor spaces and nature). Fourth, in order to maximize the number of tags retrieved for each of the categories, we trained word2vec models separately on the tags’ subsets of 8 different museums and retrieved the 50 most similar tags for each category. We then manually checked all lists to make sure that they include only tags relevant for the respective categories, to resolve overlaps between categories, and to delete ambiguous tags.
This list can be used in combination with Google Vision API to easily categorize images.