Published March 12, 2021 | Version v1
Thesis Open

Learning Video Embeddings for Similarity Computation

Authors/Creators

  • 1. Hochschule Luzern

Description

TikTok is one of the fastest-growing social media platforms in the world. It allows users to upload short videos of up to 60 seconds and enrich them with music, filters, and other features. Even though it was launched some years ago, it is still relatively unexplored as a marketing instrument. From a computer science perspective, TikTok offers a huge collection of videos to train machine learning models on. Furthermore, there is strong commercial interest from companies to place ads in videos. Recently, self-supervised learning methods have demonstrated the ability to learn powerful feature representations without the use of any labeled data. This project presents state-of-the-art research on learning semantic features from video data. For the purpose of this project, a dataset consisting of more than 290,000 TikTok videos along with descriptive information and content related statistics was collected. Based on this dataset, novel self-supervised learning approaches, including popular contrastive learning models, were trained and evaluated. To select the most suitable model architecture, the performance was evaluated by defining multiple downstream tasks, such as classification or video retrieval. Experiments revealed that the contrastive learning algorithm SimCLR performs best for extracting powerful representations of TikTok videos. Since the majority of TikTok videos use custom audio tracks played in the background, a novel multimodality model based on SimCLR was proposed. It allows to learn visual and audio embeddings simultaneously by aligning both modalities. By leveraging the audio information, the performance could be improved on all downstream tasks significantly. To demonstrate how the proposed feature learning model can be applied to marketing research, two additional downstream tasks were tackled. In the first experiment, anomaly detection algorithms were leveraged to approximate the originality of videos. Surprisingly, the predicted originality scores exhibit a slightly negative correlation with the number of likes. Even though originality is generally seen as a panacea for the success on social media, this analysis reveals that being original on TikTok rather harms than helps to succeed. These results are relevant to marketers and firms that aim to increase the popularity of their content. To further demonstrate the usefulness of the proposed architecture, a regression model was trained to predict the number of likes a video receives based on both visual and audio information as well as the authors' statistics. The regression model achieves an R2 score of 0.813 on unseen test data. Finally, a web application has been implemented that helps influencers to maximize the visibility of their content by predicting the number of likes, suggesting appropriate hashtags, and showing similar videos.

Notes

+ ID der Publikation: hslu_82942 + Art des Beitrages: Master-/Lizentiats-/Diplomarbeit + Name der Universität / Institution inkl. Ort: Lucerne University of Applied Sciences and Arts, Lucerne School of Computer Science and Information Technology, Rotkreuz + Land der Universität / Institution: Switzerland + Sprache: Englisch + Letzte Aktualisierung: 2021-04-08 16:43:49

Files

MSE_Thesis_MarcBravin_VideoEmbeddings_Publish.pdf

Files (7.5 MB)