Created by Serkan Sulun (serkan.sulun@inesctec.pt) You can use Pickle to open the .PKL files as the following: import pickle with open('ekman6_features.pkl', 'rb') as file: ekman_data = pickle.load(file) The resulting data has the following structure: { 'samples': [ video_path: { 'features': { 'asr_sentiment': numpy.array(), 'beats': numpy.array(), 'clip': numpy.array(), 'face_emotion': numpy.array(), 'ocr_sentiment': numpy.array(), } 'label': numpy.array() 'video_path': str }, (... and other videos) ] 'stats': { 'asr_sentiment': { 'mean': float, 'std': float, 'min': float, 'max': float, }, (... and other features) } 'idx_to_label': {0: 'anger', 1: 'disgust', 2: 'fear', 3: 'joy', 4: 'sadness', 5: 'surprise'} } Features explained: asr_sentiment: Audio of the video is fed into Whisper automatic speech recognizer (ASR). Resulting text is fed into text sentiment classifier. beats: Audio of the video is fed into BEATs audio classifier. clip: Frames at 1 FPS are fed into CLIP image encoder. face_emotion: Frames at 1 FPS are fed into YOLOv5 face detector, then faces are fed into Vision Transformer (ViT) facial expression classifier. ocr_sentiment: Frames at 1 FPS are fed into Paddle optical character recognizer (OCR). Resulting text is fed into text sentiment classifier. All features are obtained from one layer before the classification layer. They are also called "activations". stats: Statistics for each feature based obtained from the entire dataset. The range for each feature is very different, min-max normalization is strongly recommended. label: One-hot vector labels for each video. idx_to_label: Provides emotion categories to make sense of the one-hot representation. ekman6_blacklist.txt explained: I went through all the videos of the Ekman6 dataset and listed the ones that I personally feel are wrongly labeled. I think the videos are scraped not only by searching the category as keyword, but also using other related words. In my following explanations, I show the search keywords by surrounding them with asterix (*). Some wrong examples for each category are: anger: A single person being *annoy*ing, but there is no other person to be annoyed or angry. disgust: Flashing lights, or rapid camera movement, presumably to promote *dizzy*ness or *nausea*. It also has videos with *boredom* and *loathing*. fear: Counter-*terror*ism, *underwater* footage, *9/11*, suspect *apprehension*. joy: *Joy*ride (car driving), including the music "Ode to *joy*", people named *Joy*. sadness: *pensive* surprise: *distraction*, people doing impressive things labeled as *astonishing*. By excluding these videos, the classification accuracy increased from 60% to 67%. Granted, this is not a fair comparison since videos from both the training and testing sets are excluded. My personal opinion on VideoEmotion-8: It has 8 categories where 6 of them are included in Ekman-6. Considering these 6 categories, Ekman-6 contains all the videos from VideoEmotion-8 and more. The remaining two categories in VideoEmotion-8 are "anticipation" and "trust". After viewing some videos from these categories, I personally don't think that they reflect the emotion well, so my personal opinion is to use Ekman-6 rather than VideoEmotion-8 in my research. However, VideoEmotion-8 is frequently used in the literature and the choice belongs to the user. Raw videos can be found at: https://yanweifu.github.io/Dataset.html For Ekman-6, the surprise category (surprise.zip) is badly compressed. I saved it somehow with the help of ChatGPT, but I don't remember exactly how. I think it was using the program 7z on Linux, with some special flags. If you use this work in your research, please cite our paper using the following Bibtex: @article{SULUN2024125209, title = {Movie trailer genre classification using multimodal pretrained features}, author = {Serkan Sulun and Paula Viana and Matthew E.P. Davies}, journal = {Expert Systems with Applications}, volume = {258}, pages = {125209}, year = {2024}, issn = {0957-4174}, doi = {https://doi.org/10.1016/j.eswa.2024.125209}, url = {https://www.sciencedirect.com/science/article/pii/S0957417424020761}, }