Toward Advancing Emotion Recognition in LLMs: A Comparative Study of Prompt Strategies, Few-Shot Learning, and Model Ensembling

Park, Jungjoon; Mashayekh Esfahan, Nikoo; Narayan, Arasu

doi:10.5281/zenodo.15126322

Published April 28, 2025 | Version 1.0

Publication Open

Toward Advancing Emotion Recognition in LLMs: A Comparative Study of Prompt Strategies, Few-Shot Learning, and Model Ensembling

1. Neuromatch Academy, Neuromatch, Inc.
2. Oracle Healthcare, Washington, United States
3. Northeastern University, Washington, United States

Emotion recognition plays a critical role in enhancing human–computer interactions. This study benchmarks the performance of commercial (GPT-4o, Claude–3.5 Sonnet) and open-source (Llama 3.1, Qwen 2.5, Mistral, Gemma 3) large language models (LLMs) on the ISEAR dataset using five prompt strategies: no persona, baseline persona, zero-shot, few-shot, and retrieval-augmented generation (RAG). Structured prompting consistently improve classification accuracy, particularly for smaller open-source models, while commercial models remained robust even under minimal prompt settings. Challenging emotions such as guilt, shame, and disgust exhibited lower classification performance across models. Model ensembling further enhanced results, with the Qwen 2.5 14B ensemble achieving the highest macro-F1 score of 78.4%. These results highlight the effectiveness of structured prompt design and model aggregation in optimizing emotion recognition. Future directions include integrating dimensional emotion frameworks and multimodal signals to build more context-sensitive and resilient systems.

Files

ISP2025-Emotion-Recognition-LLMs-code.zip

Files (27.0 MB)

Name	Size	Download all
ISP2025-Emotion-Recognition-LLMs-code.zip md5:585f4fc9865192b0c2010e07190bfb6b	9.5 MB	Preview Download
ISP2025-Emotion-Recognition-LLMs-CRedit.csv md5:873194d5ea90ba9313fed48c63004d38	1.1 kB	Preview Download
ISP2025-Emotion-Recognition-LLMs-figure.png md5:a44477911eb8ff2a83f2bf7af3238f47	6.8 MB	Preview Download
ISP2025-Emotion-Recognition-LLMs-manuscript.pdf md5:6f3e1d52b2333bfd9f74918c0f323cf0	1.6 MB	Preview Download
ISP2025-Emotion-Recognition-LLMs-summary-public.png md5:1682a4c69c2024f10f055db6aaa9022f	2.4 MB	Preview Download
ISP2025-Emotion-Recognition-LLMs-supp-figure.png md5:80ee205d98858671193a54366cec5cef	6.0 MB	Preview Download
ISP2025-Emotion-Recognition-LLMs-supp-material.pdf md5:72744d2405f5b12baeaea8fff67a9a76	819.1 kB	Preview Download

Additional details

Is referenced by: Presentation: https://youtu.be/M7DXlHnzP_k?feature=shared (URL)

Neuromatch
Impact Scholars Program

Submitted: 2025-04-28

Programming language: Python

Annepaka, Y., & Pakray, P. (2025). Large language models: A survey of their development, capabilities, and applications. Knowledge and Information Systems, 67(3), 2967–3022. https://doi.org/10.1007/s10115-024-02310-4
Asghar, M. Z., Khan, A., Bibi, A., Kundi, F. M., & Ahmad, H. (2017). Sentence-Level Emotion Detection Framework Using Rule-Based Classification. Cognitive Computation, 9(6), 868–894. https://doi.org/10.1007/s12559-017-9503-3
Basile, A., Pérez-Torró, G., & Franco-Salvador, M. (2021). Probabilistic Ensembles of Zero- and Few-Shot Learning Models for Emotion Classification. In R. Mitkov & G. Angelova (Eds.), Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (pp. 128–137). INCOMA Ltd. https://aclanthology.org/2021.ranlp-1.16/
Berka, P. (2020). Sentiment analysis using rule-based and case-based reasoning. Journal of Intelligent Information Systems, 55(1), 51–66. https://doi.org/10.1007/s10844-019-00591-8
Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation (No. arXiv:2402.03216). arXiv. https://doi.org/10.48550/arXiv.2402.03216
Esfahani, S. H. N., & Adda, M. (2024). Classical Machine Learning and Large Models for Text-Based Emotion Recognition. Procedia Computer Science, 241, 77–84. https://doi.org/10.1016/j.procs.2024.08.013
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., … Ma, Z. (2024). The Llama 3. Herd of Models (No. arXiv:2407.21783). arXiv. https://doi.org/10.48550/arXiv.2407.21783
Hung, L. P., & Alias, S. (2023). Beyond Sentiment Analysis: A Review of Recent Trends in Text Based Sentiment Analysis and Emotion Detection. Journal of Advanced Computational Intelligence and Intelligent Informatics, 27(1), 84–95. https://doi.org/10.20965/jaciii.2023.p0084
Lefter, I., Rook, L., & Chaspari, T. (2024). Editorial: Multimodal interaction technologies for mental well-being. Frontiers in Computer Science, 6. https://doi.org/10.3389/fcomp.2024.1412727
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (No. arXiv:2005.11401). arXiv. https://doi.org/10.48550/arXiv.2005.11401
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2024). GPT-4 Technical Report (No. arXiv:2303.08774). arXiv. https://doi.org/10.48550/arXiv.2303.08774
Plaza-del-Arco, F. M., Martín-Valdivia, M. -T., & Klinger, R. (2022). Natural Language Inference Prompts for Zero-shot Emotion Classification in Text across Corpora. In N. Calzolari, C. -R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K. -S. Choi, P. -M. Ryu, H. -H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, & S. -H. Na (Eds.), Proceedings of the 29th International Conference on Computational Linguistics (pp. 6805–6817). International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.592/
Scherer, K. R., & Wallbott, H. G. (1994). Evidence for universality and cultural variation of differential emotion response patterning. Journal of Personality and Social Psychology. 66(2), 310–328. https://doi.org/10.1037/0022-3514.66.2.310,
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. -A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models (No. arXiv:2302.13971). arXiv. https://doi.org/10.48550/arXiv.2302.13971
Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., … Fan, Z. (2024). Qwen2 Technical Report (No. arXiv:2407.10671). arXiv. https://doi.org/10.48550/arXiv.2407.10671
Yin, W., Hay, J., & Roth, D. (2019). Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach (No. arXiv:1909.00161). arXiv. https://doi.org/10.48550/arXiv.1909.00161

	All versions	This version
Views	542	542
Downloads	348	348
Data volume	921.5 MB	921.5 MB

ISP2025-Emotion-Recognition-LLMs-code.zip

Files (27.0 MB)

Related works

Funding

Dates

Software

References

Toward Advancing Emotion Recognition in LLMs: A Comparative Study of Prompt Strategies, Few-Shot Learning, and Model Ensembling

Authors/Creators

Description

Files

ISP2025-Emotion-Recognition-LLMs-code.zip

Files (27.0 MB)

Additional details

Related works

Funding

Dates

Software

References