Toward Advancing Emotion Recognition in LLMs: A Comparative Study of Prompt Strategies, Few-Shot Learning, and Model Ensembling
Authors/Creators
- 1. Neuromatch Academy, Neuromatch, Inc.
- 2. Oracle Healthcare, Washington, United States
- 3. Northeastern University, Washington, United States
Description
Emotion recognition plays a critical role in enhancing human–computer interactions. This study benchmarks the performance of commercial (GPT-4o, Claude–3.5 Sonnet) and open-source (Llama 3.1, Qwen 2.5, Mistral, Gemma 3) large language models (LLMs) on the ISEAR dataset using five prompt strategies: no persona, baseline persona, zero-shot, few-shot, and retrieval-augmented generation (RAG). Structured prompting consistently improve classification accuracy, particularly for smaller open-source models, while commercial models remained robust even under minimal prompt settings. Challenging emotions such as guilt, shame, and disgust exhibited lower classification performance across models. Model ensembling further enhanced results, with the Qwen 2.5 14B ensemble achieving the highest macro-F1 score of 78.4%. These results highlight the effectiveness of structured prompt design and model aggregation in optimizing emotion recognition. Future directions include integrating dimensional emotion frameworks and multimodal signals to build more context-sensitive and resilient systems.
Files
ISP2025-Emotion-Recognition-LLMs-code.zip
Files
(27.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:585f4fc9865192b0c2010e07190bfb6b
|
9.5 MB | Preview Download |
|
md5:873194d5ea90ba9313fed48c63004d38
|
1.1 kB | Preview Download |
|
md5:a44477911eb8ff2a83f2bf7af3238f47
|
6.8 MB | Preview Download |
|
md5:6f3e1d52b2333bfd9f74918c0f323cf0
|
1.6 MB | Preview Download |
|
md5:1682a4c69c2024f10f055db6aaa9022f
|
2.4 MB | Preview Download |
|
md5:80ee205d98858671193a54366cec5cef
|
6.0 MB | Preview Download |
|
md5:72744d2405f5b12baeaea8fff67a9a76
|
819.1 kB | Preview Download |
Additional details
Related works
- Is referenced by
- Presentation: https://youtu.be/M7DXlHnzP_k?feature=shared (URL)
Funding
- Neuromatch
- Impact Scholars Program
Dates
- Submitted
-
2025-04-28
Software
- Programming language
- Python
References
- Annepaka, Y., & Pakray, P. (2025). Large language models: A survey of their development, capabilities, and applications. Knowledge and Information Systems, 67(3), 2967–3022. https://doi.org/10.1007/s10115-024-02310-4
- Asghar, M. Z., Khan, A., Bibi, A., Kundi, F. M., & Ahmad, H. (2017). Sentence-Level Emotion Detection Framework Using Rule-Based Classification. Cognitive Computation, 9(6), 868–894. https://doi.org/10.1007/s12559-017-9503-3
- Basile, A., Pérez-Torró, G., & Franco-Salvador, M. (2021). Probabilistic Ensembles of Zero- and Few-Shot Learning Models for Emotion Classification. In R. Mitkov & G. Angelova (Eds.), Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (pp. 128–137). INCOMA Ltd. https://aclanthology.org/2021.ranlp-1.16/
- Berka, P. (2020). Sentiment analysis using rule-based and case-based reasoning. Journal of Intelligent Information Systems, 55(1), 51–66. https://doi.org/10.1007/s10844-019-00591-8
- Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., & Liu, Z. (2024). BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation (No. arXiv:2402.03216). arXiv. https://doi.org/10.48550/arXiv.2402.03216
- Esfahani, S. H. N., & Adda, M. (2024). Classical Machine Learning and Large Models for Text-Based Emotion Recognition. Procedia Computer Science, 241, 77–84. https://doi.org/10.1016/j.procs.2024.08.013
- Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., … Ma, Z. (2024). The Llama 3. Herd of Models (No. arXiv:2407.21783). arXiv. https://doi.org/10.48550/arXiv.2407.21783
- Hung, L. P., & Alias, S. (2023). Beyond Sentiment Analysis: A Review of Recent Trends in Text Based Sentiment Analysis and Emotion Detection. Journal of Advanced Computational Intelligence and Intelligent Informatics, 27(1), 84–95. https://doi.org/10.20965/jaciii.2023.p0084
- Lefter, I., Rook, L., & Chaspari, T. (2024). Editorial: Multimodal interaction technologies for mental well-being. Frontiers in Computer Science, 6. https://doi.org/10.3389/fcomp.2024.1412727
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (No. arXiv:2005.11401). arXiv. https://doi.org/10.48550/arXiv.2005.11401
- OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2024). GPT-4 Technical Report (No. arXiv:2303.08774). arXiv. https://doi.org/10.48550/arXiv.2303.08774
- Plaza-del-Arco, F. M., Martín-Valdivia, M. -T., & Klinger, R. (2022). Natural Language Inference Prompts for Zero-shot Emotion Classification in Text across Corpora. In N. Calzolari, C. -R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K. -S. Choi, P. -M. Ryu, H. -H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, & S. -H. Na (Eds.), Proceedings of the 29th International Conference on Computational Linguistics (pp. 6805–6817). International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.592/
- Scherer, K. R., & Wallbott, H. G. (1994). Evidence for universality and cultural variation of differential emotion response patterning. Journal of Personality and Social Psychology. 66(2), 310–328. https://doi.org/10.1037/0022-3514.66.2.310,
- Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. -A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models (No. arXiv:2302.13971). arXiv. https://doi.org/10.48550/arXiv.2302.13971
- Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., … Fan, Z. (2024). Qwen2 Technical Report (No. arXiv:2407.10671). arXiv. https://doi.org/10.48550/arXiv.2407.10671
- Yin, W., Hay, J., & Roth, D. (2019). Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach (No. arXiv:1909.00161). arXiv. https://doi.org/10.48550/arXiv.1909.00161