Published May 1, 2026 | Version v1
Preprint Open

Intelligent Meeting Analytics: Real-Time Insight Extraction from Microsoft Teams Conversations Using Transformer-Based NLP and WebSocket Streaming

  • 1. Arab International University

Description

The proliferation of virtual collaboration platforms has fundamentally altered how professional teams communicate, coordinate, and make decisions. Despite this shift, a persistent challenge remains: valuable insights generated during meetings—including action items, key decisions, open questions, and discussion threads—are routinely lost or require significant post-meeting effort to retrieve and document. This project presents MeetingMind, an AI-powered real-time meeting analytics extension for Microsoft Teams that addresses this gap by automatically extracting structured intelligence from live audio streams. The system integrates a speech-to-text pipeline based on OpenAI Whisper for audio transcription, followed by a suite of fine-tuned transformer models from HuggingFace for downstream NLP tasks including extractive and abstractive summarization, action item detection with assignee and deadline recognition, intent classification, and topic segmentation. A Python FastAPI backend coordinates real-time data flow using WebSocket connections, ensuring low-latency processing and delivery of insights to a React-based dashboard. The extension is deployed through the Microsoft Teams Toolkit, enabling seamless integration with the existing collaboration environment. Evaluation of the system against benchmark datasets demonstrates strong performance: the summarization module achieves a ROUGE-2 score of 0.41 and ROUGE-L of 0.61, while the action item extraction pipeline achieves an F1-score of 0.78 on a curated meeting transcript dataset. Topic segmentation using a fine-tuned BERT model reaches 84.6% segmentation accuracy. The post-meeting dashboard, tested under simulated meeting loads of up to 90 minutes, processes and renders all extracted insights within an acceptable latency threshold. 
Overall, MeetingMind demonstrates that combining real-time audio processing with transformer-based NLP can substantially reduce the cognitive overhead of meeting 
documentation and improve team productivity.

This work was conducted at Arab International University (AIU), Syria. The official website of the university is: https://www.aiu.edu.sy

Files

Ammar_Alzoubi_APR_2026.pdf

Files (507.3 kB)

Name Size Download all
md5:5550589031d429d53e53ac562588b254
507.3 kB Preview Download

Additional details

References

  • [1] T. Barhoum, 'SETF: A Structured Engineering Thesis Framework for Artificial Intelligence, Software Engineering, and Robotics,' Zenodo, Apr. 2026. doi: 10.5281/zenodo.19686845.
  • [2] A. Vaswani et al., 'Attention is all you need,' in Advances in Neural Information Processing Systems (NeurIPS), 2017, vol. 30.
  • [3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, 'BERT: Pre-training of deep bidirectional transformers for language understanding,' in Proc. NAACL, 2019, pp. 4171–4186.
  • [4] Y. Liu et al., 'RoBERTa: A robustly optimized BERT pretraining approach,' arXiv preprint arXiv:1907.11692, 2019.
  • [5] A. Radford et al., 'Robust speech recognition via large-scale weak supervision,' in Proc. ICML, 2023, pp. 28492–28518.
  • [6] I. Fette and A. Melnikov, 'The WebSocket protocol,' IETF RFC 6455, Dec. 2011.
  • [7] S. Ramirez, 'FastAPI,' 2019. [Online]. Available: https://fastapi.tiangolo.com.
  • [8] Microsoft Corporation, 'Teams Toolkit overview,' Microsoft Docs, 2024. [Online]. Available: https://learn.microsoft.com/en-us/microsoftteams/platform/toolkit/teams-toolkit fundamentals.
  • [9] I. Mccowan et al., 'The AMI meeting corpus,' in Proc. 5th Int. Conf. on Methods and Techniques in Behavioral Research, 2005.
  • [10] G. Murray, S. Renals, and J. Carletta, 'Extractive summarization of meeting recordings,' in Proc. Interspeech, 2005.
  • [11] Y. Zhao, R. Khalman, R. Joshi, S. Narayan, M. Saleh, and P. J. Liu, 'Calibrating sequence likelihood improves conditional language generation,' arXiv preprint arXiv:2210.00045, 2022.
  • [12] M. Purver, J. Dowding, J. Niekrasz, P. Ehlen, S. Noorbaloochi, and S. Peters, 'Detecting and summarizing action items in multi-party dialogue,' in Proc. 9th SIGdial Workshop on Discourse and Dialogue, 2007.
  • [13] H. Feng, S. Wan, H. Lan, X. Liu, S. Gao, and D. Peng, 'Meeting action item detection with regularized context modeling,' in Proc. ACL Findings, 2022.