Published October 15, 2025 | Version v1
Publication Open

Gesture talk: an Integrated Multimodal AI Assistant (Gesture, Voice, and Conversational Intelligence)

Description

Emerging trends in Human‑Computer Interaction (HCI) emphasize multimodal input systems that combine visual gestures, voice commands, and dialogue-based AI. This work presents a Python‑based assistant integrating MediaPipe/OpenCV, voice automation through SpeechRecognition and system subprocesses, and a Generative AI chatbot powered by Google’s Gemini API. Inspired by prior multimodal studies and systems combining speech and gestures, our system enables real‑time control of volume, brightness, media, applications, files, and AI chat—all running concurrently using multithreading for responsiveness. Evaluation demonstrates high accuracy and low latency, showing promise for intuitive, accessible multimodal interfaces.
 

Files

IJSRED-V8I5P181.pdf

Files (94.8 kB)

Name Size Download all
md5:28892a443df3c6d47d723d7352b54e9e
94.8 kB Preview Download