Published October 15, 2025
| Version v1
Publication
Open
Gesture talk: an Integrated Multimodal AI Assistant (Gesture, Voice, and Conversational Intelligence)
Authors/Creators
Description
Emerging trends in Human‑Computer Interaction (HCI) emphasize multimodal input systems that combine visual gestures, voice commands, and dialogue-based AI. This work presents a Python‑based assistant integrating MediaPipe/OpenCV, voice automation through SpeechRecognition and system subprocesses, and a Generative AI chatbot powered by Google’s Gemini API. Inspired by prior multimodal studies and systems combining speech and gestures, our system enables real‑time control of volume, brightness, media, applications, files, and AI chat—all running concurrently using multithreading for responsiveness. Evaluation demonstrates high accuracy and low latency, showing promise for intuitive, accessible multimodal interfaces.
Files
IJSRED-V8I5P181.pdf
Files
(94.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:28892a443df3c6d47d723d7352b54e9e
|
94.8 kB | Preview Download |