Gesture talk: an Integrated Multimodal AI Assistant (Gesture, Voice, and Conversational Intelligence)

Pawan Gandhi, Krina Masharu

doi:10.5281/zenodo.17356997

Published October 15, 2025 | Version v1

Publication Open

Gesture talk: an Integrated Multimodal AI Assistant (Gesture, Voice, and Conversational Intelligence)

Pawan Gandhi, Krina Masharu (Researcher)

Emerging trends in Human‑Computer Interaction (HCI) emphasize multimodal input systems that combine visual gestures, voice commands, and dialogue-based AI. This work presents a Python‑based assistant integrating MediaPipe/OpenCV, voice automation through SpeechRecognition and system subprocesses, and a Generative AI chatbot powered by Google’s Gemini API. Inspired by prior multimodal studies and systems combining speech and gestures, our system enables real‑time control of volume, brightness, media, applications, files, and AI chat—all running concurrently using multithreading for responsiveness. Evaluation demonstrates high accuracy and low latency, showing promise for intuitive, accessible multimodal interfaces.

Files

IJSRED-V8I5P181.pdf

Files (94.8 kB)

Name	Size	Download all
IJSRED-V8I5P181.pdf md5:28892a443df3c6d47d723d7352b54e9e	94.8 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	33	33
Downloads	22	22
Data volume	2.4 MB	2.4 MB

More info on how stats are collected....

DOI

Resource type

Publication

Publisher

Zenodo

Published in

IJSRED - International Journal of Scientific Research and Engineering Development, 8(5), 1505-1508, ISSN: 2581-7175, 2025.

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 15, 2025
Modified: October 15, 2025

Gesture talk: an Integrated Multimodal AI Assistant (Gesture, Voice, and Conversational Intelligence)

Authors/Creators

Description

Files

IJSRED-V8I5P181.pdf

Files (94.8 kB)