Published February 26, 2025 | Version v1

VLM Action Parser Library

  • 1. ROR icon Jožef Stefan Institute

Description

Module for prediction and execution of robotic skills using vision-language models (VLMs).

Initial textual instructions (e.g. task board completion steps) along with an optional auxiliary image (e.g. depicting taskboard components) are processed into a robot-executable task list. This module relies on a skill library (consisting of motion primitives for executing tasks , e.g. steps in taskboard benchmark).

It can also be queried to determine action success (e.g. whether or not the door has been opened). 

Internally, it uses langchain, so the module can connect to different VLMs (local models or OpenAI API). 

Files

vlm_action_parser-main.zip

Files (642.1 kB)

Name Size Download all
md5:fc7ca107e73955c41eb5793d58c52f23
642.1 kB Preview Download

Additional details

Funding

European Commission
euROBIN - European ROBotics and AI Network 101070596

Software

Repository URL
https://repo.ijs.si/hcr/deep_learning/vlm_action_parser
Programming language
Python
Development Status
Active