Vision Language Models & Agentic Systems

Haase, Robert

doi:10.5281/zenodo.18297737

Published January 19, 2026 | Version v2

Presentation Open

Vision Language Models & Agentic Systems

Haase, Robert^{1, 2}

1. Center for Scalable Data Analytics and Artificial Intelligence
2. Leipzig University

In these slide decks, we introduce Vision Language Models (VLM), a form of Generative Artificial Intelligence that can combine textual information and image data. One the one hand these can classify, interpret and describe images. To retrieve the correct information from VLMs, prompt engineering plays a key-role. On the other hand, image-generation models can produce images from text alone, or by combining images and text. The second slide deck introduces agentic systems, as a form of computer system that acts on behalf of humans. Und the hood, large language models run that can call functions from a pre-defined list of available tools, or generate code. After function and/or code execution, these systems interpret the result and plan next steps. As an example of ongoing standardization efforts in this domain, we learn about the Model Context Protocol. Both, VLMs and agents, bring new challenges to our society which are also part of the materials to start a discussion.

Files

Vision-LMs.pdf

Files (75.1 MB)

Name	Size	Download all
AI_agents.pdf md5:7d8001509ef1d43667a41dd25d4c0045	6.4 MB	Preview Download
AI_agents.pptx md5:52886864267052a05cab3d5b2ed3424f	20.3 MB	Download
Vision-LMs.pdf md5:edc6e8aabb8ced0ce79ca2c3207b2963	7.4 MB	Preview Download
Vision-LMs.pptx md5:9ca3ca490f5903f5d4f2b727c396e8aa	41.0 MB	Download

	All versions	This version
Views	325	274
Downloads	295	259
Data volume	3.7 GB	3.3 GB

Vision Language Models & Agentic Systems

Authors/Creators

Description

Files

Vision-LMs.pdf

Files (75.1 MB)