Published January 19, 2026 | Version v2
Presentation Open

Vision Language Models & Agentic Systems

Authors/Creators

  • 1. ROR icon Center for Scalable Data Analytics and Artificial Intelligence
  • 2. ROR icon Leipzig University

Description

In these slide decks, we introduce Vision Language Models (VLM), a form of Generative Artificial Intelligence that can combine textual information and image data. One the one hand these can classify, interpret and describe images. To retrieve the correct information from VLMs, prompt engineering plays a key-role. On the other hand, image-generation models can produce images from text alone, or by combining images and text. The second slide deck introduces agentic systems, as a form of computer system that acts on behalf of humans. Und the hood, large language models run that can call functions from a pre-defined list of available tools, or generate code. After function and/or code execution, these systems interpret the result and plan next steps. As an example of ongoing standardization efforts in this domain, we learn about the Model Context Protocol. Both, VLMs and agents, bring new challenges to our society which are also part of the materials to start a discussion.

Files

Vision-LMs.pdf

Files (75.1 MB)

Name Size Download all
md5:7d8001509ef1d43667a41dd25d4c0045
6.4 MB Preview Download
md5:52886864267052a05cab3d5b2ed3424f
20.3 MB Download
md5:edc6e8aabb8ced0ce79ca2c3207b2963
7.4 MB Preview Download
md5:9ca3ca490f5903f5d4f2b727c396e8aa
41.0 MB Download