Vision Language Models & Agentic Systems
Authors/Creators
Description
In these slide decks, we introduce Vision Language Models (VLM), a form of Generative Artificial Intelligence that can combine textual information and image data. One the one hand these can classify, interpret and describe images. To retrieve the correct information from VLMs, prompt engineering plays a key-role. On the other hand, image-generation models can produce images from text alone, or by combining images and text. The second slide deck introduces agentic systems, as a form of computer system that acts on behalf of humans. Und the hood, large language models run that can call functions from a pre-defined list of available tools, or generate code. After function and/or code execution, these systems interpret the result and plan next steps. As an example of ongoing standardization efforts in this domain, we learn about the Model Context Protocol. Both, VLMs and agents, bring new challenges to our society which are also part of the materials to start a discussion.