Published December 30, 2025 | Version 0.1.1
Software Open

CatLLM

  • 1. ROR icon University of California, Berkeley

Description

A Python package designed for researchers conducting text, image, and PDF analysis using large language and vision models, providing systematized access to LLM capabilities for extracting categorical information from unstructured data. The package delivers output in a consistent, highly reproducible format optimized for statistical analysis, with optional CSV export functionality for seamless integration into research workflows.

Core Functionality:

Text Analysis Capabilities:

  • Extract multiple categories present within individual text responses

  • Identify the single most prominent category within individual text responses

  • Determine the most frequently occurring categories across entire text corpora

Image Analysis Capabilities:

  • Classify multiple categorical features present within individual images

  • Conduct open-ended feature identification within individual images

  • Generate quality scores for individual images using reference image comparisons

PDF Analysis Capabilities:

  • Classify multiple categories present within individual PDF pages
  • Process PDFs using image, text, or combined analysis modes
  • Extract structured data from documents with charts, tables, and figures

Research-Oriented Design:
The package standardizes LLM interactions to ensure consistent output formatting across different models and analysis tasks. All functions return structured data objects ready for immediate statistical analysis, eliminating the need for extensive post-processing. Users can export results directly to CSV format for integration with statistical software packages, streamlining the transition from qualitative data collection to quantitative analysis. This design approach addresses common reproducibility challenges in LLM-assisted research by providing standardized interfaces and consistent output schemas regardless of the underlying model or complexity of the analytical task.

Files

Files (162.9 kB)

Name Size Download all
md5:d5f1d32954662568ffc21f8b38f05010
394 Bytes Download
md5:3558b872e65d2cca011b42a563b6baab
384 Bytes Download
md5:87e724a78f661cee80d28d868bfb2244
9.9 kB Download
md5:1fffa0a3ef8d0833f1bdeab3e59698dc
52.5 kB Download
md5:8b175d8b38d7d6a55d52569d0ab1d05c
7.7 kB Download
md5:febd197a148d6d5ab565771739baedbe
2.3 kB Download
md5:210320e339ecb60d03d29cbf161e2b83
45.8 kB Download
md5:82fe91eda840179f1c772377720f062f
43.9 kB Download

Additional details

Software

Repository URL
https://github.com/chrissoria/cat-llm
Programming language
Python
Development Status
Active