Published August 28, 2025 | Version V1-Release-Zenodo
Software Open

dayanjan/PyZoBot-RAG: PyZoBot - A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation

  • 1. Virginia Commonwealth University

Description

PyZoBot v1.0 -Zenodo Release Notes

Overview

PyZoBot is an AI-driven platform that integrates Zotero reference management with advanced Retrieval-Augmented Generation (RAG) technology to address information overload in scientific research. This platform enables researchers to perform conversational information extraction and synthesis from their curated Zotero reference libraries.

Key Features

🚀 Four Implementation Variants

  • PyZoBot OpenAI: Standard RAG with OpenAI's GPT-3.5-Turbo and GPT-4o
  • PyZoBot OpenSource: Standard RAG with open-source LLMs (LLaMA 3.1, Mistral) via Ollama
  • PyZoBot GraphRAG OpenAI: Knowledge graph-enhanced RAG with OpenAI models
  • PyZoBot GraphRAG OpenSource: Knowledge graph-enhanced RAG with open-source models

📚 Document Processing Capabilities

  • Multiple Chunking Strategies:
    • Recursive chunking (default) - maintains context while splitting documents
    • Layout-aware chunking - preserves document structure and formatting
    • Semantic chunking - AI-driven segmentation based on meaning
  • Configurable Parameters: Adjustable chunk size, overlap, and retrieval settings

🔍 Advanced Retrieval Features

  • Vector Store Search: ChromaDB-powered similarity search with Maximum Marginal Relevance (MMR)
  • Knowledge Graph Visualization: Interactive exploration of entity relationships (GraphRAG versions)
  • Transparent Citations: Automatic in-text citations and reference generation
  • Source Tracking: Display of specific document chunks used for response generation

💻 User-Friendly Interface

  • Intuitive Streamlit-based GUI requiring no coding expertise
  • Seamless Zotero integration via API
  • Support for both user and group libraries
  • Real-time processing status and feedback

🔐 Flexible Deployment Options

  • Cloud-based: Using OpenAI API for powerful language models
  • Local/Privacy-conscious: Fully offline operation with open-source models through Ollama
  • Hybrid: Mix and match components based on your needs

Technical Specifications

Requirements

  • Python 3.11.2+
  • Zotero API key and library access
  • For OpenAI versions: OpenAI API key
  • For open-source versions: Ollama installed locally

Embedding Models

  • OpenAI: text-embedding-3-large
  • Open-source: nomic-embed-text, bge-m3

Language Models

  • OpenAI: GPT-4o, GPT-3.5-turbo
  • Open-source: LLaMA 3.1 (8B), Mistral (7B)

What's Included

  • Complete implementation of all four PyZoBot variants
  • Streamlit-based user interface
  • Document processing pipeline with multiple chunking strategies
  • Vector store indexing and retrieval system
  • Knowledge graph construction and visualization (GraphRAG versions)
  • Example usage and configuration templates
  • Comprehensive documentation

Use Cases

  • Literature review automation
  • Research synthesis and summarization
  • Knowledge discovery across document collections
  • Citation tracking and reference management
  • Cross-disciplinary connection identification
  • Evidence-based response generation

Performance Highlights

  • Demonstrated 60% reduction in literature review time in preliminary studies
  • Maintains expert-level accuracy through grounded responses
  • Transparent citation tracking ensures research integrity
  • Supports processing of large PDF collections from Zotero libraries

Authors

Suad Alshammari¹,², Walaa Abu Rukbah²,³, Lama Basalelah²,⁴, Ali Alsuhibani²,⁵, Ali Alghubayshi²,⁶, Bridget T McInnes⁷, Dayanjan S. Wijesinghe²

¹ Northern Border University, Saudi Arabia
² Virginia Commonwealth University, USA
³ University of Tabuk, Saudi Arabia
⁴ Imam Abdulrahman Bin Faisal University, Saudi Arabia
⁵ Qassim University, Saudi Arabia
⁶ University of Hail, Saudi Arabia
⁷ Virginia Commonwealth University (Computer Science), USA

Citation

If you use PyZoBot in your research, please cite:

Alshammari et al., "PyZoBot: A Platform for Conversational Information Extraction and 
Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented 
Generation," 2024.

License

[Add your license here - e.g., MIT, Apache 2.0, etc.]

Acknowledgments

This work addresses the critical challenge of information overload in biomedical research by combining human expertise through curated Zotero libraries with state-of-the-art AI capabilities.

For detailed installation instructions, usage examples, and documentation, please refer to the README.

Files

dayanjan/PyZoBot-RAG-V1-Release-Zenodo.zip

Files (52.3 kB)

Name Size Download all
md5:28c9cc6b2b5216c2af27b89dbc30ec9a
52.3 kB Preview Download

Additional details

Related works