dayanjan/PyZoBot-RAG: PyZoBot - A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation
Description
PyZoBot v1.0 -Zenodo Release Notes
Overview
PyZoBot is an AI-driven platform that integrates Zotero reference management with advanced Retrieval-Augmented Generation (RAG) technology to address information overload in scientific research. This platform enables researchers to perform conversational information extraction and synthesis from their curated Zotero reference libraries.
Key Features
🚀 Four Implementation Variants
- PyZoBot OpenAI: Standard RAG with OpenAI's GPT-3.5-Turbo and GPT-4o
- PyZoBot OpenSource: Standard RAG with open-source LLMs (LLaMA 3.1, Mistral) via Ollama
- PyZoBot GraphRAG OpenAI: Knowledge graph-enhanced RAG with OpenAI models
- PyZoBot GraphRAG OpenSource: Knowledge graph-enhanced RAG with open-source models
📚 Document Processing Capabilities
- Multiple Chunking Strategies:
- Recursive chunking (default) - maintains context while splitting documents
- Layout-aware chunking - preserves document structure and formatting
- Semantic chunking - AI-driven segmentation based on meaning
- Configurable Parameters: Adjustable chunk size, overlap, and retrieval settings
🔍 Advanced Retrieval Features
- Vector Store Search: ChromaDB-powered similarity search with Maximum Marginal Relevance (MMR)
- Knowledge Graph Visualization: Interactive exploration of entity relationships (GraphRAG versions)
- Transparent Citations: Automatic in-text citations and reference generation
- Source Tracking: Display of specific document chunks used for response generation
💻 User-Friendly Interface
- Intuitive Streamlit-based GUI requiring no coding expertise
- Seamless Zotero integration via API
- Support for both user and group libraries
- Real-time processing status and feedback
🔐 Flexible Deployment Options
- Cloud-based: Using OpenAI API for powerful language models
- Local/Privacy-conscious: Fully offline operation with open-source models through Ollama
- Hybrid: Mix and match components based on your needs
Technical Specifications
Requirements
- Python 3.11.2+
- Zotero API key and library access
- For OpenAI versions: OpenAI API key
- For open-source versions: Ollama installed locally
Embedding Models
- OpenAI: text-embedding-3-large
- Open-source: nomic-embed-text, bge-m3
Language Models
- OpenAI: GPT-4o, GPT-3.5-turbo
- Open-source: LLaMA 3.1 (8B), Mistral (7B)
What's Included
- Complete implementation of all four PyZoBot variants
- Streamlit-based user interface
- Document processing pipeline with multiple chunking strategies
- Vector store indexing and retrieval system
- Knowledge graph construction and visualization (GraphRAG versions)
- Example usage and configuration templates
- Comprehensive documentation
Use Cases
- Literature review automation
- Research synthesis and summarization
- Knowledge discovery across document collections
- Citation tracking and reference management
- Cross-disciplinary connection identification
- Evidence-based response generation
Performance Highlights
- Demonstrated 60% reduction in literature review time in preliminary studies
- Maintains expert-level accuracy through grounded responses
- Transparent citation tracking ensures research integrity
- Supports processing of large PDF collections from Zotero libraries
Authors
Suad Alshammari¹,², Walaa Abu Rukbah²,³, Lama Basalelah²,⁴, Ali Alsuhibani²,⁵, Ali Alghubayshi²,⁶, Bridget T McInnes⁷, Dayanjan S. Wijesinghe²
¹ Northern Border University, Saudi Arabia
² Virginia Commonwealth University, USA
³ University of Tabuk, Saudi Arabia
⁴ Imam Abdulrahman Bin Faisal University, Saudi Arabia
⁵ Qassim University, Saudi Arabia
⁶ University of Hail, Saudi Arabia
⁷ Virginia Commonwealth University (Computer Science), USA
Citation
If you use PyZoBot in your research, please cite:
Alshammari et al., "PyZoBot: A Platform for Conversational Information Extraction and
Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented
Generation," 2024.
License
[Add your license here - e.g., MIT, Apache 2.0, etc.]
Acknowledgments
This work addresses the critical challenge of information overload in biomedical research by combining human expertise through curated Zotero libraries with state-of-the-art AI capabilities.
For detailed installation instructions, usage examples, and documentation, please refer to the README.
Files
dayanjan/PyZoBot-RAG-V1-Release-Zenodo.zip
Files
(52.3 kB)
Name | Size | Download all |
---|---|---|
md5:28c9cc6b2b5216c2af27b89dbc30ec9a
|
52.3 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/dayanjan/PyZoBot-RAG/tree/V1-Release-Zenodo (URL)
Software
- Repository URL
- https://github.com/dayanjan/PyZoBot-RAG