# G-Pro Bot implement
## _Xiaohuan Wang_

## Overview
G-Pro Bot is a GIS domain chatbot that is able to comprehend GIS knowledge and generate accurate, context-aware responses. It marks the first application of Graph RAG in developing a chatbot for the GIS domain. 
This repository provides the original Anaconda environment for deploying G-Pro Bot. Follow the instructions below to complete the setup and you are free to ask anything about GIS you want!

> **Note**:  
> This implementation guide is primarily based on the comprehensive Chinese tutorial available at:
[Latest!!! Graphrag + ollama Local Deployment Tutorial for llm and embedding Models (Most Comprehensive and Detailed Pitfall Summary on the Internet) + Knowledge Graph Visualization + Manually Tuned Prompt for Custom Entity Recognition](https://blog.csdn.net/EEEric_/article/details/143951187?spm=1001.2014.3001.5506) 

## Requirements
- Anaconda/Miniconda installed
- Python 3.10-3.12 (recommended: 3.12)
- Minimum 16GB RAM (32GB recommended for larger models)
- Python Libraries: graphrag = 0.5.0

## Dataset Information
The dataset used in this project(./corpus_data./sample GIS knowledge base.txt) consists of a curated corpus compiled from twelve publications in the GIS field. All non-textual content such as images, figures, and tables has been removed. Additionally, sensitive information including personal names and geographic identifiers has been excluded to ensure data anonymization and compliance with confidentiality requirements.
Please note: To ensure the confidentiality of this chatbot, the dataset provided in this repository is a sample only and does not contain the full corpus used in the actual implementation.

## Code Information
This repository provides the foundational Anaconda environment required to deploy the G-Pro Bot. The environment was initialized using the following command in the Anaconda Prompt:
```sh
conda create -n G-Pro_bot_implement python=3.12
```
Subsequent implementation steps, as detailed later in this README, incorporate components from Microsoft's GraphRAG project to support retrieval-augmented generation.

## Instruction for Implementation A: Environment Setup
1.Conda Environment Setup
Download the project on Zenodo at https://zenodo.org/uploads/15135292 and move it into :WhereYouInstalledAnaconda\envs\.
Launch Anaconda Prompt and enter the command below to activate environment.
```sh
conda activate G-Pro_bot_implement
```
2.Ollama Installation and Model download
Use the command below to install Ollama. It will take a few minutes.
```sh
pip install ollama
```
When it is done, pull these two model.
```sh
ollama pull llama3.1        # LLM model
ollama pull nomic-embed-text  # Embedding model
```
> **Tip**:  You can use this command to verify the installation:
```sh
ollama list
```

3.GraphRAG Setup
Git GraphRAG, enter the folder and install requirements. G-Pro bot is based on GraphRAG 0.5.0, so please be sure to switch to the same version.
```sh
git clone https://github.com/microsoft/graphrag.git
git switch 0.5.0 #switch to GraphRAG 0.5.0
cd graphrag
pip install -e .
```

4.Project Initialization
Make a new directory firstly. The sample GIS knowledge base text file(corpus data)(./corpus_data)will be placed here.
```sh
mkdir -p ./bot_implement/input
```
> **File Requirements**:  
Place only UTF-8 encoded .txt files in ./bot_implement/input
Multiple files are supported

Use the commoand to initialise the project.
```sh
graphrag init --root ./bot_implement
```
which creates following files:
- output/ - Model results directory
- logs/ - Operation logs
- setting.yaml - Configuration file
- prompts/ - Prompt templates
- .env - Environment variables

## Instruction for Implementation B: Configuration
1.Open files and modify contents. First .env File:
```sh
GRAPHRAG_API_KEY=ollama
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True
```
2.Then setting.yaml:
```sh
# LLM Configuration
api_key: ollama
model: llama3.1-8k
api_base: http://localhost:11434/v1

# Embeddings Configuration
api_key: ollama 
model: nomic-embed-text
api_base: http://localhost:11434/api

# Chunking Parameters
chunks:
  size: 300  # Adjust between 100-1200 based on performance

# Visualization Settings
snapshots:
  graphml: true  # Set false to disable knowledge graph visualization
```

## Instruction for Implementation C: Source Code Modifications
Before running the pipeline, we need to modify some of source codes.
1.Open File: graphrag/graphrag/llm/openai/openai_embeddings_llm.py. Find following code and replace them.
```sh
import ollama
#embedding = await self.client.embeddings.create(   
    #input=input,
    #**args,
#return [d.embedding for d in embedding.data]
embedding_list = []
for inp in input:
    embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
    embedding_list.append(embedding["embedding"])
return embedding_list
```
2.Open File: graphrag/graphrag/query/llm/oai/embedding.py. Find following code and replace them.
```sh
import ollama
#embedding, chunk_len = self._embed_with_retry(chunk,**kwargs)
embedding = ollama.embeddings(model='nomic-embed-text', prompt=chunk)['embedding']

#chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
#chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
#return chunk_embeddings.tolist()
return chunk_embeddings
```
3.Open File: graphrag/graphrag/query/llm/text_utils.py. Add to chunk_text() function:
```sh
tokens = token_encoder.decode(tokens)  # Decode tokens to string
```

## Instruction for Implementation D: Execute Indexing Pipeline and Query the chatbot!
Finished steps above, back to Anaconda Prompt and run pipeline. This step may take hours depending on input data size and model size.
```sh
graphrag index --root ./bot_implement
```
Congratulations on implementing G-Pro bot! You can ask anything about GIS you what now using command below!
```sh
graphrag query --root ./bot_implement --method global --query "Your question"
```

