Querying DSpace: An AI Powered Conversation Application using RAG with Langchain
Authors/Creators
- 1. University of Oklahoma, United States of America
- 2. University of Oregon, United States of America
Description
AI has the potential to significantly impact open access repository development landscape in various ways like enabling better search, content recommendation, identifying new patterns in scholarly content, and promoting openness in datasets and content. Large language models (LLMs) have emerged as crucial and widely used resources in the field of natural language processing, which is a subfield of artificial intelligence (AI) and shares common ground with machine learning (ML). LLMs allow computers to comprehend and produce text in a manner that resembles human communication.
Our goal during the experiment was to create a conversation application that integrates OpenAI to query DSpace using natural language processing (NLP). We explored technologies such as LLMs, OpenAI API, LangChain, embeddings, and vector stores. LLMs are deep learning models trained on large datasets. The OpenAI API provides a cloud interface for accessing OpenAI's machine learning models. LangChain is an AI framework for language-based applications. Embeddings encode information in high-dimensional vector spaces. Vector stores are databases that store vector embeddings of non-numerical data. To create better responses, we used retrieval-augmented generation (RAG) to incorporate additional, real-time data from DSpace. This allows us to explore the most up-to-date data in DSpace.
Files
146_Zhang_QueryingDSpace.pdf
Files
(1.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:63292708a9f60e1cf770f370d70396d9
|
1.4 MB | Preview Download |