Published March 11, 2025 | Version v1.1.0
Software Open

GeneAgent: Self-verification Language Agent for Gene Set Analysis using Domain Databases

Creators

  • 1. National Institute of Health

Description

Recent work in gene set analysis has shown promising performance utilizing large language models (LLMs). Nonetheless, their results are subject to limitations common in LLMs, such as hallucinations. In response, we develop GeneAgent, a language agent for gene set analysis that self-verifies by autonomously interacting with biological databases, reducing hallucinations and enhancing accuracy. GeneAgent generates novel function names or aligns with notable enriched terms for input gene sets. Benchmarking on 1,106 gene sets from different sources, GeneAgent consistently outperforms vanilla GPT-4 by a significant margin. A detailed manual review confirms the effectiveness of the self-verification module in minimizing hallucinations and generating a more reliable explanatory analysis. We also apply GeneAgent to seven novel gene sets derived from mouse B2905 melanoma cell lines, with expert evaluations showing that GeneAgent offers valuable insights into gene functions and expediting knowledge discovery.

Files

ncbi-nlp/GeneAgent-v1.1.0.zip

Files (10.4 MB)

Name Size Download all
md5:c235caec269d2df62e1ad8b59464fcf9
10.4 MB Preview Download

Additional details

Related works