Published January 9, 2026 | Version v1
Software Open

Matcher (Version 1) for Automated Task Alignment in the Genomic API for Model Evaluation (GAME)

Description

This record provides a Matcher container that runs an automated task alignment service powered by a local LLM using Ollama, to perform semantic ontology matching. It bundles the gemma3:12b model and all necessary Python dependencies to map fuzzy, free-text user inputs to canonical terms from a controlled vocabulary. It operates as a standalone TCP server, accepting JSON-formatted requests and returning the best-matched term.

Matcher V1: Recursive Tournament

This version is a direct evolution of V0 (deprecated V1), which utilized raw TCP sockets. This introduces significant algorithmic, accuracy improvements, and the use of REST API framework (using FastAPI) for greater scalability. It retains the core architecture of V0 -- a robust framework for LLM-based entity matching in three key genomic domains: cell types, species, and binding molecules (e.g., Transcription Factors, Histone Modifications).

Core Functionality and Improvements from V0

  • LLM-Powered Matching: Utilizes the gemma3:12b model via the Ollama framework to understand the semantic content of a user's input term.
  • Prompting: Employs sophisticated, few-shot prompt engineering to guide the LLM's reasoning.
  • Recursive Tournament Algorithm: The "Chunk-and-Compete" method of V0 is upgraded to a more scalable, multi-stage recursive tournament.
    1. Chunking: The extensive list of potential choices is broken down into smaller, manageable chunks (e.g. of 20 items). The LLM then finds the best candidate ("champion") within each chunk.
    2. Recursive Chunking: After the initial chunking round, the algorithm checks the number of resulting champions. If it exceeds the chunk size, it treats the champions as a new list to be chunked and runs another elimination round. This process repeats recursively, like a tournament bracket, until a small group of finalists remains for the final decision. This ensures the matcher can gracefully handle massive choice lists without failure.
  • Enhanced Granularity Matching: The prompt for cell_type matching has been refined with new instructions and examples.
    • V1 is now better able to discern the required level of detail. For instance, given the input mammary epithelial cell, it can correctly choose "mammary epithelial cell female" over the more specific "mammary epithelial cell female adult (23 years)" from a list of choices, and vice-versa if the input is more specific. This leads to more contextually appropriate matches.

Running the container

Ensure Apptainer is intalled in the system the container is intended to run. Always run the Matcher first, so it can listen for incoming connections from Predictors:

apptainer run --containall --nv matcher.sif MATCHER_IP MATCHER_PORT

Note on Flags:

  • --nv: This flag enables NVIDIA GPU support inside the container. It is essential for performance, as the LLM requires GPU acceleration for timely inference.
  • --containall: This flag ensures the container is fully self-contained. It prevents the container from accessing the user's home directory or other host system files, guaranteeing that the service runs with only the software and libraries packaged within it for maximum reproducibility.
 
Additional information about the GAME framework can be found on GitHub: Genomic API for Model Evaluation

Files

Files (9.8 GB)

Name Size Download all
md5:d074f117f22cf2f826b7e126603abbed
9.8 GB Download