Published February 24, 2025 | Version v1
Dataset Open

BioSample dataset for ontology mapping evaluation

  • 1. ROR icon Research Organization of Information and Systems
  • 2. ROR icon Chiba University
  • 3. Joho System Kenkyu Kiko Life Science Togo Database Center
  • 4. ROR icon RIKEN
  • 5. ROR icon Kumamoto University
  • 6. ROR icon Kyoto University

Description

Datasets for the evaluation of automatic curation of BioSample using Large Language Models (LLMs).

Files

  • biosample_cellosaurus_mapping_gold_standard.tsv
    The gold-standard dataset for the cell line ontology mapping task. This was manually created by the authors.

  • biosample_cellosaurus_mapping_testset.json
    The BioSample dataset used for the cell line ontology mapping task. This was collected using the EBI BioSamples API.

  • biosample_cellosaurus_mapping_result_llm_assisted.tsv
    The result of the cell line ontology mapping of the test dataset by the LLM-assisted pipeline.

  • biosample_cellosaurus_mapping_result_metasra.tsv
    The result of the cell line ontology mapping of the test dataset by directly using the MetaSRA pipeline.

  • biosample_gene_extraction_testset.json
    The BioSample dataset used for the gene name extraction task. This was collected using the EBI BioSamples API.

  • biosample_gene_extraction_result.tsv 
    The result of the gene name extraction from the test dataset by the LLM-assisted pipeline.

Licenses

biosample_cellosaurus_mapping_gold_standard.tsv is licensed under CC BY 4.0.

All other files are licensed under CC0.

Files

biosample_cellosaurus_mapping_testset.json

Files (17.2 MB)

Name Size Download all
md5:c9edfb4bea76656a22dd095b0d1eb4d3
22.2 kB Download
md5:90a11f62706353f226c9d36d161080d8
577.6 kB Download
md5:1b06e0478a3014fa99991d7901c16db4
47.8 kB Download
md5:ae35e7a975c397b13fb729d3e76adedd
1.8 MB Preview Download
md5:2c1ffe542b93b31c857541ec05703b1f
3.6 MB Download
md5:b9258d7a1ba4a2f6463fd9e20c4e7e5f
11.2 MB Preview Download

Additional details

Dates

Available
2025-02-24

Software

Repository URL
https://github.com/sh-ikeda/bsllmner
Programming language
Python
Development Status
Active