Published February 24, 2025
| Version v1
Dataset
Open
BioSample dataset for ontology mapping evaluation
Creators
Description
Datasets for the evaluation of automatic curation of BioSample using Large Language Models (LLMs).
Files
- biosample_cellosaurus_mapping_gold_standard.tsv
The gold-standard dataset for the cell line ontology mapping task. This was manually created by the authors. - biosample_cellosaurus_mapping_testset.json
The BioSample dataset used for the cell line ontology mapping task. This was collected using the EBI BioSamples API. - biosample_cellosaurus_mapping_result_llm_assisted.tsv
The result of the cell line ontology mapping of the test dataset by the LLM-assisted pipeline. - biosample_cellosaurus_mapping_result_metasra.tsv
The result of the cell line ontology mapping of the test dataset by directly using the MetaSRA pipeline. - biosample_gene_extraction_testset.json
The BioSample dataset used for the gene name extraction task. This was collected using the EBI BioSamples API. - biosample_gene_extraction_result.tsv
The result of the gene name extraction from the test dataset by the LLM-assisted pipeline.
Licenses
biosample_cellosaurus_mapping_gold_standard.tsv is licensed under CC BY 4.0.
All other files are licensed under CC0.
Files
biosample_cellosaurus_mapping_testset.json
Files
(17.2 MB)
Name | Size | Download all |
---|---|---|
md5:c9edfb4bea76656a22dd095b0d1eb4d3
|
22.2 kB | Download |
md5:90a11f62706353f226c9d36d161080d8
|
577.6 kB | Download |
md5:1b06e0478a3014fa99991d7901c16db4
|
47.8 kB | Download |
md5:ae35e7a975c397b13fb729d3e76adedd
|
1.8 MB | Preview Download |
md5:2c1ffe542b93b31c857541ec05703b1f
|
3.6 MB | Download |
md5:b9258d7a1ba4a2f6463fd9e20c4e7e5f
|
11.2 MB | Preview Download |
Additional details
Dates
- Available
-
2025-02-24
Software
- Repository URL
- https://github.com/sh-ikeda/bsllmner
- Programming language
- Python
- Development Status
- Active