Published May 23, 2025
| Version v5
Dataset
Open
AMSunda: A Novel Dataset for Sundanese Information Retrieval
Authors/Creators
Description
The AMSunda dataset was introduced as the first resource designed explicitly for fine-tuning and evaluating embedding models in the Sundanese language. AMSunda dataset consists of two dataset types: (1) triplet data containing a query passage, a positive, and a negative response aimed for fine-tuning embedding models, and (2) BEIR-compatible data structured for evaluating embedding models on retrieval tasks.