Published April 27, 2025
| Version v1
Dataset
Open
AMSunda: A Novel Dataset for Sundanese Information Retrieval
Authors/Creators
Description
The AMSunda dataset was introduced as the first resource designed explicitly for fine-tuning and evaluating embedding models in the Sundanese language. AMSunda dataset consists of two dataset types: (1) triplet data containing a query passage, a positive, and a negative response aimed for fine-tuning embedding models, and (2) BEIR-compatible data structured for evaluating embedding models on retrieval tasks.
Files
Files
(5.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e52f6f2fd7a24be7a6865aeea2383bca
|
2.1 MB | Download |
|
md5:1b0229f379d1da04d7f88697da72dd08
|
569.3 kB | Download |
|
md5:40545d7d5e757a43d3c7478b829b5c91
|
736.6 kB | Download |
|
md5:9477161c8bab6424ec72f46ddf6ffa5d
|
2.2 MB | Download |