There is a newer version of the record available.

Published April 27, 2025 | Version v1
Dataset Open

AMSunda: A Novel Dataset for Sundanese Information Retrieval

Description

The AMSunda dataset was introduced as the first resource designed explicitly for fine-tuning and evaluating embedding models in the Sundanese language. AMSunda dataset consists of two dataset types: (1) triplet data containing a query passage, a positive, and a negative response aimed for fine-tuning embedding models, and (2) BEIR-compatible data structured for evaluating embedding models on retrieval tasks.

Files

Files (5.6 MB)

Name Size Download all
md5:e52f6f2fd7a24be7a6865aeea2383bca
2.1 MB Download
md5:1b0229f379d1da04d7f88697da72dd08
569.3 kB Download
md5:40545d7d5e757a43d3c7478b829b5c91
736.6 kB Download
md5:9477161c8bab6424ec72f46ddf6ffa5d
2.2 MB Download