RoBERTaSense-FACIL: A Technical Report and Model Selection Study for Meaning Preservation in Easy-to-Read Spanish Texts
Authors/Creators
Description
RoBERTaSense-FACIL is a Spanish Transformer-based model fine-tuned to evaluate meaning preservation in Easy-to-Read (E2R) text adaptations. The model builds upon RoBERTa-base-bne and incorporates a balanced dataset of expert-validated E2R adaptations together with automatically generated hard negatives designed to introduce structural, semantic, and cross-textual distortions.
This technical report describes the full methodology used to construct the dataset, the hard negative generation framework, the fine-tuning process, and a comparative evaluation of three models: MeaningBERT, RoBERTa-base-bne, and a BERTScore-based regression variant. Results show that the fine-tuned RoBERTa-base-bne, referred to as RoBERTaSense-FACIL, achieves the most robust and reliable performance for binary meaning-preservation classification in Spanish E2R texts.
Data availability:
The datasets and intermediate scripts used in this work cannot be made publicly available due to privacy and copyright restrictions. However, access may be granted upon reasonable request for academic research purposes.
Model availability:
The RoBERTaSense-FACIL model is publicly available on Hugging Face:
https://huggingface.co/oeg/RoBERTaSense-FACIL